Master thesis : Toward functional and distributed R2RML processor

Master thesis : Toward functional and distributed R2RML processor

Saillez, Brieuc

Date de soutenance : 4-sep-2023/5-sep-2023 • URL permanente : `http://hdl.handle.net/2268.2/18377`

Détails

Titre :	Master thesis : Toward functional and distributed R2RML processor
Auteur :	Saillez, Brieuc
Date de soutenance :	4-sep-2023/5-sep-2023
Promoteur(s) :	Debruyne, Christophe
Membre(s) du jury :	Louveaux, Quentin Fontaine, Pascal
Langue :	Anglais
Nombre de pages :	58
Discipline(s) :	Ingénierie, informatique & technologie > Sciences informatiques
URL complémentaire :	https://gitlab.uliege.be/Brieuc.Saillez/tfe
Institution(s) :	Université de Liège, Liège, Belgique
Diplôme :	Master en sciences informatiques, à finalité spécialisée en "intelligent systems"
Faculté :	Mémoires de la Faculté des Sciences appliquées

Résumé

[en] Resource Description Framework (RDF) offers multiple advantages for data storage. Transforming data from relational databases into RDF datasets can be interesting. One prominent approach for generating RDF datasets from relational databases is the W3C relational database to RDF (R2RML) mapping language. Existing R2RML processors face challenges related to computing time and memory consumption, particularly when dealing with large-scale relational databases. This master's thesis presents a functional and distributed solution for implementing an R2RML processor working on cluster. A Scala solution based on Apache Spark that is purely functional is proposed. This approach involves an updated Java Parser from an existing implementation, a transformation of Java objects into Scala Abstract Data Type (ADT), a preprocessing to rewrite referencing object map into new triples map, and the generation and writing of the data. In this solution, the distribution of the task is based on relational data rows. For modestly-sized databases, this solution is slow due to an overhead introduced by Apache Spark. While being computed on cluster, the solution is fast for generation and will not consume too much memory. But, on too large-scale data, it suffers from memory problems that can be solved.

Fichier(s)

Document(s)

TFE.pdf
Description:
Taille: 708.78 kB
Format: Adobe PDF

Demander un tiré à part

TFE_Abstract.pdf
Description:
Taille: 61.62 kB
Format: Adobe PDF

Demander un tiré à part

Citer ce mémoire

Tous les documents disponibles sur MatheO sont protégés par le droit d'auteur et soumis aux règles habituelles de bon usage.
L'Université de Liège ne garantit pas la qualité scientifique de ces travaux d'étudiants ni l'exactitude de l'ensemble des informations qu'ils contiennent.

Mémoire

Master thesis : Toward functional and distributed R2RML processor

Saillez, Brieuc

Promoteur(s) : Debruyne, Christophe

Date de soutenance : 4-sep-2023/5-sep-2023 • URL permanente : http://hdl.handle.net/2268.2/18377

Détails

Résumé

Fichier(s)

Document(s)

Auteur

Promoteur(s)

Membre(s) du jury

Citer ce mémoire

Date de soutenance : 4-sep-2023/5-sep-2023 • URL permanente : `http://hdl.handle.net/2268.2/18377`