You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The publication of linked data in SPARQL endpoints is not always a sustainable approach due to unavailability and cost problems.
4
-
Our work is centered around decentralized alternatives for linked data publication.
3
+
This paper reported on preliminary tests to add guided link traversal support into the Comunica querying engine using a rule-based reachability approach.
5
4
Our preliminary results show that our rule-based reachability criterion can significantly reduce the execution time of queries aligned with hypermedia description constraints compared to predicate-based reachability
6
5
opening the possibility for faster and more versatile traversal-based query execution over fragmented RDF documents.
7
6
Our experiment also highlights that the size of the internal data store might have more impact on performance than noted in previous studies.
8
7
In future work, we will perform more exhaustive evaluations of other types of domain-oriented fragmentation strategies such as string evaluation and geospatial,
9
8
and investigate how to generalize our approach to support more expressive online reasoning for online source selection during traversal queries.
9
+
Furthermore, we also showed there is still room for optimization by researching ways for pruning useless quads from the internal quadstore as the link traversal is happening.
We define our approach as a rule-based reachability criterion.
23
23
Our approach builds upon the concept of structural assumptions~\cite{taelman2023} to exploit the structural properties of TREE annotated datasets.
24
-
Concretely, we interpret the hypermedia descriptions of constraints in TREE fragments as boolean expressions $E$ ($?t>= \text{2022-01-03T09:47:59.000000}$ in Figure~\ref{lst:system}).
24
+
We therefore interpret the hypermedia descriptions of constraints in TREE fragments as boolean expressions $E$ ($?t>= \text{2022-01-03T09:47:59.000000}$ in Figure~\ref{lst:system}).
25
25
Upon discovery of a document, the query engine gathers the relevant triples to form the boolean expression of the constraint on the data of reachable fragments.
26
26
After the parsing of the expression, the filter expression $F$ of the SPARQL query is \textit{pushed down} into the engine's source selection component.
27
27
The source selection component can be formalized as a reachability criterion~\sepfootnote{sf:reachabilityCriterion}
hold true given $x$ is the variable targeted by $E_i$ and $i$ is the link towards the next fragment (\texttt{ex:nextNode} from \texttt{ex:node tree:node ex:nextNode} in Figure~\ref{lst:system}).
38
38
A variable targetted by $E$ is defined by an RDF object where the predicate as a value \texttt{?target} from the triple
39
-
defining the fragmentation path in the form \texttt{?s tree:path ?target} (\texttt{etsi:hasTimestamp} in Figure~\ref{lst:system}).
39
+
defining the fragmentation path in the form \texttt{?s tree:path ?target} (\texttt{saref:hasTimestamp} in Figure~\ref{lst:system}).
40
40
Upon satisfaction the IRI targeting the next fragment is added to the link queue otherwise the IRI is pruned.
41
41
The process is schematized in Figure~\ref{fig:process}.
With Q3 we see that the percentage of reduction is 33\%, this lowering of performance gain might be caused by the increase by a factor of 6 in HTTP requests.
93
93
This raises an interesting observation because we do not observe a reduction in execution time with a reduction in HTTP requests.
94
94
Previous research has proposed that inefficient query plans might be the bottleneck of some queries in structured environments~\cite{taelman2023,eschauzier_quweda_2023}.
95
-
However, our results seem to show that the size of the internal data source might have a bigger impact on performance than noted in previous studies.
96
-
This observation might have significant consequences because large-scale web querying might result in the acquisition of a large number of triples.
97
-
The query Q4 was not able to be answered, with any setup, because the query requires a larger number of fragments than the other to be processed.
95
+
However, our results seem to show that the size of the internal quad store might have a bigger impact on performance than noted in previous studies.
96
+
As large-scale guided link traversal over the web will result in the acquisition of a large number of triples, a future interesting research direction would be to find ways to also remove quads that are certain to not lead to a query result anymore from the internal quad store.
97
+
The query Q4 was not able to be answered, with any setup, because the query requires a larger number of fragments than the other to be processed.
Copy file name to clipboardExpand all lines: section/introduction.tex
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -14,8 +14,8 @@ \section{Introduction}
14
14
For example, in the case of periodic measurements of sensor data, a fragmentation can be made on the publication date of each data entity.
15
15
A fragment can be considered an RDF document published in a server.
16
16
TREE aims to describes dataset fragmentation in ways that enable clients to easily fetch query-relevant subsets.
17
-
The data inside a fragment are bounded with constraints expressed using hypermedia descriptions~\cite{thomasFieldingPhdThesis}.
18
-
More precisely, each fragment describes the constraints of the data of every reachable fragment.
17
+
The data within a fragment are bound by constraints expressed through hypermedia descriptions~\cite{thomasFieldingPhdThesis}.
18
+
Each fragment contains relations to other pages, and those relations contain the constraints of the data of every reachable fragment.
19
19
In this paper, we refer to those constraints as domain-specific expressions.
20
20
They can be expressions such as $?t > \text{2022-01-09T00:00:00.000000} \implies\text{ex:afterFirstSeptember}$
21
21
given that $?t$ is the date of publication of sensor data and the implication pertains to the location of the data respecting the constraint.
@@ -36,7 +36,7 @@ \section{Introduction}
36
36
to define a mechanism of traversal centered around rules.
37
37
38
38
In this paper, we propose to use a boolean solver as the main link pruning mechanism for a reachability criterion to traverse TREE documents.
39
-
The logical operators are defined by the \href{https://treecg.github.io/specification/}{TREE specification}.~\sepfootnote{sf:treeSpec}
39
+
The logical operators are defined by the \href{https://w3id.org/tree/specification/}{TREE specification}.~\sepfootnote{sf:treeSpec}
40
40
As a concrete use case, we consider the publication of (historical) sensor data.
41
41
An example query is presented in Figure~\ref{lst:system} along with the triples representing the link between two documents expressed using the TREE specification.
42
42
@@ -55,4 +55,4 @@ \section{Introduction}
55
55
The constraint describes publication times ($?t$) where $?t>= \text{2022-01-03T09:47:59.000000}$.}
0 commit comments