Variable SPARQL Service Locations

by Mr.Anderson

The SPARQL grammar—in particular the ServiceGraphPattern production— indicates that it should be possible to specify either an IRI or a variable as a service location.[1] The W3C’s “SPARQL 1.1 Federated Query” recommendation[2] does not, however, define an evaluation semantics for a service variable. It indicates only, that “the service call for any solution depends on that variable’s binding in that solution”.

While this suggestion seems sensible, it neglects to consider that without further specification, the only semantics available to define the scope of the variable and, as a consequence, “that solution”, is the standard bottom-up reduction which applies to SPARQL evaluation, which leaves the variable at the point of reference at best unbound—and likely statically undefined.

Dydra had chosen to avoid this deficiency of the specification by proscribing variable service locations. As part of our work to make historical repository revisions available as first-class datasets and to integrate access to revision meta-data stored in provenance records, we recognize that there would be great value to be found in computing revision identifiers for use as internal federation locations.

In General

Given this uncertain situation, consider the following example:

 1 PREFIX foaf: <>
 2 SELECT ?name
 3 WHERE  {
 4    FILTER ( ?S != "lee" )
 5    ?P foaf:givenName ?G ;
 6       foaf:surname ?S .
 7    BIND(CONCAT('', ?G, '/', ?S) AS ?location)
 8    SERVICE ?location {
 9     ?P foaf:mbox ?mbox ;
10        foaf:topic_interest ?interest .
11    }
12 }

Given that query, the validator[3] follows the SPARQL recommendation and indicates the following algebra structure:

 1 (base <http://example/base/>
 2   (prefix ((foaf: <>))
 3     (project (?name)
 4       (filter (!= ?S "lee")
 5         (join
 6           (extend ((?location (concat "" ?G "/" ?S)))
 7             (bgp
 8               (triple ?P foaf:givenName ?G)
 9               (triple ?P foaf:surname ?S)
10             ))
11           (service ?location
12             (bgp
13               (triple ?P foaf:mbox ?mbox)
14               (triple ?P foaf:topic_interest ?interest)
15             )))))))

Based in this algebra, what is the scope of the ?location variable? What is the scope of the ?P variable? Of ?S?

If, as described by the SPARQL recommendation[4], by analogy to a sub-select, the scope of ?location on line 11 is the lexical contour of lines 11—14, the variable would be—at the point of reference—not only unbound, but statically undefined. In order to follow the suggestion in the SPARQL federation recommendation[3], the scope of ?location must instead be in some way analogous to that of ?S in the BIND and FILTER clauses, above. One way to accomplish that is to apply the same rewriting operation as employed for those two clauses and supply the respective solution field to an extended SERVICE operator as an additional argument. For example:

(base <http://example/base/>
  (prefix ((foaf: <>))
    (project (?name)
      (filter (!= ?S "lee")
        (service ?location
            (triple ?P foaf:mbox ?mbox)
            (triple ?P foaf:topic_interest ?interest)
          (extend ((?location (concat "" ?G "/" ?S)))
              (triple ?P foaf:givenName ?G)
              (triple ?P foaf:surname ?S)))))))))

That is, when the location is a variable, the SERVICE operator should accept an optional third argument. Its value must be a solution field, to act as the source for variable bindings. For each solution from that field, the respective location binding serves as IRI and the result solution field is joined with the respective input solution to produce the result solution field from the SERVICE request. Where no second clause is present, the effect is to replace it with a unit table and any binding must be supplied as a request URL query argument.

This treatment—to define the algebra for SERVICE by analogy to BIND and a sub-select, and to follow the standard semantics for variable scope and binding, but to distinguish between the two—allows a straightforward implementation, for which the notion of “boundedness”[5] suffices, and obviates the need for additional notions— such as “safety” or “strong boundedness”—and avoids satisfiability issues.

If the service clause is within the scope of its service variable, then the variable is defined. That is to be determined by static analysis of the expression. If the variable is undefined, the service clause is statically invalid. The “safety” notion reduces to the determination, at runtime, as to whether the variable is bound in a solution. If it is bound, the service operation executes; otherwise, it does not. For this one does not need to determine whether the expression is, in general, satisfiable, but just whether the particular dataset satisfied it, which was accomplished by evaluating the query.

An Example

As a simple demonstration of how one might “mash up” related datasets, consider a common question that one might pose in the course of ontology curation: at which revision was a term deprecated?

If one has available provenance records for the dataset, a simple query, such as:

SELECT ?title ?uuid
  ?revision a <urn:dydra:Revision> .
  { GRAPH ?revision {
    ?revision dc:title ?title .
    ?revision doap:revision ?uuid

can enumerate the revisions:

title uuid
0.91 b3f29119-f775-eb4b-bf32-624047b9e5f8
0.95 06b8189e-837e-d544-96fc-ed3e6150cb1b
0.97 22a3a384-9b0d-094a-9abd-5ee011753a56
0.99 9771658b-e649-644b-9d86-9137a28b5f10
1.0a 9586b1e2-7349-984f-8d9c-d8d64b5ffe5c
1.0b c3fff81d-26b6-854e-bd8b-9df14df4fd43
1.0c 72077fba-92bb-ee49-8994-851b366af215
1.0d 0e449ba8-a385-6042-91da-acdcb73bcf19
1.0e d4cd4f1e-7bf3-6f4c-bd65-7d8e6624e280
1.0f 51b99e45-6c66-9f4e-990d-d1d820d1068d
1.1 1599b409-6c14-a743-a2e5-49586ed51b22
1.2 0492e706-ac17-3048-b5b1-7a0642dd4985
1.4 80b21344-a30a-0749-8a81-d45d36bec341
1.5 3ff20ec5-dffb-5d43-8e05-d58586ca4e9a
1.6 8de0056e-e6a3-ca42-8f46-79237caf8e2c
1.7 c930063b-2a19-ba46-94c5-b82a3115636e
1.8 e520ddb7-3064-b747-b456-384f1351e867
1.9 aa8692dd-91a0-944e-ac5d-a9988c408062
1.91 8662c279-f9a6-7f48-89d6-3522b260e49f
1.92 3d8971ca-5a8f-8747-95d2-1b9019163aad
1.93 13bb6cfc-63f8-e144-a791-4d12e607788e
2.0 0498477c-d3d7-b244-a6fb-1935a91afe8f
2.1 444216bf-88d1-0a49-8dc6-e62f92d10d2e

Given which, a federated query can combine the provenance information with the dataset revisions, to locate the changes:

PREFIX sd: <>
SELECT ?versionId ?title ?term ?comment
  { SERVICE <http://localhost/schema-org-test/provenance> {
    SELECT ?location ?versionId ?endpoint ?title
      WHERE {
        ?revision a <urn:dydra:Revision> .
        { GRAPH ?revision {
            ?revision sd:endpoint ?endpoint .
            ?revision dc:title ?title .
            ?revision doap:revision ?versionId .
  SERVICE ?endpoint {
    ?term rdfs:comment ?comment .
    FILTER (regex(?comment, '.*deprecat.*'))


versionId title term comment
0498477c-d3d7-b244-a6fb-1935a91afe8f 2.0 <> This property is deprecated, alongside the UserInteraction types on which it depended.
444216bf-88d1-0a49-8dc6-e62f92d10d2e 2.1 <> This property is deprecated, alongside the UserInteraction types on which it depended.

[5]"Federating queries in SPARQL 1.1: Syntax, semantics and evaluation", C. Buil-Aranda,
blog comments powered by Disqus