Repository Revision History

by Arto

We have now enabled repository revisioning for all Dydra evaluation users on the dydra.com cloud service. Customers with dedicated and on-site servers will receive this feature in their next scheduled system upgrade.

Repository revisioning means that your Dydra repositories maintain revision history that permits read-only access to snapshots of the previous contents of the repository. Every transaction committed (via a SPARQL request or a file import job) on a repository will create a new revision snapshot, enabling you to download and query both the current and previous state of the repository. Further, you can also download the difference between successive revisions to obtain the sets of statements removed and added in the respective transaction.

schema-org-test/schema-org history

As of right now, repository revisioning is supported both in the user interface (a work in progress) as well as more importantly on the level of the various application programming interfaces (APIs) provided by the service.

In the user interface, repository owners will now see the current repository revision identifier prominently displayed in the sidebar under the overall repository statistics. As depicted here, Dydra repository revisions are identified by universally unique identifiers (UUIDs).

This revision UUID is updated on each committed transaction, and is also output as the response entity tag (the HTTP ETag header) in protocol responses. This affords Dydra applications access to unprecedentedly precise and reliable cache identifiers, the effect of which was already praised in a recent survey of the state of the art.

On the API level, repositories' SPARQL Protocol and SPARQL Graph Store HTTP Protocol endpoints support a new ?revision=UUID URL parameter that can be used to address a specific previous revision of a repository by specifying its UUID string in this parameter. We are working on updating the API documentation to reflect all this.

Note that in the user interface, the repository’s current revision identifier serves as a link to the repository’s revision history page. For example, the revision history for the public schema-org-test/schema-org demo repository hosted at Dydra looks like this:

schema-org-test/schema-org history

This repository was created for demonstration purposes from the versioned Schema.org data dumps available at LOV by additively loading each data dump in a batch script. Each successive transaction loaded a subsequent published version of the ontology; existing statements remained unmodified and new statements were added to the repository, visible from that revision onwards.

As mentioned and depicted above, each repository revision is presently publicly identified only by a UUID (which serves as a link to download a snapshot of the repository contents at that revision in N-Quads format). We are working on permitting users to store custom provenance metadata that will enable tagging specific revisions with arbitrary string labels. Our previous blog post on variable SPARQL service locations includes an example of such provenance records.

Similarly, users will soon be able to provide a validity timestamp for each revision, helping us to present e.g. ontology history more intelligently than just by the transaction time shown currently. We will elaborate more on this bitemporal aspect of Dydra’s revision history in an upcoming blog post.

Note that on the revision history screen, the revision statistics to the right-hand side of the UUID indicate not only how many statements were removed and added in that revision, but indeed form a clickable link for downloading a patch/diff file of that revision in N-Quads format.

By downloading the diff for schema.org version 1.8 (revision 879ccfcc-52c2-8049-8d43-23772758b8c4 in this repository) we can see that this revision of the ontology added 11 new statements:

--- schema-org-test/schema-org.nq   (revision 133ac8b1-83b7-ef43-a8ea-c948511402ac)
+++ schema-org-test/schema-org.nq   (revision 879ccfcc-52c2-8049-8d43-23772758b8c4)
+<http://schema.org/encodesCreativeWork> <http://www.w3.org/2000/01/rdf-schema#comment> "The CreativeWork encoded by this media object." .
+<http://schema.org/isPartOf> <http://www.w3.org/2000/01/rdf-schema#comment> "Indicates a CreativeWork that this CreativeWork is (in some sense) part of." .
+<http://schema.org/isPartOf> <http://schema.org/domainIncludes> <http://schema.org/CreativeWork> .
+<http://schema.org/isPartOf> <http://schema.org/rangeIncludes> <http://schema.org/CreativeWork> .
+<http://schema.org/encodings> <http://www.w3.org/2000/01/rdf-schema#comment> "A media object that encodes this CreativeWork (legacy spelling; see singular form, encoding)." .
+<http://schema.org/associatedMedia> <http://www.w3.org/2000/01/rdf-schema#comment> "A media object that encodes this CreativeWork. This property is a synonym for encoding." .
+<http://schema.org/encoding> <http://www.w3.org/2000/01/rdf-schema#comment> "A media object that encodes this CreativeWork. This property is a synonym for associatedMedia." .
+<http://schema.org/WebSite> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2000/01/rdf-schema#Class> .
+<http://schema.org/WebSite> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://schema.org/CreativeWork> .
+<http://schema.org/WebSite> <http://www.w3.org/2000/01/rdf-schema#comment> "A WebSite is a set of related web pages and other items typically served from a single web domain and accessible via URLs." .
+<http://schema.org/WebSite> <http://www.w3.org/2000/01/rdf-schema#label> "WebSite" .

Speaking for our whole team, we are excited to at long last publicly roll out this major feature. While we haven’t previously exposed revisioning to users, Dydra was in fact from the get-go designed and built on a transactional append-only, MVCC storage substrate implementing first-class repository revisions. Indeed, the inception of the Dydra concept itself originated directly from our previous research into bitemporal databases and our appreciation for content-addressable storage as implemented in particular in Plan 9 and the Git revision control system.

We should note that we haven’t as yet settled on an explicit repository revision retention policy for evaluation accounts. That is, while we obviously safeguard the current snaphots of all repositories hosted at dydra.com, we do intend to occasionally garbage-collect superfluous revision history that nobody is making use of. However, in case you do need to retain permanent history for your repository in order to e.g. host an ontology with Dydra, we would be happy to accommodate such a request.

If you have thoughts or requests in this regard, we’d welcome any feedback you care to provide. We can be best reached at info@dydra.com and @dydradata.

blog comments powered by Disqus