SP2B Benchmark Results

by Arto

SP2Bench (hereafter referred to as SP2B) is a challenging and comprehensive SPARQL performance benchmark suite designed to cover the most important SPARQL constructs and operator constellations using a broad range of RDF data access patterns. While there exist other benchmark suites for RDF data access—BSBM and LUBM are two other well-known benchmarks—we’ve found SP2B to overall be the most helpful metric by which to track and evaluate performance gains in Dydra’s query processing. Not coincidentally, SP2B’s main author also wrote one of the definitive works on SPARQL query optimization—exactly the kind of light reading we like to enjoy in our copious free time…

Results

We obtained the following results (note the log scale) on a standalone deployment of Dydra’s proprietary query engine, affectionately known as SPOCQ, to a machine of roughly comparable hardware specifications as those used for previously-published SP2B results:

SP2B results overview

We benchmarked four input dataset sizes ranging from 10,000 to 1,000,000 triples (indicated as 10K, 50K, 250K, and 1M). Following the methodology of the comprehensive SP2B analysis published by Revelytix last year, we executed each query up to 25 times for a given dataset size, and averaged the response times for each query/dataset combination after discarding possible outliers (the several lowest and highest values) from the sample. The query timeout was defined as 1,800 seconds.

The hardware used in these benchmarks was a dedicated Linux server with a dual-core AMD Athlon 64 X2 6000+ processor, 8 GB of DDR2 PC667 RAM memory, and 750 GB of SATA disk storage in a software-based RAID-1 configuration. This was a relatively underpowered server by present-day standards, given that e.g. my MacBook Pro laptop outperforms it on most tasks (including these benchmarks); but, it does have the benefit of being roughly comparable to both the Amazon EC2 instance size used in the Revelytix analysis as well as to the hardware used in the original SP2B papers.

Comments

Several of the SP2B queries—in particular Q4, Q5a, Q6, and Q7—are tough on any SPARQL engine, and in published results it has been typical to see many implementations fail some of these already at dataset sizes of 50,000 to 250,000 triples. So far as we know, SPOCQ is the only native SPARQL implementation that correctly completes all SP2B queries on the 250,000 triple dataset within the specified timeout (1,800 seconds) and without returning bad data or experiencing other failures. Likewise, we correctly complete everything but Q5a (on which see a comment further below) on the 1,000,000 triple dataset as well.

Q1, Q3b, Q3c, Q10, and Q12

As depicted above, SPOCQ’s execution time was constant-time on a number of queries—specifically Q1, Q3b, Q3c, Q10, and Q12c—regardless of dataset size. The execution times for these queries all measured in the 20-40 millisecond range, depending on the exact query. Of the SP2B queries, these are the most similar to the types of day-to-day queries we actually observe being executed on the Dydra platform, and showcase the very efficient indexing we do in SPOCQ’s storage substrate.

A Detailed Look at Q7

The SP2B query Q7 is designed to test a SPARQL engine’s ability to handle nested closed-world negation (CWN). Previously-published benchmark results indicate that along with Q4, this query has proved the most difficult for the majority of SPARQL implementations, with very few managing to complete it on datasets larger than 10,000 to 50,000 triples. We’re happy to report that our SPARQL implementation is among those select few:

SP2B Q7 comparison

The above chart combines our results with Revelytix’s comprehensive SP2B benchmark results. The depicted 1,800+ second bars here indicate either a timeout or a failure to return correct results (see pp. 38-39 of the Revelytix analysis for more details).

None of the implementations Revelytix benchmarked were able to complete SP2B Q7 on 1,000,000 triples within a one-hour timeout. SPOCQ completes the task in 80 seconds.

While we benchmarked on more or less comparable hardware and with comparable methodology, we do not claim that the comparison in the preceding chart is valid as such; take it with a grain of salt. It is indicative, however, of the amount of work we have put, and are putting, into preparing Dydra’s query engine for the demands we expect it to face as we exit our beta stage.

A Detailed Look at Q4

The SP2B query Q4 deals with long graph chains and produces a very large solution sequence quadratic to the input dataset size. It is probably the most difficult of the SP2B queries, with few SPARQL implementations managing to finish it on input datasets larger than 50,000 triples.

SP2B Q4 comparison

As with Q7, this chart draws on data from the aforementioned Revelytix analysis (see pp. 32-33 of their report for details), and the same caveats certainly apply to this comparison. Nonetheless, of the implementations Revelytix benchmarked, only Oracle completed SP2B Q4 on 1,000,000 triples within a one-hour timeout. They reported a time of 522 seconds for Oracle. SPOCQ completes the task in 134 seconds.

A Special Note on Q5a

No existing SPARQL implementation does well on Q5a for larger datasets. We believe this is due to an oversight in the SP2B specification, where Q5a is defined as using a plain equality comparison in its FILTER condition, yet it is suggested that this makes for an implicit join that can be identified and optimized for. However, since joins in SPARQL are in fact defined in terms of a sameTerm comparison, such an optimization cannot be safely performed in the general case.

We have therefore also included results for an amended version of Q5a, named Q5a′:

SELECT DISTINCT ?person ?name
WHERE { ?article rdf:type bench:Article.
        ?article dc:creator ?person.
        ?inproc rdf:type bench:Inproceedings.
        ?inproc dc:creator ?person2.
        ?person foaf:name ?name.
        ?person2 foaf:name ?name2
        FILTER(sameTerm(?name,?name2)) }

Q5a′ simply substitutes sameTerm(?name, ?name2) in place of the ?name = ?name2 comparison, allowing the join to be optimized for. Q5a′ runs in comparable time to Q5b, as intended by the authors of SP2B. We suggest that others benchmarking SP2B note execution times for Q5a′ as well.

Caveats

While our production cluster has considerably more aggregate horsepower than the benchmark machine used for the above, it wouldn’t make for a very meaningful comparison given that all previously-published SP2B results have been single-machine deployments. So, the figures given here should be considered first and foremost merely a baseline capability demonstration of the technology that the Dydra platform is built on.

Further, during our ongoing beta period we are enforcing a maximum query execution time of 30 seconds, which of course would tend to preclude executing long-running analytic queries of the SP2B kind. If you have special evaluation needs you’d like to discuss with us, please contact Ben Lavender (ben@dydra.com).

Credits

In closing, we would like to express our thanks to the Freiburg University Database Group, the authors of the SP2B performance benchmark suite. SP2B has provided us with an invaluable yardstick by which to mark our weekly improvements to Dydra’s query processing throughput. Anyone developing a SPARQL engine today without exposing it to the non-trivial and tough queries of SP2B are doing themselves a serious disservice—as attested to by the difficulty most SPARQL implementations have with the more strenuous SP2B queries. SP2B is truly the gold standard of SPARQL benchmarks, ignored at one’s own peril.

blog comments powered by Disqus