BGP Statement Pattern Counts

by James

A recent developer mailing list inquiry caught our attention a few days ago. The author was wondering, “[can] anyone … give me a recommendation for ‘how big a sane query can get’?” His use case was to apply a SPARQL query against the LOD dataset in order to test for the presence of “patterns for human associations”. There was mention of subgraph sizes in excess of 400K triples, but as it were, his efforts appeared thwarted by much lower limits which his query processor set for the number of statement patterns in a BGP.

The case made me wonder where our limits lie and what accounts for them. To this end, I used a generator for simple count queries:

* (defun generate-bgp (count)
  `(spocq.a:|select| (spocq.a:|bgp| ,@(loop for i from 0 below count
                                        collect '(spocq.a:|triple| ?::|s| ?::|p| ?::|o|)))
     ((?::|count| (spocq:|count| SPOCQ.S:*)))))

GENERATE-BGP

which generates trivial queries, such as:

* (generate-bgp 4)

(spocq:|select|
 (spocq:|bgp|
   (spocq:|triple| ?::|s| ?::|p| ?::|o|)
   (spocq:|triple| ?::|s| ?::|p| ?::|o|)
   (spocq:|triple| ?::|s| ?::|p| ?::|o|)
  (spocq:|triple| ?::|s| ?::|p| ?::|o|))
 ((?::|count| (spocq:|count| SPOCQ.S:*))))
* (defun run-test-bgp (count)
    (run-sparql (generate-bgp count) :repository-id "james/bgp-tests"))

RUN-TEST-BGP

with a single-statement repository, just to exercise the query invocation, without any concern for solution field size,

* (run-test-bgp 4)

((1))
(?::|count|)
#<QUERY [E7DA6E00-E215-11E4-90B0-000000000000/NIL, select@COMPLETE, james/bgp-tests] {100AFD9199}>

In order to gauge the resource use in terms of execution time, it sufficed to simply profile the significant operators:

* (sb-profile:profile rdfcache:match rdfcache:count spocq-compile run-agp-thread)

* (loop for count = 2 then (* count 2) until (> count (* 32 1024))
    do (let ((start (get-internal-run-time)))
         (sb-profile:reset)
         (run-test-bgp count)
         (format *trace-output* "~%~%~a ~ams" count (- (get-internal-run-time) start))
         (sb-profile:report)
         (format *trace-output* "~%")))

The result is a transcript of the elapsed times, function call counts and cumulative run-time for the respective operators, in this case, on a six-core CPU. For example:

2 101ms
  seconds  |     gc     |   consed   | calls |  sec/call  |  name  
--------------------------------------------------------
     0.091 |      0.000 | 11,168,640 |     3 |   0.030333 | SPOCQ-COMPILE
     0.001 |      0.000 |     32,576 |     1 |   0.001000 | RDFCACHE:MATCH
     0.000 |      0.000 |          0 |     1 |   0.000000 | RDFCACHE:COUNT
     0.000 |      0.000 |          0 |     1 |   0.000000 | RUN-AGP-THREAD
--------------------------------------------------------
     0.092 |      0.000 | 11,201,216 |     6 |            | Total

For this query form, that is with a single BGP, which is compiled into a single, monolithic pattern matching function, the results are as follows:

pattern count milliseconds compile count match
2 101 91 1 0
4 37 23 1 1
8 125 108| 1 1
16 147 137 5 0
32 228 212 2 2
64 447 420 7 1
128 1043 1018 5 2
256 3726 3678 15 1
512 27592 27496 17 4
1024 153774 153615 37 5
1152 216749 216714 26 7

above 1K patterns, however, the run-time reaches a compilation limit. The compiler exhausts its dynamic binding stack either when precompiling the BGP, or – if the query processor is set to interpret rather than compile queries, when interpreting the pattern’s source form.

As an alternative, one could limit each BGP to a single statement.

* (defun generate-singleton-bgp (count)
    (let ((body (loop with bgp = '(spocq.a:|bgp| (spocq.a:|triple| ?::|s| ?::|p| ?::|o|))
                  for form = bgp then `(spocq.a:|join| ,bgp ,form)
                  for i from 0 below count
                  finally (return form))))
      `(spocq.a:|select| ,body ((?::|count| (spocq:|count| SPOCQ.S:*))))))

GENERATE-SINGLETON-BGP.

an attempt with queries of this form progressed up to 16K:

pattern countmilliseconds
1 42
2 36
4 49
8 60
16 93
32 136
64 286
128 540
256 1237
512 2191
1024 5670
2048 12538
4096 64937
8192 138365
16384 567760

but then also ran into limits. This time, it was system limits on connections to the store which for the 16K single-pattern BGPs amounted to that many connections, all active in parallel, with twice that many active threads – one for each BGP and one for each consequential JOIN operator. Even this limit required some reconfiguration once to raise the page map limit to permit that many simultaneous store connections and once to raise the compiler’s stack space to permit it to compile a query with that degree of nested joins.

In any event, given these two limits – 16K parallel BGPs and 1K statements per BGP, it seemed possible to achieve rather large statement counts. As indeed it was. With a configuration option to limit the BGP statement pattern count to 256, the effective limits were quite high:

count milliseconds compile count match (net gc)
2 39 30 1 0
4 109 97 0 0
8 114 108 1 1
16 144 136 1 1
32 231 218 8 0
64 432 412 6 0
128 41 32 20 0
256 47 32 23 1
512 59 41 45 0
1024 99 61 133 2
2048 155 70 419 2
4096 313 95 1636 2
8192 672 405 6645 73
16384 1965 216 43239 524
32768 8125 369 366529 25771
65536 7217 709 2077912 49093
131072 20522 2066 23031994 62938
262144 62569 4128 98516780 665346
524288 291660 13022 1380142800 2563251

The degenerate case of a single BGP with thousands of statement patterns turns out to not be all that interesting. On one hand, if it is implemented with value propagation, only one thread is involved and, depending on the dataset statistics, it would execute likely half the statements, but one match at a time. For larger sizes, in any event, it fails early due to stack limits in the compiler.

The path to test large pattern counts is to force the partitioning into joined subexpressions, with which, even within the bounds for

  • stack space limits in the query processor run-time which constrain operator depth,
  • memory mapping limits which constrain the number of simultaneous connection to the store,
  • stack space limits within the compiler which limit both BGP statement count and operator depth,
  • file descriptor limits which constrain connections to the store, and
  • eventual thread count limits which constrain total operator and BGP count,

it is possible to evaluate queries with 512K statements. The progression of elapsed times do indicate, however, that significant contention occurs as the count increases, which suggests the results would improve if the processor were to limit the number of BGPs which it executed in parallel.

blog comments powered by Disqus