Enhancing significantly the performance of RDF systems is a goal of the LOD2 project, and one of the ways to foster this is through benchmarking activities.
In the domain of relational databases, the creation of consensus around benchmarks and benchmarking standard practices (such as independent audits), happened through the Transaction Processing Council and benchmarks such as the TPC-C for transaction processing and TPC-H for business intelligence (BI), rapidly improved the state of the art by orders of magnitude in performance/$ in the past two decades (my personal background is also in the relational database space, and recently I went through all the motions of a professionally audited TPC benchmark submission with my BI database spin-off VectorWise and its recent fastest-per-core TPC-H results).
We think that opportunities for benchmark-driven improvement exist in the RDF space as well. Performance improvements will help make RDF technology more industry grade, and hence better accepted in IT as a whole. Similarly, having well-defined benchmarks with consensus on benchmark practices and results that are transparent and comparable, will make it easier for IT organizations to choose RDF-based over other (e.g. relational) technology.
For this reason, LOD2 benchmarking activities encompass:
- building a community consisting of (commercial) RDF store vendors.
- involve the vendors and the academic RDF community to agree on relevant benchmarks and benchmarking practices.
- enhance/create new and better benchmarks to measure the capabilities of RDF/SPARQL systems in relevant use cases.
Earlier this year, using feedback from various LOD2 partners (e.g. Orri Erling’s blog entries 1677, 1682, 1683, and 1684), the group of Chris Bizer at FU Berlin launched version V3.0 of the well-known Berlin SPARQL Benchmark (BSBM 3.0) and also published results obtained on a 4-core workstation for the 100M and 200M triple dataset sizes. The tested systems were 4store, BigData, BigOwlim, Jena TDB, and Virtuoso.
One new element in BSBM V3.0 already was the Business Intelligence (BI) use case but when the original results were obtained in February 2011, its use of grouping and aggregation
which are new SPARQL 1.1 features was very recent and it was decided to postpone benchmarking these for later.
which are new SPARQL 1.1 features was very recent and it was decided to postpone benchmarking these for later.
This “later” is now. This coincides with a new version 3.1 of the BSBM benchmark from the FU Berlin folks, which only modifies the BI use case, in the following aspects:
- it fixes a number of bugs in the earlier specification, and also contains SQL translation for all relational versions of the benchmark
- it query driver creates a Drill-down access pattern in in all queries involving Product Type. In BSBM, a selection on the Product Type hierarchy can either be very coarse or very selective, depending on the level used. This caused a large variability in results for such queries, as the running time critically depends on this selectivity. The driver now poses multiple such queries in a sequence, where the query is repeated starting with a coarse selection and subsequently selecting an a Product sub-Type of the previous selection, making the result smaller (drill-down).
- it adds a new Power BI Performance Metric, similar to as used in TPC-H. The Power metric is performance score that characterizes the performance of the system when running queries in isolation. The metric combines all types of queries in such a way that short- or long-running queries count as equally important. This makes the benchmark score more balanced such that outlier results do not dominate the score.
- it adds a new Throughput BI Performance Metric, that measures the average amount of queries performed per time unit that the system can sustain. This may involve a very large number of concurrent queries.
More BSBM developments will likely follow, a text retrieval BSBM Use Case is in the works from LOD2 partners which shortly will see the light as well.
We at CWI are currently gearing up for a new BSBM benchmarking experiment that will use this very latest version of BSBM 3.1 to test the Explore, Update and BI uses cases and will also use datasets that go beyond the earlier tested 100M and 200M sizes.
To increase the scale of the experiments, we decided to work with larger hardware, provided by SARA , the National Supercomputing Facility of the Netherlands. SARA hosts the LISA cluster, consisting of 4880 cores and 12TB of memory. Individual cluster nodes are dual-socket Xeon servers with 24GB of RAM, 8 cores of 2.26GHz, interconnected with Infiniband.
CWI is currently testing a number of RDF systems on this supercomputer, in cooperation with the vendors. If your system is not among them, please contact me ASAP.
Shortly, we will be back with the results, and much more exciting news around the nascent RDF benchmark community that is arising around these activities. These activities do not focus at all alone on BSBM but also on other existing and new benchmarks, such as DBPSB, SIB and RDF-H and are aimed to outlive the LOD2 project and improve the performance and scalability of RDF technologies for good, to become thoroughly industry grade.



2012/03/21 at 6:37 am
2012/03/27 at 2:08 pm