Weblog

Call For Participation: BSBM 3.1 Benchmarks on the LISA Cluster

The LISA Cluster from SARA will be used for large-scale experiments on the new BSBM v3.1

BSBM V3.1 is being run on a large cluster — contact me if you want your system tested!

Enhancing significantly the performance of RDF systems is a goal of the LOD2 project, and one of the ways to foster this is through benchmarking activities.
In the domain of relational databases, the creation of consensus around  benchmarks and benchmarking standard practices (such as independent audits), happened through the Transaction Processing Council and benchmarks such as the TPC-C for transaction processing and TPC-H for business intelligence (BI), rapidly improved the state of the art by orders of magnitude in performance/$ in the past two decades (my personal background is also in the  relational database space, and recently I went through all the motions of a professionally audited TPC benchmark submission with my BI database spin-off VectorWise and its recent fastest-per-core TPC-H results).
We think that opportunities for benchmark-driven improvement exist in the RDF space as well. Performance improvements will help make RDF technology more industry grade, and hence better accepted in IT as a whole. Similarly, having well-defined benchmarks with consensus on benchmark practices and results that are transparent and comparable, will make it easier for IT organizations to choose RDF-based over other (e.g. relational) technology.
For this reason, LOD2EU-funded (FP7) research project aiming to take the Web of Linked Data to the next level. Main research challenges: improve coherence and quality of data published on the Web, close the performance gap between relational and RDF data management, establish trust on the Linked Data Web and ... benchmarking activities encompass:
  • building a community consisting of (commercial) RDF store vendors.
  • involve the vendors and the academic RDF community to agree on relevant benchmarks and benchmarking practices.
  • enhance/create  new and better benchmarks to measure the capabilities of RDF/SPARQL systems in relevant use cases.
Earlier this year, using feedback from various LOD2 partners (e.g. Orri Erling’s blog entries 1677, 1682, 1683, and 1684), the group of Chris Bizer at FU BerlinFreie Universität Berlin is one of the leading research universities in Germany and distinguishes itself through its modern and international character. Freie Universität Berlin is the largest of the three universities in Berlin. Research at the university is focused on humanities and social ... launched version V3.0 of the well-known BerlinBerlin is the capital city of Germany and is one of the 16 states of Germany. With a population of 3.45 million people, Berlin is Germany's largest city. It is the second most populous city proper and the seventh most populous urban area in the European Union. Located in northeastern Germany, it ... SPARQL Benchmark (BSBM 3.0) and also published results obtained on a 4-core workstation for the 100M and 200M triple dataset sizes. The tested systems were 4store, BigData, BigOwlim, Jena TDB, and Virtuoso.
One new element in BSBM V3.0 already was the  Business Intelligence (BI) use case but when the original results were obtained in February 2011, its use of grouping and aggregation
which are new SPARQL 1.1 features was very recent and it was decided to postpone benchmarking these for later.
This “later” is now. This coincides with a new version 3.1 of the BSBM benchmark from the FU Berlin folks, which only modifies the BI use case, in the following aspects:
  • it fixes a number of bugs in the earlier specification, and also contains SQL translation for all relational versions of the benchmark
  • it query driver creates a Drill-down access pattern in in all queries involving Product Type. In BSBM, a selection on the Product Type hierarchy can either be very coarse or very selective, depending on the level used. This caused a large variability in results for such queries, as the running time critically depends on this selectivity. The driver now poses multiple such queries in a sequence, where the query is repeated starting with a coarse selection and subsequently selecting an a Product sub-Type of the previous selection, making the result smaller  (drill-down).
  • it adds a new Power BI Performance Metric, similar to as used in TPC-H. The Power metric is performance score that characterizes the performance of the system when running queries in isolation. The metric combines all types of queries in such a way that short- or long-running queries count as equally important. This makes the benchmark score more balanced such that outlier results do not dominate the score.
  • it adds a new Throughput BI Performance Metric, that measures the average amount of queries performed per time unit that the system can sustain. This may involve a very large number of concurrent queries.
More BSBM developments will likely follow, a text retrieval BSBM Use Case is in the works from LOD2 partners which shortly will see the light as well.
We at CWI are currently gearing up for a new BSBM benchmarking experiment that will use this very latest version of BSBM 3.1 to test the Explore, Update and BI uses cases and will also use datasets that go beyond the earlier tested 100M and 200M sizes.
To increase the scale of the experiments, we decided to work with larger hardware, provided by SARA , the  National Supercomputing Facility of the Netherlands. SARA hosts the LISA cluster, consisting of 4880 cores and 12TB of memory. Individual cluster nodes are dual-socket Xeon servers with  24GB of RAM, 8 cores of 2.26GHz, interconnected with Infiniband.
CWINational research center for mathematics and computer science in the Netherlands. (http://www.cwi.nl/en/aboutCWI) is currently  testing a number of RDF systems on this supercomputer, in cooperation with the vendors. If your system is not among them, please contact me ASAP.
Shortly, we will be back with the results, and much more exciting news around the nascent RDF benchmark community that is arising around these activities. These activities do not focus at all alone on BSBM but also on other existing and new benchmarks, such as DBPSB, SIB and RDF-H and are aimed to outlive the LOD2 project and improve the  performance and scalability of RDF technologies for good, to become thoroughly industry grade.

Posted in Announcement, Events, WP2 – Storing and Querying | Tagged , , , , ,

2 Responses to Call For Participation: BSBM 3.1 Benchmarks on the LISA Cluster

  1. Victor Chernov says:

    Dear colleagues,

    Our company develops High performance Database solutions.
    Our NitrosBase In-Memory DB product is fastest Relational DBMS in the
    world.

    Now our company is developing *NitrosBase RDF Storage*. It is going to be very fast. The preliminary results show unprecedented performance!

    We plan pass Berlin Test after we release 1st version of our RDF
    Storage. That’s going to be happen pretty soon.

    How could we participate in BSBM 3.1 benchmarks on the Lisa cluster?

    Victor Chernov

    • Peter Boncz says:

      Hi Victor,

      You would need to provide us with sources of your system, or linux binary, plus some documentation how to set up and use the system.

      Please contact me via email for further details.

      Peter

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>


join our monthly webinars

Public Mailinglist & Newsletter

Please subscribe me to the LOD2 mailinglist.
my email address
my name (optional)
goto archive

Follow


Follow lod2project on Twitter

RSS Twitter

  • linkeddata
    @mmmmmrob ~ WolframAlpha questions and answers transformed to #LinkedData: http://t.co/z8vZxV9C #SemanticWeb […]
  • linkeddata
    Playing with WolframAlpha questions and answers transformed to #LinkedData: http://t.co/EuJzYGJT . #SemanticWeb #Web30 #SmartData […]
  • linkeddata
    LinkedData Now is out! http://t.co/eTawvJDW ▸ Top stories today via @kerfors @pinkonomy @richardofsussex @giurca @sandsfish […]