CQAPri Benchmark


Benchmark description

We refer to Section 3.3.1 of this thesis for the precise description of the benchmark, in particular for the statistics on the databases (Table 3.1). The benchmark consists of:
  • An ontology: We extended the modified LUBM benchmark from Lutz et al., which provides the DL-LiteR version of the original LUBM ELI TBox. We added negative inclusions to state the disjointness of pairs of concepts or roles having the same closest super-concept. We excluded a small number of such inclusions when they did not seem to reflect the intended meaning of the concepts / roles.
  • 20 queries
  • 30 databases inconsistent w.r.t. the ontology:
    • naming: uXcY, with X related to the database size (75K to 10M facts) and Y to the quantity of conflicts added (from 3% to 46% of facts involved in some conflict)
    • uXcY ⊂ uXcY' when Y < Y' and uXcY ⊂ uX'cY when X < X'
    • several formats available: one table per class/property or one big table of RDF triples, and strings encoded as integers with a dictionary table or no encoding

Download

  • Ontology file
  • Queries folder
  • Databases (PostgreSQL dump files):
    • u1cY: 6 databases (about 75K-78K facts)
    • u5cY: 6 databases (about 463K-481K facts)
    • u20cY: 6 databases (about 2M facts)
    • u50cY: 6 databases (about 5M facts)
    • u100cY: 6 databases (about 10M facts)

    To load the database from the file dump_file.sql in database db_name, run the command: psql -U username db_name < dump_file.sql

    Database schema: each database uXcY contains the dataset in two forms:

    • In one big table named "triples", in the form of (s, p, o) RDF triples:

      (http://www.Department11.University0.edu/GraduateStudent88, http://swat.cse.lehigh.edu/onto/univ-bench.owl#takesCourse, http://www.Department11.University0.edu/GraduateCourse6) is in table triples

      (http://www.Department3.University0.edu/UndergraduateStudent109, http://www.w3.org/1999/02/22-rdf-syntax-ns#type, http://swat.cse.lehigh.edu/onto/univ-bench.owl#Subj11Student) is in table triples

    • Represented with one table per predicate (OWL class/property): the table name corresponds to the class or property name, tables that represent a class (concept) have a single column (s), and those that represent a property (role) have two columns (s, o)

      (http://www.Department11.University0.edu/GraduateStudent88, http://www.Department11.University0.edu/GraduateCourse6) is in table takescourse

      (http://www.Department3.University0.edu/UndergraduateStudent109) is in table subj11student

  • Encoded databases (PostgreSQL dump files):

    These databases give the datasets uXcY in an alternative format: one table named "dictionary" with attributes (key, value) provides an integer encoding of all classes, properties, and individuals, and the dataset itself is given by tables of the form t_idclass with one column (s) or t_idproperty with two columns (s, o), where idclass/idproperty is the key associated with the class/property, and the individuals are encoded as integers.