Skip to content

Version of LDBC data generator used + some more documentation please. #18

@pawanrawal

Description

@pawanrawal

We at Dgraph are trying to reproduce the benchmarks mentioned here https://event.cwi.nl/grades/2017/12-Apaci.pdf and write a blog post comparing Dgraph against the mentioned options. I have some questions. I am specifically interested in comparison against PostgreSQL, Titan and Neo4j.

  1. The schema of the data generated by the LDBC data generator(v0.2.6) seems to have changed and I get an error while importing data into Postgres. Is there something I am doing wrong here?
psql:load_csv.sql:178: ERROR:  missing data for column "c_creator"
CONTEXT:  COPY comment_f, line 2: "1236950581249|2011-09-17T06:26:59.961+0000|77.240.75.197|Chrome|yes|3"
COPY 2719160
psql:load_csv.sql:180: ERROR:  missing data for column "f_moderator"
CONTEXT:  COPY forum_f, line 2: "0|Wall of Mahinda Perera|2010-03-17T07:32:20.447+0000"
COPY 1629206
COPY 309775
psql:load_csv.sql:183: ERROR:  missing data for column "o_placeid"
CONTEXT:  COPY organisation_f, line 2: "0|company|Kam_Air|http://dbpedia.org/resource/Kam_Air"
psql:load_csv.sql:184: ERROR:  missing data for column "p_placeid"
CONTEXT:  COPY person_f, line 2: "933|Mahinda|Perera|male|1989-12-03|2010-03-17T07:32:10.447+0000|192.248.2.123|Firefox"
COPY 16836
COPY 229166
COPY 180670
COPY 746332
COPY 1470583
COPY 20540
COPY 7949
COPY 21764
psql:load_csv.sql:193: ERROR:  missing data for column "p_ispartof"
CONTEXT:  COPY place_f, line 2: "0|India|http://dbpedia.org/resource/India|country"
psql:load_csv.sql:194: ERROR:  missing data for column "p_creator"
CONTEXT:  COPY post_f, line 2: "1236950581248||2011-09-16T22:05:40.595+0000|192.248.2.123|Firefox|uz|About Augustine of Hippo, ustin..."
COPY 721295
COPY 71
COPY 70
COPY 16080
COPY 16080
INSERT 0 746332
INSERT 0 1470583
INSERT 0 2719160
INSERT 0 721295
INSERT 0 0
INSERT 0 0
INSERT 0 1629206
INSERT 0 309775
INSERT 0 0
INSERT 0 21764
INSERT 0 7949
INSERT 0 0
INSERT 0 16836
INSERT 0 229166
INSERT 0 20540
INSERT 0 180670
INSERT 0 180670
INSERT 0 0
INSERT 0 0
INSERT 0 16080
INSERT 0 70
INSERT 0 71
INSERT 0 16080

Would be great if the load_csv.sql script can be updated or you can specify the version of the data generator that was used to generate this data.

  1. The paper mentions 4 types of query latencies but there are 13 LDBC queries in the benchmark. How are the queries grouped? Is there a framework for evaluating the read-only query performance for Postgres?

  2. Was the data ingestion done after adding the indexes or without them?

In general, some more documentation and steps to reproduce the benchmarks would be very useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions