Skip to content

Issuing more than 500 performance queries on 500 image dataset fails #10

Description

@psyhtest

I've been using a dataset with 500 images for testing:

$ ck install package --tags=dataset,imagenet,2012,val,min --no_tags=resized

and observed that setting the target QPS parameter to 8 and above e.g.:

$ export NPROCS=`grep -c processor /proc/cpuinfo`
$ ck run program:mlperf-inference-v0.5 --skip_print_timers --dep_add_tags.dataset=min \
--cmd_key=image-classification --env.CK_LOADGEN_TASK=image-classification \
--env.CK_LOADGEN_SCENARIO=Offline --env.CK_LOADGEN_MODE=Performance  \
--env.CK_OPENVINO_NTHREADS=$NPROCS --env.CK_OPENVINO_NSTREAMS=$NPROCS \
--env.CK_OPENVINO_NIREQ=$NPROCS --env.CK_LOADGEN_TARGET_QPS=8

results in a segmentation fault:

./tmp-EEKtXG.sh: line 38: 123041 Segmentation fault      (core dumped) ./Release/ov_mlperf --scenario ${CK_LOADGEN_SCENARIO} --mode ${CK_LOADGEN_MODE} --mlperf_conf_filename ${CK_LOADGEN_MLPERF_CONF} --user_conf_filename ${CK_LOADGEN_USER_CONF} --total_sample_count ${CK_LOADGEN_DATASET_SIZE} --data_path ${CK_ENV_DATASET_IMAGENET_VAL} --dataset imagenet --device ${CK_OPENVINO_
DEVICE} --model_path ${CK_ENV_OPENVINO_MODEL_XML} --model_name ${CK_OPENVINO_MODEL_NAME} --nireq ${CK_OPENVINO_NIREQ} --nstreams ${CK_OPENVINO_NSTREAMS} --nthreads ${CK_OPENVINO_NTHREADS} -
-nwarmup_iters ${CK_OPENVINO_NWARMUP_ITERS} --batch_size ${CK_BATCH_SIZE} > stdout.log 2> stderr.log

and subsequently to a Python exception due to corrupted log files (in particular, tmp/mlperf_log_accuracy.json only contains [):

--------------------------------
Traceback (most recent call last):
  File "/home/anton/CK/ck/kernel.py", line 10820, in <module>
    r=access(sys.argv[1:])
  File "/home/anton/CK/ck/kernel.py", line 10776, in access
    rr=perform_action(i)
  File "/home/anton/CK/ck/kernel.py", line 4126, in perform_action
    return a(i)
  File "/home/anton/CK_REPOS/ck-autotuning/module/program/module.py", line 3571, in run
    run_output_dict = process(i)
  File "/home/anton/CK_REPOS/ck-autotuning/module/program/module.py", line 182, in process
    r=process_in_dir(ii)
  File "/home/anton/CK_REPOS/ck-autotuning/module/program/module.py", line 3042, in process_in_dir
    rxx=cs.ck_postprocess(ii)
  File "/home/anton/CK_REPOS/ck-mlperf/script/image-classification/loadgen_postprocess.py", line 34, in ck_postprocess
    mlperf_log_dict['accuracy'] = json.load(accuracy_file)
  File "/usr/lib/python2.7/json/__init__.py", line 291, in load
    **kw)
  File "/usr/lib/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 380, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Expecting object: line 1 column 1 (char 0)

With the target QPS set to 7, mlperf_log_summary.txt contains e.g.

================================================
MLPerf Results Summary
================================================
SUT name : SUT^@
Scenario : Offline
Mode     : Performance
Samples per second: 77.7004
Result is : INVALID
  Min duration satisfied : NO
  Min queries satisfied : Yes
Recommendations:
 * Increase expected QPS so the loadgen pre-generates a larger (coalesced) query.

================================================
Additional Stats
================================================
Min latency (ns)                : 5945914094
Max latency (ns)                : 5945914094
Mean latency (ns)               : 5945914094
50.00 percentile latency (ns)   : 5945914094
90.00 percentile latency (ns)   : 5945914094
95.00 percentile latency (ns)   : 5945914094
97.00 percentile latency (ns)   : 5945914094
99.00 percentile latency (ns)   : 5945914094
99.90 percentile latency (ns)   : 5945914094

================================================
Test Parameters Used
================================================
samples_per_query : 462
target_qps : 7
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 0
min_query_count : 1
max_query_count : 0
qsl_rng_seed : 3133965575612453542
sample_index_rng_seed : 665484352860916858
schedule_rng_seed : 3622009729038561421
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
print_timestamps : false
performance_issue_unique : false
performance_issue_same : false
performance_issue_same_index : 0
performance_sample_count : 1024

samples_per_query gets calculated as target_qps * 60 * 1.1. When target_qps=7, samples_per_query=462 as above. Therefore, when target_qps=8, samples_per_query=528 which explains the segmentation fault.

However, rather than segfaulting, a better approach would be to load the 500 images, and process some images more than once.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions