Skip to content

Issue: Bug/Performance Issue [Custom Images] - training on dexnet compatible dataset result in gqcnn unable to predict good grasps (pred nonzero is always '0') #128

@aprath1

Description

@aprath1

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
  • Python version: 2.7.12
  • Installed using pip or ROS: pip
  • Camera: default

Describe what you are trying to do
Trying to train GQCNN from scratch on a custom dataset and also trying to fine tune a pretrained GQCNN_2.0 on custom dataset. (Datasets are created using dex-net API).

Describe current behavior
Training or finetuning (Optimizing CNN also) using the dataset results in a behavior such that the network is unable to make any good grasp prediction. Referring to the log output the - 'Pred nonzero' is always 0. Even after 5 to 10 iterations in case of finetuning. Is this a normal behavior?

Describe the expected behavior
I am expecting the network to make at least a few good grasps out of the available good grasps. Interestingly, when I keep the layers upto fc3 or fc4 as base layer and DO NOT optimise the base layer the network seem to get finetuned properly and it is predicting some good grasps but still the error rate is high.

Describe the input images
The input dataset is generated using a dexnet compatible hdf5 database and I used dex-net API to generate the dataset from these. Source of database - https://dougsm.github.io/egad/ (please see section dex-net compatible data.)

Describe the physical camera setup
generated using dexnet API

Other info / logs
Few lines of Training logs:

GQCNNTrainerTF INFO     Step took 2.304 sec.
GQCNNTrainerTF INFO     Max 0.23993634
GQCNNTrainerTF INFO     Min 0.14524038
GQCNNTrainerTF INFO     Pred nonzero 0
GQCNNTrainerTF INFO     True nonzero 15
GQCNNTrainerTF INFO     Step 27312 (epoch 1.426), 0.02 s
GQCNNTrainerTF INFO     Minibatch loss: 0.478, learning rate: 0.009025
GQCNNTrainerTF INFO     Minibatch error: 11.719
GQCNNTrainerTF INFO     Step took 2.369 sec.
GQCNNTrainerTF INFO     Max 0.23774128
GQCNNTrainerTF INFO     Min 0.19348052
GQCNNTrainerTF INFO     Pred nonzero 0
GQCNNTrainerTF INFO     True nonzero 80
GQCNNTrainerTF INFO     Step 27313 (epoch 1.426), 0.02 s
GQCNNTrainerTF INFO     Minibatch loss: 1.077, learning rate: 0.009025
GQCNNTrainerTF INFO     Minibatch error: 62.5
GQCNNTrainerTF INFO     Step took 2.158 sec.
GQCNNTrainerTF INFO     Max 0.23704815
GQCNNTrainerTF INFO     Min 0.16592737
GQCNNTrainerTF INFO     Pred nonzero 0
GQCNNTrainerTF INFO     True nonzero 45

Few lines of finetuning log (fc3 set as base layer and using oldformat for layers upto fc3 and optimizing the base layers also ):

10-04 11:56:09 GQCNNTrainerTF INFO     Step 191576 (epoch 9.999), 0.08 s
10-04 11:56:09 GQCNNTrainerTF INFO     Minibatch loss: 0.433, learning rate: 0.004633
10-04 11:56:09 GQCNNTrainerTF INFO     Minibatch error: 13.281
10-04 11:56:10 GQCNNTrainerTF INFO     Step took 1.242 sec.
10-04 11:56:10 GQCNNTrainerTF INFO     Max 0.25177836
10-04 11:56:10 GQCNNTrainerTF INFO     Min 0.16215596
10-04 11:56:10 GQCNNTrainerTF INFO     Pred nonzero 0
10-04 11:56:10 GQCNNTrainerTF INFO     True nonzero 34
10-04 11:56:10 GQCNNTrainerTF INFO     Step 191577 (epoch 9.999), 0.07 s
10-04 11:56:10 GQCNNTrainerTF INFO     Minibatch loss: 0.577, learning rate: 0.004633
10-04 11:56:10 GQCNNTrainerTF INFO     Minibatch error: 26.563
10-04 11:56:11 GQCNNTrainerTF INFO     Step took 1.171 sec.
10-04 11:56:11 GQCNNTrainerTF INFO     Max 0.25264603
10-04 11:56:11 GQCNNTrainerTF INFO     Min 0.18788987
10-04 11:56:11 GQCNNTrainerTF INFO     Pred nonzero 0
10-04 11:56:11 GQCNNTrainerTF INFO     True nonzero 49
10-04 11:56:11 GQCNNTrainerTF INFO     Step 191578 (epoch 10.0), 0.06 s
10-04 11:56:11 GQCNNTrainerTF INFO     Minibatch loss: 0.709, learning rate: 0.004633
10-04 11:56:11 GQCNNTrainerTF INFO     Minibatch error: 38.281
10-04 11:56:13 GQCNNTrainerTF INFO     Step took 1.36 sec.
10-04 11:56:13 GQCNNTrainerTF INFO     Max 0.25366336
10-04 11:56:13 GQCNNTrainerTF INFO     Min 0.17693533
10-04 11:56:13 GQCNNTrainerTF INFO     Pred nonzero 0
10-04 11:56:13 GQCNNTrainerTF INFO     True nonzero 16
10-04 11:56:13 GQCNNTrainerTF INFO     Step 191579 (epoch 10.0), 0.07 s
10-04 11:56:13 GQCNNTrainerTF INFO     Minibatch loss: 0.423, learning rate: 0.004633
10-04 11:56:13 GQCNNTrainerTF INFO     Minibatch error: 12.5
10-04 11:56:14 GQCNNTrainerTF INFO     Step took 1.24 sec.
10-04 11:56:14 GQCNNTrainerTF INFO     Max 0.25436333
10-04 11:56:14 GQCNNTrainerTF INFO     Min 0.1827491
10-04 11:56:14 GQCNNTrainerTF INFO     Pred nonzero 0
10-04 11:56:14 GQCNNTrainerTF INFO     True nonzero 10
10-04 11:56:14 GQCNNTrainerTF INFO     Step 191580 (epoch 10.0), 0.07 s
10-04 11:56:14 GQCNNTrainerTF INFO     Minibatch loss: 0.372, learning rate: 0.004633
10-04 11:56:14 GQCNNTrainerTF INFO     Minibatch error: 7.813

Another interesting thing is that the softmax output seems to be not proper, out of the 2 outputs the 1st value is always in range of 0.7 and the 2nd value is in range of 0.3 ! (varies somewhat at different trainings due to the random initialization of the weights during training)
Sample softmax output:

array([[0.7649399 , 0.23506004],
       [0.7651925 , 0.23480749],
       [0.76295185, 0.23704815],
       [0.7643285 , 0.23567156],
       [0.7630225 , 0.23697755],
       [0.7642536 , 0.23574635],
       [0.76532423, 0.23467574],
       [0.76295376, 0.23704618],
       [0.7632632 , 0.23673679],
       [0.76498514, 0.23501493],
       [0.7632064 , 0.2367936 ],
       [0.7959242 , 0.20407586],
       [0.7641547 , 0.23584531],
       [0.76448244, 0.23551749],
       [0.76394135, 0.23605862],
       [0.7647108 , 0.23528923],
       [0.7639811 , 0.23601893],
       [0.7649897 , 0.23501036],
       [0.7647293 , 0.23527072],
       [0.7651613 , 0.23483868],
       [0.76307136, 0.23692863],
       [0.7640458 , 0.23595421],
       [0.76476514, 0.23523483],
       [0.7672727 , 0.23272723],
       [0.7630191 , 0.2369809 ],
       [0.7645683 , 0.23543172],
       [0.7641252 , 0.2358748 ],
       [0.7639672 , 0.23603278],
       [0.7635745 , 0.23642555],
       [0.79914796, 0.20085205],
       [0.7640747 , 0.23592529],
       [0.76295626, 0.23704374],
       [0.7648026 , 0.23519741],
       [0.76468086, 0.23531915],
       [0.79236376, 0.20763627],
       [0.763892  , 0.23610799],
       [0.76452196, 0.23547806],
       [0.76323694, 0.2367631 ],
       [0.76363677, 0.23636323],
       [0.7694154 , 0.23058464],
     .......

Hi @visatish , could you please let me know if this is a normal behavior? any clue as to what could be the reason for this...?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions