Thanks for sharing the wonderful work, the paper differentiate GLIP with GroundingDINO, FIBER, the former is classified into open vocabulary object detection, while the latter is named bi-functional model(detect and reference comprehention), since GLIP can also be used for DOD (e.g., in omnilabel paper), could you please give more dissucssion on this ?
Thanks for sharing the wonderful work, the paper differentiate GLIP with GroundingDINO, FIBER, the former is classified into open vocabulary object detection, while the latter is named bi-functional model(detect and reference comprehention), since GLIP can also be used for DOD (e.g., in omnilabel paper), could you please give more dissucssion on this ?