Can Large Vision-Language Models Correct Grounding Errors and Reason By Themselves?

Publication
CVPR 2025