Excellent work! I am very interested in Multi-Modal Collaborative Editing. I have a question: why do the results of Mask_edit and Text_edit show a significant difference in skin tone compared to the input image, while the result of Collaborative_edit has a skin tone very similar to the input image? I look forward to your response, thank you !

Excellent work! I am very interested in Multi-Modal Collaborative Editing. I have a question: why do the results of Mask_edit and Text_edit show a significant difference in skin tone compared to the input image, while the result of Collaborative_edit has a skin tone very similar to the input image? I look forward to your response, thank you !
