Hi, thank you for open-sourcing this excellent work! The diffusion-based semantic scene completion for point clouds is truly an interesting and promising direction. I have a question regarding the model architecture. While exploring the code, I noticed that in the instantiation of MinkUNetDiff, the in_channels is set to 3:
self.model = minknet.MinkUNetDiff(in_channels=3, out_channels=self.hparams['model']['out_dim'])
Given that the model performs semantic prediction, I was wondering: does the semantic information serve as an additional attribute/condition during the diffusion process, or is it handled differently in the network architecture? I would appreciate any clarification on how the semantic prediction is integrated into the diffusion framework. Looking forward to your insights!
Hi, thank you for open-sourcing this excellent work! The diffusion-based semantic scene completion for point clouds is truly an interesting and promising direction. I have a question regarding the model architecture. While exploring the code, I noticed that in the instantiation of
MinkUNetDiff, thein_channelsis set to 3:Given that the model performs semantic prediction, I was wondering: does the semantic information serve as an additional attribute/condition during the diffusion process, or is it handled differently in the network architecture? I would appreciate any clarification on how the semantic prediction is integrated into the diffusion framework. Looking forward to your insights!