Skip to content
This repository was archived by the owner on Sep 1, 2024. It is now read-only.
This repository was archived by the owner on Sep 1, 2024. It is now read-only.

About #20

@dengyuanjie

Description

@dengyuanjie

Thank you very much for your excellent work.
One problem I am confused about is the definition of the crossmodal loss function and coseparation loss function. In the train.py, why random numbers and opt.gt_percentage are used to select which audio feature (audio_embedding_A1_pred or audio_embedding_A1_gt) is used. According to the method of the paper, shouldn't the predictive features be used?

def get_coseparation_loss(output, opt, loss_triplet):
if random.random() > opt.gt_percentage:
audio_embeddings_A1 = output['audio_embedding_A1_pred']
audio_embeddings_A2 = output['audio_embedding_A2_pred']
audio_embeddings_B1 = output['audio_embedding_B1_pred']
audio_embeddings_B2 = output['audio_embedding_B2_pred']
else:
audio_embeddings_A1 = output['audio_embedding_A1_gt']
audio_embeddings_A2 = output['audio_embedding_A2_gt']
audio_embeddings_B1 = output['audio_embedding_B_gt']
audio_embeddings_B2 = output['audio_embedding_B_gt']

coseparation_loss = loss_triplet(audio_embeddings_A1, audio_embeddings_A2, audio_embeddings_B1) + loss_triplet(audio_embeddings_A1, audio_embeddings_A2, audio_embeddings_B2)
return coseparation_loss
def get_crossmodal_loss(output, opt, loss_triplet):
identity_feature_A = output['identity_feature_A']
identity_feature_B = output['identity_feature_B']
if random.random() > opt.gt_percentage:
audio_embeddings_A1 = output['audio_embedding_A1_pred']
audio_embeddings_A2 = output['audio_embedding_A2_pred']
audio_embeddings_B1 = output['audio_embedding_B1_pred']
audio_embeddings_B2 = output['audio_embedding_B2_pred']
else:
audio_embeddings_A1 = output['audio_embedding_A1_gt']
audio_embeddings_A2 = output['audio_embedding_A2_gt']
audio_embeddings_B1 = output['audio_embedding_B_gt']
audio_embeddings_B2 = output['audio_embedding_B_gt']
crossmodal_loss = loss_triplet(audio_embeddings_A1, identity_feature_A, identity_feature_B) + loss_triplet(audio_embeddings_A2, identity_feature_A, identity_feature_B) + loss_triplet(audio_embeddings_B1, identity_feature_B, identity_feature_A) + loss_triplet(audio_embeddings_B2, identity_feature_B, identity_feature_A)
return crossmodal_loss`
```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions