Skip to content
This repository was archived by the owner on Sep 9, 2025. It is now read-only.
This repository was archived by the owner on Sep 9, 2025. It is now read-only.

Active learning sampling strategy #125

@aenglebert

Description

@aenglebert

Hello !

I am trying to use medCAT with medCATTrainer in an active learning setup to label a subset of a large set of unannotated French documents.

In the medcattrainer paper ( https://arxiv.org/pdf/1907.07322.pdf ), in section 3.2 Active Learning, it's specified the use of selective certainty-based sampling to guide the sampling of documents to annotate.

But the only parameter I found related to active learning in MedCATTrainer is the "train_model_on_submit" parameter in ProjectAnnotateEntities.

train_model_on_submit = models.BooleanField(default=True, help_text='Active learning - configured CDB is trained '

From what I found, this parameter is responsible for a call to the train_medcat function when a document is submitted, but it seems to have no influence on the order/sampling of documents in the project annotation interface.

Is there another option I missed or misunderstood that allows for replicating the certainty-based sampling described in the paper?

Or does this part need to be done outside of MedCATTrainer with the creation of a new project at each annotation step containing only the sampled documents?

By the way, thank you for this amazing tool !

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions