Skip to content

Learning rate per stream #2028

@sophie-xhonneux

Description

@sophie-xhonneux

Describe the task. It can be a feature, documentation, etc.

We noticed some variables would benefit from different learning rates + modded nano gpt runs also noticed embedding networks benefitting from separately tuned learning rates.

Hedgedoc URL, if you are keeping notes, plots, logs in hedgedoc.

No response

Area

  • datasets, data readers, data preparation and transfer
  • model
  • science
  • infrastructure and engineering
  • evaluation, export and visualization
  • documentation

Metadata

Metadata

Assignees

No one assigned

    Labels

    modelRelated to model training or definition (not generic infra)

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions