Port distributed training support from existing PR

I'm opening this PR to keep track of the work needed to port the content of the #996 PR to the main branch.

The idea is to split that PR (which is huge and based on a quite old version of the codebase) and, starting from the current state of the main branch, port its main elements in smaller PRs.
I'll keep this issue updated as I work on this.

Many changes are not strictly related to supporting distributed training but may benefit Avalanche in general.

1. I'm starting with porting the modernized object detection/segmentation dataset, strategies, and metrics. I'll also port the generalized batch collate functionality.

--- 
# Changes in Distributed Training PR #996:

Legend:
- :black_square_button: Not ported
- :hourglass: Work in progress
- :speech_balloon: PR opened, discussion in progress
- :heavy_check_mark: Merged into main branch

## Base elements
- :heavy_check_mark: DistributedHelper implementation (#1370)
- :black_square_button: Distributed value, object, batch, model, tensor, ...
- :heavy_check_mark: Distributed consistency (hashers) (#1370)
- :black_square_button: Distributed training example (and runner script)

## Strategy e plugins
- :heavy_check_mark: New `supports_distributed` plugin field (#1370)
- :heavy_check_mark: New `_distributed_check` strategy field and related `_check_distributed_training_compatibility()` check (#1370)
- :black_square_button: New `wrap_distributed_model` strategy lifecycle method. Called from `..._observation.py`
- :heavy_check_mark: `_obtain_common_dataloader_parameters` strategy method (unrelated to distributed training) (#1370)
- :black_square_button: Strategy support superclasses
- :black_square_button: Various plugin adaptations for distributed training (LwF, CWR, ...)
- :heavy_check_mark: AR1: modernize to use `_obtain_common_dataloader_parameters` (unrelated to distributed training) (#1370)
- :black_square_button: Strategy templates: wrap various lifecycle methods to allow for seamless support of distributed training
    - Implementations should now be in `_backward()`, `_forward()`, ... while wrapping happens in `backward`, `forward`. Wrapper methods should be *final*, but Python is not strict on this (flexibility).

## Models
- :heavy_check_mark: Fixed device issues with dynamic models (unrelated to distributed training) (#1370)
- :heavy_check_mark: In `avalanche_forward`, generalize using `is_multi_task_module` to consider DDP wrapping (#1370)

## Detection
- :heavy_check_mark: Detection scenario modernization (#1333)
- :heavy_check_mark: Detection template (incl. Naive) modernization (#1333)
- :heavy_check_mark: Updated detection example (#1333)
- :heavy_check_mark: Detection dataset based on new dataset creation procedure (#1333)
- :hourglass: Collate generalization

## Data Loader
- :heavy_check_mark: Use DistributedHelper, remove mock _DistributedHelper (#1370)
- :heavy_check_mark: Various fixes to address drop_last, shuffle, etcetera

## Loggers and metrics
- :heavy_check_mark: Disable loggers creation for non-main processes (#1370)
- :heavy_check_mark: Default logger: pass 'default' instead of loggers list (#1370)
- :heavy_check_mark: Strategies constructor: allow strategies to accept a factory for the `evaluator` constructor parameter (`evaluator=default_evaluator()` -> `evaluator=default_evaluator`). (#1370)
- :heavy_check_mark: All strategy classes: change the default `evaluator` parameter value to use a factory. (#1370)

## Unit tests
- :black_square_button: Called in both environment-update and unit-test actions
    - :heavy_check_mark: Unit test runner: run_dist_tests.py and related utils (#1370)
    - End-to-end test script

## Typing
- :heavy_check_mark: Various typing fixes/integrations in AvalancheDataset and FlatData (#1333)
    - Mostly to improve the programming experience of VSCode users

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Port distributed training support from existing PR #1315

Changes in Distributed Training PR #996:

Base elements

Strategy e plugins

Models

Detection

Data Loader

Loggers and metrics

Unit tests

Typing

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Port distributed training support from existing PR #1315

Description

Changes in Distributed Training PR #996:

Base elements

Strategy e plugins

Models

Detection

Data Loader

Loggers and metrics

Unit tests

Typing

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions