Add NumPy-style docstrings across all DashAI components by Irozuku · Pull Request #521 · DashAISoftware/DashAI

Irozuku · 2026-04-01T18:12:22Z

Summary

Adds comprehensive docstrings across all major DashAI components—including converters, explorers, models, metrics, generative models, generative tasks, tasks, dataloaders, and explainers. These docstrings follow the NumPy-style format, providing clear, consistent documentation of responsibilities, parameters, return values, and usage.

Additionally, docstrings now include references to original implementations (libraries) and/or research papers where the methods or models were introduced, offering better context and traceability.

Type of Change

Check all that apply like this [x]:

Changes (by file)

converters/*: Added detailed NumPy-style docstrings and references to original implementations or papers.
explorers/*: Documented core classes and methods, including relevant references where applicable.
models/*: Added NumPy-style docstrings describing model interfaces, parameters, behavior, and source references.
metrics/*: Improved documentation for metric definitions, inputs, outputs, and references.
generative_models/*: Documented architecture, usage, and linked to original implementations/papers.
generative_tasks/*: Clarified task structure, inputs, outputs, and contextual references.
tasks/*: Added descriptions of task responsibilities and execution flow.
dataloaders/*: Documented data loading logic, expected formats, and outputs.
explainers/*: Added docstrings explaining interpretation methods along with relevant references.

Testing (optional)

No testing required, changes are strictly documentation and do not affect runtime behavior.

…e class docstrings

…ry class docstrings

…plorers

…converters

…n models

…ters

…olNet models

…odels

…sses

…nd task files

…tyle Add detailed multi-paragraph class descriptions and schema docstrings across all 29 scikit-learn converter wrappers, following the DistilBERT pattern established for the codebase.

Add detailed class descriptions with References sections for SMOTE (Chawla et al. 2002), SMOTEENN (Batista et al. 2004), and RandomUnderSampler, plus expanded schema docstrings.

Add detailed multi-paragraph descriptions to all 14 explorer classes covering chart types, use cases, and parameter guidance; add schema docstrings describing what each explorer configures.

Add detailed descriptions to base explainer classes and all three explainer implementations (KernelSHAP, PartialDependence, PermutationFeatureImportance) with literature References sections.

Add detailed descriptions to all 8 task classes covering input/output types, compatible metrics, and multi-paragraph context for each ML task type (classification, regression, translation, image generation, etc.).

Add detailed class descriptions to CSVDataLoader, ExcelDataLoader, and JSONDataLoader covering parsing behaviour, multi-file handling, and split logic; expand schema docstrings with parameter configuration notes.

…tyle Add detailed multi-paragraph class descriptions with References sections for MistralModel (Jiang et al. 2023), MixtralModel (Jiang et al. 2024), QwenModel (Qwen Team 2024), and SmolLMModel (Allal et al. 2024); expand schema docstrings with quantization and variant configuration notes.

Add detailed multi-paragraph descriptions with References sections to all 13 scikit-learn model wrappers covering algorithm theory, strengths, and limitations (LogisticRegression, SVC, RandomForest, GradientBoosting, MLP, KNeighbors, Ridge, Linear/SVR, DecisionTree, DummyClassifier, etc.).

Add full NumPy-style docstrings to the two private llama_utils GPU-check helpers; expand weak docstrings on get_model_params_from_task, DistilBERT/OpusMT SavedModel.__init__, and save/load methods in BagOfWordsTextClassificationModel and SklearnLikeModel.

Add Parameters/Returns/Raises sections to fit, transform, get_output_type, and helper methods across base_converter, hugging_face_wrapper, imbalanced_learn_wrapper, bag_of_words, label_encoder, polynomial_features, tf_idf, the three imbalanced_learn converters, and character_replacer/nan_remover.

Add NumPy Parameters/Returns sections to DummyTextClassificationModel fit/predict, EmbeddingConverter get_output_type/_process_batch, and TokenizerConverter _process_batch.

…lasses

Add multi-paragraph descriptions to BaseMetric, ClassificationMetric, RegressionMetric, and TranslationMetric covering MAXIMIZE semantics, compatible tasks, and helper function roles.

Add multi-paragraph descriptions with formulas, value ranges, use-case guidance, and References sections to Accuracy, CohenKappa, F1, HammingDistance, LogLoss, Precision, Recall, and ROCAUC.

Add multi-paragraph descriptions with formulas, value ranges, outlier sensitivity notes, and References sections to ExplainedVariance, MAE, MedianAbsoluteError, MSE, R2, and RMSE.

Expand class, schema, and __init__ docstrings for StableDiffusionV2Model, PixArtSigmaModel, and SDXLTurboModel following the established pattern.

Expand SD15DepthControlNetSchema/Model, SD15HEDControlNetSchema/Model, SD15OpenPoseControlNetSchema/Model, and SDXLCannyControlNetSchema/Model class docstrings. All __init__ methods were already documented.

Remove stale NumericalWrapperForText reference from schema docstring. Change kwargs : dict to **kwargs : dict in __init__ per NumPy standard.

Expand one-liner class docstrings for LlamaSchema/Model, StableDiffusionV3 Schema/Model, StableDiffusionXL Schema/Model, TongyiZImage Schema/Model, StableDiffusionXLV1ControlNetSchema, SklearnLikeModel/Classifier/Regressor, and the nested MLP helper class. Audit now reports 0 issues.

…tch repo style

…locks .. math:: directives emit curly braces that the MDX acorn parser treats as JSX expressions, breaking doc generation. Replace all .. math:: blocks across 13 metric files with indented :: code blocks containing Unicode plain-text formulas, following the pattern used in partial_dependence.py.

…ting

…to use a consistent bullet point style.

Update type annotations in classification metric modules to explicitly use numpy arrays for predicted labels and other relevant parameters.

…R score methods

…cstring

…rmer methods

…odels adding item list dash

Irozuku added 30 commits March 31, 2026 00:37

docs: convert base model docstrings to NumPy style

588ac65

docs: convert base explorer docstrings to NumPy style and add sub-typ…

6316791

…e class docstrings

docs: convert base converter docstrings to NumPy style and add catego…

e97d4e1

…ry class docstrings

docs: add NumPy-style method docstrings to relationship explorers

a33d5d7

docs: add NumPy-style method docstrings to distribution explorers

304eaaa

docs: add NumPy-style method docstrings to statistical and preview ex…

177178e

…plorers

docs: add NumPy-style docstrings to scaling and encoding converters

7715e23

docs: add NumPy-style docstrings to dimensionality reduction converters

d00d2a8

docs: add NumPy-style docstrings to feature selection and imputation …

fa73de6

…converters

docs: add NumPy-style docstrings to model category classes and sklear…

83cdd2b

…n models

docs: add NumPy-style docstrings to dataloader utility gaps

c44de89

docs: add NumPy-style docstrings to kernel samplers and simple conver…

5ec2a42

…ters

docs: add NumPy-style docstrings to sampling and HuggingFace converters

9af8f13

docs: add NumPy-style docstrings to HuggingFace generative text models

f9c307f

docs: add NumPy-style docstrings to HuggingFace transformer and Contr…

840de92

…olNet models

docs: add NumPy-style docstrings to tasks, explainability, and misc m…

c323c28

…odels

docs: add schema docstring to PolynomialFeatures converter

eb65ada

docs: add missing schema and __init__ docstrings to all converter cla…

b1e5f62

…sses

docs: add schema and _check_params docstrings to dataloader classes

c90b17d

docs: add missing __init__ and utility function docstrings to model a…

887b141

…nd task files

docs: expand sklearn converter class and schema docstrings to NumPy s…

6f5d586

…tyle Add detailed multi-paragraph class descriptions and schema docstrings across all 29 scikit-learn converter wrappers, following the DistilBERT pattern established for the codebase.

docs: expand imbalanced-learn converter docstrings to NumPy style

afe72fd

Add detailed class descriptions with References sections for SMOTE (Chawla et al. 2002), SMOTEENN (Batista et al. 2004), and RandomUnderSampler, plus expanded schema docstrings.

docs: expand explorer class and schema docstrings to NumPy style

82a85a9

Add detailed multi-paragraph descriptions to all 14 explorer classes covering chart types, use cases, and parameter guidance; add schema docstrings describing what each explorer configures.

docs: expand explainability class docstrings to NumPy style

0ea342b

Add detailed descriptions to base explainer classes and all three explainer implementations (KernelSHAP, PartialDependence, PermutationFeatureImportance) with literature References sections.

docs: expand task class docstrings to NumPy style

afcce3c

Add detailed descriptions to all 8 task classes covering input/output types, compatible metrics, and multi-paragraph context for each ML task type (classification, regression, translation, image generation, etc.).

docs: expand dataloader class and schema docstrings to NumPy style

37768b4

Add detailed class descriptions to CSVDataLoader, ExcelDataLoader, and JSONDataLoader covering parsing behaviour, multi-file handling, and split logic; expand schema docstrings with parameter configuration notes.

Irozuku added 24 commits March 31, 2026 14:42

docs: fix remaining weak docstrings found by post-audit sweep

9fb0671

Add NumPy Parameters/Returns sections to DummyTextClassificationModel fit/predict, EmbeddingConverter get_output_type/_process_batch, and TokenizerConverter _process_batch.

fix: apply pre-commit to multiple files

e82c0c2

fix: improve docstring formatting in Nystroem and PartialDependence c…

5b44b0d

…lasses

docs: expand base metric class docstrings to NumPy style

40f19f4

Add multi-paragraph descriptions to BaseMetric, ClassificationMetric, RegressionMetric, and TranslationMetric covering MAXIMIZE semantics, compatible tasks, and helper function roles.

docs: expand classification metric class docstrings to NumPy style

ac3dc1b

Add multi-paragraph descriptions with formulas, value ranges, use-case guidance, and References sections to Accuracy, CohenKappa, F1, HammingDistance, LogLoss, Precision, Recall, and ROCAUC.

docs: expand regression metric class docstrings to NumPy style

977f0d5

Add multi-paragraph descriptions with formulas, value ranges, outlier sensitivity notes, and References sections to ExplainedVariance, MAE, MedianAbsoluteError, MSE, R2, and RMSE.

docs: expand non-ControlNet generative model docstrings to NumPy style

ee335c9

Expand class, schema, and __init__ docstrings for StableDiffusionV2Model, PixArtSigmaModel, and SDXLTurboModel following the established pattern.

docs: expand ControlNet model schema and class docstrings to NumPy style

cd4a915

Expand SD15DepthControlNetSchema/Model, SD15HEDControlNetSchema/Model, SD15OpenPoseControlNetSchema/Model, and SDXLCannyControlNetSchema/Model class docstrings. All __init__ methods were already documented.

fix: correct BagOfWords schema docstring and __init__ kwargs format

1a47693

Remove stale NumericalWrapperForText reference from schema docstring. Change kwargs : dict to **kwargs : dict in __init__ per NumPy standard.

docs: rewrite BagOfWordsTextClassificationModel class docstring to ma…

669dc54

…tch repo style

docs: update partial dependence docstring to use literal block format…

31fbd64

…ting

fix: apply pre-commit to various files

135220e

fix formatting of references in documentation across various modules …

520266e

…to use a consistent bullet point style.

docs: annotate type hints with numpy arrays in metrics

7d054c2

Update type annotations in classification metric modules to explicitly use numpy arrays for predicted labels and other relevant parameters.

docs: update regression metrics to include numpy type hints

940a209

docs: add numpy type hints for target_sentences in BLEU, CHRF, and TE…

f17bbeb

…R score methods

docs: update type hints to use numpy for ndarray in metrics

c6e36a7

fix: replace Unicode multiplication and dash with ASCII equivalents

e8bbe0f

Merge remote-tracking branch 'origin/develop' into docs/components-do…

b26f017

…cstring

Merge remote-tracking branch 'origin/develop' into docs/components-do…

4e37663

…cstring

docs: enhance docstrings for Hugging Face text classification transfo…

5f01f90

…rmer methods

docs: enhance docstrings for DeBERTa-v3 and ModernBERT transformers

5011ea9

Irozuku added the documentation Improvements or additions to documentation label Apr 1, 2026

Irozuku force-pushed the docs/components-docstring branch from 92365ad to 5011ea9 Compare April 1, 2026 18:17

docs: format references in docstrings across various converters and m…

dfab38b

…odels adding item list dash

cristian-tamblay approved these changes Apr 6, 2026

View reviewed changes

cristian-tamblay merged commit 04b7579 into develop Apr 6, 2026
19 checks passed

cristian-tamblay deleted the docs/components-docstring branch April 6, 2026 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NumPy-style docstrings across all DashAI components#521

Add NumPy-style docstrings across all DashAI components#521
cristian-tamblay merged 59 commits intodevelopfrom
docs/components-docstring

Irozuku commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Irozuku commented Apr 1, 2026

Summary

Type of Change

Changes (by file)

Testing (optional)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants