Skip to content

Add NumPy-style docstrings across all DashAI components#521

Merged
cristian-tamblay merged 59 commits intodevelopfrom
docs/components-docstring
Apr 6, 2026
Merged

Add NumPy-style docstrings across all DashAI components#521
cristian-tamblay merged 59 commits intodevelopfrom
docs/components-docstring

Conversation

@Irozuku
Copy link
Copy Markdown
Collaborator

@Irozuku Irozuku commented Apr 1, 2026

Summary

Adds comprehensive docstrings across all major DashAI components—including converters, explorers, models, metrics, generative models, generative tasks, tasks, dataloaders, and explainers. These docstrings follow the NumPy-style format, providing clear, consistent documentation of responsibilities, parameters, return values, and usage.

Additionally, docstrings now include references to original implementations (libraries) and/or research papers where the methods or models were introduced, offering better context and traceability.

image

Type of Change

Check all that apply like this [x]:

  • Backend change
  • Frontend change
  • CI / Workflow change
  • Build / Packaging change
  • Bug fix
  • Documentation

Changes (by file)

  • converters/*: Added detailed NumPy-style docstrings and references to original implementations or papers.
  • explorers/*: Documented core classes and methods, including relevant references where applicable.
  • models/*: Added NumPy-style docstrings describing model interfaces, parameters, behavior, and source references.
  • metrics/*: Improved documentation for metric definitions, inputs, outputs, and references.
  • generative_models/*: Documented architecture, usage, and linked to original implementations/papers.
  • generative_tasks/*: Clarified task structure, inputs, outputs, and contextual references.
  • tasks/*: Added descriptions of task responsibilities and execution flow.
  • dataloaders/*: Documented data loading logic, expected formats, and outputs.
  • explainers/*: Added docstrings explaining interpretation methods along with relevant references.

Testing (optional)

No testing required, changes are strictly documentation and do not affect runtime behavior.

Irozuku added 30 commits March 31, 2026 00:37
…tyle

Add detailed multi-paragraph class descriptions and schema docstrings
across all 29 scikit-learn converter wrappers, following the DistilBERT
pattern established for the codebase.
Add detailed class descriptions with References sections for SMOTE
(Chawla et al. 2002), SMOTEENN (Batista et al. 2004), and
RandomUnderSampler, plus expanded schema docstrings.
Add detailed multi-paragraph descriptions to all 14 explorer classes
covering chart types, use cases, and parameter guidance; add schema
docstrings describing what each explorer configures.
Add detailed descriptions to base explainer classes and all three
explainer implementations (KernelSHAP, PartialDependence,
PermutationFeatureImportance) with literature References sections.
Add detailed descriptions to all 8 task classes covering input/output
types, compatible metrics, and multi-paragraph context for each ML task
type (classification, regression, translation, image generation, etc.).
Add detailed class descriptions to CSVDataLoader, ExcelDataLoader, and
JSONDataLoader covering parsing behaviour, multi-file handling, and
split logic; expand schema docstrings with parameter configuration notes.
…tyle

Add detailed multi-paragraph class descriptions with References sections
for MistralModel (Jiang et al. 2023), MixtralModel (Jiang et al. 2024),
QwenModel (Qwen Team 2024), and SmolLMModel (Allal et al. 2024); expand
schema docstrings with quantization and variant configuration notes.
Add detailed multi-paragraph descriptions with References sections to
all 13 scikit-learn model wrappers covering algorithm theory, strengths,
and limitations (LogisticRegression, SVC, RandomForest, GradientBoosting,
MLP, KNeighbors, Ridge, Linear/SVR, DecisionTree, DummyClassifier, etc.).
Add full NumPy-style docstrings to the two private llama_utils GPU-check
helpers; expand weak docstrings on get_model_params_from_task,
DistilBERT/OpusMT SavedModel.__init__, and save/load methods in
BagOfWordsTextClassificationModel and SklearnLikeModel.
Add Parameters/Returns/Raises sections to fit, transform, get_output_type,
and helper methods across base_converter, hugging_face_wrapper,
imbalanced_learn_wrapper, bag_of_words, label_encoder, polynomial_features,
tf_idf, the three imbalanced_learn converters, and character_replacer/nan_remover.
Irozuku added 24 commits March 31, 2026 14:42
Add NumPy Parameters/Returns sections to DummyTextClassificationModel
fit/predict, EmbeddingConverter get_output_type/_process_batch, and
TokenizerConverter _process_batch.
Add multi-paragraph descriptions to BaseMetric, ClassificationMetric,
RegressionMetric, and TranslationMetric covering MAXIMIZE semantics,
compatible tasks, and helper function roles.
Add multi-paragraph descriptions with formulas, value ranges, use-case
guidance, and References sections to Accuracy, CohenKappa, F1,
HammingDistance, LogLoss, Precision, Recall, and ROCAUC.
Add multi-paragraph descriptions with formulas, value ranges, outlier
sensitivity notes, and References sections to ExplainedVariance, MAE,
MedianAbsoluteError, MSE, R2, and RMSE.
Expand class, schema, and __init__ docstrings for StableDiffusionV2Model,
PixArtSigmaModel, and SDXLTurboModel following the established pattern.
Expand SD15DepthControlNetSchema/Model, SD15HEDControlNetSchema/Model,
SD15OpenPoseControlNetSchema/Model, and SDXLCannyControlNetSchema/Model
class docstrings. All __init__ methods were already documented.
Remove stale NumericalWrapperForText reference from schema docstring.
Change kwargs : dict to **kwargs : dict in __init__ per NumPy standard.
Expand one-liner class docstrings for LlamaSchema/Model, StableDiffusionV3
Schema/Model, StableDiffusionXL Schema/Model, TongyiZImage Schema/Model,
StableDiffusionXLV1ControlNetSchema, SklearnLikeModel/Classifier/Regressor,
and the nested MLP helper class. Audit now reports 0 issues.
…locks

.. math:: directives emit curly braces that the MDX acorn parser
treats as JSX expressions, breaking doc generation. Replace all
.. math:: blocks across 13 metric files with indented :: code blocks
containing Unicode plain-text formulas, following the pattern used
in partial_dependence.py.
Update type annotations in classification metric modules to explicitly use numpy arrays for predicted labels and other relevant parameters.
@Irozuku Irozuku added the documentation Improvements or additions to documentation label Apr 1, 2026
@Irozuku Irozuku force-pushed the docs/components-docstring branch from 92365ad to 5011ea9 Compare April 1, 2026 18:17
@cristian-tamblay cristian-tamblay merged commit 04b7579 into develop Apr 6, 2026
19 checks passed
@cristian-tamblay cristian-tamblay deleted the docs/components-docstring branch April 6, 2026 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants