Irregular Time Series Processing and Regression Pipeline (NASA ADS / arXiv:2311.04470)
This repository contains the published paper and implementation notebooks for a large-scale time series modeling project available on NASA ADS (arXiv:2311.04470).
The project implements a complete end-to-end analytical pipeline for extracting structured features from irregular observational time series and estimating physical distances using regression modeling.
Author: Kevin Mota da Costa
Portfolio: https://costakevinn.github.io
LinkedIn: https://linkedin.com/in/costakevinnn
The objective of this project was to design a robust time series and regression workflow capable of extracting meaningful patterns from large, noisy, irregularly sampled datasets.
The pipeline integrates:
- Irregular time series processing
- Period detection using Lomb–Scargle (Astropy)
- Phase-folding and feature engineering
- Supervised regression modeling
- Quality filtering and outlier handling
- Validation against external reference datasets (OGLE-IV)
Although the application is astrophysical, the workflow directly translates to financial time series, sensor monitoring, industrial analytics, and forecasting systems.
The analytical workflow follows a structured lifecycle:
Raw irregular time series → Period detection (Lomb–Scargle) → Frequency transformation → Phase-folding → Feature extraction → Regression modeling → Distance estimation → Benchmark validation
This structure mirrors production-grade time series engineering pipelines.
- 4,700+ real observational samples
- OGLE-IV reference catalog
- Irregular sampling
- Heteroscedastic measurement noise
The project includes statistical validation against benchmark datasets to ensure predictive reliability.
- Irregular time series analysis
- Lomb–Scargle periodogram (Astropy)
- Phase-folding of periodic signals
- Feature engineering from temporal structure
- Statistical regression (SciPy
curve_fit) - Kernel Density Estimation
- Outlier filtering and quality control
- Benchmark validation
(Lomb–Scargle applied to irregular data)
(Phase-folded time series)
(Period–luminosity relationship)
(Comparison with OGLE-IV benchmark)
-
docs/paper.pdfPublished scientific paper (Portuguese) -
notebooks/time_series_distance_pipeline.ipynbComplete pipeline: loading, period detection, feature engineering, regression, validation -
notebooks/equation_43_derivation.ipynbSupporting statistical derivations and modeling details
NASA ADS / arXiv:2311.04470 https://ui.adsabs.harvard.edu/abs/arXiv:2311.04470
Note: The original publication links to an older GitHub account. This repository hosts the updated implementation and paper under my current profile.
Python
- NumPy
- Pandas
- SciPy
- Astropy
- Lomb–Scargle periodogram
- Kernel Density Estimation
- Statistical regression
- Outlier filtering
- Matplotlib
- Large-scale irregular time series processing
- Feature engineering from periodic signals
- Regression modeling on noisy data
- Data validation against benchmark datasets
- Analytical reproducibility
- Structured ML-style pipeline design
This project is part of my Machine Learning portfolio: 👉 https://costakevinn.github.io
MIT License — see LICENSE for details.





