Skip to content

Abineshabee/watcher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The silent data watcher. Decorates your pipeline functions and tells you exactly what happened to your data — row counts, schema drift, null changes, memory usage, join explosions — automatically, with zero config.

PyPI Python CI License: MIT DOI GitHub release Discussions

The problem

You run a data pipeline. The output is wrong — but the real problem is you don’t know where it went wrong.

import pandas as pd

df = pd.DataFrame({
    "customer_id": range(1, 1000001),
    "status": (["active"] * 700000) + (["inactive"] * 300000),
    "amount": [100] * 1000000
})

orders = pd.DataFrame({
    "customer_id": range(1, 400001),  # 400,000 rows
    "order_value": range(1, 400001)
})

print("Input rows:", len(df))

df = df[df["status"] == "active"]
df = df.merge(orders, on="customer_id", how="inner")
df = df.dropna()

print("Output rows:", len(df))

Output

You can see the final number.  
But not the story behind it.

Which step dropped the rows? Was it a filter, a null drop, or a bad join? You have no idea without adding print statements everywhere and re-running the whole thing.

watcher answers that — automatically.


Install

pip install dfwatcher                 # core only (pandas)
pip install "dfwatcher[rich]"         # + coloured terminal output
pip install "dfwatcher[full]"         # + Rich + psutil memory tracking

Quickstart

import pandas as pd
from watcher import watch, session

raw = pd.DataFrame({
    "customer_id": [1, 2, 3, 4],
    "status": ["active", "inactive", "active", None]
})

orders = pd.DataFrame({
    "customer_id": [1, 3],
    "amount": [250.0, 150.0]
})

@watch
def clean(df):
    return df.dropna()

@watch
def merge_orders(df):
    return df.merge(orders, on="customer_id", how="left")

@watch
def filter_active(df):
    return df[df["status"] == "active"]

# 3. Run the session to see the watcher summary!
if __name__ == "__main__":
    with session("nightly ETL") as s:
        df = clean(raw)
        df = merge_orders(df)
        df = filter_active(df)

#=====================================
# For more Examples    : exammples/
# For Syntax and Usage : docs/usage.md
# ====================================

Output — automatically, no extra code:


Documentation

For advanced pipeline patterns and debugging workflows, see the full documentation.


💬 Community & Support

Have questions, ideas, or want to share your pipeline results?

  • 💡 Feature requests → GitHub Discussions
  • 🐛 Bug reports → GitHub Issues
  • 📊 Showcase your pipelines → Discussions
  • 🙋 Help & usage → Discussions

👉 Join the conversation: https://github.com/Abineshabee/watcher/discussions


Development

git clone https://github.com/Abineshabee/watcher.git
cd watcher
pip install -e ".[dev]"
pytest tests/ -v --cov=watcher

CI runs on Python 3.10–3.13 across Ubuntu, Windows, and macOS on every push.


Roadmap

  • Polars backend
  • DuckDB backend
  • Notebook / HTML renderer
  • JSON handler for structured logging pipelines
  • watcher.config — global defaults without decorator arguments

License

MIT — see LICENSE.

About

đŸ˜¶ Watcher: The silent data watcher for pandas pipelines. Decorate your functions and instantly see how your data changed—rows, schema, nulls, memory, joins, and more—with zero configuration.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages