A small Python module containing quick utility functions for standard ETL processes.
pip install gluestick
- NumPy
- Pandas
The repo includes a script that prints peak RSS (resident set size) in MiB for several gluestick workloads. It uses the same scenarios as tests/function_tests/test_memory_usage.py, so you can compare numbers before and after a change on your machine. Peak RSS is a rough signal, not a portable “this library always uses X MB” guarantee.
Setup (from the repository root):
pip install ".[test]"This installs gluestick, pytest, and memory-profiler, which the script and memory tests need.
Run the benchmark:
python scripts/memory_benchmark.pyFor machine-readable output (e.g. to save and diff):
python scripts/memory_benchmark.py --json > before.json
# change code, then:
python scripts/memory_benchmark.py --json > after.jsonCompare the JSON objects (or use diff / jq) to see per-scenario peak MiB before and after. Percent change is only meaningful when both runs use the same host and similar load.
Sanity-check with tests:
pytest tests/function_tests/test_memory_usage.py -qIf pytest passes, the same workloads stay within the smoke-test RSS bands used in CI.
This project is maintained by the hotglue team. We welcome contributions from the community via issues and pull requests.
If you wish to chat with our team, feel free to join our Slack!