What's the biggest headache when training or deploying a large language model? OOM (Out of Memory).
This project provides precise memory requirement estimation tools to help you:
- β‘ Avoid Resource Waste - Accurately estimate VRAM needs, no more over-provisioning
- π― Quick Decision Making - Rapidly determine if a model can run on existing hardware
- π° Cost Optimization - Properly plan GPU resources and reduce cloud service costs
- π Multi-Scenario Support - Cover training, inference, quantization, and more
- Precise Memory Estimation - Accurate calculation based on model architecture, precision, batch size, and other parameters
- Multi-Precision Support - Full coverage of FP32, FP16, BF16, INT8, INT4
- Training & Inference Modes - Separate calculations for different scenarios
- Optimizer State Calculation - Includes additional overhead from optimizers like Adam, AdamW
- Activation Estimation - Considers intermediate activations in forward propagation
- Visual Display - Clear charts showing memory distribution
Backend
- Python 3.10+ (managed with
uv) - Flask / FastAPI
- Scientific computing libraries (NumPy, etc.)
Frontend
- Next.js 14+
- React 18+
- TypeScript
- Tailwind CSS
For rough estimates, use these rules:
| Scenario | Formula | Example |
|---|---|---|
| Inference | Memory β Parameters Γ Precision |
7B model Γ FP16 (2 bytes) β 14GB |
| Training | Memory β Inference Memory Γ 4~6 |
14GB Γ 5 β 70GB |
The memory calculator considers the following factors:
| Factor | Description | Impact |
|---|---|---|
| Model Parameters | Model size (e.g., 7B, 13B, 70B) | Base memory footprint |
| Data Precision | FP32 (4B) / FP16 (2B) / INT8 (1B) | Directly affects weight storage |
| Batch Size | Number of samples processed in parallel | Affects activation size |
| Sequence Length | Maximum length of input/output text | Affects KV Cache and activations |
| Optimizer State | Extra state from optimizers like Adam | Typically 2-3Γ parameters during training |
| Gradients | Gradient storage for backpropagation | Equal to parameter size |
| Activations | Intermediate results in forward pass | Related to layer count and batch size |
Recommended: Memory Requirements for LLM Training and Inference
No installation needed, visit Live Demo to start using immediately.
Prerequisites
- Git
- Python 3.10+
- Node.js 18+ (LTS)
uv(Python package manager)
One-Click Setup
# 1. Clone the repository
git clone https://github.com/amazingchow/LLMToolset.git
cd LLMToolset
# 2. Start backend (Terminal 1)
cd backend
uv venv && uv sync # Install dependencies
make dev # Start service (http://127.0.0.1:15050)
# 3. Start frontend (Terminal 2)
cd frontend
npm install # Install dependencies
npm run dev # Start service (http://localhost:13031)Access the Application
Open your browser and visit http://localhost:13031
- Select a model (e.g., Qwen3-8B-Base)
- Choose precision (FP32 / FP16 / BF16 / INT8 / INT4)
- Set batch size and sequence length
- Select mode (Inference / Training)
- Click "Calculate" to view results
- Memory Calculator - Training & inference memory estimation
- Model Quantization Tool - INT8/INT4 quantization support
- Performance Benchmarking - Multi-hardware performance comparison
- Cost Estimator - Cloud service cost prediction
- Visualization Dashboard - Real-time resource monitoring
- Docker Deployment - Simplified deployment process
- API Endpoints - RESTful API support
All contributions are welcome!
Ways to Participate
- π Report bugs
- π‘ Suggest new features
- π Improve documentation
- π§ Submit pull requests
Development Workflow
- Fork this repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Q: Why does the calculation differ from actual usage?
A: Memory estimation is affected by multiple factors:
- Framework overhead (PyTorch/TensorFlow, etc.)
- Model implementation details
- Compilation optimizations
- System caching
Use the estimates as a reference and test before actual deployment.
Q: Which model architectures are supported?
A: Currently mainly supports Transformer-based models, including:
- GPT series
- BERT series
- LLaMA / Qwen / Mistral, etc.
- T5 / BART, etc.
Support for other architectures (like Mamba) is under development.
Q: How to contribute new model configurations?
A: Add a JSON configuration file in the backend/models/ directory with model parameters, layer count, etc., then submit a PR.
This tool provides estimates, not precise measurements. Actual memory usage is affected by hardware, software, configuration, and other factors. Always conduct actual testing and performance analysis before production deployment.
This project is open-sourced under the MIT License.
Made with β€οΈ by @amazingchow
