🛠️ LLM Toolset

Lightweight Resource Planning Tools for LLM Developers and Researchers

Live Demo · Report Bug · Request Feature

✨ Why LLM Toolset?

What's the biggest headache when training or deploying a large language model? OOM (Out of Memory).

This project provides precise memory requirement estimation tools to help you:

⚡ Avoid Resource Waste - Accurately estimate VRAM needs, no more over-provisioning
🎯 Quick Decision Making - Rapidly determine if a model can run on existing hardware
💰 Cost Optimization - Properly plan GPU resources and reduce cloud service costs
📊 Multi-Scenario Support - Cover training, inference, quantization, and more

🎯 Core Features

Precise Memory Estimation - Accurate calculation based on model architecture, precision, batch size, and other parameters
Multi-Precision Support - Full coverage of FP32, FP16, BF16, INT8, INT4
Training & Inference Modes - Separate calculations for different scenarios
Optimizer State Calculation - Includes additional overhead from optimizers like Adam, AdamW
Activation Estimation - Considers intermediate activations in forward propagation
Visual Display - Clear charts showing memory distribution

🚀 Tech Stack

Backend

Python 3.10+ (managed with uv)
Flask / FastAPI
Scientific computing libraries (NumPy, etc.)

Frontend

Next.js 14+
React 18+
TypeScript
Tailwind CSS

📖 Usage Examples

Quick Estimation (Rules of Thumb)

For rough estimates, use these rules:

Scenario	Formula	Example
Inference	`Memory ≈ Parameters × Precision`	7B model × FP16 (2 bytes) ≈ 14GB
Training	`Memory ≈ Inference Memory × 4~6`	14GB × 5 ≈ 70GB

Precise Calculation

The memory calculator considers the following factors:

Factor	Description	Impact
Model Parameters	Model size (e.g., 7B, 13B, 70B)	Base memory footprint
Data Precision	FP32 (4B) / FP16 (2B) / INT8 (1B)	Directly affects weight storage
Batch Size	Number of samples processed in parallel	Affects activation size
Sequence Length	Maximum length of input/output text	Affects KV Cache and activations
Optimizer State	Extra state from optimizers like Adam	Typically 2-3× parameters during training
Gradients	Gradient storage for backpropagation	Equal to parameter size
Activations	Intermediate results in forward pass	Related to layer count and batch size

📚 Further Reading

Recommended: Memory Requirements for LLM Training and Inference

⚡ Quick Start

Online Demo (Recommended)

No installation needed, visit Live Demo to start using immediately.

Local Deployment

Prerequisites

Git
Python 3.10+
Node.js 18+ (LTS)
uv (Python package manager)

One-Click Setup

# 1. Clone the repository
git clone https://github.com/amazingchow/LLMToolset.git
cd LLMToolset

# 2. Start backend (Terminal 1)
cd backend
uv venv && uv sync  # Install dependencies
make dev            # Start service (http://127.0.0.1:15050)

# 3. Start frontend (Terminal 2)
cd frontend
npm install         # Install dependencies
npm run dev         # Start service (http://localhost:13031)

Access the Application

Open your browser and visit http://localhost:13031

Usage Workflow

Select a model (e.g., Qwen3-8B-Base)
Choose precision (FP32 / FP16 / BF16 / INT8 / INT4)
Set batch size and sequence length
Select mode (Inference / Training)
Click "Calculate" to view results

🗺️ Roadmap

Memory Calculator - Training & inference memory estimation
Model Quantization Tool - INT8/INT4 quantization support
Performance Benchmarking - Multi-hardware performance comparison
Cost Estimator - Cloud service cost prediction
Visualization Dashboard - Real-time resource monitoring
Docker Deployment - Simplified deployment process
API Endpoints - RESTful API support

🤝 Contributing

All contributions are welcome!

Ways to Participate

🐛 Report bugs
💡 Suggest new features
📝 Improve documentation
🔧 Submit pull requests

Development Workflow

Fork this repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📝 FAQ

Q: Why does the calculation differ from actual usage?

A: Memory estimation is affected by multiple factors:

Framework overhead (PyTorch/TensorFlow, etc.)
Model implementation details
Compilation optimizations
System caching

Use the estimates as a reference and test before actual deployment.

Q: Which model architectures are supported?

A: Currently mainly supports Transformer-based models, including:

GPT series
BERT series
LLaMA / Qwen / Mistral, etc.
T5 / BART, etc.

Support for other architectures (like Mamba) is under development.

Q: How to contribute new model configurations?

A: Add a JSON configuration file in the backend/models/ directory with model parameters, layer count, etc., then submit a PR.

⚠️ Disclaimer

This tool provides estimates, not precise measurements. Actual memory usage is affected by hardware, software, configuration, and other factors. Always conduct actual testing and performance analysis before production deployment.

📄 License

This project is open-sourced under the MIT License.

If this project helps you, please give us a ⭐️!
Made with ❤️ by @amazingchow

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
backend		backend
frontend		frontend
images		images
.cursorrules		.cursorrules
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README_ZH.md		README_ZH.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛠️ LLM Toolset

✨ Why LLM Toolset?

🎯 Core Features

🚀 Tech Stack

📖 Usage Examples

Quick Estimation (Rules of Thumb)

Precise Calculation

📚 Further Reading

⚡ Quick Start

Online Demo (Recommended)

Local Deployment

Usage Workflow

🗺️ Roadmap

🤝 Contributing

📝 FAQ

⚠️ Disclaimer

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🛠️ LLM Toolset

✨ Why LLM Toolset?

🎯 Core Features

🚀 Tech Stack

📖 Usage Examples

Quick Estimation (Rules of Thumb)

Precise Calculation

📚 Further Reading

⚡ Quick Start

Online Demo (Recommended)

Local Deployment

Usage Workflow

🗺️ Roadmap

🤝 Contributing

📝 FAQ

⚠️ Disclaimer

📄 License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages