Skip to content

chesterit21/SFCoreServerProviderOnnxRuntime

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

SFCore.OnnxRuntimeProvider.Api

Bahasa Indonesia | English

Full Documentation Click Zread in bellow zread


๐Ÿ‡ฎ๐Ÿ‡ฉ Bahasa Indonesia

API Provider berbasis ASP.NET Core yang dirancang khusus untuk menjalankan model AI dalam format ONNX menggunakan ONNX Runtime dengan performa tinggi dan optimasi perangkat keras yang handal.

๐Ÿš€ Fitur Utama

  • Performa Tinggi: Inferensi model machine learning yang dioptimalkan langsung oleh ONNX Runtime.
  • REST API Siap Pakai: Integrasi mudah dengan aplikasi eksternal melalui endpoint HTTP/JSON.
  • Arsitektur Fleksibel: Dirancang untuk menangani berbagai seri model ONNX (khususnya seri Qwen).
  • Deteksi Perangkat Keras Otomatis: Optimasi cerdas berdasarkan spesifikasi hardware pengguna.

๐Ÿ› ๏ธ Prasyarat

  • .NET 10.0 SDK (atau versi terbaru yang kompatibel).
  • Python (Opsional): Untuk menjalankan skrip pengujian tambahan seperti debug_request.py.

๐Ÿ“ฆ Pengaturan Model AI ONNX

PENTING: Sangat disarankan mengunduh model dari ONNX Community karena konfigurasi dan Tokenizer telah disesuaikan di dalam kode.

Rekomendasi Model: onnx-community/Qwen3.5-4B-ONNX (Versi: q4f16, q4, atau int8).

Daftar File Wajib:

  • decoder_model_merged_q4f16.onnx (& .onnx_data jika > 2GB)
  • encoder_model_merged_q4f16.onnx (& .onnx_data jika > 2GB)
  • tokenizer.json, tokenizer_config.json, vocab.json, merges.txt, special_tokens_map.json
  • Opsional: vision_encoder_q4f16.onnx (untuk dukungan Vision/Gambar).

Important

Jangan lupa untuk memperbarui path root folder model di file appsettings.json.

๐Ÿ’ก Optimasi & Kustomisasi

Jika Anda ingin performa yang lebih optimal:

  1. Bagikan kode HardwareDetector.cs, Program.cs, dan Qwen35inferenceengine.cs ke AI (Claude/Gemini/ZAI).
  2. Sertakan informasi spesifikasi perangkat keras (GPU/CPU/RAM) Anda.
  3. Minta AI untuk mengoptimalkan parameter inisialisasi agar sesuai dengan hardware Anda demi efisiensi maksimal.

๐Ÿ“Š Hasil Benchmark & Performa

Proyek ini telah diuji secara ekstensif pada hardware kelas server (Xeon v4) dan berhasil menangani 64,000 tokens context window dengan stabil.

  • Konteks Panjang: Mendukung hingga 64K tokens dengan degradasi performa minimal.
  • Efisiensi: Arsitektur hybrid memberikan kecepatan prefill yang konsisten bahkan pada context besar.
  • Melampaui Standar: Berbeda dengan llama.cpp atau Ollama yang seringkali tidak stabil atau sangat lambat pada CPU untuk konteks di atas 8K, proyek ini berhasil menjalankan 64K context secara penuh dengan stabil pada hardware tahun 2016.

Tip

Lihat laporan performa lengkap dan statistik benchmark di: BENCHMARK.md


๐Ÿ‡บ๐Ÿ‡ธ English

An ASP.NET Core-based API Provider specifically designed to run AI models in ONNX format using ONNX Runtime, focusing on high performance and reliable hardware optimization.

๐Ÿš€ Key Features

  • High Performance: Machine learning model inference optimized directly by ONNX Runtime.
  • Ready-to-use REST API: Seamless integration with external applications via HTTP/JSON endpoints.
  • Extensible Architecture: Designed to handle various ONNX model series (specifically tailored for Qwen).
  • Auto Hardware Detection: Intelligent optimization based on the user's hardware specifications.

๐Ÿ› ๏ธ Prerequisites

  • .NET 10.0 SDK (or relevant latest versions).
  • Python (Optional): For running additional testing scripts like debug_request.py.

๐Ÿ“ฆ ONNX AI Model Setup

IMPORTANT: It is highly recommended to download models from the ONNX Community repository as configurations and Tokenizers are already synchronized with the codebase.

Recommended Model: onnx-community/Qwen3.5-4B-ONNX (Versions: q4f16, q4, or int8).

Required Files:

  • decoder_model_merged_q4f16.onnx (& .onnx_data if > 2GB)
  • encoder_model_merged_q4f16.onnx (& .onnx_data if > 2GB)
  • tokenizer.json, tokenizer_config.json, vocab.json, merges.txt, special_tokens_map.json
  • Optional: vision_encoder_q4f16.onnx (for Vision/Image support).

Important

Remember to update the model folder root path in appsettings.json.

๐Ÿ’ก Optimization & Customization

To achieve maximum performance:

  1. Share the HardwareDetector.cs, Program.cs, and Qwen35inferenceengine.cs files with an AI (Claude/Gemini/ZAI).
  2. Provide your hardware specifications (GPU/CPU/RAM).
  3. Ask the AI to optimize the initialization parameters to match your specific hardware for peak efficiency.

๐Ÿ“Š Benchmarks & Performance

This project has been extensively tested on server-grade hardware (Xeon v4) and successfully handles a 64,000 tokens context window with high stability.

  • Long Context: Supports up to 64K tokens with minimal performance degradation.
  • Efficiency: The hybrid architecture ensures consistent prefill speeds even at large context scales.
  • Beyond Industry Standards: Unlike llama.cpp or Ollama, which often encounter instability or severe slowdowns on CPU for contexts exceeding 8K, this project successfully processes a full 64K context with stability on hardware from 2016.

Tip

Read the full performance report and benchmark statistics at: BENCHMARK.md


โš™๏ธ Quick Start / Mulai Cepat

  1. Clone Repository:

    git clone https://github.com/USERNAME/SFCoreServerProviderOnnxRuntime.git
    cd SFCoreServerProviderOnnxRuntime
  2. Restore Dependencies:

    dotnet restore SFCore.OnnxRuntimeProvider.Api
  3. Run Application:

    dotnet run --project SFCore.OnnxRuntimeProvider.Api

Note

Default URL: http://localhost:5034 (or as configured in appsettings.json).

๐Ÿ“– API Documentation (Swagger)

Once running, access the Swagger UI at: http://localhost:<PORT>/swagger

๐Ÿค Contributing / Berkontribusi

We welcome contributions! Please check CONTRIBUTING.md for guidelines.

๐Ÿ“„ License / Lisensi

Distributed under the MIT License.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors