Serious AI infrastructure. Packaged simply.
Warning
Sipp is under active development. Breaking changes are expected as we optimize the runtime layers. It might not be suitable for mission-critical production environments yet. If you find issues, bugs, or missing features, please open a GitHub issue.
Sipp is an all-in-one, high-performance AI framework for building web, desktop, and edge applications. It ships as a cohesive SDK with a unified, symmetric API for local, provider, and cloud gateway inference.
At its core is Sipp Engine, a blazing-fast runtime built to run anywhere: in the browser, on the desktop, or on bare-metal cloud infrastructure, that delivers low startup times and a minimal memory footprint.
import { SippClient } from '@sipphq/sipp';
const blender = new SippClient();
// 1. Initialize high-speed, local WebGPU or CUDA inference
const juice = await blender.add('edge', { kind: 'local', source: '/models/llama3.gguf' });
// 2. Or connect to a secure cloud proxy using the exact same interface
const ice = await blender.add('cloud', { kind: 'gateway', baseUrl: 'https://gateway.example.com/v1/' });
// Run inference on either endpoint seamlessly with a symmetric API
const [smoothie, snowcone] = await Promise.all([
blender.chat([{ role: 'user', content: 'Explain Sipp.' }], { endpoint: juice }),
blender.chat([{ role: 'user', content: 'Create a Sipp app.' }], { endpoint: ice })
]);The unified SDK lets you dynamically partition and optimize complex application logic between local and cloud compute. Instead of wrestling with fragmented web runtimes, disconnected native wrappers for desktop, or custom middleware to protect API keys, you only need Sipp.
It packages a high-performance WebGPU engine, with a secure container gateway proxy into a single, neat toolkit. Future releases will focus on embedded vector memory, on-device PII masking, and automated smart routing. See Roadmap.
sipp build wasm # Compile high-performance WebGPU assets
sipp run demos serve chat # Launch a local, hardware-accelerated test canvas
Run them yourself here: benchmark.sipp.sh/benchmark
| Runtime / Framework | TTFT (ms) ↓ | Decode (tok/s) ↑ | E2E Latency (ms) ↓ |
|---|---|---|---|
| Sipp | 24.3 (Best) | 77.07 (Best) | 6,655 (Best) |
| WebLLM | 160.0 (6.55x) | 25.80 (2.99x) | 19,930 (2.99x) |
| Transformers.js | 301.0 (12.38x) | 33.25 (2.32x) | 15,670 (2.35x) |
Disclaimer & Metric Notes:
- TTFT (Time to First Token): Measured in milliseconds (ms). Lower is better.
- Decode: Measured in tokens per second (tok/s). Higher is better.
- E2E Latency (End-to-End Latency): Measured in milliseconds (ms). Lower is better.
- Performed on a Nvidia GTX 3080, 1 warm up, 3 measured runs. Results avg. of all measured runs.
Sipp supports web browsers, desktop application wrappers, server environments, and native runtimes. Install the specific implementation layer for your surface environment:
# For Web Browsers, Next.js, and TanStack applications
npm install @sipphq/sipp
# For Node.js backend deployments (with native CUDA/Metal compilation)
npm install @sipphq/sipp-server
# For native systems development and application embedding
cargo add sipp-rs
# For Python automation and data engineering pipelines
# (sippy wheels ship from GitHub Releases today; full PyPI build matrix in progress)
# pip install sipppy
# Deploy the secure cloud gateway server instance via Docker
# (cloud gateway will be available in the future, currently building from source)
# docker pull noumena/sipp-gateway
Most developers should start with our pre-built, published packages rather than compiling directly from the monorepo source.
| Surface | Module | Install | Docs |
|---|---|---|---|
| Browser | Sipp Edge | npm install @sipphq/sipp |
Browser package |
| Node.js | Sipp Core | npm install @sipphq/sipp-server |
Node.js package |
| Rust | Sipp Core | cargo add sipp-rs |
Rust package |
| Python | Sipp Core | Wheels available on release page | Python package |
| Gateway Server | Sipp Cloud | Source-built | Gateway Server |
| Gateway Toolkit | Sipp Cloud | Source-built | Gateway toolkit |
Initialize the local engine client to execute model weights directly on the client machine's shader cores using WebGPU.
npm install @sipphq/sipp
import { SippClient } from '@sipphq/sipp';
const messages = [
{ role: 'system', content: 'Answer concisely.' },
{ role: 'user', content: 'Explain Sipp in one sentence.' },
];
const client = new SippClient();
const endpoint = await client.add('default', {
kind: 'local',
source: '/models/model.gguf',
});
const run = client.chat(messages, {
endpoint,
maxTokens: 64,
});
console.log((await run.response).text);
await client.close();Cloud gateway clients use the exact same SippClient API layout. The gateway owns model paths, provider credentials, access policies, and centralized metrics tracking; your client application code only needs the gateway routing target URL.
import { SippClient } from '@sipphq/sipp';
const client = new SippClient();
const endpoint = await client.add('gateway', {
kind: 'gateway',
target: 'upstream-cluster',
baseUrl: 'https://gateway.example.com/v1/',
authentication: { kind: 'bearer', value: await getGatewayToken() },
});
const run = client.query('Explain gateway inference.', {
endpoint,
maxTokens: 64,
});
console.log((await run.response).text);
await client.close();Sipp includes native integration blueprints to handle Server-Sent Events (SSE) streaming, serverless route orchestration, and client hydration patterns out of the box.
- Next.js: App Router route handlers, Client Components, gateway proxies, and streaming.
- TanStack: TanStack Start server functions and TanStack Query patterns.
- React And Vite: Browser package setup, WASM assets, OPFS model loading, and gateway examples.
The full documentation lives in docs/en. From a source checkout, use the sipp docs CLI tool utility to build or serve the book resource:
sipp docs build
sipp docs serve
sipp docs automatically evaluates and installs required mdBook tooling when missing and configures the Mermaid compilation assets used by the technical book layout.
Our core development trajectory is oriented around expanding the edge-cloud infrastructure for running hybrid systems, where local and cloud resources are orchestrated seamlessly.
For a detailed structural breakdown of milestones, memory architectures, and long-term research initiatives, see the full Sipp Technical Roadmap.
To bootstrap the workspace workspace environment, initialize cross-platform profiles, and run structural unit assertions, utilize the integrated CLI environment scripts:
source ./setup.sh
sipp doctor
sipp test list
(On Windows platforms, execute .\setup.ps1 inside PowerShell or setup.cmd via classic CMD if not using Git Bash or WSL).
sipp build wasm && sipp run examples serve browser
sipp build node --backend cpu && node examples/node/query.mjs <model.gguf> "Explain Sipp."
sipp build python --backend cpu && python examples/python/query.py <model.gguf> "Explain Sipp."
sipp run demos serve chat
For thorough verification steps, consult the Source Builds Documentation and our full Testing Framework Suite.
- crates: The published core
sipp-rsand low-level backendsipp-sysRust crates. - lib: High-level language package surfaces and gateway proxy toolkit.
- bindings: Native Node.js bindings, Python extensions, and browser-compiled WASM targets.
- apps: First-party user interfaces and monitoring implementations.
- examples: Small, runable framework integration blueprints.
- demos: Advanced browser sandboxes running on public package surfaces.
- tools/playground: Live browser-runtime profiling and hardware execution diagnostics.
xtask/: Internal cargo automation engine driving build, test, and package deployment pipelines.
Sipp is licensed under the Apache-2.0 License. Vendored third-party dependencies preserve their respective upstream open-source licensing constraints and documentation requirements; see the third-party notices.