Cobble is a high-performance LSM-based key-value storage engine designed for both embedded and distributed systems. It provides a flexible and efficient storage solution for various workloads, from small-scale applications to large distributed services. Compared with other embedded key-value stores like RocksDB, it offers multiple file formats (SSTable and Parquet), distributed storage support, distributed snapshots, online rescaling between nodes, remote compaction, and more. Thus, it really fits the needs of modern distributed systems that require a versatile and scalable storage engine.
We list some of Cobble's key features below, they are either implemented or are planned for future releases:
- Hybrid Storage: Local disk and remote object storage (S3, OSS, etc.) can be used individually or together; supports multi-volume distributed I/O scheduling.
- Schema Support & Evolution: User-defined column schemas with incremental evolution.
- Multiple File Formats: SST and Parquet for both point lookup and analytical queries.
- Distributed Snapshots: Global consistent snapshots across multiple shards and machines, with local shard snapshots as building blocks.
- One writer, multiple readers for one shard: A single writer for consistency, with concurrent readers across processes or machines.
- Remote Compaction: Compaction can run on remote object storage to reduce local resource usage.
- Multi-version Snapshots: Read historical data states via versioned snapshots.
- Key-value Separation: Separates keys and values to optimize large-value, low-access patterns.
- Time-to-live (TTL): Expire and clean up data automatically.
- Hot/Cold Separation: Optimize storage and access efficiency with multiple strategies.
- Merge Operators: Support for user-defined merge operations on values. Efficiently handle updates without reading existing values.
- Multi-language Bindings: Now java-binding supported. Planned support for C, C++, Python and Go bindings.
For more details on features and design, see docs:
Add cobble to your Cargo.toml:
[dependencies]
cobble = "0.1.0"Cobble uses Apache OpenDAL for volume backends.
The local file:// backend is always enabled by default and does not require any Cargo feature.
Optional remote/storage-service features exposed by Cobble are:
storage-alluxiostorage-cosstorage-ossstorage-s3storage-ftpstorage-hdfsstorage-sftp
On Windows,
storage-hdfsandstorage-sftpare currently not supported.
[dependencies]
cobble = { version = "0.1.0", default-features = false, features = ["storage-s3"] }- Enable all optional remote/storage-service backends:
storage-all - Crates that depend on
cobblein this workspace (for examplecobble-cli,cobble-web-monitor,cobble-cluster,cobble-bench,cobble-data-structure,cobble-java) also re-expose the samestorage-*feature names and forward them tocobble.
Cobble usage can be viewed as step 0 + five patterns.
For complete guides and more examples, see docs:
- https://cobble-project.github.io/cobble/latest/
- https://cobble-project.github.io/cobble/latest/getting-started/
Before any API flow, define Config and volume layout.
Volume categories (VolumeUsageKind) and their roles:
PrimaryDataPriorityHigh/Medium/Low: main data files (SST/parquet/VLOG) with priority-aware placement.Meta: metadata (manifests, pointers, schema files).Snapshot: snapshot materialization target when separated from primary.Cache: block cache disk tier (when hybrid cache is enabled).Readonly: read-only source volumes for loading historical files.
Minimal practical setup: one local path via VolumeDescriptor::single_volume(...).
This is the simplest single-path deployment and is enough for local development.
use cobble::{Config, VolumeDescriptor};
let mut config = Config::default();
config.volumes = VolumeDescriptor::single_volume("file:///tmp/cobble");Important
For any restore/resume flow, runtime must still be able to access all files referenced by that snapshot (snapshot manifests, schema files, and data/VLOG files). If any referenced file is missing or inaccessible, restore can fail.
This is the simplest mode. You run one embedded process with local write/read and single-node global snapshots.
Create + write:
use cobble::{Config, SingleDb, VolumeDescriptor};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut config = Config::default();
config.num_columns = 2;
config.total_buckets = 1;
config.volumes = VolumeDescriptor::single_volume("file:///tmp/cobble-single");
let db = SingleDb::open(config)?;
db.put(0, b"user:1", 0, b"Alice")?;
db.put(0, b"user:1", 1, b"premium")?;
let global_snapshot_id = db.snapshot()?;
println!("snapshot id = {}", global_snapshot_id);
Ok(())
}Recover/read flow:
- Resume directly via
SingleDb::resume(config, global_snapshot_id). - Continue normal read/write on the resumed embedded instance.
use cobble::{Config, CoordinatorConfig, Db, DbCoordinator};
// shard writers
let db1 = Db::open(config1, vec![0..=499])?;
let db2 = Db::open(config2, vec![500..=999])?;
// coordinator
let coord = DbCoordinator::open(CoordinatorConfig {
volumes: coordinator_volumes,
snapshot_retention: Some(5),
})?;
// write
db1.put(100, b"user:1", 0, b"Alice")?;
db2.put(700, b"order:9", 0, b"paid")?;
// global snapshot
let s1 = db1.snapshot()?;
let s2 = db2.snapshot()?;
let i1 = db1.shard_snapshot_input(s1)?;
let i2 = db2.shard_snapshot_input(s2)?;
let manifest = coord.take_global_snapshot(1000, vec![i1, i2])?;
coord.materialize_global_snapshot(&manifest)?;Remote compaction example:
let mut config = Config::default();
config.compaction_remote_addr = Some("127.0.0.1:18888".to_string());See full distributed setup and restore examples: https://cobble-project.github.io/cobble/latest/getting-started/distributed/
use cobble::{Reader, ReaderConfig, VolumeDescriptor};
let read_config = ReaderConfig {
volumes: VolumeDescriptor::single_volume("file:///tmp/cobble"),
total_buckets: 1024,
..ReaderConfig::default()
};
let mut reader = Reader::open_current(read_config)?;
let v = reader.get(0, b"user:1")?;
reader.refresh()?; // pull newer materialized snapshotMore Reader details:
https://cobble-project.github.io/cobble/latest/getting-started/reader-and-scan/
use cobble::{ScanOptions, ScanPlan};
let plan = ScanPlan::new(global_manifest);
for split in plan.splits() {
let scanner = split.create_scanner(config.clone(), &ScanOptions::default())?;
for row in scanner {
let (key, columns) = row?;
// process row...
}
}cobble-data-structure provides typed wrappers for all flows above:
- Single-machine embedded:
StructuredSingleDb - Distributed write shards:
StructuredDb - Real-time read:
StructuredReader - Snapshot pinned read:
StructuredReadOnlyDb - Distributed scan:
StructuredScanPlan/StructuredScanSplit
All snapshot/read/scan patterns are the same as core cobble, but values are
encoded/decoded as structured typed columns (Bytes/List).
use bytes::Bytes;
use cobble::{Config, VolumeDescriptor};
use cobble_data_structure::{ListConfig, ListRetainMode, StructuredColumnValue, StructuredSingleDb};
let mut config = Config::default();
config.num_columns = 2;
config.total_buckets = 1;
config.volumes = VolumeDescriptor::single_volume("file:///tmp/cobble-structured");
let mut db = StructuredSingleDb::open(config)?;
db.update_schema()
.add_list_column(1, ListConfig {
max_elements: Some(100),
retain_mode: ListRetainMode::Last,
preserve_element_ttl: false,
})
.commit()?;
db.put(0, b"k1", 0, StructuredColumnValue::Bytes(Bytes::from_static(b"v0")))?;More scan examples: https://cobble-project.github.io/cobble/latest/getting-started/reader-and-scan/
More structured examples: https://cobble-project.github.io/cobble/latest/getting-started/structured-db/
- Format:
cargo fmt --all - Lint:
cargo clippy -- -D warnings - Test:
cargo test
We welcome contributions from the community! Please refer to the CONTRIBUTING.md file for guidelines on how to contribute to the project.
This project is licensed under the Apache-2.0 License. See the LICENSE file for details.
- Zakelly - Project Founder & Main Developer
