BURST - BTRFS Ultrafast Restore from S3 Transfers

BURST is software and a zip-based archive format that offers an optimized integration between Amazon S3 and the BTRFS Linux filesystem. It is probably the fastest way to load large numbers of files onto an EC2 instance from S3.

BURST can also be used without BTRFS, with different but generally good performance characteristics.

Installation

Download the latest release for your platform from GitHub Releases.

# Example for Linux x86_64 (replace v1.0.0 with the latest version)
VERSION=v1.0.0
curl -LO https://github.com/posit-dev/burst/releases/download/${VERSION}/burst-${VERSION}-linux-x86_64.tar.gz
tar -xzf burst-${VERSION}-linux-x86_64.tar.gz
sudo mv burst-${VERSION}-linux-x86_64/burst-writer burst-${VERSION}-linux-x86_64/burst-downloader /usr/local/bin/

Verify the download (optional):

curl -LO https://github.com/posit-dev/burst/releases/download/${VERSION}/checksums.txt
sha256sum -c checksums.txt --ignore-missing

Basic Usage

To use BURST, first create a BURST archive of the files you wish to save in S3:

Creating an archive

burst-writer -o name-of-archive.zip /path/to/directory

This will create an archive file containing all the files and folders under /path/to/directory in S3.

Direct upload to S3 not currently implemented -- you'll need to then upload this file to S3 using another tool.

Restoring the archive

One of BURST's optimization strategies involves a direct (ioctl) interaction with the BTRFS filesystem. This interaction requires that the downloader run as root (or in any way that grants CAP_SYS_ADMIN.) BURST also preserves file ownership, which requires these permissions as well.

sudo ./burst-downloader -b name-of-bucket -k archive-name-in-S3 -r aws-region -o /path/to/restore/to

This will download the archive from S3 and recreate the data at /path/to/restore/to.

It is also possible to run the downloader without elevated permissions. In this mode, the data has to be immediately decompressed as it is downloaded and written to disk using conventional write()s. This approach has higher disk throughput requirements, higher CPU utilization, and lower disk use efficiency.

Compatibility

It is a design priority that data captured into BURST formatted S3 objects can be recovered in future using generally available tools, without necessarily relying on BURST software or running on systems BURST supports.

BURST archives are compliant with the ZIP specification, and zip extractors unaffiliated with the BURST project exist that are capable of extracting BURST zip archives -- albeit without the performance optimizations. Specifically, the zip writer logic is rigorously tested against 7-Zip. Any archives generated by BURST that 7-Zip cannot extract correctly would be considered a bug. More extractors may be added to the test matrix in future.

Note that lots of popular zip extractor implementations cannot process BURST archives because they do not support the ZStandard compression algorithm that BURST utilizes.

At a high level, the most notable difference between a BURST zip archive and any regular ZIP archive is that certain binary structures are guaranteed to occur at every 8th MiB of the overall object.

Security

You should not run the BURST downloader on archives that could have been created by untrusted writers, especially when running it as root. There are several reasons for this:

BURST preserves file ownership, and so would enable an untrusted source to craft a zip stream that created executable files owned by priviliged users or even potentially overwrote files critical to system integrity.
At present, no work has been applied to fuzzing the downloader for intentionally malicious zip streams. It is probable that security vulnerabilities exist when presented with intentionally malformed input.

Performance

The downloader is currently capable of restoring a dataset comprised of a comprehensive Ubuntu-based system of around 250,000 inodes spanning 8.7 GiB in 5.9 - 6.2 seconds on an i7ie.3xlarge ec2 instance.

Much more performance benchmarks to come.

Optimization techniques employed

The fundamental techniques leveraged are

Adheres to published patterns for high performance S3 utilization, including
- Many concurrent TCP streams utilizing byte-range fetches, with download requests typically (can be overridden) aligned to the part boundaries used during multipart upload.
- Does not issue requests that yield small amounts of response data, e.g. for small objects. S3 offers high sequential read bandwidth but also high roundtrip latencies, so the ability to create many small files from a single S3 request greatly improves small file restoration performance.
- Internally utilizes aws-c-s3, a library that obsesses over maximizing the performance of S3 in the EC2 environment. This enables us to inherit many micro-optimizations such as DNS load balancing across the S3 fleet.
After the zip central directory is fetched (typically one roundtrip), response data from all concurrent multipart downloads is immediately streamed to the final physical disk blocks where that data needs to reside. The downloader knows what part of what file any given byte-range download relates to. There is no need to download to a temporary location first or wait for other concurrent downloads to complete.
Passthrough ZStandard compression, meaning the compression in s3 is the same as the compression on disk after restoration. BTRFS only decompresses on read. This is important because it reduces total data that must be written out, which is usually the bottleneck: even direct-attached NVMe instance storage in EC2 typically has much lower write throughput than the instance's network bandwidth to S3. Leaving the data compressed also reduces CPU utilization during extraction and enables more total information to be placed on valuable instance block storage.

The resulting decompress on read does increase CPU overhead when performing reads that are page cache misses, but given ZStandard's exceptional decompress performance, the disk savings and improved extraction performance are probably desirable for most applications.

Technical Details

See the docs/ folder for an overview of the BURST format and principles of its efficient use.

Building

Prerequisites

sudo apt-get install -y ruby cmake libzstd-dev zlib1g-dev

Build Steps

mkdir build
cd build
cmake ..
make

Testing Prerequisites

7-Zip with Zstandard Support

To run the Zstandard compression tests, you need 7-Zip with Zstandard codec support.

⚠️ Ubuntu/Debian Package Issue: The distribution-packaged p7zip-full (7-Zip 23.01+dfsg) strips Zstandard codec support for DFSG (Debian Free Software Guidelines) compliance. You must install the official 7-Zip from 7-zip.org instead.

Additional Testing Dependencies

sudo apt-get install -y unzip

Running Tests

# Via Makefile (recommended)
make test                         # All tests
make test-unit                    # Unit tests (< 5s)
make test-integration             # Fast integration (~1min with 4 parallel jobs)
make test-slow                    # Slow E2E tests (~5min with 4 parallel jobs)

# Control parallelism
CTEST_PARALLEL_LEVEL=8 make test-integration  # Use 8 parallel jobs
CTEST_PARALLEL_LEVEL=1 make test-integration  # Disable parallelism

# Via CTest (from build directory)
ctest                             # Run all tests
ctest -V                          # Verbose output
ctest -R test_alignment           # Run specific test
ctest -L slow --parallel 4        # Slow tests with 4 jobs

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.github/workflows		.github/workflows
docs		docs
include		include
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RELEASING.md		RELEASING.md
worktree-create.sh		worktree-create.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BURST - BTRFS Ultrafast Restore from S3 Transfers

Installation

Basic Usage

Creating an archive

Restoring the archive

Compatibility

Security

Performance

Optimization techniques employed

Technical Details

Building

Prerequisites

Build Steps

Testing Prerequisites

7-Zip with Zstandard Support

Additional Testing Dependencies

Running Tests

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BURST - BTRFS Ultrafast Restore from S3 Transfers

Installation

Basic Usage

Creating an archive

Restoring the archive

Compatibility

Security

Performance

Optimization techniques employed

Technical Details

Building

Prerequisites

Build Steps

Testing Prerequisites

7-Zip with Zstandard Support

Additional Testing Dependencies

Running Tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages