Flamingo

Flamingo is a system built for privacy-preserving federated learning, where individual training weights are aggregated using secure aggregation. This implementation accompanies our paper by Yiping Ma, Jess Woods, Sebastian Angel, Antigoni Polychroniadou and Tal Rabin at IEEE S&P (Oakland) 2023.

WARNING: This is an academic proof-of-concept prototype and is not production-ready.

Overview

We integrate our code into ABIDES, an open-source highfidelity simulator designed for AI research in financial markets (e.g., stock exchanges). The simulator supports tens of thousands of clients interacting with a server to facilitate transactions (and in our case to compute sums). It also supports configurable pairwise network latencies.

Flamingo protocol works by steps (i.e., round trips). A step includes waiting and processing messages. The waiting time is set according to the network latency distribution and a target dropout rate. See more details in Section 8 in our paper.

The main branch contains the code for private sum protocol and the fedlearn branch contains the code for private training of machine learning models.

Installation Instructions

Requires Python 3.9+. You can use pip directly or set up a virtual environment.

Option A: pip (simplest)

pip install -r requirements.txt

Option B: Conda

conda create --name flamingo-v0 python=3.9.12
conda activate flamingo-v0
pip install -r requirements.txt

Private Sum

The code is in branch main.

First enter into folder pki_files, and run

python setup_pki.py

Command-Line Options

-c [protocol name]                  flamingo or google_malicious
-n [number of clients]              power of 2, minimum 128 (e.g., 128, 256, 512)
-i [number of iterations]           number of protocol iterations
-p [parallel mode]                  1=on, 0=off
-o [neighborhood size]              multiplicative factor of log(n)
-s [random seed]                    for reproducibility (optional)
-e [vector length]                  override input vector length (optional)
-w [wait mode]                      fixed or adaptive (default: fixed)
--wait_threshold [fraction]         threshold for adaptive mode (default: 0.9)
-d [debug mode]                     1=on, 0=off

Example commands:

python abides.py -c flamingo -n 128 -i 1 -p 1
python abides.py -c flamingo -n 256 -i 1 -w adaptive --wait_threshold 0.9
python abides.py -c google_malicious -n 256 -i 1 -w adaptive -e 10000

If you want to print out information of every agent, add -d 1 to the above command.

NOTE: For ease of benchmarking, we separate the setup phase (folder dkg) and the private sum phase (folder flamingo). You can execute the command above directly as we provide the shares of the secret key to decryptors (a small random subset of clients) before the summation begins. If you wish to benchmark the setup independently, run python abides.py -c dkg -n [number of decryptors].

Server Waiting Modes: Fixed vs. Adaptive

The server supports two waiting modes that control when it proceeds from one round to the next.

Fixed mode (-w fixed): The server waits for a preconfigured timeout in each round (set in util/param.py), regardless of how many messages have arrived. Simple and predictable, but wastes time if messages arrive early.
Adaptive mode (-w adaptive): The server proceeds as soon as enough messages arrive, based on per-round trigger conditions. A 60-second safety timeout prevents indefinite waiting. This significantly reduces end-to-end latency (typically 29-40% faster) with only a marginal decrease in participation.

The two protocols have different round structures, so adaptive mode behaves differently for each.

Flamingo Protocol

Flamingo has 3 rounds per iteration (after initialization). In adaptive mode, all 3 rounds are threshold-based — the server proceeds once it receives a sufficient fraction of expected messages.

Round	Name	Fixed Timeout	Adaptive Trigger
1	`report`	10s	`wait_threshold` of `num_clients` vectors received (default 90%)
2	`crosscheck`	3s	2/3 of committee signatures received
3	`reconstruction`	3s	`committee_threshold` decryption shares received (determined by protocol parameters)

Round 3 uses a hard threshold (committee_threshold) derived from the committee size and secret-sharing fraction, not the configurable wait_threshold.

Google Malicious Protocol

Google Malicious has 6 rounds per iteration: 3 setup rounds followed by 3 aggregation rounds. In adaptive mode, only 1 round uses the configurable threshold; the rest wait for all expected messages.

Round	Name	Fixed Timeout	Adaptive Trigger
1	`advertise_keys`	10s	All `num_clients` pubkeys received
2	`establish_graph`	10s	All graph choices received
3	`forward_shares`	30s	All backup shares received
4	`collection`	10s	`wait_threshold` of vectors received (default 90%)
5	`check_alive`	3s	All ACKs from online clients
6	`reconstruction`	2s	All reconstruction shares from online clients

Rounds 1-3 are setup phases where every client's data is needed for correct pairwise mask construction, so they require 100% participation. Only Round 4 (collection) applies the configurable wait_threshold. Rounds 5-6 wait for all online clients — here "online" refers to the set of clients that successfully submitted their masked vector in Round 4 (collection). Any client that did not respond in Round 4 is considered offline/dropped out, and Rounds 5-6 only expect messages from the remaining online set.

Note on Rounds 5-6 threshold: The BBGLR protocol theoretically supports waiting for only a subset of clients in Rounds 5 and 6. However, doing so requires the server to track every client's neighbor list and maintain per-client state about which of their neighbors are online versus offline. This adds significant bookkeeping complexity to the implementation. In practice, waiting for all online clients in these rounds is a much simpler approach and is what we implement here.

Despite most rounds waiting for all messages, adaptive mode still provides large speedups because the server proceeds immediately when all messages arrive rather than waiting out the remaining fixed timeout.

Machine Learning Applications

The code is in branch fedlearn. The machine learning model we use in this repository is a multi-layer perceptron classifier (MLPClassfier in sklearn) that can pull a variety of different datasets from the pmlb website. Users might wish to implement more complex models themselves.

Beyond the aforementioned configs, we provide machine learning training configs below.

-t [dataset name]
-s [random seed (optional)]
-e [input vector length]
-x [float-as-int encoding constant (optional)]
-y [float-as-int multiplier (optional)]

Example command:

python abides.py -c flamingo -n 128 -i 5 -p 1 -t mnist

Benchmark Suite

The benchmark suite (benchmark_suite.py) sweeps over different parameter combinations and collects performance data into a CSV file with a printed summary table.

python benchmark_suite.py                     # Run default sweep
python benchmark_suite.py --quick             # Quick smoke test (128 clients, 1 iteration, both modes)
python benchmark_suite.py --help              # Show all options

Benchmark Options

--protocols [names]       Protocols to test: flamingo, google_malicious (default: flamingo)
--clients [counts]        Client counts to sweep (default: 128 256 512)
--vector-lens [lengths]   Vector lengths to sweep (default: use param.vector_len)
--iterations [counts]     Iteration counts to sweep (default: 1 3)
--modes [modes]           Wait modes to test: fixed, adaptive (default: fixed adaptive)
--thresholds [fractions]  Thresholds for adaptive mode (default: 0.9)
--seed [int]              Random seed for reproducibility (default: 42)
--output [file]           Output CSV file (default: benchmark_results.csv)
--quick                   Quick smoke test: 128 clients, 1 iteration, both modes

Examples

Sweep over multiple client counts with both protocols:

python benchmark_suite.py --protocols flamingo google_malicious --clients 128 256 512

Test adaptive mode with different thresholds:

python benchmark_suite.py --clients 256 --modes adaptive --thresholds 0.8 0.9 0.95

The suite automatically validates correctness by checking that aggregated sums are consistent across vector elements. If a bug is detected (e.g., incorrect mask cancellation), the benchmark stops immediately and reports the issue. Aggregation failures due to insufficient shares (from client dropout) are reported separately and are not treated as bugs.

Output

benchmark_results.csv — Raw results for all runs, including server/client timings, communication costs, and online client counts.
Summary table — Printed to stdout with columns for protocol, client count, vector length, iterations, wait mode, wall-clock time, server/client step timings, communication bytes, and correctness.

Timeline Analysis

Each experiment run produces a timeline_<protocol>.csv file recording per-iteration events with timestamps. The analyze_timeline.py script parses these CSVs and prints per-round wait times, server computation time, and total iteration time.

Usage

# Analyze a single file
python analyze_timeline.py timeline_flamingo.csv

# Compare two protocols side-by-side
python analyze_timeline.py timeline_flamingo.csv timeline_google_malicious.csv

# Compare across client counts
python analyze_timeline.py timeline_flamingo_128.csv timeline_flamingo_256.csv timeline_flamingo_512.csv

Example Output (n=512, adaptive mode, threshold=0.9)

Flamingo (3 rounds per iteration):

Iter	Report Wait	CC Wait	Recon Wait	Srv Comp	Total
1	4.951s	0.213s	0.203s	5.054s	10.42s
2	5.984s	0.397s	0.155s	4.949s	11.49s
3	5.055s	0.311s	0.184s	4.784s	10.33s
4	5.586s	0.766s	0.207s	4.729s	11.29s
AVG	5.394s	0.422s	0.187s	4.879s	10.88s

Google Malicious (6 rounds per iteration):

Iter	R1 adkey	R2 graph	R3 share	R4 coll	R5 alive	R6 recon	Srv Comp	Total
1	3.000s	10.000s	29.982s	1.626s	2.998s	1.996s	2.409s	52.01s
2	10.000s	10.000s	14.234s	1.731s	2.997s	1.995s	2.475s	43.43s
3	10.000s	10.000s	21.608s	1.721s	2.998s	1.995s	2.661s	50.98s
4	10.000s	10.000s	16.333s	1.595s	2.998s	1.995s	2.608s	45.53s
AVG	8.250s	10.000s	20.539s	1.668s	2.998s	1.995s	2.538s	47.99s

(Benchmarked on a laptop, server compute time is much slower than standard case)

Speedup: 4.4x (Google 47.99s vs Flamingo 10.88s)

Column Definitions

Report/Round Wait — Time from round start until the server begins processing (i.e., network wait for sufficient messages).
Srv Comp — Total server-side computation across all rounds (processing received messages).
Total — End-to-end time per iteration (sum of all waits + server computation).

Using AI Agents for Analysis

The timeline CSV files contain detailed per-event timestamps that are well-suited for deeper analysis with AI coding agents (e.g., Claude Code, Cursor, GitHub Copilot). Example prompts:

"Read timeline_flamingo_512.csv and explain why iteration 2 was slower than iteration 3."
"Compare the wait time distributions across these timeline CSVs and plot a chart."
"What is the 90th percentile network latency implied by the report wait times?"

The CSV schema is simple: iteration, simtime, offset_s, event — any tool that reads CSV can process it.

Additional Information

The server waiting time is set in util/param.py according to a target dropout rate (1%). Specifically, for a target dropout rate, we set the waiting time according to the network latency (see model/LatencyModel.py). For each iteration, server total time = server waiting time + server computation time.

Acknowledgement

We thank authors of MicroFedML for providing an example template of ABIDES framework.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flamingo

Overview

Installation Instructions

Private Sum

Command-Line Options

Server Waiting Modes: Fixed vs. Adaptive

Flamingo Protocol

Google Malicious Protocol

Machine Learning Applications

Benchmark Suite

Benchmark Options

Examples

Output

Timeline Analysis

Usage

Example Output (n=512, adaptive mode, threshold=0.9)

Column Definitions

Using AI Agents for Analysis

Additional Information

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
agent		agent
cli		cli
config		config
contributed_traders		contributed_traders
figures		figures
message		message
model		model
pki_files		pki_files
util		util
.gitignore		.gitignore
CHANGES.md		CHANGES.md
Kernel.py		Kernel.py
LICENSE		LICENSE
README.md		README.md
abides.py		abides.py
analyze_timeline.py		analyze_timeline.py
benchmark_report.md		benchmark_report.md
benchmark_suite.py		benchmark_suite.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Flamingo

Overview

Installation Instructions

Private Sum

Command-Line Options

Server Waiting Modes: Fixed vs. Adaptive

Flamingo Protocol

Google Malicious Protocol

Machine Learning Applications

Benchmark Suite

Benchmark Options

Examples

Output

Timeline Analysis

Usage

Example Output (n=512, adaptive mode, threshold=0.9)

Column Definitions

Using AI Agents for Analysis

Additional Information

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages