Reordering Algorithms

GraphBrew implements 17 algorithm IDs (-o 0 through -o 16). Two are baselines (no useful reordering), one is a runtime selector (AdaptiveOrder, research-only), one loads a precomputed permutation from disk (MAP). The remaining 13 produce orderings you can benchmark.

Why reorder

Power-law and clustered graphs have neighbour-access patterns that miss in cache when vertices are placed randomly. Renumbering vertices so that frequently co-accessed ones land on nearby cache lines turns unpredictable misses into hits. Reordering does not change the graph's topology — only the integer labels.

Three orthogonal locality dimensions matter:

Dimension	Captured by	Example algorithms
Spatial (community structure)	Leiden, Rabbit Order	`12:leiden`, `12:rabbit`, RABBIT (8)
Temporal (degree skew)	hub grouping	HUBCLUSTER (4), DBG (5)
Convergence (forward-edge fraction)	edge-direction optimisation	GoGraph (16), chained `12:leiden → 16`

GraphBrew (-o 12) is the framework that composes these dimensions; the other IDs are individual primitives or baselines.

Quick reference

ID	Flag	Algorithm	Complexity	Notes
0	`-o 0`	ORIGINAL	O(1)	input ordering, baseline
1	`-o 1`	RANDOM	O(n)	random permutation, worst-case baseline
2	`-o 2`	SORT	O(n log n)	degree-descending sort
3	`-o 3`	HUBSORT	O(n log n)	hubs first (full degree sort)
4	`-o 4`	HUBCLUSTER	O(n)	hubs first, non-hubs keep original order
5	`-o 5`	DBG	O(n)	logarithmic degree buckets
6	`-o 6`	HUBSORTDBG	O(n log n)	DBG buckets + degree sort within
7	`-o 7`	HUBCLUSTERDBG	O(n)	DBG buckets + hub-cluster within
8	`-o 8`	RABBITORDER	O(n log n + m)	Louvain + dendrogram DFS; variants `csr` (default), `boost`
9	`-o 9`	GORDER	O(n·w + m)	sliding-window greedy, w=5; high cache quality, slow reorder
10	`-o 10`	CORDER	O(n log n)	cache-aware bandwidth reduction
11	`-o 11`	RCM	O(n log n + m)	Reverse Cuthill–McKee; variants `default`, `bnf`
12	`-o 12`	GraphBrewOrder	O(n log n + m)	composable pipeline — see GraphBrewOrder
13	`-o 13:<file>`	MAP	O(n)	load permutation from `.lo` / `.so` file
14	`-o 14`	AdaptiveOrder	varies	ML selector; research-only, see AdaptiveOrder-ML
15	`-o 15`	LeidenOrder	O(n log n + m)	native GVE-Leiden, no post-ordering layer
16	`-o 16`	GoGraphOrder	O(m log d + n log n)	maximises forward-edge fraction

When to use what

Workload	First try	Why
Power-law social graph + PR	`12:hrab` or `12:leiden`	best cache quality on community-strong graphs
Mesh / road network	`12:rcm` or `11:bnf`	RCM reduces bandwidth on sparse near-planar graphs
Iterative PR / SpMV with many trials	`12:hrab`, `12:leiden`	high quality amortises preprocessing cost
Single-pass BFS / SSSP from one source	HUBCLUSTERDBG (7) or `12:rabbit`	cheap reorder, hub-cached frontier
Dynamic / frequently-updated graph	`12:streaming` or RABBITORDER (8)	low memory + low reorder time
Quick experimentation, any graph	HUBCLUSTERDBG (7)	O(n), often within 10% of the best
Comparing against Gorder	GORDER (9)	the heavyweight cache-quality baseline

Algorithm details

Baselines (0, 1)

ORIGINAL (-o 0) keeps the input ordering. Always run this first to know what you're improving over.

RANDOM (-o 1) permutes vertices uniformly. Useful as a worst-case reference and as a starting state for evaluating deterministic reorderings on top of it.

Degree-based (2-7)

All cheap (O(n) or O(n log n)) and effective on power-law graphs where a small set of hubs dominates access frequency.

SORT (-o 2): sort all vertices by degree, descending.
HUBSORT (-o 3): same as SORT but framed around the hub idea.
HUBCLUSTER (-o 4): split into hubs (high-degree) and non-hubs, reorder only the hubs, leave the rest in input order. Preserves non-hub spatial structure.
DBG (-o 5): partition vertices into logarithmic degree buckets, place buckets contiguously. Hub bucket goes first.
HUBSORTDBG (-o 6): DBG bucketing + sort by degree inside each bucket.
HUBCLUSTERDBG (-o 7): DBG bucketing + hub-cluster inside each bucket. Reasonable default for power-law graphs when you want a cheap, predictable speedup.

Community-based (8)

RABBITORDER (-o 8) — single-pass parallel Louvain that builds a dendrogram of community merges, then orders vertices by DFS of that dendrogram. Fast (~2-10× slower than degree-based, much faster than Gorder) and produces high-quality cache locality on graphs with clear community structure.

Variants:

Flag	Implementation
`-o 8` or `-o 8:csr`	native CSR implementation (default)
`-o 8:boost`	original Boost-based implementation; requires Boost 1.58

The CSR variant has no Boost / numa / tcmalloc dependency and benchmarks slightly faster.

Heavyweight (9, 10)

GORDER (-o 9) — Wei et al. (2016). Sliding window of width 5 greedy vertex placement maximising a local cache-locality score (Gscore). Produces best-in-class cache hits but is serial and NP-hard in the limit; reorder time is typically 10-100× a community method on the same graph. Use for paper comparisons.

CORDER (-o 10) — cache-aware bandwidth reduction. Less common; included for completeness.

Bandwidth-based (11)

RCM (-o 11) — Reverse Cuthill–McKee. BFS from a peripheral vertex with neighbours visited in ascending-degree order, then reverse the result. Bandwidth reduction translates directly to sequential cache access on sparse, near-planar graphs (road networks, finite-element meshes).

Variants:

Flag	Description
`-o 11`	GoGraph-baseline RCM
`-o 11:bnf`	CSR-native George–Liu pseudoperipheral BFS

Composable (12 — the GraphBrew framework)

-o 12 is the framework that combines community detection, intra-community ordering, and inter-community arrangement into ten variants. Variant selection is by flag suffix:

Flag	Variant	One-line summary
`-o 12:leiden` (default)	Leiden	Leiden + BFS within community + hierarchical sort
`-o 12:rabbit`	Rabbit	single-pass Rabbit + dendrogram DFS
`-o 12:hrab`	HRAB	Leiden + RabbitOrder on super-graph + adaptive intra
`-o 12:tqr`	TQR	cache-line tile + RabbitOrder on tile adjacency
`-o 12:hcache`	HCache	uses every Leiden level as a cache-tier mapping
`-o 12:streaming`	Streaming	lazy-aggregation Leiden; lower memory
`-o 12:rcm`	RCM	Leiden + RCM within community + BNF-RCM super-graph
`-o 12:hubcluster`	HubCluster	Leiden + hub-first intra-community
`-o 12:rabbit:dbg`	Rabbit-DBG	single-pass Rabbit + DBG within community
`-o 12:rabbit:hubcluster`	Rabbit-HubCluster	single-pass Rabbit + hub-first intra

Modifier tokens that compose with the variants:

Token	Effect
`:sgres0.10`	super-graph modularity resolution γ (default 0.10)
`:gamma0.10`	alias for `:sgres`
`:rcm_intra`	force RCM within communities (HRAB default)
`:bfs_intra`	force BFS within communities
`:rcm_super`	RCM on super-graph instead of dendrogram DFS
`:hubx`	extract top-1% hubs first, place adjacent to their best block
`:gord`	Gorder-greedy intra-community (with UnitHeap)

Example: -o 12:hrab:sgres0.25:hubx. See GraphBrewOrder for the pipeline and how variants compose.

Meta (13, 14)

MAP (-o 13:<file>) loads a vertex permutation from disk (.lo or .so file). Used by the benchmark pipeline to apply a pregenerated reordering without redoing the work.

AdaptiveOrder (-o 14) — runtime ML selector. Not part of the VLDB submission; see AdaptiveOrder-ML.

Reference Leiden (15)

LeidenOrder (-o 15) — direct GVE-Leiden ordering with no GraphBrew post-processing layer. Use as a community-detection baseline distinct from RabbitOrder's single-pass Louvain.

Forward-edge maximisation (16)

GoGraphOrder (-o 16) — Zhou et al. (2024). Hub-aware BFS followed by greedy vertex insertion that maximises the fraction of edges where src < dst in the ordering. Specifically helps Gauss-Seidel iterations (standard PR formulation). Has no effect on double-buffered Jacobi algorithms like PR-SpMV.

Variants:

Flag	Implementation
`-o 16`	default
`-o 16:fast`	parallel approximation
`-o 16:naive`	naive reference

Chained orderings

Multiple reorderings can be applied in sequence. Order matters: community detection should precede degree refinement because the degree-based methods preserve relative position within their buckets, so they refine the existing community layout instead of destroying it.

# Leiden then DBG: community spatial layout + hub temporal locality
./bench/bin/pr -f g.el -s -o 12:leiden -o 5 -n 5

# HRAB then DBG
./bench/bin/pr -f g.el -s -o 12:hrab -o 5 -n 5

# Leiden then GoGraph: cache locality + convergence speed (PR only, not PR-SpMV)
./bench/bin/pr -f g.el -s -o 12:leiden -o 16 -n 5

# Rabbit then DBG (the lightweight chain)
./bench/bin/pr -f g.el -s -o 8 -o 5 -n 5

Five chains are formally evaluated in the VLDB paper §4.5.2.

Algorithm selection cheatsheet

Is your graph mesh-like (road, FE)?
└── yes → 12:rcm   (or 11:bnf for standalone RCM)
└── no  → Does it have strong community structure?
          ├── yes (social, collaboration, web) → 12:hrab or 12:leiden
          └── no  (citation, sparse) → HUBCLUSTERDBG (7) or 12:rabbit

When in doubt, run HUBCLUSTERDBG (cheap baseline) and 12:hrab (usually best) and compare on your actual workload.

Reordering Algorithms

Reordering Algorithms

Why reorder

Quick reference

When to use what

Algorithm details

Baselines (0, 1)

Degree-based (2-7)

Community-based (8)

Heavyweight (9, 10)

Bandwidth-based (11)

Composable (12 — the GraphBrew framework)

Meta (13, 14)

Reference Leiden (15)

Forward-edge maximisation (16)

Chained orderings

Algorithm selection cheatsheet

Further reading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally