Skip to content

Post-quantum relay-transparent authentication for ROS2/DDS — protocol proposal (CLEITONQ_AUTH_SAMPLE) #392

Description

@cleitonaugusto

Summary

Any post-quantum (PQC) authentication scheme that appends bytes after a CDR-serialized
message
is silently defeated by DDS middleware that re-serializes to a typed schema.
This is not a bug in CycloneDDS or FastDDS — it is a structural consequence of how DDS
type-safety works. The same root cause is documented in the MAVLink ecosystem
(mavlink/mavlink#2527,
PX4/PX4-Autopilot#27704).

This issue proposes a concrete protocol — CLEITONQ_AUTH_SAMPLE — that resolves the
stripping problem for ROS2/DDS by encoding authentication material as a first-class,
typed DDS sample on a parallel topic, rather than appending bytes after a typed message.


1 — The problem: CDR schema boundary strips authentication bytes

DDS middleware deserializes an incoming sample according to its IDL-declared schema and
discards any bytes beyond that schema's CDR encoding. A middleware that re-serializes the
message for forwarding (bridge, domain relay, security plugin) reconstructs the CDR from
the typed fields — bytes appended after the schema boundary are gone.

Measured with real DDS stacks

Diagnostic tool (single-file, --simulate mode requires no dependencies):
tools/ros2_bridge_strip_poc.py

python3 tools/ros2_bridge_strip_poc.py --simulate

Output (geometry_msgs/msg/Twist as the authenticated message, CDR = 52 bytes):

  Auth scheme                    Sent  Received  Stripped  Result
  ─────────────────────────────  ────  ────────  ────────  ────────────────
  HMAC-SHA3-256 (32 B)             84        52        32  FAIL — auth gone
  Ed25519 sig  (64 B)             116        52        64  FAIL — auth gone
  ML-DSA-87 sig (4627 B)         4679        52      4627  FAIL — auth gone

The --simulate flag runs a pure-Python CDR implementation; no ROS2 installation
required to reproduce. The real CycloneDDS endpoint test (run_real() mode, two separate
processes on domain 0) produces the same stripped result.

Two distinct failure modes

Middleware Behaviour Impact
CycloneDDS Deserializes to Twist struct (schema bytes only); auth material is silently absent from the reconstructed struct Silent auth bypass
FastDDS (rmw_fastrtps_cpp) Pre-allocates reader history buffer sized to the type's max CDR length (55 B for Twist). Payloads larger than this trigger [RTPS_READER_HISTORY Error] Change payload size of 'N' bytes is larger than the history payload size of '55' bytes — sample is dropped entirely Denial of service: rclpy callback never fires

Both failure modes produce the same outcome: authentication material never reaches the verifier.

Why this cannot be fixed in the middleware

The CDR schema boundary is intentional — it is what enables type-safe, version-tolerant
message passing. A DDS implementation that forwarded arbitrary appended bytes would break
backward compatibility and type-safety guarantees. The issue is not in the implementation;
it is in assuming that appended bytes survive a DDS hop.


2 — Current SROS2 / DDS-Security gap

SROS2 implements DDS-Security (OMG specification), which provides:

  • Authentication: X.509 certificates with RSA-2048 or ECDSA-P256
  • Key exchange: ECDH over P-256
  • Message integrity: AES-GCM-128 or AES-GCM-256 (symmetric, per-session)

NIST finalised post-quantum standards in August 2024:

  • FIPS 203 — ML-KEM-1024 (key encapsulation, replaces ECDH)
  • FIPS 204 — ML-DSA-87 (signatures, replaces ECDSA)

Neither is supported in the current DDS-Security specification or in any ROS2
security plugin. Defence and critical-infrastructure integrators are already
required to begin PQC migration for 2026–2027.

Additionally, the DDS-Security model authenticates at the participant level
individual published samples are integrity-protected by AES-GCM with a session key,
but there is no mechanism for non-repudiation of individual commands (proof that a
specific sample came from a specific authorised node, not just from a node that passed
handshake). For safety-critical commands (arm/disarm, emergency stop, waypoint),
non-repudiation at the sample level is required.


3 — Proposed protocol: CLEITONQ_AUTH_SAMPLE

Core design principle

Authentication material is a first-class typed DDS sample, not appended bytes.

Instead of appending a signature after a Twist CDR buffer (which gets stripped),
the authenticator publishes an accompanying typed sample on a parallel topic.
The DDS middleware treats this sample identically to any other typed message — no
stripping, no pre-allocation failure, full QoS propagation.

Message definition

# cleitonq_msgs/msg/AuthenticatedSample.msg
# Accompanies a typed command on a parallel topic.
# Receiver correlates by (publisher_id, nonce).

uint64    nonce              # monotonically increasing per-publisher counter (anti-replay)
uint8[32] msg_digest         # SHA3-256(CDR(original_sample) || nonce_le_8)
uint8[4627] signature        # ML-DSA-87 signature over msg_digest (FIPS 204)
uint8[32] vk_fingerprint     # SHA3-256(verifying_key) first 32 bytes — for key selection
# cleitonq_msgs/msg/SessionInit.msg
# Carries ML-KEM-1024 ciphertext for forward-secret session establishment.

uint8   initiator_id         # DDS participant ID of the initiating node
uint64  timestamp            # microseconds since epoch (handshake anti-replay)
uint8[1568] kem_ciphertext   # ML-KEM-1024 ciphertext (FIPS 203)

Protocol flow

Session establishment (one-time per connection):

  1. Robot pre-loads the GCS ML-DSA-87 verifying key out-of-band (pre-flight).
  2. GCS publishes SessionInit on /robot_N/cleitonq/session_init.
  3. Robot decapsulates → shared secret → derives channel keys via SHA3-256.

Authenticated command (per safety-critical sample):

Publisher (GCS):
  cdr        = CDR(Twist(linear.z=1.0))
  nonce      = AtomicNonce::next()               # monotonically increasing
  msg_digest = SHA3-256(cdr || nonce_le_8)
  signature  = ML-DSA-87.Sign(sk, msg_digest)    # FIPS 204
  publish Twist on               /cmd_vel
  publish AuthenticatedSample on /cmd_vel/cleitonq_auth

Subscriber (robot):
  receive Twist from              /cmd_vel
  receive AuthenticatedSample from /cmd_vel/cleitonq_auth
  recompute = SHA3-256(CDR(Twist) || auth.nonce_le_8)
  assert recompute == auth.msg_digest
  ML-DSA-87.Verify(vk, auth.msg_digest, auth.signature)
  assert auth.nonce > last_accepted_nonce         # anti-replay

Security properties

Property Mechanism Quantum-safe
Command non-repudiation ML-DSA-87 (FIPS 204) Yes
Forward secrecy ML-KEM-1024 (FIPS 203) Yes
Anti-replay Monotonic nonce per publisher N/A
DDS relay transparency Auth material is a typed sample N/A
Sample integrity SHA3-256 digest binds CDR to nonce Partial (Grover: 128-bit)

Performance (ARM64 Neoverse-N2, release build)

Operation Latency Suitable for
ML-KEM-1024 session setup 241 µs One-time per mission
ML-DSA-87 sign 509 µs Per arm/disarm, emergency stop
SHA3-256 digest < 1 µs Every sample

For high-rate telemetry (100 Hz+), HMAC-SHA3-256 (1.1 µs per packet) is the
appropriate mechanism.


4 — MAVLink precedent: validated approach

The identical structural problem (relay stripping appended auth bytes) has been solved
in the MAVLink ecosystem via CLEITONQ_CHUNK (mavlink/mavlink#2527).
The solution encodes all PQC material as first-class MAVLink messages (msg_id=50000).
Relays that do not know the CleitonQ dialect forward the frames as opaque but valid
frames; relays that do know the dialect validate them fully.

The ROS2 CLEITONQ_AUTH_SAMPLE proposal is the direct DDS analogue:

Context Problem Solution
MAVLink Relay strips appended bytes CLEITONQ_CHUNK: auth as valid MAVLink frame
ROS2/DDS Middleware re-serializes typed messages CLEITONQ_AUTH_SAMPLE: auth as typed DDS sample

Both solutions share the same principle: authentication material must be a
first-class object in the protocol, not an afterthought appended outside the
protocol's framing
.


5 — Reference implementation

Resource Link
Reference implementation (Rust, MIT/Apache-2.0) https://github.com/cleitonaugusto/CleitonQ
Technical paper (Zenodo) https://doi.org/10.5281/zenodo.20776349
MAVLink RFC mavlink/mavlink#2527
Diagnostic tool tools/ros2_bridge_strip_poc.py (repo above)

The reference implementation provides ML-DSA-87 sign/verify, ML-KEM-1024 session
establishment, HMAC-SHA3-256 per-packet authentication, and anti-replay nonce
management — all tested with unit, integration, and fuzz coverage.


6 — Questions for the working group

  1. SROS2 plugin path — Should CLEITONQ_AUTH_SAMPLE be a standalone cleitonq_msgs
    package with documented integration pattern, or a security plugin alongside DDS-Security?

  2. Correlation model — Is reception-timestamp-based correlation sufficient, or should
    the protocol mandate an explicit correlation_id field in both topics?

  3. DDS-Security hybrid — Parallel operation of DDS-Security (classical) and
    CLEITONQ_AUTH_SAMPLE (PQC) during a transition period, or a clean cutover?

  4. cleitonq_msgs placement — New package, std_srvs, or vendor extension?

  5. Embedded benchmarks — Current numbers are from ARM64 Neoverse-N2 (server-class).
    Cortex-A76 (Raspberry Pi 5) numbers are pending. Would the working group require
    embedded-class numbers before adoption?


Related: mavlink/mavlink#2527,
PX4/PX4-Autopilot#27704

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions