Molecular Chain-of-Thought Agent

An implementation of structured reasoning based on "The Molecular Structure of Thought" (Chen et al., 2026).

Why This Exists

Standard chain-of-thought prompting tells a model to "think step by step" — but it doesn't control how the model thinks. The result is often a single linear argument that anchors on its first framing, never questions its own assumptions, and produces a hedged conclusion full of bullet points.

The paper by Chen et al. studied the reasoning traces of strong models (R1-class) and found that their thinking isn't free-form — it follows a molecular structure with distinct, recurring reasoning behaviors. These behaviors form predictable transition patterns that can be modeled as a Markov chain.

This agent makes that structure explicit. Instead of hoping the model reasons well, it tells the model what type of reasoning to perform at each step, guided by transition probabilities extracted from real reasoning traces.

The Four Reasoning Bonds

The paper identifies four fundamental reasoning behaviors ("bonds") that strong reasoners alternate between:

Bond	What It Does	Why It Matters
EXPLORE	Generates 2-3 structurally different framings of the problem	Prevents anchoring on the first interpretation
DEEP	Extends the current argument beyond the obvious — surfaces hidden assumptions and causal links	Prevents shallow reasoning that stops at the first plausible answer
REFLECT	Returns to a specific prior step and surgically interrogates it	Catches flawed assumptions before they propagate into the conclusion
NORMAL	Applies established logic directly — calculates, executes, moves forward	Not every step needs exploration; sometimes you just need to do the math

How It Differs From Standard CoT

Standard CoT:    "Think step by step"  -->  Linear argument  -->  Hedged answer

Molecular CoT:   EXPLORE (frame the problem 3 ways)
                  --> DEEP (push the strongest framing further)
                  --> REFLECT (go back and stress-test step 2)
                  --> DEEP (extend with the corrected reasoning)
                  --> NORMAL (apply and calculate)
                  --> Conclusion (synthesize the trajectory)

The key insight from the paper: the sequence of bond types matters as much as the content. Models that explore before committing, go deep before reflecting, and reflect before concluding produce structurally stronger arguments — even when the raw token content is similar.

Transition Graph

The agent uses a Markov transition matrix (from paper Figure 5) to probabilistically select the next bond type based on the current one:

         NORMAL  DEEP  REFLECT  EXPLORE
NORMAL  [  0.74  0.10    0.05     0.11 ]
DEEP    [  0.32  0.21    0.10     0.37 ]
REFLECT [  0.35  0.10    0.17     0.38 ]
EXPLORE [  0.31  0.11    0.10     0.48 ]

This means, for example, after an EXPLORE step there's a 48% chance of another EXPLORE, 31% chance of NORMAL, 11% of DEEP, and 10% of REFLECT — matching the patterns observed in strong reasoning models. You can replace this with your own matrix estimated from your own traces.

Structural Warnings

After reasoning completes, the agent checks the bond distribution and warns about imbalances:

Low Self-Reflection (< 10%) — conclusions may rest on unchecked assumptions
Low Self-Exploration (< 10%) — may have anchored on first framing
Low Deep Reasoning (< 15%) — argument may be shallow

These thresholds come from the paper's analysis of what separates strong reasoning traces from weak ones.

CoT vs Direct: Why Both Modes?

The agent supports two modes so you can empirically test whether structured reasoning improves answers for your use case:

--mode cot (default) — runs the full molecular reasoning pipeline, then synthesizes a conclusion
--mode direct — sends the question straight to the LLM with no scaffolding

In our testing, CoT mode produces more opinionated, decisive answers that name specific uncertainties. Direct mode tends to produce longer, more generic responses that hedge with lists and frameworks instead of committing to a position.

Quick Start

git clone https://github.com/YOUR_USERNAME/molecular-cot.git
cd molecular-cot
pip install -r requirements.txt
cp .env.example .env   # add your API key(s)

Run with any provider:

# Molecular CoT (structured reasoning)
python agent.py "Should a startup with 12 months runway cut costs or raise?" --provider openrouter

# Direct mode (single-shot, no CoT)
python agent.py "Should a startup with 12 months runway cut costs or raise?" --provider openrouter --mode direct

Supported Providers

Provider	Key needed	Default model
`anthropic`	`ANTHROPIC_API_KEY`	`claude-sonnet-4-6`
`openai`	`OPENAI_API_KEY`	`gpt-4o`
`openrouter`	`OPENROUTER_API_KEY`	`qwen/qwen3-30b-a3b`
`gemini`	`GOOGLE_API_KEY`	`gemini-2.0-flash`
`ollama`	None (local)	`llama3.1`

Set your key(s) in .env or export them:

export OPENROUTER_API_KEY=sk-or-v1-...

Usage

CLI

# Choose provider and model
python agent.py "Your question" --provider openrouter
python agent.py "Your question" --provider openai --model gpt-4o-mini

# Control reasoning depth
python agent.py "Your question" --provider openrouter --steps 6

# Compare CoT vs direct
python agent.py "Your question" --provider openrouter --mode cot
python agent.py "Your question" --provider openrouter --mode direct

# Compare providers side by side
python agent.py "Your question" --compare anthropic openai --output results.json

# Save output to JSON
python agent.py "Your question" --provider openrouter --output result.json

Python

from agent import MolecularCoTAgent, create_backend

backend = create_backend("openrouter")  # or "anthropic", "openai", etc.
agent = MolecularCoTAgent(backend)

# Structured reasoning
result = agent.run("Your question here", max_steps=8)
print(result["trajectory"]["conclusion"])
print(result["trajectory"]["bond_distribution"])
print(result["warnings"])

# Direct (no CoT) for comparison
direct = agent.run_direct("Your question here")
print(direct["answer"])

Custom Transition Graph

Replace the default Markov chain with one estimated from your own traces:

from agent import TransitionGraph, BondType

my_traces = [
    [BondType.EXPLORE, BondType.DEEP, BondType.REFLECT, BondType.DEEP],
    [BondType.EXPLORE, BondType.EXPLORE, BondType.DEEP, BondType.REFLECT],
]
graph = TransitionGraph.estimate_from_traces(my_traces)
agent = MolecularCoTAgent(backend, graph=graph)

Output Format

{
  "task": "...",
  "backend": "openrouter/qwen/qwen3-30b-a3b",
  "trajectory": {
    "steps": [
      {"step": 1, "bond": "EXPLORE", "reflects_on": null, "content": "..."},
      {"step": 2, "bond": "DEEP",    "reflects_on": null, "content": "..."},
      {"step": 3, "bond": "REFLECT", "reflects_on": 2,    "content": "..."}
    ],
    "conclusion": "...",
    "bond_distribution": {"EXPLORE": 0.2, "DEEP": 0.5, "REFLECT": 0.2, "NORMAL": 0.1}
  },
  "warnings": []
}

Warnings fire when the trajectory is structurally imbalanced:

Low Self-Reflection (< 10%) — conclusions may rest on unchecked assumptions
Low Self-Exploration (< 10%) — may have anchored on first framing
Low Deep Reasoning (< 15%) — argument may be shallow

Tests

# Unit tests (no API calls, fast)
pytest test_agent.py -v -m "not integration"

# Integration tests (real API calls)
pytest test_agent.py -v -m "integration"

# All tests
pytest test_agent.py -v

Project Structure

.
├── agent.py           # Core agent, backends, CLI
├── test_agent.py      # Unit + integration tests
├── requirements.txt   # Dependencies
├── .env.example       # Template for API keys
└── .gitignore

How It Works (Step by Step)

Start with EXPLORE — the agent always opens by generating multiple framings of the problem
Transition — the Markov graph probabilistically picks the next bond type based on the current one
Prompt per bond — each step uses a bond-specific prompt that constrains the LLM to only perform that type of reasoning (no summarizing, no concluding early)
Safety net — if no REFLECT step has occurred by step max_steps - 2, one is forced to prevent unchecked conclusions
Early exit — if the model produces a convergence signal (FINAL ANSWER:, \boxed{}), reasoning stops early
Conclude — a separate conclusion prompt synthesizes the full trajectory into a direct answer
Validate — the bond distribution is checked against paper thresholds and structural warnings are emitted

Structural Compatibility (Multi-Agent)

The paper (Section 5.2) found that merging outputs from two agents with incompatible reasoning structures causes performance collapse — even when the token-level content is similar (Pearson > 0.9 on tokens, but < 0.8 on bond distributions).

The compare_backends function checks this automatically:

from agent import compare_backends

result = compare_backends(
    task="...",
    providers=["anthropic", "openai"],
)
# result["structural_compatibility"] = {"anthropic vs openai": True/False}

If two backends are structurally incompatible, don't merge their outputs — pick the better one.

References

Chen et al., "The Molecular Structure of Thought", 2026 — the paper this implementation is based on
Section 3.2: Bond type definitions and transition analysis
Section 5.2: Structural compatibility and performance collapse
Figure 5: Transition probability matrix used as the default graph

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Molecular Chain-of-Thought Agent

Why This Exists

The Four Reasoning Bonds

How It Differs From Standard CoT

Transition Graph

Structural Warnings

CoT vs Direct: Why Both Modes?

Quick Start

Supported Providers

Usage

CLI

Python

Custom Transition Graph

Output Format

Tests

Project Structure

How It Works (Step by Step)

Structural Compatibility (Multi-Agent)

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent.py		agent.py
requirements.txt		requirements.txt
test_agent.py		test_agent.py

Folders and files

Latest commit

History

Repository files navigation

Molecular Chain-of-Thought Agent

Why This Exists

The Four Reasoning Bonds

How It Differs From Standard CoT

Transition Graph

Structural Warnings

CoT vs Direct: Why Both Modes?

Quick Start

Supported Providers

Usage

CLI

Python

Custom Transition Graph

Output Format

Tests

Project Structure

How It Works (Step by Step)

Structural Compatibility (Multi-Agent)

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages