Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
*.swp
.ipynb_checkpoints/
4 changes: 4 additions & 0 deletions di/simtick/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.venv/
notebooks/
testing*.q
*.ipynb
87 changes: 66 additions & 21 deletions di/simtick/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,25 @@ This flexibility allows the same module to serve quick prototypes and sophistica
- **Intraday seasonality** — trading activity is high at open and close, low at midday. Configurable U-shape or J-shape patterns.
- **Price dynamics** — GBM with optional jump-diffusion captures continuous price movement and occasional discontinuities.
- **Microstructure** — bid-ask spreads that widen at open/close, quote updates between trades.
- **Realistic pricing** — trade prices and quote bid/ask rounded to the nearest cent (US equity tick size).

### Market Focus

The default presets and parameter examples are calibrated for **US equity markets** (NVDA on NASDAQ). Key characteristics:

- High liquidity at open and close, quiet midday (J-shape or U-shape)
- Spreads wider at open/close, tighter at midday
- Arrival rates and volatility consistent with large-cap tech stocks

**Futures markets** have different microstructure — most liquid in the last 5-10 minutes before close with the tightest spreads, and wider spreads at midday. The parameter system is flexible enough to approximate futures behavior by tuning `openmult`, `midmult`, `closemult`, `spreadopenmult`, `spreadmidmult`, `spreadclosemult`. However, the sharp pre-close liquidity spike typical of futures cannot be fully captured with the current cosine interpolation — the shape function smooths transitions gradually rather than modeling sudden discontinuities.

### Use Cases

**Stress testing and scenario analysis** — Generate data under severe but plausible conditions. Simulate liquidity shocks by lowering `baseintensity`, gap moves using the jump-diffusion model (`pricemodel:jump`), or extreme volatility regimes by increasing `vol`. Test how your systems behave when markets break from normal patterns.

**Sensitivity and robustness testing** — Vary parameters systematically to understand how strategies respond to changes in volatility, trade frequency, or spread dynamics. Identify breaking points before they occur in production.

**System development** — Stress-test data ingestion pipelines by adjusting trade arrival rates. Increase `baseintensity` (e.g., from 0.5 to 50) and `alpha` to simulate high-frequency bursts. This lets you verify that your database, message queues, and processing logic handle peak loads without data loss or latency spikes.
**System development** — Stress-test data ingestion pipelines by adjusting trade arrival rates. Increase `baseintensity` (e.g., from 1.0 to 50) and `alpha` to simulate high-frequency bursts. This lets you verify that your database, message queues, and processing logic handle peak loads without data loss or latency spikes.

**Real-time demos** — Feed simulated data to dashboards, visualization tools, or trading interfaces. Useful for demos, training sessions, or testing UI responsiveness without connecting to live markets.

Expand All @@ -49,9 +60,41 @@ For these advanced use cases, a full limit order book simulator with queue dynam

### Next Steps

A future module will extend this simulator to support **multi-instrument generation with correlation**. Using KDB-X module hierarchy, a new `di.simmulti` module will build on `di.simtick` as the single-instrument foundation, adding:
Two future modules will extend this simulator, using the KDB-X module framework's **sibling architecture**. Each module lives at the same level under `di/` and declares dependencies via relative module references.

**Module hierarchy:**

```
di/
├── simtick/ # 1 instrument, 1 day (atomic unit)
├── simcalendar/ # 1 instrument, N days (uses ..simtick)
└── simbasket/ # M instruments, N days (uses ..simcalendar)
```

**Dependency chain:**

```
simtick ← simcalendar ← simbasket
```

Each module builds on its predecessor. This design allows users to load only what they need while keeping each module focused on a single responsibility.

**Note:** We use absolute module paths (`use`di.simtick`) rather than relative sibling references (`use`..simtick`). The sibling syntax did not work in our testing with KDB-X Community Edition — further investigation needed.

---

- Correlated processes
**`di.simcalendar`** — Single instrument over multiple trading days

- Accepts a list of trading dates (e.g., NYSE calendar)
- Orchestrates `di.simtick` for each day
- Carries forward closing price as next day's opening price (no overnight gap modeling)
- Optional disk persistence to date-partitioned kdb+ database

---

**`di.simbasket`** — Multiple correlated instruments over multiple trading days

- Correlated price processes across instruments
- Configurable correlation matrices
- Synchronized or independent arrival processes

Expand All @@ -66,10 +109,10 @@ Correlated price paths across assets are essential for:

Simulations are driven by a configuration dictionary containing all model parameters (arrival rates, volatility, spread settings, etc.). Rather than building these manually, the module reads configurations from a **CSV file**.

A ready-to-use file `presets.csv` is included with five market scenarios (default, liquid, illiquid, volatile, jumpy). You can:
A ready-to-use file `presets.csv` is included with three market scenarios calibrated for NVDA (default, volatile, jumpy). You can:

- Use presets directly: `cfg:cfgs`default`
- Modify values for specific runs: `cfg[`vol]:0.4`
- Modify values for specific runs: `cfg[`vol]:0.65`
- Add new rows to define custom scenarios
- Create your own CSV following the same schema

Expand Down Expand Up @@ -109,11 +152,11 @@ q)simtick:use`di.simtick
q)cfgs:simtick.loadconfig`:di/simtick/presets.csv
q)cfg:cfgs`default
q)simtick.run[cfg]
time price qty
------------------------------------------
2026.01.20D09:30:02.487640474 100 43
2026.01.20D09:30:03.846514899 100.0011 32
2026.01.20D09:30:04.444929571 100.0122 78
sym time price qty
-----------------------------------------------
NVDA 2026.01.20D09:30:02.487640474 181.90 43
NVDA 2026.01.20D09:30:03.846514899 182.01 32
NVDA 2026.01.20D09:30:04.444929571 182.05 78
...
```

Expand All @@ -137,28 +180,30 @@ q)result`quote

## Presets

Presets are calibrated for NVDA (NASDAQ large-cap tech):

| Preset | Description |
|--------|-------------|
| `default` | Standard trading day |
| `liquid` | High volume, tighter spreads |
| `illiquid` | Low volume |
| `volatile` | Higher price volatility |
| `jumpy` | Jump-diffusion price model |
| `default` | Baseline NVDA trading day |
| `volatile` | Higher volatility regime (earnings, macro events) |
| `jumpy` | Jump-diffusion model (sudden news, guidance) |

## Configuration Parameters

| Parameter | Description | Example |
|-----------|-------------|---------|
| `baseintensity` | Base arrival rate (trades/sec) | 0.5 |
| `sym` | Ticker symbol | `` `NVDA `` |
| `baseintensity` | Base arrival rate (trades/sec) | 1.0 |
| `alpha` | Hawkes excitation (0 = Poisson) | 0.3 |
| `beta` | Hawkes decay (must be > alpha) | 1.0 |
| `vol` | Annualized volatility | 0.2 |
| `vol` | Annualized volatility | 0.45 |
| `drift` | Annualized drift | 0.05 |
| `transitionpoint` | Intraday shape (0.3=J, 0.5=U) | 0.3 |
| `pricemodel` | `gbm` or `jump` | `gbm` |
| `qtymodel` | `lognormal` or `constant` | `lognormal` |
| `avgqty` | Average trade size | 100 |
| `basespread` | Base bid-ask spread (fraction) | 0.001 |
| `seed` | Random seed (`0N` = no seed) | `42` |
| `basespread` | Base bid-ask spread (fraction) | 0.0001 |
| `generatequotes` | Generate quotes flag | 0b |
| `openmult` | Opening intensity multiplier | 1.5 |
| `midmult` | Midday intensity multiplier | 0.5 |
Expand All @@ -167,15 +212,15 @@ q)result`quote
## Testing

```q
q)k4unit:use`di.k4unit
q)k4unit:use`local.k4unit
q)k4unit.moduletest`di.simtick
```

### Test Coverage

| Group | Tests | Description |
|-------|-------|-------------|
| Validation | 3 | Bad configs throw correct errors (alpha >= beta, negative intensity, zero multipliers) |
| Validation | 7 | Bad configs throw correct errors (alpha >= beta, negative intensity, zero multipliers, zero/negative vol, zero/negative startprice) |
| Arrivals | 5 | Output properties: non-empty, sorted, positive, within duration, correct type |
| Shape | 3 | Intraday pattern: open > mid, close > mid, J-shape verification |
| Price | 6 | Positive prices, startprice correct, realized vol within tolerance, jump model works |
Expand All @@ -185,7 +230,7 @@ q)k4unit.moduletest`di.simtick
| Describe | 3 | Returns table, correct columns, correct parameter count |
| Constant Qty | 2 | All quantities equal, quantity equals avgqty |
| Reproducibility | 1 | Same seed produces same output |
| **Total** | **46** | |
| **Total** | **50** | |

## Documentation

Expand Down
75 changes: 48 additions & 27 deletions di/simtick/init.q
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,12 @@ shape:{[cfg;progress]
hawkes.step:{[params;state]
/ single step of Ogata thinning algorithm
/ params: dict with `duration`lambdamax`baseintensity`alpha`beta`cfg
/ state: dict with `t`excitation`times`done
/ state: dict with `t`excitation`accept`done
/ returns: updated state dict
/
/ `accept` records whether this candidate was accepted (1b) or rejected (0b).
/ arrivals[] uses scan (\) to collect all states, then filters t where accept.
/ this avoids O(n^2) list copies from appending to a growing times list each step.
duration:params`duration;
lambdamax:params`lambdamax;
baseintensity:params`baseintensity;
Expand All @@ -105,8 +109,8 @@ hawkes.step:{[params;state]
wait:neg log[first 1?1.0]%lambdamax;
t:state[`t]+wait;

/ check if past duration
if[t>=duration; :state,enlist[`done]!enlist 1b];
/ check if past duration - mark accept:0b so scan filter excludes this state
if[t>=duration; :`t`excitation`accept`done!(t;state`excitation;0b;1b)];

/ decay excitation
excitation:state[`excitation]*exp neg beta*wait;
Expand All @@ -118,10 +122,9 @@ hawkes.step:{[params;state]

/ accept/reject
accept:(first 1?1.0)<lambda%lambdamax;
times:$[accept; state[`times],t; state`times];
excitation:$[accept; excitation+alpha; excitation];

`t`excitation`times`done!(t;excitation;times;0b)
`t`excitation`accept`done!(t;excitation;accept;0b)
};

arrivals:{[cfg]
Expand Down Expand Up @@ -158,12 +161,13 @@ arrivals:{[cfg]
duration;lambdamax;baseintensity;alpha;beta;cfg);

/ initial state
init:`t`excitation`times`done!(0f;0f;`float$();0b);

/ run until done
final:.z.m.hawkes.step[params]/[{not x`done};init];
init:`t`excitation`accept`done!(0f;0f;0b;0b);

final`times
/ scan all candidate steps (\ returns every intermediate state as a table)
/ then extract t values where the candidate was accepted
/ this avoids the O(n^2) list append of the previous fold-with-accumulator approach
states:.z.m.hawkes.step[params]\[{not x`done};init];
states[`t] where states[`accept]
};

gbm:{[s;r;eps;t]
Expand Down Expand Up @@ -297,14 +301,16 @@ quote.generate:{[cfg;trades]
pretimes:tradetimes-`timespan$`long$(pretradeoffset+randoffsets)*nsperms;

/ spreads based on time of day (vectorized)
prespreadmults:.z.m.quote.spreadmults[cfg;tradetimes];
/ use pretimes (actual quote timestamps) not tradetimes - spread is evaluated
/ when the quote is posted, which is pretradeoffset ms before the trade
prespreadmults:.z.m.quote.spreadmults[cfg;pretimes];
prespreads:basespread*tradeprices*prespreadmults;
prebids:tradeprices-prespreads%2;
preasks:tradeprices+prespreads%2;

/ sizes (vectorized)
prebidsizes:avgquotesize+`long$100*.z.m.rng.boxmuller[n];
preasksizes:avgquotesize+`long$100*.z.m.rng.boxmuller[n];
prebidsizes:avgquotesize+`long$100*.z.m.rng.normal[n;cfg];
preasksizes:avgquotesize+`long$100*.z.m.rng.normal[n;cfg];

/ === 3. intermediate quotes (vectorized) ===
/ only if we have at least 2 trades
Expand All @@ -321,8 +327,13 @@ quote.generate:{[cfg;trades]
allasksizes:(enlist avgquotesize),intresult[`asksizes],preasksizes;

/ build table, enforce minimum size of 1, sort by time
/ round bid/ask to nearest cent consistent with trade price rounding
/ enforce bid < ask after rounding - tight spreads can collapse to bid=ask
quotes:([]time:alltimes;bid:allbids;ask:allasks;bidsize:allbidsizes;asksize:allasksizes);
quotes:update bidsize:1|bidsize,asksize:1|asksize from quotes;
quotes:update bidsize:1|bidsize,asksize:1|asksize,
bid:0.01*`long$0.5+bid%0.01,
ask:0.01*`long$0.5+ask%0.01 from quotes;
quotes:update ask:bid+0.01 from quotes where bid>=ask;
`time xasc quotes
};

Expand Down Expand Up @@ -371,19 +382,19 @@ quote.intermediates:{[cfg;tradetimes;tradeprices;basespread;pretradeoffset;quote

/ prices: interpolate from prev toward next trade price, plus noise
midprices:gapprevprices+fractions*(gapnextprices-gapprevprices);
noise:quoteticksize*midprices*.z.m.rng.boxmuller[totint];
noise:quoteticksize*midprices*.z.m.rng.normal[totint;cfg];
midprices+:noise;

/ spreads (vectorized across all intermediate quotes)
intspreadmults:.z.m.quote.spreadmults[cfg;inttimes];
spreadvar:1+0.1*abs .z.m.rng.boxmuller[totint];
spreadvar:1+0.1*abs .z.m.rng.normal[totint;cfg];
intspreads:basespread*midprices*intspreadmults*spreadvar;
intbids:midprices-intspreads%2;
intasks:midprices+intspreads%2;

/ sizes
intbidsizes:avgquotesize+`long$100*.z.m.rng.boxmuller[totint];
intasksizes:avgquotesize+`long$100*.z.m.rng.boxmuller[totint];
intbidsizes:avgquotesize+`long$100*.z.m.rng.normal[totint;cfg];
intasksizes:avgquotesize+`long$100*.z.m.rng.normal[totint;cfg];

`times`bids`asks`bidsizes`asksizes!(inttimes;intbids;intasks;intbidsizes;intasksizes)
};
Expand Down Expand Up @@ -421,6 +432,8 @@ validate:{[cfg]
/ - Positive multipliers: openmult, midmult, closemult > 0
/ - Positive base intensity
/ - Transitionpoint in valid range (prevents division by zero)
/ - Positive volatility (zero vol produces degenerate flat price path)
/ - Positive start price (negative/zero price is economically invalid)

/ check Hawkes stability condition
if[cfg[`alpha]>=cfg`beta; '"validate: Hawkes unstable - alpha must be < beta"];
Expand All @@ -431,6 +444,10 @@ validate:{[cfg]
/ check transitionpoint bounds (prevents division by zero in shape function)
if[not cfg[`transitionpoint] within 0.01 0.99;
'"validate: transitionpoint must be between 0.01 and 0.99"];
/ check vol positive (zero produces NaN in log, flat path with no signal)
if[0>=cfg`vol; '"validate: vol must be positive"];
/ check startprice positive (GBM/jump models require positive initial price)
if[0>=cfg`startprice; '"validate: startprice must be positive"];
cfg
};

Expand All @@ -447,47 +464,51 @@ run:{[cfg]
/ result:run[cfg] / result`trade, result`quote
cfg:.z.m.validate[cfg];

/ set seed for reproducibility
if[cfg[`seed]>0; system "S ",string cfg`seed];
/ set seed for reproducibility (0N = no seed)
if[not null cfg`seed; system "S ",string cfg`seed];

/ generate arrival times (seconds from open)
arrs:.z.m.arrivals[cfg];
n:count arrs;

if[n=0;
trades:([]time:`timestamp$();price:`float$();qty:`long$());
trades:([]sym:`symbol$();time:`timestamp$();price:`float$();qty:`long$());
:$[cfg`generatequotes;
`trade`quote!(trades;([]time:`timestamp$();bid:`float$();ask:`float$();bidsize:`long$();asksize:`long$()));
`trade`quote!(trades;([]sym:`symbol$();time:`timestamp$();bid:`float$();ask:`float$();bidsize:`long$();asksize:`long$()));
trades]
];

/ convert to timestamps
basetime:cfg[`tradingdate]+`timespan$cfg`openingtime;
times:basetime+`timespan$`long$arrs*nspersec;

/ generate prices
/ generate prices and round to nearest cent (US equity tick size)
prices:.z.m.price[cfg;arrs];
prices:0.01*`long$0.5+prices%0.01;

/ generate quantities
qtys:.z.m.qty.gen[n;cfg];

trades:([]time:times;price:prices;qty:qtys);
trades:`sym`time xcols update sym:cfg`sym from trades;
trades:update `p#sym from trades;

/ return trades only or dictionary with quotes
$[cfg`generatequotes;
`trade`quote!(trades;.z.m.quote.generate[cfg;trades]);
`trade`quote!(trades;
{[s;t] update `p#sym from `sym`time xcols update sym:s from t}[cfg`sym;.z.m.quote.generate[cfg;trades]]);
trades]
};
};

/ configuration schema: column name -> (type; description)
/ type codes: S=symbol, D=date, U=minute, F=float, J=long, B=boolean
schema:()!()
schema[`name]:("S";"preset name (key)")
schema[`sym]:("S";"ticker symbol")
schema[`tradingdate]:("D";"simulation date")
schema[`openingtime]:("U";"market open time")
schema[`closingtime]:("U";"market close time")
schema[`startprice]:("F";"initial price")
schema[`seed]:("J";"random seed (0 = no seed)")
schema[`seed]:("J";"random seed (0N = no seed, use null long)")
schema[`rngmodel]:("S";"RNG model (`pseudo)")
schema[`drift]:("F";"annualized drift")
schema[`vol]:("F";"annualized volatility")
Expand Down
Loading