BACK TO BLOG

Inside voltage-kalshi: Building a Kalshi BTC Volatility Bot

Kalshi is a regulated prediction market. You're not trading BTC itself — you're trading binary contracts on whether BTC closes above or below a specified level within a given window. Most participants treat these like sports bets. I treat them like a structured volatility product with extractable edge.

The key insight: forecasting volatility is a different problem than forecasting price. A directional model asks "will BTC go up or down?" A volatility model asks "will BTC move more than X% in the next N hours?" The second question has more stable signal. Markets are less efficient at pricing volatility events than directional bets, and the asymmetry in contract payoffs creates exploitable opportunities.

This is what voltage-kalshi is built around.

The Data Pipeline

You need two data streams running in parallel: Kalshi's orderbook and Kraken's OHLCV feed.

Kalshi exposes a WebSocket API for real-time orderbook updates. Each tick carries best bid, best ask, last price, and volume on both sides. The relevant signal here is order flow imbalance — when the buy side consistently dominates at the ask, someone informed is positioning.

Kraken's WebSocket provides tick-level OHLCV for BTC/USD. I subscribe to both the 1-minute and 5-minute channels. The raw feed looks like this:

// Kraken WS subscription { "event": "subscribe", "pair": ["XBT/USD"], "subscription": { "name": "ohlc", "interval": 1 // 1-min candles } }

The first engineering problem: WebSocket connections drop. Kraken will silently disconnect under load. The fix is a reconnection manager that tracks last heartbeat timestamp and reconnects with exponential backoff. If you miss candles during reconnection, you backfill via the REST API before resuming live subscriptions. Never assume the stream is healthy.

Feature Engineering for LightGBM

LightGBM needs tabular features. I compute these in a rolling window over the incoming OHLCV stream:

Why Parkinson vol? Close-to-close volatility wastes information. The Parkinson estimator uses (ln(H/L))² / (4 ln 2) to incorporate the high-low range, making it 5x more efficient for i.i.d. log-normal prices. For short windows on crypto, this matters.

LightGBM trains in minutes, handles missing data gracefully, and produces well-calibrated probability estimates with objective: binary + is_unbalance: true. It's the right tool for this feature set.

PatchTST: The Transformer Component

LightGBM is good at cross-sectional patterns but captures temporal structure only indirectly through lagged features. PatchTST — Patch Time Series Transformer — addresses this directly.

The key idea: instead of feeding the transformer one timestep at a time, you divide the sequence into non-overlapping patches (like ViT does for image patches) and apply attention over those patches. This dramatically reduces the effective sequence length and lets the model capture long-range dependencies that lagged features would miss.

# Patch sequence construction def patchify(x, patch_len=16, stride=8): # x: [batch, seq_len, features] n_patches = (x.shape[1] - patch_len) // stride + 1 patches = torch.stack([ x[:, i*stride : i*stride + patch_len, :] for i in range(n_patches) ], dim=1) # patches: [batch, n_patches, patch_len * features] return patches.flatten(2)

I train PatchTST on 60-candle sequences (1 hour of 1-min data) predicting a binary volatility event label. The model outputs a probability independently for each channel (price, volume, order flow) and they're averaged in the output head.

Ensemble and Position Sizing

At inference time, LightGBM and PatchTST each produce a probability estimate for the volatility event. I combine them with a learned weighted average, where weights are calibrated on a held-out validation set using isotonic regression. Typically the weights end up around 0.55 LightGBM / 0.45 PatchTST, though this shifts during high-regime-change periods.

Position sizing uses the Kelly criterion:

// Kelly fraction const kelly = (b * p - q) / b; // b = net odds (contract payout ratio) // p = model's estimated P(win) // q = 1 - p // Use fractional Kelly (0.25x) to account for model uncertainty const position = bankroll * kelly * 0.25;

Full Kelly is theoretically optimal but practically suicidal when your model is wrong. 0.25x Kelly gives up some expected growth in exchange for dramatically lower drawdowns. With a real capital account, surviving a bad week matters more than maximising a good one.

What Breaks in Production

The gap between a backtest and a live system is the gap between theory and grief. Here's what I encountered that no test suite caught:

The most important lesson: live trading is a distributed systems problem disguised as a machine learning problem. The model is 30% of the work. The operational infrastructure is 70%.