A
Adaptive Inference at the Edge: Speculative Decoding and KV-Cache Compression
The bottleneck for edge LLM workloads is usually the runtime, not just the model. air-runtime packages smart routing, speculative decoding, and KV-cache compression for constrained hardware.
WritingWritingEdge AI10 min read
A
air-runtime
An inference runtime for constrained hardware that combines speculative decoding, smart routing, and KV-cache compression to make smaller devices more useful.
ResearchInferenceEdge AIInferenceKV cache
I
Inside voltage-kalshi: Building a Kalshi BTC Volatility Bot
Kalshi is a regulated prediction market: the edge is forecasting volatility, not price direction. This build note covers a LightGBM and PatchTST ensemble running on live Kraken WebSocket data and deployed with real capital.
WritingWritingTrading9 min read
