Search — Daniel Silva Perez

All Projects Writing Trading AI Agents Inference

About 3 results for “edge inference”

Adaptive Inference at the Edge: Speculative Decoding and KV-Cache Compression

The bottleneck for edge LLM workloads is usually the runtime, not just the model. air-runtime packages smart routing, speculative decoding, and KV-cache compression for constrained hardware.

WritingWritingEdge AI10 min read

An inference runtime for constrained hardware that combines speculative decoding, smart routing, and KV-cache compression to make smaller devices more useful.

ResearchInferenceEdge AIInferenceKV cache

Inside voltage-kalshi: Building a Kalshi BTC Volatility Bot

Kalshi is a regulated prediction market: the edge is forecasting volatility, not price direction. This build note covers a LightGBM and PatchTST ensemble running on live Kraken WebSocket data and deployed with real capital.

WritingWritingTrading9 min read