Things With AI — Notes on AI

Shift of large model to cost aware routing in 2026

The Shift Toward Inference, Part 2: From Bigger Models to Smarter Systems

AI software stack is evolving from simple model serving into a cost-aware system of routers, caches, runtimes, and reasoning policies. Knowing when to spend compute, when to save it, and how to make every token cheaper will increasingly become important.

June 17, 2026

Shift Towards AI Inference

Part 1

The Shift Toward Inference, Part 1: From Training Clusters to Inference Infrastructure

How AI’s center of gravity is moving from model training to model serving. Why GPUs, memory, power, and data centers are becoming the real bottlenecks. And why the next AI race is about running intelligence at scale.

5 mins Part 2

The Shift Toward Inference, Part 2: From Bigger Models to Smarter Systems

AI software stack is evolving from simple model serving into a cost-aware system of routers, caches, runtimes, and reasoning policies. Knowing when to spend compute, when to save it, and how to make every token cheaper will increasingly become important.

7 mins

DeepSeek Efficiency

Part 1

How to Train a 1.6T Parameter MoE on a Budget: Inside DeepSeek-V4's Pre-Training Stack

The Math That Beat the Export Controls: DeepSeek-V4's Radical Training Efficiency

10 mins Part 1

How to Train a 1.6T Parameter MoE on a Budget: Inside DeepSeek-V4's Pre-Training Stack

The Math That Beat the Export Controls: DeepSeek-V4's Radical Training Efficiency

10 mins

The AI Knowledge Series

Part 1

RAG vs. Fine-Tuning — The Question Every AI Builder Gets Wrong

AI models don't know your private data. Two approaches have been the standard answer. In 2026, a third matters just as much.

5 min read Part 1

RAG vs. Fine-Tuning — The Question Every AI Builder Gets Wrong

AI models don't know your private data. Two approaches have been the standard answer. In 2026, a third matters just as much.

5 min read Part 2

Inside RAG — How It Really Works (And Why Most Projects Stall Before They Ship)

The gap between the RAG concept and production reality is where most projects quietly fail. Here's what the explanations usually skip.

6 min read Part 2

Inside RAG — How It Really Works (And Why Most Projects Stall Before They Ship)

The gap between the RAG concept and production reality is where most projects quietly fail. Here's what the explanations usually skip.

6 min read Part 3

Fine-Tuning in 2026 — Cheaper Than You Think, Harder Than You Think

Bloomberg spent roughly $10 million on it. You can now get started for under $100. What changed, and when does fine-tuning actually make sense?

6 min read Part 3

Fine-Tuning in 2026 — Cheaper Than You Think, Harder Than You Think

Bloomberg spent roughly $10 million on it. You can now get started for under $100. What changed, and when does fine-tuning actually make sense?

6 min read Part 4

Agentic RAG — The Architecture That Made the Debate Beside the Point

Once you see RAG and fine-tuning operating inside a reasoning loop, the question of which one to choose starts to feel like asking whether a workshop needs better hammers or better saws.

6 min read Part 4

Agentic RAG — The Architecture That Made the Debate Beside the Point

Once you see RAG and fine-tuning operating inside a reasoning loop, the question of which one to choose starts to feel like asking whether a workshop needs better hammers or better saws.

6 min read