Engineering Log
Deep dives into system design, ML infrastructure, and production engineering.
Diffusion Models Beyond Generation: Supervised Denoising for Medical Imaging
Why I used generative model architectures for a supervised task. Adapting diffusion models for MRI denoising with paired data, custom noise schedules, and single-step inference.
Federated Learning for LLMs: Training Without Centralizing Data
Building a production federated learning system using Flower and LoRA adapters. How to coordinate distributed training across edge devices while keeping data local and secure.
Building a Production RAG System: The Engineering Beyond Embeddings
RAG tutorials skip the hard parts. Here's what it takes to build a real system: PII masking, hallucination prevention, intent-driven retrieval, and automated evaluation frameworks.
Speculative Decoding: Making LLMs 2-3x Faster Without Losing Quality
How draft-verify architectures and raw TCP sockets cut LLM inference latency in half. Lessons from building a production speculative sampling system across heterogeneous hardware.
Split Inference: Running 70B Models on Consumer Hardware
How to run models larger than your GPU by slicing them across devices. Trading network bandwidth for VRAM using torch.distributed.rpc and careful tensor serialization.