Biao's Blog

ML Systems

Notes on LLM systems, training infrastructure, and the machinery underneath.

2026

Jun 11, 2026

How JAX Allocates Memory

2025

Aug 27, 2025

Efficient RL Training - Optimizing Weight Sync in slime

Jun 21, 2025

Efficient RL Training - Optimizing Memory Usage in verl

Apr 26, 2025

Implement Flash Attention Backend in SGLang - Basics and KV Cache

Apr 3, 2025

What is Flash Attention?

2024

Jun 2, 2024

How to Calculate LLM Model Parameter Size - MoE Model

Jun 1, 2024

How to Calculate LLM Model Parameter Size - Dense Model

2023

Jan 16, 2023

Model Distillation using Tensorflow, Pytorch and Google JAX

May 12, 2023

Template for a blog post

Latest blog

How JAX Allocates Memory

A deep dive into JAX GPU memory allocation, BFCAllocator preallocation, and how it differs from PyTorch caching.

June 11, 2026

Blogs

Technical notes and implementation writeups.

2025

Aug 27, 2025

Efficient RL Training - Optimizing Weight Sync in slime

A look at how slime synchronizes weights between training and rollout engines, and where the main performance wins come from.

Jun 21, 2025

Efficient RL Training - Optimizing Memory Usage in verl

A systems deep dive into memory pressure in RL training and the techniques that make larger policy rollouts feasible.

Apr 26, 2025

Implement Flash Attention Backend in SGLang - Basics and KV Cache

Notes on SGLang attention backend internals, from Flash Attention basics to KV cache layout and execution flow.

Apr 3, 2025

What is Flash Attention?

A visual explanation of Flash Attention and how IO-aware tiling reduces memory traffic for modern attention kernels.

2024

Jun 2, 2024

How to Calculate LLM Model Parameter Size - MoE Model

A worked guide to counting parameters in MoE language models, with Qwen-style expert layers as the running example.

Jun 1, 2024

How to Calculate LLM Model Parameter Size - Dense Model

A practical walkthrough for estimating dense LLM parameter counts from architecture details and model code.

2023

Jan 16, 2023

Model Distillation using Tensorflow, Pytorch and Google JAX

An introduction to model distillation, where a smaller student network learns from a larger teacher model.

May 12, 2023

Template for a blog post

A compact typography and Markdown sample for checking how the blog theme renders common writing elements.