
Illia Kasian
CTO, American Compute
Illia leads engineering at American Compute. Previously a founding engineer at a YC-backed insurance carrier and an ML engineer building fraud detection systems at scale. His background includes machine learning, full-stack development, and infrastructure engineering across insurance and defense technology.
LinkedIn →Articles
Disaggregated Inference: How NVIDIA, AWS, and Cerebras Are Rethinking LLM Inference
Disaggregated inference started as a software technique for splitting prefill and decode onto separate GPU pools. NVIDIA, Groq, Cerebras, and AWS are now taking it further with chips purpose-built for each phase.
Memory for AI Accelerators
HBM, GDDR, and SRAM compared: how memory hierarchy, bandwidth, and capacity determine AI accelerator performance, cost, and which workloads each chip can serve.
NVIDIA Software Ecosystem for AI
NVIDIA’s software stack, from CUDA through cuDNN, TensorRT, and NCCL to Dynamo and NIM, is the reason 90% of cloud AI workloads run on NVIDIA GPUs. What each layer does, how it connects to the hardware, and why switching is hard.
GPU Cluster Networking 101
How GPU clusters are networked: NVLink within servers, InfiniBand or Ethernet between them, switches, topology, optics, and real costs from 16-GPU to 24,576-GPU scale.
NVIDIA AI GPU Differences from Ampere to Blackwell
NVIDIA’s six flagship data center GPUs compared: V100, A100, H100, H200, B200, and B300. Specs, architecture changes, and which generation to buy in 2026.
NICs and DPUs for GPU Servers
A NIC connects your server to the network. A DPU is a NIC with its own CPU. Which one you need depends on what your cluster is doing besides training.
SXM vs PCIe for GPU Servers
SXM and PCIe GPUs use the same silicon. The difference is the connector, and it determines bandwidth, power, cost, and flexibility. Here is how to choose.
Every GPU Infrastructure Term You Need to Know
Every term you'll encounter when buying, building, or operating a GPU cluster, defined in plain English. From GPUs and NVLink to colocation and TCO.