If you’re building AI systems in production — or just getting started — these repos are worth bookmarking.

LLM Serving & Inference

vLLM (66k+ stars) — The industry standard for high-throughput LLM serving. Continuous batching and maximum GPU utilization. If you’re serving LLMs in production, this is probably what you should be using.

Ollama (162k+ stars) — The easiest way to run LLMs locally. Great for fast experimentation before you commit to a cloud setup.

LiteLLM (20k+ stars) — One interface for 100+ LLM providers. Swap providers without changing code. Useful if you want to avoid vendor lock-in.

exo (39k+ stars) — Run your own AI cluster at home with distributed inference across multiple devices. Interesting for hobbyists and edge deployments.

Fine-Tuning & Training

PyTorch (96k+ stars) — The core deep learning framework. Essential when you need custom optimization and low-level control.

Unsloth (51k+ stars) — Fine-tune LLMs 2x faster with 70% less VRAM. If you’re doing any fine-tuning on consumer hardware, check this out.

Flash Attention (21k+ stars) — Fast, memory-efficient attention mechanism. Used by almost every other tool under the hood.

RAG & Embeddings

FAISS (33k+ stars) — Meta’s similarity search library. Handles millions of embeddings efficiently.

Sentence Transformers (16k+ stars) — Powers most RAG and semantic search pipelines in production.

API & Tooling

FastAPI (83k+ stars) — The default for serving ML models via API. If you’re not using it, you’re probably overcomplicating things.

Pydantic (23k+ stars) — The backbone of reliable AI pipelines. Config, validation, structured outputs. Works beautifully with FastAPI.

FastMCP (15k+ stars) — The fast, Pythonic way to build MCP servers. Connect your LLMs to any tool or data source.

Python Developer Experience

uv (55k+ stars) — Replaces pip, pip-tools, and virtualenv. Written in Rust, incredibly fast. Once you switch, you won’t go back.

Ruff (40k+ stars) — 10-100x faster than flake8 + black combined. Also written in Rust. Makes CI feel instant.

My Take

The combo of uv + Ruff + Pydantic is becoming the holy trinity for any Python AI project. And for serving, vLLM for production and Ollama for local dev is a solid split.

What I find interesting is how much the AI tooling ecosystem has matured. A year ago, half of these either didn’t exist or were too rough for production use. Now they’re becoming standard infrastructure.