If you’re building AI systems in production — or just getting started — these repos are worth bookmarking.
LLM Serving & Inference
vLLM (66k+ stars) — The industry standard for high-throughput LLM serving. Continuous batching and maximum GPU utilization. If you’re serving LLMs in production, this is probably what you should be using.
Ollama (162k+ stars) — The easiest way to run LLMs locally. Great for fast experimentation before you commit to a cloud setup.
LiteLLM (20k+ stars) — One interface for 100+ LLM providers. Swap providers without changing code. Useful if you want to avoid vendor lock-in.
exo (39k+ stars) — Run your own AI cluster at home with distributed inference across multiple devices. Interesting for hobbyists and edge deployments.
Fine-Tuning & Training
PyTorch (96k+ stars) — The core deep learning framework. Essential when you need custom optimization and low-level control.
Unsloth (51k+ stars) — Fine-tune LLMs 2x faster with 70% less VRAM. If you’re doing any fine-tuning on consumer hardware, check this out.
Flash Attention (21k+ stars) — Fast, memory-efficient attention mechanism. Used by almost every other tool under the hood.
RAG & Embeddings
FAISS (33k+ stars) — Meta’s similarity search library. Handles millions of embeddings efficiently.
Sentence Transformers (16k+ stars) — Powers most RAG and semantic search pipelines in production.
API & Tooling
FastAPI (83k+ stars) — The default for serving ML models via API. If you’re not using it, you’re probably overcomplicating things.
Pydantic (23k+ stars) — The backbone of reliable AI pipelines. Config, validation, structured outputs. Works beautifully with FastAPI.
FastMCP (15k+ stars) — The fast, Pythonic way to build MCP servers. Connect your LLMs to any tool or data source.
Python Developer Experience
uv (55k+ stars) — Replaces pip, pip-tools, and virtualenv. Written in Rust, incredibly fast. Once you switch, you won’t go back.
Ruff (40k+ stars) — 10-100x faster than flake8 + black combined. Also written in Rust. Makes CI feel instant.
My Take
The combo of uv + Ruff + Pydantic is becoming the holy trinity for any Python AI project. And for serving, vLLM for production and Ollama for local dev is a solid split.
What I find interesting is how much the AI tooling ecosystem has matured. A year ago, half of these either didn’t exist or were too rough for production use. Now they’re becoming standard infrastructure.