LLM
Specs
- A2A
- Agent Skills
- agents.md - A simple, open format for guiding coding agents, used by over 20k open-source
Tools
Frameworks
Serving
- LMCache - Supercharge Your LLM with the Fastest KV Cache Layer
- ollama - Get up and running with Llama 3, Mistral, Gemma, and other large language models
- sglang - SGLang is a high-performance serving framework for large language models and multimodal models.
- vllm - A high-throughput and memory-efficient inference and serving engine for LLMs
Guardrails
- Guardrails - NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
API Gateways
- bifrost - Fastest LLM gateway (50x faster than LiteLLM) with adaptive load
- litellm - Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, VLLM, NVIDIA NIM]
Serialization
- toon - 🎒 Token-Oriented Object Notation (TOON) – Compact, human-readable, schema-aware JSON for LLM prompts. Spec, benchmarks, TypeScript SDK.
Utilities
- TokenCost - Easy token price estimates for 400+ LLMs. TokenOps projects.
- opencommit - top #1 and most feature rich GPT wrapper for git — generate commit messages with an LLM in 1 sec — works with Claude, GPT and every other provider, supports local Ollama models too
Models
| Creator | Name | Hugging Face |
|---|---|---|
| Alibaba | Qwen3-ASR | HF |
| Alibaba | Qwen3-VL | |
| Alibaba | Qwen3.5 | HF |
| BAAI | bge-m3 | HF |
| TranslateGemma | HF | |
| Gemma 4 | HF | |
| PrismML | Bonsai | HF |
TTS
- Kokoro - Available voices: https://huggingface.co/onnx-community/Kokoro-82M-v1.0-ONNX
- pocket-tts - A TTS that fits in your CPU (and pocket)
STT
API
bash
curl -X POST http://example.com/v1/responses \
-u "username:password" \ # basic auth
-H "Authorization: Bearer $OPENAI_API_KEY" \ # api key auth
-H "Content-Type: application/json" \
-d '{
"model": "Llama-3.2-1B-Instruct-Hybrid",
"input": "What is the population of Paris?",
"stream": false
}'Basicauth can also be provided as request header:
bash
echo -n "username:password" | base64
curl -H "Authorization: Basic xxxx"Security
Misc
Hardware
Setting up NVIDIA DGX Spark with ggml
bash
bash <(curl -s https://ggml.ai/dgx-spark.sh)Vendors
Google
Apps
- gallery - A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally.
Resources
- 12-factor-agents - What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?
- Agentic UX Patterns
- Artificial Analysis - AI Model & API Providers Analysis
- Awesome Agentic Patterns
- How LLMs Work — A Visual Deep Dive
- Inference Hardware Leaderboard
- Killed by LLM
- LLM Explorer
- LLM Politeness Study
- LLM Pricing
- LLMRequirements.com — Hardware for Local LLMs in 2026
- MakingMCP
- The Ultra-Scale Playbook: Training LLMs on GPU Clusters