Skip to content

LLM

Specs

Tools

Frameworks

Serving

  • LMCache - Supercharge Your LLM with the Fastest KV Cache Layer
  • ollama - Get up and running with Llama 3, Mistral, Gemma, and other large language models
  • sglang - SGLang is a high-performance serving framework for large language models and multimodal models.
  • vllm - A high-throughput and memory-efficient inference and serving engine for LLMs

Guardrails

  • Guardrails - NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.

API Gateways

  • bifrost - Fastest LLM gateway (50x faster than LiteLLM) with adaptive load
  • litellm - Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, VLLM, NVIDIA NIM]

Serialization

  • toon - 🎒 Token-Oriented Object Notation (TOON) – Compact, human-readable, schema-aware JSON for LLM prompts. Spec, benchmarks, TypeScript SDK.

Utilities

  • TokenCost - Easy token price estimates for 400+ LLMs. TokenOps projects.
  • opencommit - top #1 and most feature rich GPT wrapper for git — generate commit messages with an LLM in 1 sec — works with Claude, GPT and every other provider, supports local Ollama models too

Models

CreatorNameHugging Face
AlibabaQwen3-ASRHF
AlibabaQwen3-VL
AlibabaQwen3.5HF
BAAIbge-m3HF
GoogleTranslateGemmaHF
GoogleGemma 4HF
PrismMLBonsaiHF

TTS

STT

API

bash
curl -X POST http://example.com/v1/responses \
  -u "username:password" \ # basic auth
  -H "Authorization: Bearer $OPENAI_API_KEY" \ # api key auth
  -H "Content-Type: application/json" \
  -d '{
        "model": "Llama-3.2-1B-Instruct-Hybrid",
        "input": "What is the population of Paris?",
        "stream": false
      }'

Basicauth can also be provided as request header:

bash
echo -n "username:password" | base64
curl -H "Authorization: Basic xxxx"

Security

Misc

Hardware

Setting up NVIDIA DGX Spark with ggml

bash
bash <(curl -s https://ggml.ai/dgx-spark.sh)

Vendors

Google

Apps

  • gallery - A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally.

Resources