Cache your knowledge. Channel the Akashic.
Compile your documents into an embedded GraphRAG brain. Ship AI agents as Docker images.
Agent as a Service Β· Knowledge as a Service
Kash /kΓ¦Κ/ β A double-entendre by design. Like a cache, it compiles your heavy knowledge into a fast, local binary you can ship anywhere. Like the Akashic records, it holds a complete, queryable record of everything you've fed it. One word. Two meanings. Zero infrastructure.
Kash is a Go CLI that turns your raw documents (PDFs, Markdown, text files) into a self-contained AI agent packaged in a lightweight Docker container (~50MB).
No Python runtime. No external vector databases. No infrastructure headaches.
Your Documents β kash build β Docker Image β Ship Anywhere π
Think of it like a static site generator, but for AI brains. You compile knowledge at build time, and the runtime only serves queries β fast, lightweight, and portable.
| Traditional RAG Stack | Kash |
|---|---|
| Python app + Pinecone + Redis + FastAPI | Single Go binary + lightweight Docker image |
| Runtime document ingestion | Build-time compilation |
| External vector DB dependency | Embedded pure-Go vector store |
| Complex deployment | docker run and done |
| $$$ infrastructure costs | Runs on a Raspberry Pi |
|
Feed your company docs, runbooks, or research papers. Get an AI that actually knows your stuff and cites sources. Example: Internal engineering wiki β Docker image β every dev has a domain expert on tap. |
Compile textbooks and notes into a Socratic tutor that quizzes you, explains concepts, and never makes things up. Example: UPSC prep material β AI tutor β study from anywhere. |
|
Turn your API docs, changelogs, and FAQs into a support bot that plugs into any chat UI or IDE. Example: Docs + release notes β Docker image β mount in Open WebUI. |
Spin up multiple specialized agents (legal, finance, engineering) and wire them together via A2A protocol. Example: Three domain agents β CrewAI orchestration β one smart team. |
Note: Scanned (image-only) PDF support (OCR) is not yet available. Kash currently extracts embedded/selectable text only.
# 1. Install Kash (build from source β see "Building from Source" below)
go install github.com/akashicode/kash/cmd/kash@latest
# 2. Configure your API providers
mkdir -p ~/.kash
cat > ~/.kash/config.yaml << 'EOF'
build_providers:
llm:
base_url: "https://api.openai.com/v1"
api_key: "sk-..."
model: "gpt-4o"
embedder:
base_url: "https://api.voyageai.com/v1"
api_key: "pa-..."
model: "voyage-3" # make sure the model dimensions match in agent.yaml in agent config
EOF
# 3. Scaffold a new agent
kash init my-expert
# 4. Add your knowledge
cp ~/docs/*.pdf my-expert/data/
cp ~/notes/*.md my-expert/data/
# 5. Compile the knowledge base
cd my-expert
kash build --dir /path/to/my-expert
# 6. Serve locally (no Docker needed!)
kash serve -d /path/to/my-expertYour agent is now live at http://localhost:8000 with three interfaces ready to go.
flowchart TB
subgraph BUILD["π¨ Build Time"]
direction LR
D["π Documents\nPDF / MD / TXT"] --> CK["Chunker"]
CK --> EMB["Embedder API"]
CK --> LLM1["LLM API\ntriple extraction"]
EMB --> VDB["Vector DB\ndata/memory.chromem"]
LLM1 --> GDB["Graph DB\ndata/knowledge.cayley"]
end
BUILD -- "docker build" --> RUNTIME
subgraph RUNTIME["β‘ Runtime β port 8000"]
direction LR
Q["Query"] --> HS["Hybrid Search\nVector + Graph"]
HS --> RR["Rerank\noptional"]
RR --> LLM2["LLM"]
LLM2 --> REST["REST API\nPOST /v1/chat/completions"]
LLM2 --> MCP["MCP Server\nGET /mcp"]
LLM2 --> A2A["A2A Protocol\nPOST /rpc/agent"]
end
| Component | Technology | Purpose |
|---|---|---|
| CLI Framework | spf13/cobra |
Developer interface (init, build, serve) |
| Vector Memory | philippgille/chromem-go |
Pure-Go embedded vector store |
| Graph Memory | cayleygraph/cayley |
Embedded knowledge graph (triples) |
| LLM Client | sashabaranov/go-openai |
Build-time extraction & runtime queries |
| MCP Protocol | Model Context Protocol | Tool exposure for Cursor / Windsurf / IDEs |
| A2A Protocol | JSON-RPC | Multi-agent orchestration (AutoGen, CrewAI) |
Every query (REST, MCP, A2A) runs through the same pipeline:
flowchart LR
Q["Query"] --> E["Embed"]
Q --> K["Keywords"]
E --> VS["Vector Search\nchromem-go"]
K --> GT["Graph Traversal\ncayley"]
VS --> M["Merge"]
GT --> M
M --> R["Rerank\noptional"]
R --> C["Context β LLM"]
Scaffolds a new agent project.
kash init my-agentCreates:
my-agent/
βββ data/ # Drop your PDFs, Markdown, TXT here
βββ agent.yaml # Agent persona + config
βββ Dockerfile # Ready for docker build
βββ docker-compose.yml # One-command local deployment
βββ .env.example # Runtime env var template
βββ .dockerignore # Keeps images clean
βββ README.md # Auto-generated docs
Compiles documents into vector + graph databases.
kash build # in current directory
kash build --dir ./my-agent # specify project dir| Flag | Short | Default | Description |
|---|---|---|---|
--dir |
-d |
. |
Project directory to build |
Pipeline:
- Load documents from
data/ - Chunk text into passages
- Generate vector embeddings β
data/memory.chromem/ - Extract knowledge graph triples β
data/knowledge.cayley/ - Auto-generate MCP tool descriptions β
agent.yaml
Starts the runtime HTTP server.
kash serve # default: port 8000, ./agent.yaml
kash serve --port 9000 # custom port
kash serve --dir ./my-agent # serve from specific directory
kash serve --agent custom.yaml # custom agent config path| Flag | Short | Default | Description |
|---|---|---|---|
--port |
-p |
8000 |
Listen port (overridden by PORT env var) |
--agent |
-a |
agent.yaml |
Path to agent configuration |
--dir |
-d |
. |
Project directory |
kash version
# kash v1.0.0
# commit: a3f9c12
# built: 2026-02-27T10:00:00Z
# go version: go1.25.0
# os/arch: linux/amd64All three interfaces serve concurrently on a single port.
Drop-in replacement for the OpenAI API. Intercepts requests, runs hybrid RAG, injects context, proxies to your LLM.
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Explain the key concepts"}]
}'Works with LibreChat, Open WebUI, AnythingLLM, and any OpenAI-compatible client.
Model Context Protocol over HTTP SSE. Exposes your knowledge base as tools to IDEs.
{
"mcpServers": {
"my-agent": {
"url": "http://localhost:8000/mcp"
}
}
}Tested and working with Cursor and Windsurf.
JSON-RPC for multi-agent frameworks.
# Agent info
curl http://localhost:8000/rpc/agent \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"agent.info"}'
# Query knowledge
curl http://localhost:8000/rpc/agent \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":2,"method":"agent.query","params":{"query":"your question"}}'π§ͺ A2A protocol implementation is complete. Integration testing with AutoGen/CrewAI is in progress.
By default all endpoints are open (ideal for local dev). Set AGENT_API_KEY to enable authentication on all endpoints except /health.
export AGENT_API_KEY="my-secret-key"
kash serveThe key is passed as a standard Bearer token β compatible with all three interfaces:
curl http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer my-secret-key" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"hello"}]}'from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="my-secret-key", # β AGENT_API_KEY goes here
)import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:8000/v1',
apiKey: 'my-secret-key',
});{
"mcpServers": {
"my-agent": {
"url": "http://localhost:8000/mcp",
"env": {
"API_KEY": "my-secret-key"
}
}
}
}curl http://localhost:8000/rpc/agent \
-H "Authorization: Bearer my-secret-key" \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"agent.info"}'When AGENT_API_KEY is not set, everything works without any header (open access).
curl http://localhost:8000/health{
"status": "ok",
"agent": "my-expert",
"version": "1.0.0",
"vectors": 892,
"triples": 1423,
"mcp_tools": 1,
"embed_dimensions": 1024,
"llm_model": "gpt-4o",
"embed_model": "voyage-3",
"reranker_enabled": false,
"auth_enabled": true,
"time": "2026-02-27T10:00:00Z"
}
/healthis always public β no auth required even whenAGENT_API_KEYis set.
Perfect for development and testing. Just build and serve directly:
# Set up providers
export LLM_BASE_URL="https://api.openai.com/v1"
export LLM_API_KEY="sk-..."
export LLM_MODEL="gpt-4o"
export EMBED_BASE_URL="https://api.voyageai.com/v1"
export EMBED_API_KEY="pa-..."
# Build the knowledge base
kash build
# Serve it
kash serveThat's it. Hit http://localhost:8000 and start chatting.
One command to build and run:
# Fill in your keys
cp .env.example .env
# edit .env with your API keys
# Build the knowledge base first
kash build
# Build image + run
docker compose up --build# Build the image
docker build -t my-agent:latest .
# Run with env vars
docker run -p 8000:8000 \
-e LLM_BASE_URL="https://api.openai.com/v1" \
-e LLM_API_KEY="sk-..." \
-e LLM_MODEL="gpt-4o" \
-e EMBED_BASE_URL="https://api.voyageai.com/v1" \
-e EMBED_API_KEY="pa-..." \
-e AGENT_API_KEY="my-secret-key" \
my-agent:latestBuild a multi-arch image and push to any registry:
# Build for both x86 and ARM (runs on servers + Raspberry Pi)
docker buildx build --platform linux/amd64,linux/arm64 \
-t ghcr.io/you/my-agent:v1 --push .
# Anyone can now run your agent with one command:
docker run -p 8000:8000 --env-file .env ghcr.io/you/my-agent:v1Your agent is now a portable Docker image that anyone can pull and run. They just bring their own API keys.
Used by kash build to call LLM and embedding APIs.
build_providers:
llm:
base_url: "https://api.openai.com/v1" # or any OpenAI-compatible endpoint
api_key: "sk-..."
model: "gpt-4o"
embedder:
base_url: "https://api.voyageai.com/v1"
api_key: "pa-..."
model: "voyage-3" # optional if using a router
# reranker: # optional β must be Cohere-compatible (/rerank endpoint)
# base_url: "https://api.cohere.ai/v1" # Cohere, Jina, Voyage, or a LiteLLM proxy
# api_key: "..."
# model: "rerank-english-v3.0" # or jina-reranker-v2-base-en, rerank-1, etc.Provider agnostic β works with any OpenAI-compatible endpoint. Use LiteLLM, Ollama, or TrueFoundry as a proxy.
Used by kash serve and Docker containers.
| Variable | Required | Description |
|---|---|---|
LLM_BASE_URL |
β | OpenAI-compatible LLM endpoint |
LLM_API_KEY |
β | LLM API key |
LLM_MODEL |
β | Model name (e.g. gpt-4o) |
EMBED_BASE_URL |
β | Embedding API endpoint |
EMBED_API_KEY |
β | Embedding API key |
EMBED_MODEL |
β | Embedding model (optional if using a router) |
RERANK_BASE_URL |
β | Reranker base URL β must expose a Cohere-compatible /rerank endpoint |
RERANK_API_KEY |
β | Reranker API key |
RERANK_MODEL |
β | Reranker model name (e.g. rerank-english-v3.0) |
RERANK_ENDPOINT |
β | Full rerank URL override (e.g. https://gateway.example.com/v1/rerank) β takes priority over RERANK_BASE_URL |
AGENT_API_KEY |
β | Enable auth β all endpoints (except /health) require Authorization: Bearer <key> |
PORT |
β | Override listen port (default: 8000) |
Each project has an agent.yaml that defines persona, embedding dimensions, and MCP tools:
agent:
name: "my-expert"
version: "1.0.0"
description: "An expert AI agent powered by Kash"
system_prompt: |
You are a highly knowledgeable expert assistant...
runtime:
embedder:
dimensions: 1024 # must match build AND serve time
mcp:
tools:
- name: "search_my_expert_knowledge"
description: "Auto-generated by kash build"
server:
port: 8000
cors_origins: ["*"]Important: The
dimensionsvalue is NOT sent to the embedding API β some providers don't support it. Kash handles truncation locally.
- Go 1.25+ β Install Go
- Git
git clone https://github.com/akashicode/kash.git
cd Kash
# Build for your platform
go build -o bin/kash ./cmd/Kash
# Or use Make
make build# Linux
GOOS=linux GOARCH=amd64 go build -o bin/kash-linux ./cmd/Kash
# macOS (Apple Silicon)
GOOS=darwin GOARCH=arm64 go build -o bin/kash-darwin ./cmd/Kash
# Windows
GOOS=windows GOARCH=amd64 go build -o bin/kash.exe ./cmd/Kash
# All platforms at once
make build-all# Linux / macOS
sudo make install
# β installs to /usr/local/bin/kash
# Windows (PowerShell as Admin)
Copy-Item bin\kash.exe C:\Windows\System32\kash.exemake test # Run all tests
make test-v # Verbose output
make coverage # Generate HTML coverage report
make fmt # Format code
make vet # Static analysis
make lint # golangci-lint (install first)
make tidy # go mod tidy
make clean # Remove build artifactsKash/
βββ cmd/ # CLI commands (Cobra)
β βββ kash/main.go # Entry point
β βββ root.go # Root command + Viper config
β βββ init.go # kash init
β βββ build.go # kash build
β βββ serve.go # kash serve
β βββ version.go # kash version
βββ internal/
β βββ config/ # Unified config (env + YAML)
β βββ display/ # Colorful CLI output + banners
β βββ chunker/ # Text chunking
β βββ reader/ # Document loading (PDF, MD, TXT)
β βββ llm/ # LLM client, embedder, reranker
β βββ vector/ # chromem-go vector store
β βββ graph/ # cayley knowledge graph
β βββ server/ # HTTP server (REST, MCP, A2A)
βββ Makefile
βββ Dockerfile # Base image (multi-arch)
βββ go.mod
| Feature | Status | Notes |
|---|---|---|
kash init |
β Stable | Full project scaffolding |
kash build |
β Stable | PDF, Markdown, TXT ingestion |
kash serve |
β Stable | All three interfaces |
| REST API | β Tested | Drop-in OpenAI replacement |
| MCP Server | β Tested | Works with Cursor & Windsurf |
| A2A Protocol | π§ͺ In Progress | Implementation done, testing pending |
| Hybrid RAG | β Stable | Vector + Graph search |
| Reranker | β Optional | Cohere-compatible rerank API (/rerank endpoint) |
| Multi-arch Docker | β Stable | amd64 + arm64 |
| Streaming responses | β Stable | SSE streaming for REST API |
| π§ | Zero Infrastructure β No Pinecone, no Redis, no PostgreSQL. Everything is embedded in a single binary. |
| π³ | Ship as Docker β Your agent is a lightweight image. Push to a registry and anyone can run it with docker run. |
| π | BYOM (Bring Your Own Model) β Works with OpenAI, Anthropic (via proxy), Ollama, LiteLLM, TrueFoundry β any OpenAI-compatible endpoint. |
| β‘ | Fast β Go binary starts in <50ms. No Python cold starts. No dependency hell. |
| π§ | Hybrid RAG β Vector similarity + knowledge graph traversal. Better context than vector-only retrieval. |
| π | Three Interfaces β REST (any chat UI), MCP (IDEs), A2A (multi-agent). One build, three ways to connect. |
MIT β do whatever you want with it.
β‘ Kash
Cache your knowledge. Channel the Akashic. No infrastructure required.