⚡ Kash

Cache your knowledge. Channel the Akashic.
Compile your documents into an embedded GraphRAG brain. Ship AI agents as Docker images.

Agent as a Service · Knowledge as a Service

Kash /kæʃ/ — A double-entendre by design. Like a cache, it compiles your heavy knowledge into a fast, local binary you can ship anywhere. Like the Akashic records, it holds a complete, queryable record of everything you've fed it. One word. Two meanings. Zero infrastructure.

💡 What is Kash?

Kash is a Go CLI that turns your raw documents (PDFs, Markdown, text files) into a self-contained AI agent packaged in a lightweight Docker container (~50MB).

No Python runtime. No external vector databases. No infrastructure headaches.

Your Documents  →  kash build  →  Docker Image  →  Ship Anywhere 🚀

Think of it like a static site generator, but for AI brains. You compile knowledge at build time, and the runtime only serves queries — fast, lightweight, and portable.

The "Compiler" Approach

Traditional RAG Stack	Kash
Python app + Pinecone + Redis + FastAPI	Single Go binary + lightweight Docker image
Runtime document ingestion	Build-time compilation
External vector DB dependency	Embedded pure-Go vector store
Complex deployment	`docker run` and done
$$$ infrastructure costs	Runs on a Raspberry Pi

🎯 Use Cases

📖 Expert Knowledge Agent

Feed your company docs, runbooks, or research papers. Get an AI that actually knows your stuff and cites sources.

Example: Internal engineering wiki → Docker image → every dev has a domain expert on tap.

🎓 Study / Exam Prep Agent

Compile textbooks and notes into a Socratic tutor that quizzes you, explains concepts, and never makes things up.

Example: UPSC prep material → AI tutor → study from anywhere.

🛠️ Product Support Agent

Turn your API docs, changelogs, and FAQs into a support bot that plugs into any chat UI or IDE.

Example: Docs + release notes → Docker image → mount in Open WebUI.

🤝 Multi-Agent Teams

Spin up multiple specialized agents (legal, finance, engineering) and wire them together via A2A protocol.

Example: Three domain agents → CrewAI orchestration → one smart team.

Scanned PDFs / OCR

Note: Scanned (image-only) PDF support (OCR) is not yet available. Kash currently extracts embedded/selectable text only.

⚡ Quick Start

5-Minute Setup

# 1. Install Kash (build from source — see "Building from Source" below)
go install github.com/akashicode/kash/cmd/kash@latest

# 2. Configure your API providers
mkdir -p ~/.kash
cat > ~/.kash/config.yaml << 'EOF'
build_providers:
  llm:
    base_url: "https://api.openai.com/v1"
    api_key: "sk-..."
    model: "gpt-4o"
  embedder:
    base_url: "https://api.voyageai.com/v1"
    api_key: "pa-..."
    model: "voyage-3"       # make sure the model dimensions match in agent.yaml in agent config
EOF

# 3. Scaffold a new agent
kash init my-expert

# 4. Add your knowledge
cp ~/docs/*.pdf my-expert/data/
cp ~/notes/*.md my-expert/data/

# 5. Compile the knowledge base
cd my-expert
kash build --dir /path/to/my-expert

# 6. Serve locally (no Docker needed!)
kash serve -d /path/to/my-expert

Your agent is now live at http://localhost:8000 with three interfaces ready to go.

🏗️ Architecture

flowchart TB
    subgraph BUILD["🔨 Build Time"]
        direction LR
        D["📄 Documents\nPDF / MD / TXT"] --> CK["Chunker"]
        CK --> EMB["Embedder API"]
        CK --> LLM1["LLM API\ntriple extraction"]
        EMB --> VDB["Vector DB\ndata/memory.chromem"]
        LLM1 --> GDB["Graph DB\ndata/knowledge.cayley"]
    end

    BUILD -- "docker build" --> RUNTIME

    subgraph RUNTIME["⚡ Runtime — port 8000"]
        direction LR
        Q["Query"] --> HS["Hybrid Search\nVector + Graph"]
        HS --> RR["Rerank\noptional"]
        RR --> LLM2["LLM"]
        LLM2 --> REST["REST API\nPOST /v1/chat/completions"]
        LLM2 --> MCP["MCP Server\nGET /mcp"]
        LLM2 --> A2A["A2A Protocol\nPOST /rpc/agent"]
    end

Core Stack

Component	Technology	Purpose
CLI Framework	`spf13/cobra`	Developer interface (`init`, `build`, `serve`)
Vector Memory	`philippgille/chromem-go`	Pure-Go embedded vector store
Graph Memory	`cayleygraph/cayley`	Embedded knowledge graph (triples)
LLM Client	`sashabaranov/go-openai`	Build-time extraction & runtime queries
MCP Protocol	Model Context Protocol	Tool exposure for Cursor / Windsurf / IDEs
A2A Protocol	JSON-RPC	Multi-agent orchestration (AutoGen, CrewAI)

Hybrid RAG Pipeline

Every query (REST, MCP, A2A) runs through the same pipeline:

flowchart LR
    Q["Query"] --> E["Embed"]
    Q --> K["Keywords"]
    E --> VS["Vector Search\nchromem-go"]
    K --> GT["Graph Traversal\ncayley"]
    VS --> M["Merge"]
    GT --> M
    M --> R["Rerank\noptional"]
    R --> C["Context → LLM"]

🖥️ CLI Reference

`kash init <name>`

Scaffolds a new agent project.

kash init my-agent

Creates:

my-agent/
├── data/               # Drop your PDFs, Markdown, TXT here
├── agent.yaml          # Agent persona + config
├── Dockerfile          # Ready for docker build
├── docker-compose.yml  # One-command local deployment
├── .env.example        # Runtime env var template
├── .dockerignore       # Keeps images clean
└── README.md           # Auto-generated docs

`kash build`

Compiles documents into vector + graph databases.

kash build                     # in current directory
kash build --dir ./my-agent    # specify project dir

Flag	Short	Default	Description
`--dir`	`-d`	`.`	Project directory to build

Pipeline:

Load documents from data/
Chunk text into passages
Generate vector embeddings → data/memory.chromem/
Extract knowledge graph triples → data/knowledge.cayley/
Auto-generate MCP tool descriptions → agent.yaml

`kash serve`

Starts the runtime HTTP server.

kash serve                          # default: port 8000, ./agent.yaml
kash serve --port 9000              # custom port
kash serve --dir ./my-agent         # serve from specific directory
kash serve --agent custom.yaml      # custom agent config path

Flag	Short	Default	Description
`--port`	`-p`	`8000`	Listen port (overridden by `PORT` env var)
`--agent`	`-a`	`agent.yaml`	Path to agent configuration
`--dir`	`-d`	`.`	Project directory

`kash version`

kash version
# kash v1.0.0
#   commit:     a3f9c12
#   built:      2026-02-27T10:00:00Z
#   go version: go1.25.0
#   os/arch:    linux/amd64

🔌 Runtime Interfaces

All three interfaces serve concurrently on a single port.

REST API — `POST /v1/chat/completions`

Drop-in replacement for the OpenAI API. Intercepts requests, runs hybrid RAG, injects context, proxies to your LLM.

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Explain the key concepts"}]
  }'

Works with LibreChat, Open WebUI, AnythingLLM, and any OpenAI-compatible client.

MCP Server — `GET /mcp`

Model Context Protocol over HTTP SSE. Exposes your knowledge base as tools to IDEs.

{
  "mcpServers": {
    "my-agent": {
      "url": "http://localhost:8000/mcp"
    }
  }
}

Tested and working with Cursor and Windsurf.

A2A Protocol — `POST /rpc/agent`

JSON-RPC for multi-agent frameworks.

# Agent info
curl http://localhost:8000/rpc/agent \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"agent.info"}'

# Query knowledge
curl http://localhost:8000/rpc/agent \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":2,"method":"agent.query","params":{"query":"your question"}}'

🧪 A2A protocol implementation is complete. Integration testing with AutoGen/CrewAI is in progress.

🔐 Security — API Key Auth

By default all endpoints are open (ideal for local dev). Set AGENT_API_KEY to enable authentication on all endpoints except /health.

export AGENT_API_KEY="my-secret-key"
kash serve

The key is passed as a standard Bearer token — compatible with all three interfaces:

curl / any HTTP client

curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer my-secret-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"hello"}]}'

OpenAI Python / JS SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="my-secret-key",   # ← AGENT_API_KEY goes here
)

import OpenAI from 'openai';
const client = new OpenAI({
  baseURL: 'http://localhost:8000/v1',
  apiKey: 'my-secret-key',
});

MCP clients (Cursor, Claude Desktop, Windsurf)

{
  "mcpServers": {
    "my-agent": {
      "url": "http://localhost:8000/mcp",
      "env": {
        "API_KEY": "my-secret-key"
      }
    }
  }
}

A2A clients

curl http://localhost:8000/rpc/agent \
  -H "Authorization: Bearer my-secret-key" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"agent.info"}'

When AGENT_API_KEY is not set, everything works without any header (open access).

Health Check — `GET /health`

curl http://localhost:8000/health

{
  "status": "ok",
  "agent": "my-expert",
  "version": "1.0.0",
  "vectors": 892,
  "triples": 1423,
  "mcp_tools": 1,
  "embed_dimensions": 1024,
  "llm_model": "gpt-4o",
  "embed_model": "voyage-3",
  "reranker_enabled": false,
  "auth_enabled": true,
  "time": "2026-02-27T10:00:00Z"
}

/health is always public — no auth required even when AGENT_API_KEY is set.

🚀 Running Your Agent

Option 1: Local (No Docker)

Perfect for development and testing. Just build and serve directly:

# Set up providers
export LLM_BASE_URL="https://api.openai.com/v1"
export LLM_API_KEY="sk-..."
export LLM_MODEL="gpt-4o"
export EMBED_BASE_URL="https://api.voyageai.com/v1"
export EMBED_API_KEY="pa-..."

# Build the knowledge base
kash build

# Serve it
kash serve

That's it. Hit http://localhost:8000 and start chatting.

Option 2: Docker Compose (Recommended)

One command to build and run:

# Fill in your keys
cp .env.example .env
# edit .env with your API keys

# Build the knowledge base first
kash build

# Build image + run
docker compose up --build

Option 3: Docker Run (Manual)

# Build the image
docker build -t my-agent:latest .

# Run with env vars
docker run -p 8000:8000 \
  -e LLM_BASE_URL="https://api.openai.com/v1" \
  -e LLM_API_KEY="sk-..." \
  -e LLM_MODEL="gpt-4o" \
  -e EMBED_BASE_URL="https://api.voyageai.com/v1" \
  -e EMBED_API_KEY="pa-..." \
  -e AGENT_API_KEY="my-secret-key" \
  my-agent:latest

Option 4: Share With the World 🌍

Build a multi-arch image and push to any registry:

# Build for both x86 and ARM (runs on servers + Raspberry Pi)
docker buildx build --platform linux/amd64,linux/arm64 \
  -t ghcr.io/you/my-agent:v1 --push .

# Anyone can now run your agent with one command:
docker run -p 8000:8000 --env-file .env ghcr.io/you/my-agent:v1

Your agent is now a portable Docker image that anyone can pull and run. They just bring their own API keys.

⚙️ Configuration

Build-Time: `~/.kash/config.yaml`

Used by kash build to call LLM and embedding APIs.

build_providers:
  llm:
    base_url: "https://api.openai.com/v1"    # or any OpenAI-compatible endpoint
    api_key: "sk-..."
    model: "gpt-4o"
  embedder:
    base_url: "https://api.voyageai.com/v1"
    api_key: "pa-..."
    model: "voyage-3"                          # optional if using a router
  # reranker:        # optional — must be Cohere-compatible (/rerank endpoint)
  #   base_url: "https://api.cohere.ai/v1"  # Cohere, Jina, Voyage, or a LiteLLM proxy
  #   api_key: "..."
  #   model: "rerank-english-v3.0"           # or jina-reranker-v2-base-en, rerank-1, etc.

Provider agnostic — works with any OpenAI-compatible endpoint. Use LiteLLM, Ollama, or TrueFoundry as a proxy.

Runtime: Environment Variables

Used by kash serve and Docker containers.

Variable	Required	Description
`LLM_BASE_URL`	✅	OpenAI-compatible LLM endpoint
`LLM_API_KEY`	✅	LLM API key
`LLM_MODEL`	✅	Model name (e.g. `gpt-4o`)
`EMBED_BASE_URL`	✅	Embedding API endpoint
`EMBED_API_KEY`	✅	Embedding API key
`EMBED_MODEL`	❌	Embedding model (optional if using a router)
`RERANK_BASE_URL`	❌	Reranker base URL — must expose a Cohere-compatible `/rerank` endpoint
`RERANK_API_KEY`	❌	Reranker API key
`RERANK_MODEL`	❌	Reranker model name (e.g. `rerank-english-v3.0`)
`RERANK_ENDPOINT`	❌	Full rerank URL override (e.g. `https://gateway.example.com/v1/rerank`) — takes priority over `RERANK_BASE_URL`
`AGENT_API_KEY`	❌	Enable auth — all endpoints (except `/health`) require `Authorization: Bearer <key>`
`PORT`	❌	Override listen port (default: `8000`)

Agent Config: `agent.yaml`

Each project has an agent.yaml that defines persona, embedding dimensions, and MCP tools:

agent:
  name: "my-expert"
  version: "1.0.0"
  description: "An expert AI agent powered by Kash"
  system_prompt: |
    You are a highly knowledgeable expert assistant...

runtime:
  embedder:
    dimensions: 1024    # must match build AND serve time

mcp:
  tools:
    - name: "search_my_expert_knowledge"
      description: "Auto-generated by kash build"

server:
  port: 8000
  cors_origins: ["*"]

Important: The dimensions value is NOT sent to the embedding API — some providers don't support it. Kash handles truncation locally.

🔨 Building from Source

Prerequisites

Go 1.25+ — Install Go
Git

Build

git clone https://github.com/akashicode/kash.git
cd Kash

# Build for your platform
go build -o bin/kash ./cmd/Kash

# Or use Make
make build

Cross-Compile

# Linux
GOOS=linux GOARCH=amd64 go build -o bin/kash-linux ./cmd/Kash

# macOS (Apple Silicon)
GOOS=darwin GOARCH=arm64 go build -o bin/kash-darwin ./cmd/Kash

# Windows
GOOS=windows GOARCH=amd64 go build -o bin/kash.exe ./cmd/Kash

# All platforms at once
make build-all

Install System-Wide

# Linux / macOS
sudo make install
# → installs to /usr/local/bin/kash

# Windows (PowerShell as Admin)
Copy-Item bin\kash.exe C:\Windows\System32\kash.exe

🧪 Development

make test         # Run all tests
make test-v       # Verbose output
make coverage     # Generate HTML coverage report
make fmt          # Format code
make vet          # Static analysis
make lint         # golangci-lint (install first)
make tidy         # go mod tidy
make clean        # Remove build artifacts

Project Layout

Kash/
├── cmd/                          # CLI commands (Cobra)
│   ├── kash/main.go       # Entry point
│   ├── root.go                   # Root command + Viper config
│   ├── init.go                   # kash init
│   ├── build.go                  # kash build
│   ├── serve.go                  # kash serve
│   └── version.go                # kash version
├── internal/
│   ├── config/                   # Unified config (env + YAML)
│   ├── display/                  # Colorful CLI output + banners
│   ├── chunker/                  # Text chunking
│   ├── reader/                   # Document loading (PDF, MD, TXT)
│   ├── llm/                      # LLM client, embedder, reranker
│   ├── vector/                   # chromem-go vector store
│   ├── graph/                    # cayley knowledge graph
│   └── server/                   # HTTP server (REST, MCP, A2A)
├── Makefile
├── Dockerfile                    # Base image (multi-arch)
└── go.mod

📊 Project Status

Feature	Status	Notes
`kash init`	✅ Stable	Full project scaffolding
`kash build`	✅ Stable	PDF, Markdown, TXT ingestion
`kash serve`	✅ Stable	All three interfaces
REST API	✅ Tested	Drop-in OpenAI replacement
MCP Server	✅ Tested	Works with Cursor & Windsurf
A2A Protocol	🧪 In Progress	Implementation done, testing pending
Hybrid RAG	✅ Stable	Vector + Graph search
Reranker	✅ Optional	Cohere-compatible rerank API (`/rerank` endpoint)
Multi-arch Docker	✅ Stable	amd64 + arm64
Streaming responses	✅ Stable	SSE streaming for REST API

🌟 Why Kash?

🧊	Zero Infrastructure — No Pinecone, no Redis, no PostgreSQL. Everything is embedded in a single binary.
🐳	Ship as Docker — Your agent is a lightweight image. Push to a registry and anyone can run it with `docker run`.
🔑	BYOM (Bring Your Own Model) — Works with OpenAI, Anthropic (via proxy), Ollama, LiteLLM, TrueFoundry — any OpenAI-compatible endpoint.
⚡	Fast — Go binary starts in <50ms. No Python cold starts. No dependency hell.
🧠	Hybrid RAG — Vector similarity + knowledge graph traversal. Better context than vector-only retrieval.
🔌	Three Interfaces — REST (any chat UI), MCP (IDEs), A2A (multi-agent). One build, three ways to connect.

📜 License

MIT — do whatever you want with it.

⚡ Kash
Cache your knowledge. Channel the Akashic. No infrastructure required.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/workflows		.github/workflows
cmd		cmd
docs		docs
internal		internal
.gitignore		.gitignore
.golangci.yml		.golangci.yml
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

⚡ Kash

💡 What is Kash?

The "Compiler" Approach

🎯 Use Cases

📖 Expert Knowledge Agent

🎓 Study / Exam Prep Agent

🛠️ Product Support Agent

🤝 Multi-Agent Teams

Scanned PDFs / OCR

⚡ Quick Start

5-Minute Setup

🏗️ Architecture

Core Stack

Hybrid RAG Pipeline

🖥️ CLI Reference

kash init <name>

kash build

kash serve

kash version

🔌 Runtime Interfaces

REST API — POST /v1/chat/completions

MCP Server — GET /mcp

A2A Protocol — POST /rpc/agent

🔐 Security — API Key Auth

curl / any HTTP client

OpenAI Python / JS SDK

MCP clients (Cursor, Claude Desktop, Windsurf)

A2A clients

Health Check — GET /health

🚀 Running Your Agent

Option 1: Local (No Docker)

Option 2: Docker Compose (Recommended)

Option 3: Docker Run (Manual)

Option 4: Share With the World 🌍

⚙️ Configuration

Build-Time: ~/.kash/config.yaml

Runtime: Environment Variables

Agent Config: agent.yaml

🔨 Building from Source

Prerequisites

Build

Cross-Compile

Install System-Wide

🧪 Development

Project Layout

📊 Project Status

🌟 Why Kash?

📜 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`kash init <name>`

`kash build`

`kash serve`

`kash version`

REST API — `POST /v1/chat/completions`

MCP Server — `GET /mcp`

A2A Protocol — `POST /rpc/agent`

Health Check — `GET /health`

Build-Time: `~/.kash/config.yaml`

Agent Config: `agent.yaml`

Packages