Prerequisites
Rust & Cargo — Install from rustup.rs
clang C/C++ compiler (for building llama.cpp)
cmake build system
pkg-config package for your OS
libssl-dev package for your OS
libomp-dev package for your OS
(Optional) CUDA or Metal for GPU acceleration
Step 1: Install Akio
The recommended way to install Akio is directly from the GitHub repository using Cargo:
cargo install --git https://github.com/Fastiraz/akio.git
cargo install --git https://github.com/Fastiraz/akio.git --features cuda
Requires a CUDA-capable NVIDIA GPU and the CUDA Toolkit installed. cargo install --git https://github.com/Fastiraz/akio.git --features metal
macOS only. Requires Apple Silicon or an AMD/Intel GPU with Metal support.
This compiles Akio with llama.cpp statically linked — no external runtime or model server needed.
The first build takes a few minutes since it compiles llama.cpp from source. Subsequent builds are incremental.
Step 2: Pull a model
Akio downloads GGUF models directly from Hugging Face:
akio pull Fastiraz/Qwen3.5-9B-GGUF
This downloads the repository’s GGUF files into Akio’s local model store. You can browse available GGUF models at huggingface.co .
Start with a smaller model like ggml-org/Qwen3-0.6B-GGUF for faster testing on CPU.
Step 3: Run the agent
akio run -m Fastiraz/Qwen3.5-9B-GGUF
This starts an interactive chat session. Akio will load the model and give you a prompt where you can type tasks. The agent has access to its built-in tools (shell, read, write, glob, websearch) and will use them autonomously to complete your requests.
With GPU acceleration
akio run -m Fastiraz/Qwen3.5-9B-GGUF --ngl 99
--ngl specifies how many transformer layers to offload to the GPU. Use 99 to offload all layers.
With a custom context window
akio run -m Fastiraz/Qwen3.5-9B-GGUF -c 16384
The default context size is 8192 tokens.
Step 4: Generate embeddings
Akio can run embedding models locally to produce vector representations of text — useful for semantic search, RAG pipelines, or similarity comparisons.
akio embedding -m Qwen3-Embedding-0.6B-Q8_0.gguf "Hello, world!" "Another sentence"
This outputs a JSON array of L2-normalized float vectors, one per input. You can also use the embed alias:
akio embed -m Qwen3-Embedding-0.6B-Q8_0.gguf "Hello, world!"
Pull a dedicated embedding model first: akio pull Fastiraz/Qwen3-Embedding-0.6B-GGUF
Step 5: List your models
akio list # Show downloaded repositories
akio list --all # Show whitelisted repositories and individual GGUF files
Next steps
CLI Reference All commands and flags documented.
Built-in Tools What tools the agent can use out of the box.
MCP Servers Extend Akio with external tool servers.
Roadmap What’s coming next for Akio.