Quickstart - 0xcat Akio

Prerequisites

Rust & Cargo — Install from rustup.rs
clang C/C++ compiler (for building llama.cpp)
cmake build system
pkg-config package for your OS
libssl-dev package for your OS
libomp-dev package for your OS
(Optional) CUDA or Metal for GPU acceleration

Step 1: Install Akio

The recommended way to install Akio is directly from the GitHub repository using Cargo:

CPU
CUDA
Metal

cargo install --git https://github.com/Fastiraz/akio.git

cargo install --git https://github.com/Fastiraz/akio.git --features cuda

Requires a CUDA-capable NVIDIA GPU and the CUDA Toolkit installed.

cargo install --git https://github.com/Fastiraz/akio.git --features metal

macOS only. Requires Apple Silicon or an AMD/Intel GPU with Metal support.

This compiles Akio with llama.cpp statically linked — no external runtime or model server needed.

The first build takes a few minutes since it compiles llama.cpp from source. Subsequent builds are incremental.

Step 2: Pull a model

Akio downloads GGUF models directly from Hugging Face:

akio pull Fastiraz/Qwen3.5-9B-GGUF

This downloads the repository’s GGUF files into Akio’s local model store. You can browse available GGUF models at huggingface.co.

Start with a smaller model like ggml-org/Qwen3-0.6B-GGUF for faster testing on CPU.

Step 3: Run the agent

akio run -m Fastiraz/Qwen3.5-9B-GGUF

This starts an interactive chat session. Akio will load the model and give you a prompt where you can type tasks. The agent has access to its built-in tools (shell, read, write, glob, websearch) and will use them autonomously to complete your requests.

With GPU acceleration

akio run -m Fastiraz/Qwen3.5-9B-GGUF --ngl 99

--ngl specifies how many transformer layers to offload to the GPU. Use 99 to offload all layers.

With a custom context window

akio run -m Fastiraz/Qwen3.5-9B-GGUF -c 16384

The default context size is 8192 tokens.

Step 4: Generate embeddings

Akio can run embedding models locally to produce vector representations of text — useful for semantic search, RAG pipelines, or similarity comparisons.

akio embedding -m Qwen3-Embedding-0.6B-Q8_0.gguf "Hello, world!" "Another sentence"

This outputs a JSON array of L2-normalized float vectors, one per input. You can also use the embed alias:

akio embed -m Qwen3-Embedding-0.6B-Q8_0.gguf "Hello, world!"

Pull a dedicated embedding model first: akio pull Fastiraz/Qwen3-Embedding-0.6B-GGUF

Step 5: List your models

akio list         # Show downloaded repositories
akio list --all   # Show whitelisted repositories and individual GGUF files

Next steps

CLI Reference

All commands and flags documented.

Built-in Tools

What tools the agent can use out of the box.

MCP Servers

Extend Akio with external tool servers.

Roadmap

What’s coming next for Akio.

​Prerequisites

​Step 1: Install Akio

​Step 2: Pull a model

​Step 3: Run the agent

​With GPU acceleration

​With a custom context window

​Step 4: Generate embeddings

​Step 5: List your models

​Next steps

CLI Reference

Built-in Tools

MCP Servers

Roadmap

Prerequisites

Step 1: Install Akio

Step 2: Pull a model

Step 3: Run the agent

With GPU acceleration

With a custom context window

Step 4: Generate embeddings

Step 5: List your models

Next steps