Ollama Fork Strategy for Nexus

Date: 2025-12-27 Correction: This replaces the Llama fork research - user meant Ollama (the runtime), not Llama (the model)

What is Ollama?

Ollama is the local LLM runtime that powers LARS. It's what actually runs the AI models on your hardware. Think of it as: - Ollama = The engine (runtime) - Qwen/Llama/etc = The fuel (models)

License: MIT (Fully Open Source!)

Unlike Llama's restrictive license, Ollama uses the MIT License which allows: - ✅ Commercial use - ✅ Modification - ✅ Distribution - ✅ Private use - ✅ No attribution required - ✅ No naming requirements

You can fork it, call it whatever you want, and sell it.

Why Fork Ollama?

1. Custom Features for Nexus

Built-in Gateway integration
Native MCP tool support
Nexus-specific optimizations
Custom model management

2. Branding

"Nexus AI Runtime" instead of "Ollama"
Integrated with Nexus ecosystem
Client-ready deployment

3. Control

No dependency on upstream changes
Custom API extensions
Tailored for your hardware (dual 3090s)

Ollama Architecture

Language: Go (Golang)

Key Components: - CLI for model management - REST API server (port 11434) - Model storage/caching - GGUF/Safetensors import - Streaming response handling

API Endpoints:

GET  /api/tags          - List models
POST /api/generate      - Generate text
POST /api/chat          - Chat with history
POST /api/embeddings    - Generate embeddings
POST /api/pull          - Download model
POST /api/create        - Create custom model

How to Fork

# Clone the repo
git clone https://github.com/ollama/ollama.git
cd ollama

# Rename to your fork
mv ollama nexus-ai-runtime

# Build from source (requires Go)
go build .

# Or use Docker
docker build -t nexus-ai-runtime .

What We Could Customize

API Extensions

Add /api/tools endpoint for Gateway integration
Add /api/voice endpoint for direct TTS
Add /api/nexus for system integration

Model Management

Auto-pull models on first use
Custom model registry (not just ollama.com)
Pre-configured models for clients

Performance

Optimized for RTX 3090 architecture
Custom CUDA kernels if needed
Better multi-GPU support

Recommended Approach

Phase 1: Soft Fork

Fork the repo
Add custom API endpoints
Keep syncing with upstream
Deploy as "Ollama (Nexus Edition)"

Phase 2: Hard Fork (If Needed)

Diverge significantly
Rename to "Nexus AI Runtime"
Maintain independently
Add major custom features

Phase 3: Client Deployment

Package for easy installation
Include pre-trained LARS model
One-click setup for clients
Managed updates

Resources

GitHub: https://github.com/ollama/ollama
License: MIT (fully permissive)
Language: Go
Current LARS Setup: http://100.89.34.86:11434

Key Takeaway

Ollama's MIT license means we can: 1. Fork it completely 2. Call it whatever we want 3. Sell it to clients 4. Modify it however we need 5. No attribution required

This is MUCH better than the Llama model license situation.

Ollama Fork Strategy - Custom Local AI Runtime