Ollama Fork Strategy for Nexus
Date: 2025-12-27 Correction: This replaces the Llama fork research - user meant Ollama (the runtime), not Llama (the model)
What is Ollama?
Ollama is the local LLM runtime that powers LARS. It's what actually runs the AI models on your hardware. Think of it as: - Ollama = The engine (runtime) - Qwen/Llama/etc = The fuel (models)
License: MIT (Fully Open Source!)
Unlike Llama's restrictive license, Ollama uses the MIT License which allows: - ✅ Commercial use - ✅ Modification - ✅ Distribution - ✅ Private use - ✅ No attribution required - ✅ No naming requirements
You can fork it, call it whatever you want, and sell it.
Why Fork Ollama?
1. Custom Features for Nexus
- Built-in Gateway integration
- Native MCP tool support
- Nexus-specific optimizations
- Custom model management
2. Branding
- "Nexus AI Runtime" instead of "Ollama"
- Integrated with Nexus ecosystem
- Client-ready deployment
3. Control
- No dependency on upstream changes
- Custom API extensions
- Tailored for your hardware (dual 3090s)
Ollama Architecture
Language: Go (Golang)
Key Components: - CLI for model management - REST API server (port 11434) - Model storage/caching - GGUF/Safetensors import - Streaming response handling
API Endpoints:
GET /api/tags - List models
POST /api/generate - Generate text
POST /api/chat - Chat with history
POST /api/embeddings - Generate embeddings
POST /api/pull - Download model
POST /api/create - Create custom model
How to Fork
# Clone the repo
git clone https://github.com/ollama/ollama.git
cd ollama
# Rename to your fork
mv ollama nexus-ai-runtime
# Build from source (requires Go)
go build .
# Or use Docker
docker build -t nexus-ai-runtime .
What We Could Customize
API Extensions
- Add
/api/toolsendpoint for Gateway integration - Add
/api/voiceendpoint for direct TTS - Add
/api/nexusfor system integration
Model Management
- Auto-pull models on first use
- Custom model registry (not just ollama.com)
- Pre-configured models for clients
Performance
- Optimized for RTX 3090 architecture
- Custom CUDA kernels if needed
- Better multi-GPU support
Recommended Approach
Phase 1: Soft Fork
- Fork the repo
- Add custom API endpoints
- Keep syncing with upstream
- Deploy as "Ollama (Nexus Edition)"
Phase 2: Hard Fork (If Needed)
- Diverge significantly
- Rename to "Nexus AI Runtime"
- Maintain independently
- Add major custom features
Phase 3: Client Deployment
- Package for easy installation
- Include pre-trained LARS model
- One-click setup for clients
- Managed updates
Resources
- GitHub: https://github.com/ollama/ollama
- License: MIT (fully permissive)
- Language: Go
- Current LARS Setup: http://100.89.34.86:11434
Key Takeaway
Ollama's MIT license means we can: 1. Fork it completely 2. Call it whatever we want 3. Sell it to clients 4. Modify it however we need 5. No attribution required
This is MUCH better than the Llama model license situation.