Experiment 004 - LARS 3D Task Reasoning v1
Date: 2025-12-29 Status: Completed - SUCCESS
Configuration
- Base Model: qwen2.5-7b-abliterated
- Dataset: lars_3d_combined.json (20 examples: 12 identity + 8 tasks)
- Format: 3D with
and tags - Epochs: 10
- Learning Rate: 3e-4
- Max Length: 1024 tokens
- Training Time: ~5 minutes
Results
- Loss: 2.96 → 0.026 (excellent convergence)
- Task Reasoning Test: 4/4 novel tasks handled with 3D format
- Multi-step Planning: WORKING
- Storage Decision: WORKING
- Ambiguity Handling: WORKING
Novel Task Test Results
- 'Find that thing saved last week' → Reasoned about Context/KB search
- 'Check website redesign project' → Planned to search Track, create if missing
- 'Document this conversation' → Listed storage options, asked for clarification
- 'Which experiment worked best' → Planned to compare KPIs, identify metrics
Key Achievement
LARS now reasons through multi-step tasks, not just identity questions. The thinking process includes: - Breaking down vague requests - Identifying storage locations - Planning search operations - Handling ambiguity
Issues Found
- Some hallucination (made up numbers like '18 experiments')
- Base model knowledge mixing with trained knowledge
- But: REASONING PATTERN is correct
Comparison to EXP-003
- EXP-003: 12 examples, identity only → Identity works
- EXP-004: 20 examples, identity + tasks → Tasks work too
- Combined training preserves identity while adding capabilities
Output
- Path: ~/corlera-training/outputs/lars-3d-v2-tasks