πŸ“š KB / Web Intelligence / Data Flow Architecture
section

Data Flow Architecture

Data Flow Architecture

Web Intelligence: AI's Eyes on the Web

Web Intelligence is a standalone environment that serves as the AI's primary interface for web data collection. It is NOT limited to contact enrichment - it enables ANY task requiring web data.

Core Pattern

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Web Intelligence│───▢│    Staging      │───▢│     Target      β”‚
β”‚  (6670/6671)    β”‚    β”‚  (6680/6681)    β”‚    β”‚   Environment   β”‚
β”‚  crawl/scrape   β”‚    β”‚  temporary hold β”‚    β”‚ final destinationβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Why Staging?

The Staging Environment is a multi-tiered intermediary that: - Holds raw web data temporarily during processing - Allows transformation and enrichment before routing - Decouples web collection from final storage - Enables retry/reprocessing without re-crawling - Supports audit trails and processing logs

Downstream Environments

Web Intelligence data can flow to ANY Nexus environment:

Environment Port Use Case
KB 6625/6626 Documentation, knowledge articles
Documents TBD Reports, scraped content
Contacts 6630/6631 Contact enrichment
Corpus TBD Training data, transcripts
Links 6635/6636 URL bookmarks, discovered links
Track 6640/6641 Project research data

Example Data Flows

1. Documentation Scraping

web.scrape(docs_url)
  ↓
staging.store(raw_html)
  ↓
web.to_markdown()
  ↓
kb.create(title, markdown_content)

Use case: Scraping API documentation for LARS reference.

2. Competitive Research

web.research(competitor, depth='deep')
  ↓
staging.store(research_data)
  ↓
Process and structure
  ↓
documents.create(research_report)

Use case: Building competitive analysis reports.

3. Contact Enrichment

web.research(person_company, depth='standard')
  ↓
staging.store(enrichment_data)
  ↓
Extract entities and verify
  ↓
contact.update(contact_id, enriched_fields)

Use case: Adding business intelligence to CRM contacts.

4. YouTube Transcripts

web.fetch_transcript(video_url)
  ↓
staging.store(raw_transcript)
  ↓
Process and clean
  ↓
corpus.add(transcript_document)

Use case: Building training data from educational videos.

web.crawl(site_url, depth=2)
  ↓
web.discover_links()
  ↓
web.filter_links(criteria)
  ↓
links.save_batch(discovered_urls)

Use case: Building curated bookmark collections.

LARS Integration

LARS (Local AI Runtime System) uses Web Intelligence as its primary tool for gathering external information:

  1. Research tasks β†’ web.research()
  2. Documentation β†’ web.scrape() + web.to_markdown()
  3. Monitoring β†’ web.crawl() for changes
  4. Enrichment β†’ web.research() with entity extraction

All results flow through Staging, allowing LARS to process data incrementally and route to appropriate destinations.

Key Principles

  1. Separation of Concerns: Web Intelligence collects; Staging holds; Target environments store
  2. Flexibility: Same crawl data can route to multiple destinations
  3. Idempotency: Staging enables safe reprocessing
  4. Audit Trail: Processing steps tracked in Staging
  5. Universal Interface: One web tool serves all AI tasks
ID: 44d5fe84 Path: Web Intelligence > Data Flow Architecture Updated: 2026-01-08T12:37:38