Architecture
SciAgent follows a Think → Act → Observe cycle. This page explains the internal components.
Components
┌─────────────────────────────────────────────────────────┐
│ AgentLoop │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Context │ │ LLM │ │ Tool │ │
│ │ Window │ │ Client │ │ Registry │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │ │
│ ┌───────┴───────┐ │
│ │ Skills │ │
│ └───────────────┘ │
│ │ │
│ ┌────────────┴────────────┐ │
│ │ Sub-Agent Orchestrator │ │
│ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Agent Loop
The core loop in sciagent.agent.AgentLoop:
- Context building - Compile messages: system prompt, task, history, tool results
- LLM invocation - Pass to
LLMClient.chat(), receive text and/or tool calls - Tool execution - Execute tools, append results to context
- Observation - Check for completion or errors
- Iteration control - Track iterations/tokens, summarize if needed
Sessions auto-save to .agent_states for resumption.
Context Window
ContextWindow manages conversation history with three roles: system, user, assistant. Tool results are inserted as assistant messages with tool_result fields.
When approaching token limits, older messages are summarized while preserving tool-use integrity:
def _find_safe_cut_point(self, start, forward=True):
"""Find cut points that don't orphan tool_use/tool_result pairs."""
LLM Client
sciagent.llm.LLMClient wraps litellm for multi-provider support:
chat(messages, tools)- Send messages with tool schemaschat_stream()- Streaming variantconfigure_cache(backend)- Enable caching (local, redis, disabled)
Tool System
Tools extend BaseTool with name, description, parameters (JSON schema), and execute().
Atomic Tools
Full-featured tools in sciagent.tools.atomic:
bash- Shell execution with timeoutsfile_ops- Read/write/replace/listsearch- Glob and grepweb- Search and fetchtodo- Task graph managementskill- Load workflow instructionsask_user- User interaction
Tool Registry
ToolRegistry handles registration, lookup, and execution:
registry = create_default_registry(working_dir="./project")
registry.register(my_tool)
registry.execute("bash", command="ls")
Skills
Skills are loadable workflows in src/sciagent/skills/*/SKILL.md:
---
name: sci-compute
triggers:
- "simulat(e|ion)"
- "run.*(meep|gromacs)"
---
# Workflow instructions...
When user input matches triggers, skill instructions inject into context.
Built-in skills:
sci-compute- Scientific simulations with research-first workflowbuild-service- Docker service buildingcode-review- Comprehensive code review
Sub-agents
Sub-agents are isolated agents with their own context and tool set. Each uses a cost-optimised model tier defined in src/sciagent/defaults.py:
- Scientific (SCIENTIFIC_MODEL): Best quality for scientific code and deep reasoning
- Coding (CODING_MODEL): Good for implementation, debugging, research
- Fast (FAST_MODEL): Quick/cheap for exploration and extraction
Defined by SubAgentConfig:
SubAgentConfig(
name="explore",
description="Fast codebase exploration",
system_prompt="...",
model=FAST_MODEL, # Uses tiered model
max_iterations=15,
allowed_tools=["file_ops", "search", "bash"]
)
Built-in sub-agents: | Name | Model Tier | Purpose | Tools | |——|————|———|——-| | explore | Fast | Quick codebase searches | file_ops, search, bash | | debug | Coding | Error investigation | file_ops, search, bash, web, skill | | research | Coding | Web/doc research | web, file_ops, search | | plan | Scientific | Break down problems | file_ops, search, bash, web, skill, todo | | general | Coding | Multi-step tasks | all | | verifier | Verification | Independent validation | file_ops, search, bash |
Orchestration
SubAgentOrchestrator manages spawning and parallel execution:
orch = SubAgentOrchestrator(tools=registry, working_dir=".")
result = orch.spawn("explore", "Find API endpoints")
results = orch.spawn_parallel([
{"agent_name": "research", "task": "Find documentation for S4 library"},
{"agent_name": "debug", "task": "Investigate build error in logs"}
])
Verification System
SciAgent implements a three-tier verification architecture to prevent fabricated data and ensure scientific integrity.
Verification Gates
┌─────────────────────────────────────────────────────────┐
│ Task Execution │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ GATE 1: DATA GATE │
│ • Verify HTTP fetches succeeded (status 200) │
│ • Detect HTML/error pages in data files │
│ • Validate CSV structure and row counts │
│ • Prevents analysis on fabricated/invalid data │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ GATE 2: EXEC GATE │
│ • Verify commands actually ran │
│ • Check exit codes (success = 0) │
│ • Ensure verification tasks completed │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ GATE 3: LLM VERIFICATION │
│ • Independent verifier subagent (fresh context) │
│ • Skeptical auditor with no conversation history │
│ • Returns verdict: verified | refuted | insufficient │
│ • Detects fabrication indicators │
└─────────────────────────────────────────────────────────┘
Verifier Subagent
The LLM verification gate spawns an independent verifier subagent:
SubAgentConfig(
name="verifier",
description="Independent verification of claims",
model=VERIFICATION_MODEL, # Sonnet by default
temperature=0.0, # Deterministic
allowed_tools=["file_ops", "search", "bash"]
)
Key properties:
- Fresh context: No conversation history (prevents bias)
- Adversarial: Defaults to “insufficient” verdict
- Read-only: Can read files and run verification commands but not modify
Configuration
Configure gates in OrchestratorConfig:
OrchestratorConfig(
enable_data_gate=True, # Verify data provenance
data_gate_strict=True, # Block on failure (vs warn)
enable_exec_gate=True, # Verify execution
exec_gate_strict=True,
enable_verification=True, # LLM verification
verification_strict=True,
verification_threshold=0.7 # Confidence threshold
)
Content Validation
ContentValidator in tools/atomic/todo.py detects fabrication patterns:
- HTML in data files (downloaded error page instead of data)
- Placeholder values (suspiciously round numbers)
- Error messages in output (404, access denied, stack traces)
- Invalid CSV structure
TodoItem Verification
Tasks can request verification via the verify flag:
TodoItem(
content="Analyze protein fitness data",
produces="file:results.csv:csv:100", # Expect CSV with 100 rows
verify=True # Run LLM verification on completion
)
Service Registry
Scientific services in src/sciagent/services/registry.yaml:
- name: rcwa
image: ghcr.io/sciagent-ai/rcwa
capabilities: ["RCWA simulation", "photonic crystals"]
timeout: 300
Resolution order: local image → pull from GHCR → build from Dockerfile
Services run in Docker with workspace mounted:
docker run --rm -v "$(pwd)":/workspace -w /workspace ghcr.io/sciagent-ai/rcwa python3 script.py