How SciAgent Fits the AI Agent Landscape
The AI agent landscape in 2026 spans three categories: general-purpose coding agents, multi-agent orchestration frameworks, and domain-specific scientific systems. SciAgent bridges these — combining software engineering capabilities with containerized scientific computing, cloud compute orchestration, durable provenance, and a fresh-context independent verifier.
This page positions SciAgent against the field on three lenses:
- Levels of Scientific Automation — where SciAgent sits in the Self-Driving Laboratory autonomy hierarchy.
- Capability Axes — the capability questions that differentiate systems in this space.
- Feature Comparison by Category — concrete tool-by-tool tables for the three categories.
It closes with a use-case decision table.
Levels of Scientific Automation
The chemistry and materials community has formalised a Self-Driving Laboratory (SDL) autonomy framework analogous to SAE’s autonomy levels for self-driving cars [14][15]. Systems are scored on two independent axes — software autonomy (planning, decisions, analysis) and hardware autonomy (the physical execution substrate) — each from category 0 (manual) to category 3 (fully unattended, diverse experiments). Levels 2-5 are derived from combinations of the two axes; Level 2-3 is where the vast majority of demonstrated systems sit today, and a true Level 5 (cat 3 in both) remains unattained in the field.
Where SciAgent fits:
| Axis | Cat 0 | Cat 1 | Cat 2 | Cat 3 | SciAgent |
|---|---|---|---|---|---|
| Software — planning, dispatch, execution, analysis, verification | Human ideation | One-shot AI suggestion | AI plans + iterates | AI plans, executes, analyzes, verifies independently | Cat 2-3 |
| Compute substrate — provisioning, cluster lifecycle, workspace, fault tolerance | Manual | Single-task script | Workflow config | Diverse jobs, unattended | Cat 2-3 |
SciAgent automates the design, computation, and optimization half of scientific work — simulations, numerical experiments, data analysis, model fitting, design-space exploration. It sits in the AI-for-scientific-computing space, alongside emerging tools like Dyad (JuliaHub) connecting simulation and CAE — automating and connecting that ecosystem from the AI side. For its substrate (compute), it covers the same end-to-end loop the SDL framework describes: plan → dispatch → run → observe → derive → verify.
On the software-autonomy axis, SciAgent adds a closed audit loop: durable provenance plus an independent fresh-context verifier (see § Closed audit loop below). The verifier reads the log and the on-disk artifacts only, so a different LLM in a different process can re-audit a session it didn’t run [16].
Capability Axes
Eight capability axes that differentiate systems in this space:
| Axis | Question |
|---|---|
| Cloud compute | Can the agent provision and tear down cloud clusters as a normal tool, or does the user wire that up out-of-band? |
| Workspace persistence | Do outputs survive cluster teardown via cloud-backed storage, or are they lost when the cluster goes away? |
| Containerized scientific services | Are domain-specific environments (CFD, MD, photonics, EDA) bundled and registry-resolvable, or does the user wire up their own Docker images? |
| Durable provenance | Is there an append-only audit trail of every tool call, job, artifact, and verification — or do you just have agent transcripts? |
| Independent verification | Does an external/fresh-context verifier validate claims against the artifacts and the log, or is “verification” self-attestation by the same model? |
| Cross-LLM auditability | Can a different model in a different process re-audit a completed session? |
| Background work + checkpointing | Can long-running jobs survive crashes (per-iteration checkpoints, 3-way resume), or is failure terminal? |
| Software engineering | Beyond running simulations, can the agent navigate code, debug, do git ops, refactor? |
The matrix below scores broad categories. Individual tools within a category vary; a “✓” marks the typical case for the category. The SciAgent column reflects v2.0 as released.
| Axis | Coding agents | Multi-agent frameworks | Scientific agents | SciAgent v2.0 |
|---|---|---|---|---|
| Cloud compute | ✗ | Build-it-yourself | ✗ | ✓ via SkyPilot |
| Workspace persistence | ✗ | Build-it-yourself | ✗ | ✓ per-session bucket |
| Containerized scientific services | ✗ | Build-it-yourself | Domain-specific | ✓ 25+ services |
| Durable provenance | ✗ | Build-it-yourself | ✗ | ✓ JSONL v1 |
| Independent verification | ✗ | Build-it-yourself | Varies | ✓ fresh context |
| Cross-LLM auditability | ✗ | ✗ | ✗ | ✓ |
| Background work + checkpointing | Partial | Build-it-yourself | ✗ | ✓ |
| Software engineering | ✓ | ✓ | ✗ | ✓ |
“Build-it-yourself” in the multi-agent column means the framework exposes primitives that can implement the axis; the user supplies the integration. SciAgent ships these axes pre-built.
Feature Comparison by Category
Coding Agents
Tools focused on software engineering tasks: code generation, debugging, refactoring, and repository management.
| Feature | Coding Agents | SciAgent |
|---|---|---|
| Code generation & editing | ✓ All tools | ✓ |
| Repository navigation | ✓ All tools | ✓ |
| Git operations | ✓ All tools | ✓ |
| Autonomous execution | Varies (high in OpenHands, Devin; lower in Cursor) | ✓ |
| Scientific computing | ✗ None | ✓ 25+ containers |
| Cloud compute orchestration | ✗ None | ✓ via SkyPilot (managed jobs + cluster mode) |
| Durable provenance log | ✗ None | ✓ JSONL v1, append-only |
| Fresh-context independent verifier | ✗ None | ✓ reads log + artifacts |
Representative tools: Claude Code [1], Cursor [2], Aider [3], OpenHands [4], SWE-Agent [5], Devin [6]
Coding agents handle code generation, navigation, and git operations. SciAgent does the same and adds containerized scientific environments, cloud compute orchestration, and a durable provenance log.
Multi-Agent Frameworks
Frameworks for building and orchestrating multiple AI agents working together.
| Feature | Multi-Agent Frameworks | SciAgent |
|---|---|---|
| Agent orchestration | ✓ Core capability | ✓ Compute, analyze, verifier, plan, debug, research, explore subagents |
| Custom agent design | ✓ Flexible | Focused, opinionated |
| Provider-agnostic | ✓ Most tools | ✓ Via LiteLLM |
| Scientific computing | ✗ Requires custom setup | ✓ Built-in |
| Pre-built scientific services | ✗ None | ✓ 25+ services |
| Cloud compute orchestration | ✗ Requires custom setup | ✓ via SkyPilot |
| Durable provenance log | ✗ Requires custom setup | ✓ JSONL v1 |
| Background subagents + checkpoint/resume | Varies | ✓ task_index registry, 3-way resume |
Representative tools: AG2 [7], Microsoft AutoGen/Semantic Kernel [8], LangChain/LangGraph [9]
Multi-agent frameworks expose orchestration primitives the user assembles into a runtime. SciAgent ships such a runtime preconfigured for scientific computing, with the cloud, registry, and provenance layers already wired up.
Scientific AI Agents
Domain-specific agents designed for scientific research and discovery.
| Feature | Scientific Agents | SciAgent |
|---|---|---|
| Domain expertise | Typically single domain (chemistry, materials) | Cross-domain (11 areas) |
| Tool count | 5-18 tools | 25+ containerized services |
| Cross-domain pipelines | ✗ Limited | ✓ Full support |
| Software engineering | ✗ Minimal | ✓ Full SWE agent |
| Cloud compute orchestration | ✗ Mostly local or institutional HPC scripts | ✓ via SkyPilot, multi-cloud |
| Durable provenance log | ✗ Mostly transcripts only | ✓ JSONL v1, cross-LLM verifiable |
| Independent fresh-context verifier | Varies; often shares context with executor | ✓ Reads log + artifacts only |
| Workflow scope | Wet-lab synthesis + analysis (Coscientist) | Design, computation, optimization |
Representative tools: ChemCrow [10], Coscientist [11], FORUM-AI [12], Google AI Co-Scientist [13]
Domain-specific scientific agents typically focus on a single domain (chemistry, materials, biology) and ship deep tooling for it. SciAgent covers multiple computational domains, orchestrates cloud compute, and persists a durable record. Its workflow scope is design, computation, and optimization — the AI-for-scientific-computing space, alongside tools like Dyad (JuliaHub) connecting simulation and CAE.
Key Differentiators
1. Closed audit loop: durable provenance + fresh-context verifier
LLM-driven scientific work can produce plausible-looking but fabricated results. SciAgent’s closed loop guards against this: every relevant event is appended to a durable per-session JSONL log, and an independent verifier with fresh context reads the log and artifacts to validate the claims.
Task Execution
│
▼
DATA GATE → Verify HTTP fetches, detect HTML/error pages, validate CSV structure
│
▼
EXEC GATE → Verify commands ran, check exit codes
│
▼
LLM VERIFY → Independent verifier subagent
· fresh context (no prior reasoning)
· reads provenance log (JSONL v1)
· cross-LLM friendly (audit a session you didn't run)
· adversarial default verdict: "insufficient"
│
▼
[ Provenance log — append-only JSONL ]
tool_call · tool_result · compute_job_launched · compute_job_status_changed
artifact_produced · verification_result · correction
Two properties together:
- Durable. The log is an append-only event stream you can replay. A different model in a different process can read it and reach the same verdict. Per-line cap 16 KB; per-field cap 4 KB; thread-safe via
fcntl.flock. - Cross-LLM. The verifier reads only the log + artifacts, so the executor can run on Claude Sonnet and the verifier on GPT-4 (or vice versa). Verification doesn’t share priors with execution.
See Provenance Log Schema for the v1 schema; Cloud Compute and Task Orchestration for what gets logged.
2. Cloud-native compute via SkyPilot
compute_run provisions a cluster, runs the job, persists outputs to a per-session cloud bucket (<cloud>://sciagent-workspace-<sid>/), and cleans up. Multi-cloud (AWS, GCP, Azure, Lambda Labs, etc.) via SkyPilot. The agent has tools for the full lifecycle:
compute_run(mode="job"|"cluster", backend="skypilot")— managed jobs (one-shot) or persistent clusters (iterative)compute_exec(cluster_name=...)— follow-up commands on a warm clustercompute_cluster(action="status"|"stop"|"start"|"down"|"autostop"|"refresh_mounts"|"wait_for_job"|...)— full lifecyclematerialize,materialize_workspace— pull outputs back to local
Defaults:
- Stop, not down. End-of-task action is
stop(preserves disk for fast restart in seconds), notdown(destroys cluster). The agent’s prompt enforces this. - Cost gate at $5. When the optimizer’s estimated total exceeds $5, the tool prompts the user with the Sky-optimizer menu before launching. Tool-layer gate; the LLM cannot bypass it. Override via
CloudConfig(commit_threshold_usd=...), envSCIAGENT_COMPUTE_COMMIT_THRESHOLD_USD, or~/.sciagent/config.yaml(compute.commit_threshold_usd). - 1-hour wall-clock cap per job. Default
timeout_sec=3600; the reaper kills clusters whose runtime exceeds this. Per-callcompute_run(timeout_sec=...)overrides; the agent-level default isCloudConfig.default_timeout_sec. - Workspace persistence. The per-session bucket auto-mounts at
/workspace/on every cluster job; outputs survive cluster teardown.
See Cloud Compute for the full guide and the Datacenter CFD case study for an end-to-end example.
3. Task orchestration with checkpoint & resume
A unified registry (task_index) tracks long-running work — cloud jobs and background subagents alike — at ~/.sciagent/tasks/<task_id>.json. Two kinds today (compute_job, subagent); future kinds (watch, scheduled) land additively. The state machine:
pending → running → {completed | failed | cancelled | blocked_produce_missing}
→ {crashed | blocked_resume} ← resumable, subagent-only
Per-iteration checkpoints persist agent state at ~/.sciagent/sessions/<id>/subagents/<task_id>/checkpoint.jsonl. On crash before terminal state, a fresh spawn matched by description hash offers the parent a 3-way resume — skip · use_prior · retry — surfaced as an explicit ask_user so the user sees what crashed and decides.
Long-running scientific workflows (CFD reproducing a paper, GROMACS trajectory analysis, design-space exploration) survive transient failures (server disconnect, network drop, LLM hiccup) without restarting from zero. See Task Orchestration.
4. Cross-domain containerized services
25+ isolated Docker environments registered in services/registry.yaml, spanning eleven scientific areas:
| Domain | Services |
|---|---|
| Math & Optimization | scipy-base, sympy, cvxpy, optuna |
| Chemistry & Materials | rdkit, ase, dwsim |
| Molecular Dynamics | gromacs |
| Photonics & Optics | rcwa, meep, pyoptools |
| CFD & FEM | openfoam, gmsh, elmer |
| Post-processing & Visualisation | paraview |
| Circuits & EDA | ngspice, openroad, iic-osic-tools |
| Quantum Computing | qiskit |
| Bioinformatics | biopython, blast |
| Network Analysis | networkx |
| Scientific ML | sciml-julia |
Cross-domain pipelines work directly — e.g., RDKit → GROMACS → SciPy for molecular design → simulation → analysis. Service inheritance is registry-resolved (extends: chain); adding a new domain means adding a registry entry, with subagent kinds staying generic. See Architecture → Service Registry.
5. Research-first workflow
The use-service skill enforces documentation research before code generation:
- Discovery – Find the right service in the registry (
service_search) - Research – Read official docs and examples (
web,service_detail) - Code – Write using verified API patterns (
file_ops) - Execute – Run in isolated container (
compute_run) - Debug – Search for error solutions if needed (
web,bash)
The same workflow applies across every registered scientific domain. Coscientist [11] uses an analogous research-first pattern, scoped to chemistry.
6. SWE + science combined
| Capability | Pure coding agents | Pure scientific agents | SciAgent |
|---|---|---|---|
| Navigate codebases | ✓ | ✗ | ✓ |
| Debug complex issues | ✓ | ✗ | ✓ |
| Git operations | ✓ | ✗ | ✓ |
| Run simulations | ✗ | ✓ | ✓ |
| Cloud compute lifecycle | ✗ | ✗ | ✓ |
| Cross-domain compute | ✗ | ✗ | ✓ |
| Validate results | ✗ | Varies | ✓ |
| Durable audit trail | ✗ | ✗ | ✓ |
When to Use Each Approach
| Use Case | Recommended Approach |
|---|---|
| Pure software engineering (no scientific computing) | Coding agents (Claude Code, Cursor, Aider, etc.) |
| Custom multi-agent architectures, bespoke topology | Orchestration frameworks (AG2, LangChain) |
| Chemistry with wet-lab synthesis (real robots) | ChemCrow, Coscientist |
| Materials science with institutional HPC clusters | FORUM-AI (institutional) |
| Scientific computing + software engineering | SciAgent |
| Cross-domain scientific pipelines (e.g. design → simulate → analyze) | SciAgent |
| Cloud-scale simulations with auditable provenance | SciAgent |
| Long-running runs that must survive transient failures | SciAgent |
| Cross-LLM verification of computational claims | SciAgent |
References
Coding Agents
[1] Anthropic. “Claude Code.” https://claude.ai/code
[2] Cursor. “The AI Code Editor.” https://cursor.sh
[3] P. Gauthier. “Aider: AI pair programming in your terminal.” https://github.com/paul-gauthier/aider
[4] All-Hands-AI. “OpenHands: Platform for AI software developers.” https://github.com/All-Hands-AI/OpenHands
[5] C. Yang et al. “SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering.” arXiv preprint arXiv:2405.15793, 2024. https://github.com/SWE-agent/SWE-agent
[6] Cognition AI. “Devin: The first AI software engineer.” https://devin.ai
Multi-Agent Frameworks
[7] C. Wang et al. “AG2: Community-driven AutoGen fork.” https://github.com/ag2ai/ag2
[8] Microsoft. “AutoGen: Multi-agent conversation framework.” https://github.com/microsoft/autogen
[9] LangChain. “LangGraph: Build stateful, multi-actor applications.” https://github.com/langchain-ai/langgraph
Scientific AI Agents
[10] A. M. Bran et al. “ChemCrow: Augmenting large language models with chemistry tools.” Nature Machine Intelligence, 6, 525–535, 2024. https://doi.org/10.1038/s42256-024-00832-8
[11] D. A. Boiko et al. “Autonomous chemical research with large language models.” Nature, 624, 570–578, 2023. https://doi.org/10.1038/s41586-023-06792-0
[12] Berkeley Lab. “Berkeley Lab Leads Effort to Build AI Assistant for Energy Materials Discovery (FORUM-AI).” Berkeley Lab News Center, 2026. https://newscenter.lbl.gov/2026/02/03/berkeley-lab-leads-effort-to-build-ai-assistant-for-energy-materials-discovery/
[13] Google Research. “AI Co-Scientist: Accelerating scientific discovery.” 2024.
Levels of Scientific Automation
[14] “Self-Driving Laboratories for Chemistry and Materials Science.” Chemical Reviews, 2024. https://pubs.acs.org/doi/10.1021/acs.chemrev.4c00055
[15] “Autonomous ‘self-driving’ laboratories: a review of technology and policy implications.” Royal Society Open Science, 2025. https://royalsocietypublishing.org/rsos/article/12/7/250646/235354/Autonomous-self-driving-laboratories-a-review-of
[16] “Steering towards safe self-driving laboratories.” Nature Reviews Chemistry, 2025. https://www.nature.com/articles/s41570-025-00747-x
[17] “Performance metrics to unleash the power of self-driving labs in chemistry and materials science.” Nature Communications, 2024. https://www.nature.com/articles/s41467-024-45569-5
[18] Argonne National Laboratory. “Autonomous Discovery.” https://www.anl.gov/autonomous-discovery
Additional Resources
[19] J. M. Zhang et al. “Awesome AI for Science.” https://github.com/ai-boost/awesome-ai-for-science