Tools

SciAgent uses tools to interact with files, run commands, query the web, manage long-running work, and orchestrate cloud compute. The agent automatically selects the right tool for each task.

The full tool registry is created in src/sciagent/tools/registry.py:create_atomic_registry. The lists below group tools by purpose.


Core

file_ops

Read/write/edit files. Supports image input (PNG, JPG, GIF, WebP) — the image is passed to the LLM for visual analysis.

file_ops(command="view", path="src/main.py")
file_ops(command="view", path="src/main.py", start_line=10, end_line=50)
file_ops(command="write", path="hello.py", content="print('Hello!')")
file_ops(command="str_replace", path="config.py", old_str="DEBUG = False", new_str="DEBUG = True")
file_ops(command="view", path="./plots/results.png")  # Image analysis

bash

Execute shell commands with timeout handling. Long-running commands can be backgrounded with background=True.

bash(command="ls -la")
bash(command="npm install", timeout=180)
bash(command="python long_script.py", background=True)  # Returns a job_id

Find files (glob) or content (grep).

search(command="glob", pattern="**/*.py")
search(command="grep", pattern="def main", include="*.py")
search(command="grep", pattern="TODO", path="./src")

web

Search the web (Brave / DuckDuckGo) or fetch page content.

web(command="search", query="Python FastAPI tutorial 2024")
web(command="fetch", url="https://fastapi.tiangolo.com/tutorial/")

todo

Track tasks with status and DAG dependencies. Supports artifact-tracking via produces= and verification via verify=True.

todo(command="add", content="Implement authentication")
todo(command="update", task_id="task_1", status="in_progress")
todo(command="list")

ask_user

Request user input for decisions or confirmations.

ask_user(
    question="Which solver should I use?",
    options=["MEEP (FDTD)", "RCWA (faster for periodic)"],
    context="MEEP is general but slower."
)

skill

Load specialized workflow definitions (SKILL.md). Auto-triggers based on task description.

skill(name="use-service")
skill(name="build-service")
skill(name="code-review")

Compute

Cloud and local container job orchestration. See Cloud Compute for the full guide.

compute_run

Launch a containerized compute job. Background by default; the agent picks SkyPilot or local Docker based on requirements.

compute_run(
    service="openfoam",
    command="bash Allrun",
    mode="cluster",
    backend="skypilot",
    cluster_name="cfd-run-1",
    cpus=4,
    memory_gb=32,
)

Modes: mode="job" (managed job — Sky tears the cluster down) or mode="cluster" (persistent cluster you can iterate against).

Cost gate: when the optimizer’s estimated total exceeds $5.00, the tool prompts the user before launching. Override via CloudConfig(commit_threshold_usd=...), env SCIAGENT_COMPUTE_COMMIT_THRESHOLD_USD, or ~/.sciagent/config.yaml. See Configuration → CloudConfig.

compute_exec

Run a follow-up command on a warm cluster.

compute_exec(cluster_name="cfd-run-1", command="postProcess -func writeCellVolumes")

compute_cluster

Cluster lifecycle, action-dispatched. Default end-of-task is stop (preserves the disk for fast restart), not down (destroys the cluster).

compute_cluster(action="status", cluster_name="cfd-run-1")
compute_cluster(action="wait_until_up", cluster_name="cfd-run-1", timeout=300)
compute_cluster(action="wait_for_job", cluster_name="cfd-run-1", cluster_job_id=2, timeout=1800)
compute_cluster(action="logs", cluster_name="cfd-run-1", cluster_job_id=2, tail_lines=200)
compute_cluster(action="stop", cluster_name="cfd-run-1")
compute_cluster(action="start", cluster_name="cfd-run-1")
compute_cluster(action="autostop", cluster_name="cfd-run-1", idle_minutes=10)
compute_cluster(action="refresh_mounts", cluster_name="cfd-run-1")
compute_cluster(action="down", cluster_name="cfd-run-1")  # Destructive

materialize

Pull cloud outputs to local. Cloud-agnostic (s3://, gs://, az://, r2://, oci://).

materialize(uri="s3://sciagent-workspace-abc/run-001/fields/", dest="./_outputs/fields/")
materialize(uri="s3://sciagent-workspace-abc/run-001/", list_only=True)

materialize_workspace

Pull (a slice of) the per-session workspace bucket to local. Pairs with the /workspace/ auto-mount on cluster jobs.

materialize_workspace(subpath="run-001/derived/", dest="./_outputs/derived/")
materialize_workspace(list_only=True)

Task & background management

The task_* tools view the cross-kind in-flight registry. The bg_* tools own the per-cloud-job runtime surface (Sky-side status, log streaming, kill). See Task Orchestration.

task_list

Enumerate tracked tasks across kinds (compute_job, subagent) and states.

task_list()                                         # everything
task_list(kind="compute_job", state="running")
task_list(kind="subagent", session_id="abc12345")

task_get

Inspect a single task’s full manifest.

task_get("sciagent-abc123")

task_wait

Block until a task reaches a terminal state. Kind-agnostic.

task_wait("sciagent-abc123", timeout=1800, poll_interval=5)

bg_status

Sky-side cloud-job status joined with sciagent’s local manifest.

bg_status()                          # all jobs
bg_status(job_id="sciagent-abc123")  # one job

bg_output

Stream output from a cloud job.

bg_output(job_id="sciagent-abc123", tail_lines=200)

bg_wait

Block until a cloud job reaches a terminal state.

bg_wait(job_id="sciagent-abc123", timeout=1800)

bg_kill

Cancel a running cloud job.

bg_kill(job_id="sciagent-abc123")

Service discovery

Case-insensitive keyword search across the service registry (src/sciagent/services/registry.yaml). Cheaper than reading the registry file directly.

service_search(keyword="openfoam")
service_search(keyword="bioinformatics")

service_detail

Full details for a service — Dockerfile path, example, extends-chain.

service_detail(service="openfoam")

Monitoring

monitor

Spawn a watcher on a long-running subprocess. Each stdout line becomes an event delivered as a <system-reminder> on the next agent turn — no LLM round-trip per event.

monitor(command="tail -f solver.log", description="OpenFOAM solver progress")

monitor_stop

Terminate a watcher.

monitor_stop(watcher_id="...")

Verification

verify_session

Snapshot read of a session’s durable provenance log; produces a structured verification report. Cross-LLM friendly — any provider can audit a session it didn’t run. See Provenance Log Schema.

verify_session(session_id="abc12345")

One-shot, non-blocking, snapshot semantics. There is no wait= or polling — invoke again to see a fresh snapshot.


Creating custom tools

See Configuration → Custom Tools for adding your own tools to the registry.