Skip to main content

Runtime Dashboard

The context_visualizer package provides a lightweight Flask web application that lets you inspect and manage a live Chimaera runtime cluster from your browser. It connects to the runtime using the same client API used by application code and surfaces cluster topology, per-node worker statistics, system resource utilization, block device stats, pool configuration, and the active YAML config.

Prerequisites

  • IOWarp installed with Python support (WRP_CORE_ENABLE_PYTHON=ON)
  • A running Chimaera runtime (chimaera runtime start)
  • Python dependencies: flask, pyyaml, msgpack

Install the Python dependencies with any of:

pip install flask pyyaml msgpack
# or
pip install iowarp-core[visualizer]
# or (conda)
conda install flask pyyaml python-msgpack

Starting the Dashboard

python -m context_visualizer

Then open http://127.0.0.1:5000 in your browser.

CLI Options

FlagDefaultDescription
--host127.0.0.1Bind address. Use 0.0.0.0 to expose on all interfaces.
--port5000Listen port.
--debug(off)Enable Flask debug mode (auto-reload, verbose errors).
# Expose on all interfaces, non-default port
python -m context_visualizer --host 0.0.0.0 --port 8080

# Debug mode (development only)
python -m context_visualizer --debug

Pages

Topology (/)

The landing page shows a live grid of all nodes in the cluster. Each node card displays:

  • Hostname and IP address
  • Status badge (alive)
  • CPU, RAM, and GPU utilization bars (GPU shown only when GPUs are present)
  • Restart and Shutdown action buttons

The search bar supports filtering by node ID (single 3, range 1-20, comma-separated 1,3,5) or by hostname/IP substring.

Clicking a node card navigates to the per-node detail page.

Node Detail (/node/<id>)

A per-node drilldown page showing:

  • Worker statistics — per-worker queue depth, blocked tasks, processed count, and more
  • System stats — time-series CPU, RAM, GPU, and HBM utilization
  • Block device stats — per-bdev pool throughput and capacity

Pools (/pools)

Lists all pools defined in the compose section of the active configuration file:

ColumnDescription
ModuleChiMod shared-library name (mod_name)
Pool NameUser-defined pool name
Pool IDUnique pool identifier
QueryRouting policy (local, dynamic, broadcast)

Config (/config)

Displays the full contents of the active YAML configuration file as formatted JSON, for quick inspection without opening a terminal.

REST API

All pages are backed by a JSON API. You can query these endpoints directly for scripting or integration with other monitoring tools.

Cluster-wide

EndpointMethodDescription
/api/topologyGETList all nodes with hostname, IP, CPU/RAM/GPU utilization
/api/systemGETHigh-level system overview (connected, worker/queue/blocked/processed counts)
/api/workersGETPer-worker stats plus a fleet summary (local node)
/api/poolsGETPool list from the compose section of the config
/api/configGETFull active configuration as JSON

Per-node

EndpointMethodDescription
/api/node/<id>/workersGETWorker stats for a specific node
/api/node/<id>/system_statsGETSystem resource utilization entries for a specific node
/api/node/<id>/bdev_statsGETBlock device stats for a specific node

Node Management

EndpointMethodDescription
/api/topology/node/<id>/shutdownPOSTGracefully shut down a node via SSH
/api/topology/node/<id>/restartPOSTRestart a node via SSH

Shutdown and restart are performed by SSHing from the dashboard host to the target node and running chimaera runtime stop or chimaera runtime restart. This avoids the problem of a node killing itself mid-RPC. The SSH connection uses StrictHostKeyChecking=no and ConnectTimeout=5.

Shutdown response:

{
"success": true,
"returncode": 0,
"stdout": "",
"stderr": ""
}

Exit codes 0 and 134 (SIGABRT from std::abort() in InitiateShutdown) are both treated as success.

Restart uses nohup so the SSH session returns immediately while the node restarts in the background.

All endpoints return Content-Type: application/json. On error they return an appropriate HTTP status code (e.g., 503 if the runtime is unreachable, 404 if a node is not found) with an "error" field in the response body.

Examples

# Get cluster topology
curl http://127.0.0.1:5000/api/topology

# Get system overview
curl http://127.0.0.1:5000/api/system

# Get worker stats for node 2
curl http://127.0.0.1:5000/api/node/2/workers

# Shut down node 3
curl -X POST http://127.0.0.1:5000/api/topology/node/3/shutdown

# Restart node 3
curl -X POST http://127.0.0.1:5000/api/topology/node/3/restart

Configuration File Discovery

The dashboard reads the same config file as the runtime, using the same search order:

SourcePriority
CHI_SERVER_CONF environment variable1st
WRP_RUNTIME_CONF environment variable2nd
~/.chimaera/chimaera.yaml3rd

See Configuration for details on the config file format.

Connection Lifecycle

The dashboard connects to the runtime lazily — on the first request that needs live data. If the runtime is not yet running when the dashboard starts, it will show a disconnected state and retry on subsequent requests. Shutdown is handled automatically via atexit so the client is finalized cleanly when the server process exits.

Docker / Remote Access

When running the runtime inside Docker or on a remote host, bind the dashboard to all interfaces and forward the port:

# On the host running the runtime
python -m context_visualizer --host 0.0.0.0 --port 5000
# docker-compose.yml — expose the dashboard port alongside the runtime
services:
iowarp:
image: iowarp/deploy-cpu:latest
ports:
- "9413:9413" # Chimaera RPC
- "5000:5000" # Dashboard
command: >
bash -c "chimaera runtime start &
python -m context_visualizer --host 0.0.0.0"
warning

The dashboard has no authentication. Do not expose it on a public network without a reverse proxy that enforces access control.

Try It: Interactive Docker Cluster

An interactive test environment is provided that spins up a 4-node Chimaera cluster with the dashboard so you can explore all features from your browser.

Location

context-runtime/test/integration/interactive/
├── docker-compose.yml # 4-node runtime cluster
├── hostfile # Node IP addresses (172.28.0.10-13)
├── wrp_conf.yaml # Runtime configuration
└── run.sh # Launcher script

How It Works

  • 4 Docker containers (iowarp-interactive-node1 through node4) run the Chimaera runtime on a private 172.28.0.0/16 network, each with sshd for SSH-based shutdown/restart
  • Node 1 also runs the dashboard alongside its runtime
  • The script connects the devcontainer to the Docker network and starts a local port-forward so that localhost:5000 reaches the dashboard inside Docker — VS Code then auto-forwards this to your host browser
  • SSH keys are distributed via a shared Docker volume so the dashboard can authenticate to all nodes

Running

cd context-runtime/test/integration/interactive

# Foreground (Ctrl-C to stop)
bash run.sh

# Or run in the background
bash run.sh start

# Follow runtime container logs
bash run.sh logs

# Stop everything (cluster + dashboard)
bash run.sh stop

Once the cluster is up (~15 seconds), open http://localhost:5000 to browse the topology, click into individual nodes, and use the Restart/Shutdown buttons.

If running from a devcontainer or a host where the workspace is at a different path, set HOST_WORKSPACE:

HOST_WORKSPACE=/host/path/to/workspace bash run.sh