Local Scheduler Guide

Overview

The CLIO Runtime uses a pluggable scheduler architecture to control how tasks are mapped to workers and how workers are organized. Scheduling happens at two levels: container-level scheduling resolves what to execute (via Container::ScheduleTask), and the Scheduler decides where to execute it (which worker thread). This document explains both levels and how to build custom schedulers.

Architecture Overview
Two-Level Scheduling
Scheduler Interface
Worker Lifecycle
Implementing a Custom Scheduler
DefaultScheduler Example
Best Practices
Integration Points

Architecture Overview

Component Responsibilities

The CLIO Runtime separates concerns across three main components:

ConfigManager: Manages configuration (number of threads, queue depth, etc.)
WorkOrchestrator: Creates workers, spawns threads, assigns lanes to workers (1:1 mapping for all workers)
Scheduler: Decides worker partitioning, task-to-worker mapping, and load balancing
IpcManager: Manages shared memory, queues, and provides task routing infrastructure (RouteTask, RouteLocal, RouteGlobal)
Container: Provides per-pool dynamic scheduling via ScheduleTask

Data Flow

┌─────────────────┐
│  ConfigManager  │──→ num_threads, queue_depth
└─────────────────┘
         │
         ↓
┌─────────────────┐
│ WorkOrchestrator│──→ Creates num_threads + 1 workers
└─────────────────┘
         │
         ↓
┌─────────────────┐
│   Scheduler     │──→ Tracks worker groups for routing decisions
└─────────────────┘    Updates IpcManager with scheduler queue count
         │
         ↓
┌─────────────────┐
│ WorkOrchestrator│──→ Maps ALL workers to lanes (1:1 mapping)
└─────────────────┘    Spawns OS threads for each worker
         │
         ↓
┌─────────────────┐
│   IpcManager    │──→ num_sched_queues used for client task mapping
└─────────────────┘

Task Routing Flow

When a task is submitted, IpcManager::RouteTask orchestrates the full routing pipeline:

RouteTask(future, force_enqueue)
  │
  ├─ 1. Container::ScheduleTask()    → resolve Dynamic pool query
  │                                     (e.g., Dynamic → DirectHash)
  │
  ├─ 2. ResolvePoolQuery()           → resolve to physical node(s)
  │
  ├─ 3. IsTaskLocal()                → local or remote?
  │     │
  │     ├─ local:  RouteLocal()      → resolve container, pick worker
  │     │           │
  │     │           ├─ RuntimeMapTask()  → scheduler picks dest worker
  │     │           │
  │     │           ├─ dest == current && !force_enqueue → execute directly
  │     │           └─ otherwise → enqueue to dest worker's lane
  │     │
  │     └─ remote: RouteGlobal()     → enqueue to net_queue_ for SendIn
  │
  └─ return true if task can be executed by current worker

Two-Level Scheduling

Level 1: Container Scheduling (`ScheduleTask`)

Before the Scheduler decides which worker runs a task, the container decides which container instance handles it. This is the Container::ScheduleTask virtual method.

class Container {
public:
  virtual PoolQuery ScheduleTask(const hipc::FullPtr<Task> &task) {
    return task->pool_query_;  // Default: no transformation
  }
};

Purpose: Transform a high-level PoolQuery (often Dynamic) into a concrete routing mode (DirectHash, Local, Broadcast, etc.) based on the task's semantics.

Example: A distributed filesystem container might schedule read tasks to the node holding the target block:

PoolQuery MyFsContainer::ScheduleTask(const hipc::FullPtr<Task> &task) {
  if (task->method_ == kRead || task->method_ == kWrite) {
    // Route to the container holding the target block
    u64 block_id = GetBlockId(task);
    return PoolQuery::DirectHash(block_id);
  }
  // Metadata ops stay local
  return PoolQuery::Local();
}

ScheduleTask is called by RouteTask before pool query resolution. The returned PoolQuery then goes through ResolvePoolQuery to determine concrete physical nodes and containers.

Level 2: Worker Scheduling (`RuntimeMapTask`)

After container and node resolution, RouteLocal calls Scheduler::RuntimeMapTask to pick the specific worker thread. This is where I/O-size routing, task group affinity, and network worker pinning happen.

Scheduler Interface

All schedulers must inherit from the Scheduler base class and implement the following methods:

Required Methods

class Scheduler {
public:
  virtual ~Scheduler() = default;

  // Partition workers into groups after WorkOrchestrator creates them
  virtual void DivideWorkers(WorkOrchestrator *work_orch) = 0;

  // Map tasks from clients to worker lanes
  virtual u32 ClientMapTask(IpcManager *ipc_manager, const Future<Task> &task) = 0;

  // Map tasks from runtime workers to other workers
  virtual u32 RuntimeMapTask(Worker *worker, const Future<Task> &task,
                             Container *container) = 0;

  // Rebalance load across workers (called periodically by workers)
  virtual void RebalanceWorker(Worker *worker) = 0;

  // Adjust polling intervals for periodic tasks
  virtual void AdjustPolling(RunContext *run_ctx) = 0;

  // Get designated GPU worker (optional)
  virtual Worker *GetGpuWorker() const { return nullptr; }

  // Get designated network worker (optional)
  virtual Worker *GetNetWorker() const { return nullptr; }
};

Method Details

`DivideWorkers(WorkOrchestrator *work_orch)`

Purpose: Partition workers into functional groups after they've been created.

Called: Once during initialization, after WorkOrchestrator creates all workers but before threads are spawned.

Responsibilities:

Access workers via work_orch->GetWorker(worker_id)
Organize workers into scheduler-specific groups (e.g., scheduler worker, I/O workers, network worker, GPU worker)
Update IpcManager with the scheduling queue count via IpcManager::SetNumSchedQueues()
Set the network lane via IpcManager::SetNetLane() so the runtime knows where to enqueue network tasks

Important: All workers are assigned lanes by WorkOrchestrator::SpawnWorkerThreads() using 1:1 mapping. The scheduler does NOT control lane assignment — it only tracks worker groups for routing decisions.

Example:

void MyScheduler::DivideWorkers(WorkOrchestrator *work_orch) {
  u32 total_workers = work_orch->GetTotalWorkerCount();

  // Worker 0: scheduler worker (metadata + small I/O)
  scheduler_worker_ = work_orch->GetWorker(0);

  // Workers 1..N-2: I/O workers (large I/O round-robin)
  for (u32 i = 1; i < total_workers - 1; ++i) {
    io_workers_.push_back(work_orch->GetWorker(i));
  }

  // Worker N-1: network worker
  net_worker_ = work_orch->GetWorker(total_workers - 1);

  // IMPORTANT: Update IpcManager
  IpcManager *ipc = CHI_IPC;
  if (ipc) {
    ipc->SetNumSchedQueues(1);  // Client tasks go to scheduler worker
    if (net_worker_) {
      ipc->SetNetLane(net_worker_->GetLane());
    }
  }
}

`ClientMapTask(IpcManager *ipc_manager, const Future<Task> &task)`

Purpose: Determine which worker lane a task from a client should be assigned to.

Called: When clients submit tasks to the runtime via SendRuntimeClient.

Responsibilities:

Return a lane ID in range [0, num_sched_queues)
Use ipc_manager->GetNumSchedQueues() to get valid lane count
Route special tasks (e.g., network Send/Recv) to the appropriate worker
Common strategies: PID+TID hash, round-robin, locality-aware

Example:

u32 MyScheduler::ClientMapTask(IpcManager *ipc_manager, const Future<Task> &task) {
  u32 num_lanes = ipc_manager->GetNumSchedQueues();
  if (num_lanes == 0) return 0;

  // Network tasks (Send/Recv from admin pool) → network worker
  Task *task_ptr = task.get();
  if (task_ptr != nullptr && task_ptr->pool_id_ == chi::kAdminPoolId) {
    u32 method_id = task_ptr->method_;
    if (method_id == 14 || method_id == 15 ||
        method_id == 20 || method_id == 21) {
      return num_lanes - 1;
    }
  }

  // Default: scheduler worker (lane 0)
  return 0;
}

`RuntimeMapTask(Worker worker, const Future<Task> &task, Container container)`

Purpose: Determine which worker should execute a task when routing from within the runtime.

Called: By IpcManager::RouteLocal() after the execution container has been resolved. If the returned worker ID differs from the current worker (or if force_enqueue is set), the task is enqueued to the destination worker's lane.

Parameters:

worker: The current worker (may be nullptr if called from a non-worker thread)
task: The task being routed
container: The resolved execution container. Used for task-group affinity lookups. May be nullptr when called without a resolved container.

Responsibilities:

Return a worker ID for task execution
Route periodic network tasks (Send/Recv) to the dedicated network worker
Implement I/O-size-based routing (large I/O → dedicated workers)
Implement task group affinity (pin related tasks to the same worker)

Example:

u32 MyScheduler::RuntimeMapTask(Worker *worker, const Future<Task> &task,
                                Container *container) {
  Task *task_ptr = task.get();

  // Task group affinity: if this task's group is already pinned, use that worker
  if (container != nullptr && task_ptr != nullptr &&
      !task_ptr->task_group_.IsNull()) {
    int64_t group_id = task_ptr->task_group_.id_;
    ScopedCoRwReadLock read_lock(container->task_group_lock_);
    auto it = container->task_group_map_.find(group_id);
    if (it != container->task_group_map_.end() && it->second != nullptr) {
      return it->second->GetId();
    }
  }

  // Periodic network tasks → network worker
  if (task_ptr != nullptr && task_ptr->IsPeriodic()) {
    if (task_ptr->pool_id_ == chi::kAdminPoolId) {
      u32 method_id = task_ptr->method_;
      if (method_id == 14 || method_id == 15) {
        if (net_worker_ != nullptr) {
          return net_worker_->GetId();
        }
      }
    }
  }

  // Large I/O → round-robin across I/O workers
  if (task_ptr != nullptr && !io_workers_.empty()) {
    if (task_ptr->stat_.io_size_ >= kLargeIOThreshold) {
      u32 idx = next_io_idx_.fetch_add(1, std::memory_order_relaxed)
                % static_cast<u32>(io_workers_.size());
      return io_workers_[idx]->GetId();
    }
  }

  // Small I/O / metadata → scheduler worker
  if (scheduler_worker_ != nullptr) {
    return scheduler_worker_->GetId();
  }

  return worker ? worker->GetId() : 0;
}

`RebalanceWorker(Worker *worker)`

Purpose: Balance load across workers by stealing or delegating tasks.

Called: Periodically by workers after processing tasks.

Responsibilities:

Implement work stealing algorithms
Migrate tasks between workers
Optional — can be a no-op for simple schedulers

Example:

void MyScheduler::RebalanceWorker(Worker *worker) {
  // Simple schedulers can leave this empty
  (void)worker;
}

`AdjustPolling(RunContext *run_ctx)`

Purpose: Adjust polling intervals for periodic tasks based on work done.

Called: After each execution of a periodic task.

Responsibilities:

Modify run_ctx->yield_time_us_ based on run_ctx->did_work_
Implement adaptive polling (exponential backoff when idle)
Reduce CPU usage for idle periodic tasks
Important: co_await on Futures sets yield_time_us_ to 0, so this method must restore it to prevent busy-looping

Example:

void MyScheduler::AdjustPolling(RunContext *run_ctx) {
  if (!run_ctx) return;

  const double kMaxPollingIntervalUs = 100000.0; // 100ms

  if (run_ctx->did_work_) {
    // Reset to original period when work is done
    run_ctx->yield_time_us_ = run_ctx->true_period_ns_ / 1000.0;
  } else {
    // Exponential backoff when idle
    double current = run_ctx->yield_time_us_;
    if (current <= 0.0) {
      current = run_ctx->true_period_ns_ / 1000.0;
    }
    run_ctx->yield_time_us_ = std::min(current * 2.0, kMaxPollingIntervalUs);
  }
}

Worker Lifecycle

Understanding the worker lifecycle is crucial for scheduler implementation:

1. ConfigManager loads configuration (num_threads, queue_depth)
   ↓
2. WorkOrchestrator::Init()
   - Creates num_threads + 1 workers
   - Calls Scheduler::DivideWorkers()
   ↓
3. Scheduler::DivideWorkers()
   - Tracks workers into functional groups (scheduler, I/O, network, GPU)
   - Updates IpcManager::SetNumSchedQueues()
   - Sets network lane via IpcManager::SetNetLane()
   ↓
4. WorkOrchestrator::StartWorkers()
   - Calls SpawnWorkerThreads()
   - Maps ALL workers to lanes (1:1 mapping: worker i → lane i)
   - Spawns actual OS threads
   ↓
5. Workers run task processing loops
   - ProcessNewTask: pop futures from lane, route via RouteTask()
     - RouteTask calls Container::ScheduleTask() for dynamic resolution
     - RouteLocal calls Scheduler::RuntimeMapTask() for worker selection
     - If dest != current worker (or force_enqueue), re-enqueue to dest lane
   - Call Scheduler::RebalanceWorker() periodically
   - Call Scheduler::AdjustPolling() after periodic task execution

Implementing a Custom Scheduler

Step 1: Create Header File

Create context-runtime/include/chimaera/scheduler/my_scheduler.h:

#ifndef CHIMAERA_INCLUDE_CHIMAERA_SCHEDULER_MY_SCHEDULER_H_
#define CHIMAERA_INCLUDE_CHIMAERA_SCHEDULER_MY_SCHEDULER_H_

#include <atomic>
#include <vector>
#include "chimaera/scheduler/scheduler.h"

namespace chi {

class MyScheduler : public Scheduler {
public:
  MyScheduler() : scheduler_worker_(nullptr), net_worker_(nullptr),
                  gpu_worker_(nullptr), next_io_idx_{0} {}
  ~MyScheduler() override = default;

  void DivideWorkers(WorkOrchestrator *work_orch) override;
  u32 ClientMapTask(IpcManager *ipc_manager, const Future<Task> &task) override;
  u32 RuntimeMapTask(Worker *worker, const Future<Task> &task,
                     Container *container) override;
  void RebalanceWorker(Worker *worker) override;
  void AdjustPolling(RunContext *run_ctx) override;
  Worker *GetGpuWorker() const override { return gpu_worker_; }
  Worker *GetNetWorker() const override { return net_worker_; }

private:
  Worker *scheduler_worker_;
  std::vector<Worker*> io_workers_;
  Worker *net_worker_;
  Worker *gpu_worker_;
  std::atomic<u32> next_io_idx_{0};
};

}  // namespace chi

#endif  // CHIMAERA_INCLUDE_CHIMAERA_SCHEDULER_MY_SCHEDULER_H_

Step 2: Implement Methods

Create context-runtime/src/scheduler/my_scheduler.cc:

#include "chimaera/scheduler/my_scheduler.h"
#include "chimaera/config_manager.h"
#include "chimaera/ipc_manager.h"
#include "chimaera/work_orchestrator.h"
#include "chimaera/worker.h"

namespace chi {

void MyScheduler::DivideWorkers(WorkOrchestrator *work_orch) {
  if (!work_orch) return;

  u32 total_workers = work_orch->GetTotalWorkerCount();

  scheduler_worker_ = work_orch->GetWorker(0);
  net_worker_ = work_orch->GetWorker(total_workers - 1);

  if (total_workers > 2) {
    gpu_worker_ = work_orch->GetWorker(total_workers - 2);
    for (u32 i = 1; i < total_workers - 1; ++i) {
      io_workers_.push_back(work_orch->GetWorker(i));
    }
  }

  IpcManager *ipc = CHI_IPC;
  if (ipc) {
    ipc->SetNumSchedQueues(1);
    if (net_worker_) {
      ipc->SetNetLane(net_worker_->GetLane());
    }
  }
}

u32 MyScheduler::ClientMapTask(IpcManager *ipc_manager, const Future<Task> &task) {
  u32 num_lanes = ipc_manager->GetNumSchedQueues();
  if (num_lanes == 0) return 0;
  return 0;  // All client tasks → scheduler worker
}

u32 MyScheduler::RuntimeMapTask(Worker *worker, const Future<Task> &task,
                                Container *container) {
  Task *task_ptr = task.get();

  // Task group affinity
  if (container != nullptr && task_ptr != nullptr &&
      !task_ptr->task_group_.IsNull()) {
    int64_t group_id = task_ptr->task_group_.id_;
    ScopedCoRwReadLock read_lock(container->task_group_lock_);
    auto it = container->task_group_map_.find(group_id);
    if (it != container->task_group_map_.end() && it->second != nullptr) {
      return it->second->GetId();
    }
  }

  // Periodic network tasks → network worker
  if (task_ptr != nullptr && task_ptr->IsPeriodic() &&
      task_ptr->pool_id_ == chi::kAdminPoolId) {
    u32 m = task_ptr->method_;
    if ((m == 14 || m == 15 || m == 20 || m == 21) && net_worker_) {
      return net_worker_->GetId();
    }
  }

  // Large I/O → round-robin across I/O workers
  if (task_ptr != nullptr && !io_workers_.empty() &&
      task_ptr->stat_.io_size_ >= 4096) {
    u32 idx = next_io_idx_.fetch_add(1, std::memory_order_relaxed)
              % static_cast<u32>(io_workers_.size());
    return io_workers_[idx]->GetId();
  }

  // Default → scheduler worker
  if (scheduler_worker_) return scheduler_worker_->GetId();
  return worker ? worker->GetId() : 0;
}

void MyScheduler::RebalanceWorker(Worker *worker) { (void)worker; }

void MyScheduler::AdjustPolling(RunContext *run_ctx) {
  if (!run_ctx) return;
  run_ctx->yield_time_us_ = run_ctx->true_period_ns_ / 1000.0;
}

}  // namespace chi

Step 3: Register Scheduler

Update context-runtime/src/ipc_manager.cc to create your scheduler:

bool IpcManager::ServerInit() {
  // ... existing initialization code ...

  // Create scheduler based on configuration
  ConfigManager *config = CHI_CONFIG_MANAGER;
  std::string sched_name = config->GetLocalSched();

  if (sched_name == "my_scheduler") {
    scheduler_ = new MyScheduler();
  } else if (sched_name == "default") {
    scheduler_ = new DefaultScheduler();
  } else {
    HLOG(kError, "Unknown scheduler: {}", sched_name);
    return false;
  }

  return true;
}

Step 4: Configure

Update your configuration file to use the new scheduler:

runtime:
  local_sched: "my_scheduler"
  num_threads: 4
  queue_depth: 1024

DefaultScheduler Example

The DefaultScheduler provides a reference implementation with I/O-size-based routing and task group affinity.

Worker Partitioning

Worker 0: Scheduler worker — handles metadata and small I/O (< 4KB)
Workers 1..N-2: I/O workers — handle large I/O via round-robin
Worker N-2: Also serves as the GPU worker (polls GPU queues)
Worker N-1: Network worker — handles all Send/Recv/ClientSend/ClientRecv tasks
SetNumSchedQueues(1) — client tasks all go to the scheduler worker initially

Dynamic Scheduling via `ScheduleTask`

The DefaultScheduler works hand-in-hand with Container::ScheduleTask. When a task arrives with a Dynamic pool query, the container's ScheduleTask resolves it before RuntimeMapTask picks the worker:

Task submitted with PoolQuery::Dynamic()
RouteTask calls container->ScheduleTask(task) → returns e.g. DirectHash(block_id)
ResolvePoolQuery resolves DirectHash to a physical node + container
IsTaskLocal checks if the target is this node
RouteLocal calls RuntimeMapTask to pick the worker

This two-level design means containers control what gets executed where, while the scheduler controls how workers are utilized.

Task Group Affinity

The DefaultScheduler implements task group affinity to pin related tasks to the same worker. Each Container maintains a task_group_map_ mapping group IDs to workers:

When RuntimeMapTask sees a task with a non-null task_group_, it checks the container's task_group_map_
If the group is already mapped to a worker, the task goes to that worker
If not, normal routing selects a worker and records the mapping

This ensures tasks in the same group (e.g., operations on the same file handle) execute on the same worker, avoiding lock contention and improving cache locality.

I/O-Size Routing

Tasks with stat_.io_size_ >= 4096 → round-robin across I/O workers
Tasks with stat_.io_size_ < 4096 → scheduler worker (worker 0)
Network tasks (admin pool methods 14, 15, 20, 21) → network worker

Force Enqueue

RouteTask accepts a force_enqueue parameter (default false). When true, RouteLocal always enqueues the task to the destination worker's lane, even if the destination is the current worker. This is used by SendRuntime (non-worker thread path) which cannot execute tasks directly and must always enqueue.

Code Reference

See implementation in:

Header: context-runtime/include/chimaera/scheduler/default_sched.h
Implementation: context-runtime/src/scheduler/default_sched.cc

Best Practices

1. Always Update IpcManager in DivideWorkers

void MyScheduler::DivideWorkers(WorkOrchestrator *work_orch) {
  // ... partition workers ...

  IpcManager *ipc = CHI_IPC;
  if (ipc) {
    ipc->SetNumSchedQueues(num_client_lanes);
    if (net_worker_) {
      ipc->SetNetLane(net_worker_->GetLane());
    }
  }
}

Why: Clients use GetNumSchedQueues() to map tasks to lanes. If this doesn't match the actual number of workers/lanes, tasks will be mapped to non-existent or wrong workers.

2. Route Network Tasks to the Network Worker

Both ClientMapTask and RuntimeMapTask should route Send/Recv tasks (methods 14/15/20/21 from admin pool) to the dedicated network worker (last worker). This prevents network I/O from blocking task processing workers.

3. Implement Task Group Affinity

Use container->task_group_map_ to pin related tasks to the same worker. Protect map access with container->task_group_lock_ (read lock for lookups, write lock for updates).

4. Validate Lane IDs

u32 MyScheduler::ClientMapTask(IpcManager *ipc_manager, const Future<Task> &task) {
  u32 num_lanes = ipc_manager->GetNumSchedQueues();
  if (num_lanes == 0) return 0;

  u32 lane = ComputeLane(...);
  return lane % num_lanes;  // Ensure lane is in valid range
}

5. Handle Null Pointers

void MyScheduler::DivideWorkers(WorkOrchestrator *work_orch) {
  if (!work_orch) return;
  // ... proceed ...
}

u32 MyScheduler::RuntimeMapTask(Worker *worker, const Future<Task> &task,
                                Container *container) {
  return worker ? worker->GetId() : 0;
}

6. Consider Thread Safety

If your scheduler maintains shared state accessed by multiple workers:

Use atomic operations for counters (e.g., next_io_idx_)
Use CoRwLock for complex data structures (e.g., task_group_map_)
Prefer lock-free designs when possible

7. Test with Different Configurations

Test your scheduler with various num_threads values:

Single thread (num_threads = 1): single worker serves dual role
Small (num_threads = 2-4)
Large (num_threads = 16+)

Integration Points

Singletons and Macros

Access runtime components via global macros:

// Configuration
ConfigManager *config = CHI_CONFIG_MANAGER;
u32 num_threads = config->GetNumThreads();

// IPC Manager
IpcManager *ipc = CHI_IPC;
u32 num_lanes = ipc->GetNumSchedQueues();

// System Info
auto *sys_info = CTP_SYSTEM_INFO;
pid_t pid = sys_info->pid_;

// Thread Model
auto tid = CTP_THREAD_MODEL->GetTid();

Worker Access

Access workers through WorkOrchestrator:

u32 total_workers = work_orch->GetTotalWorkerCount();
Worker *worker = work_orch->GetWorker(worker_id);

// Get worker properties
u32 id = worker->GetId();
TaskLane *lane = worker->GetLane();

Container Access

Containers expose scheduling-related state:

// Task group affinity map (per container)
container->task_group_map_    // std::unordered_map<int64_t, Worker*>
container->task_group_lock_   // CoRwLock protecting the map

// Override ScheduleTask for custom dynamic scheduling
PoolQuery MyContainer::ScheduleTask(const hipc::FullPtr<Task> &task) {
  // Transform Dynamic queries into concrete routing modes
  return PoolQuery::DirectHash(ComputeHash(task));
}

Logging

Use Hermes logging macros:

HLOG(kInfo, "Scheduler initialized with {} workers", num_workers);
HLOG(kDebug, "Mapping task to lane {}", lane_id);
HLOG(kWarning, "Worker {} has empty queue", worker_id);
HLOG(kError, "Invalid configuration: {}", error_msg);

Configuration Access

Read configuration values:

ConfigManager *config = CHI_CONFIG_MANAGER;
u32 num_threads = config->GetNumThreads();
u32 queue_depth = config->GetQueueDepth();
std::string sched_name = config->GetLocalSched();

Advanced Topics

Custom Container Scheduling

Override ScheduleTask to implement application-specific routing:

class DistributedKVStore : public Container {
public:
  PoolQuery ScheduleTask(const hipc::FullPtr<Task> &task) override {
    switch (task->method_) {
      case kGet:
      case kPut:
      case kDelete: {
        // Route key-value ops to the node owning the key's hash partition
        u64 key_hash = ExtractKeyHash(task);
        return PoolQuery::DirectHash(key_hash);
      }
      case kScan:
        // Range scans may span multiple nodes
        return PoolQuery::Range(ExtractStartKey(task), ExtractEndKey(task));
      case kCreateIndex:
        // Metadata ops broadcast to all nodes
        return PoolQuery::Broadcast();
      default:
        return PoolQuery::Local();
    }
  }
};

Work Stealing

Implement work stealing in RebalanceWorker:

void MyScheduler::RebalanceWorker(Worker *worker) {
  TaskLane *my_lane = worker->GetLane();
  if (my_lane->Empty()) {
    for (Worker *victim : io_workers_) {
      if (victim == worker) continue;

      TaskLane *victim_lane = victim->GetLane();
      if (!victim_lane->Empty()) {
        Future<Task> stolen_task;
        if (victim_lane->Pop(stolen_task)) {
          my_lane->Push(stolen_task);
          break;
        }
      }
    }
  }
}

Priority-Based Scheduling

Use task priorities for scheduling:

void MyScheduler::DivideWorkers(WorkOrchestrator *work_orch) {
  u32 total = work_orch->GetTotalWorkerCount();
  u32 high_prio_count = total / 2;

  for (u32 i = 0; i < high_prio_count; ++i) {
    high_priority_workers_.push_back(work_orch->GetWorker(i));
  }

  for (u32 i = high_prio_count; i < total - 1; ++i) {
    low_priority_workers_.push_back(work_orch->GetWorker(i));
  }

  // Network worker
  net_worker_ = work_orch->GetWorker(total - 1);
}

Troubleshooting

Tasks Not Being Processed

Symptom: Tasks submitted but never execute

Check:

Did you call IpcManager::SetNumSchedQueues() in DivideWorkers?
Are all workers getting lanes via WorkOrchestrator's 1:1 mapping?
Does ClientMapTask return lane IDs in valid range?
Is IpcManager::SetNetLane() called for the network worker?

Client Mapping Errors

Symptom: Assertion failures or crashes in ClientMapTask

Check:

Is returned lane ID in range [0, num_sched_queues)?
Did you check for num_lanes == 0?
Are you using modulo to wrap lane IDs?

Tasks Hang After Re-enqueue

Symptom: Tasks enqueued to a different worker never complete

Check:

When RouteLocal re-enqueues to a different worker, ProcessNewTask on the destination worker updates the RunContext's worker_id_, lane_, and event_queue_ to match the new worker. If RunContext fields are stale, subtask completion events go to the wrong worker.
Check that TASK_STARTED is respected — re-enqueued tasks with live coroutines must be resumed, not restarted.

Worker Crashes

Symptom: Workers crash during initialization

Check:

Are you checking for null pointers?
Does DivideWorkers handle total_workers < expected?
Is the single-worker case handled (when total_workers == 1)?

References

Scheduler Interface: context-runtime/include/chimaera/scheduler/scheduler.h
DefaultScheduler: context-runtime/src/scheduler/default_sched.cc
Container Base: context-runtime/include/chimaera/container.h
IpcManager (RouteTask/RouteLocal): context-runtime/src/ipc_manager.cc
WorkOrchestrator: context-runtime/src/work_orchestrator.cc
Configuration: Configuration Reference

Overview​

Table of Contents​

Architecture Overview​

Component Responsibilities​

Data Flow​

Task Routing Flow​

Two-Level Scheduling​

Level 1: Container Scheduling (ScheduleTask)​

Level 2: Worker Scheduling (RuntimeMapTask)​

Scheduler Interface​

Required Methods​

Method Details​

DivideWorkers(WorkOrchestrator *work_orch)​

ClientMapTask(IpcManager *ipc_manager, const Future<Task> &task)​

RuntimeMapTask(Worker *worker, const Future<Task> &task, Container *container)​

RebalanceWorker(Worker *worker)​

AdjustPolling(RunContext *run_ctx)​

Worker Lifecycle​

Implementing a Custom Scheduler​

Step 1: Create Header File​

Step 2: Implement Methods​

Step 3: Register Scheduler​

Step 4: Configure​

DefaultScheduler Example​

Worker Partitioning​

Dynamic Scheduling via ScheduleTask​

Task Group Affinity​

I/O-Size Routing​

Force Enqueue​

Code Reference​

Best Practices​

1. Always Update IpcManager in DivideWorkers​

2. Route Network Tasks to the Network Worker​

3. Implement Task Group Affinity​

4. Validate Lane IDs​

5. Handle Null Pointers​

6. Consider Thread Safety​

7. Test with Different Configurations​

Integration Points​

Singletons and Macros​

Worker Access​

Container Access​

Logging​

Configuration Access​

Advanced Topics​

Custom Container Scheduling​

Work Stealing​

Priority-Based Scheduling​

Troubleshooting​

Tasks Not Being Processed​

Client Mapping Errors​

Tasks Hang After Re-enqueue​

Worker Crashes​

References​

Overview

Table of Contents

Architecture Overview

Component Responsibilities

Data Flow

Task Routing Flow

Two-Level Scheduling

Level 1: Container Scheduling (`ScheduleTask`)

Level 2: Worker Scheduling (`RuntimeMapTask`)

Scheduler Interface

Required Methods

Method Details

`DivideWorkers(WorkOrchestrator *work_orch)`

`ClientMapTask(IpcManager *ipc_manager, const Future<Task> &task)`

`RuntimeMapTask(Worker worker, const Future<Task> &task, Container container)`

`RebalanceWorker(Worker *worker)`

`AdjustPolling(RunContext *run_ctx)`

Worker Lifecycle

Implementing a Custom Scheduler

Step 1: Create Header File

Step 2: Implement Methods

Step 3: Register Scheduler

Step 4: Configure

DefaultScheduler Example

Worker Partitioning

Dynamic Scheduling via `ScheduleTask`

Task Group Affinity

I/O-Size Routing

Force Enqueue

Code Reference

Best Practices

1. Always Update IpcManager in DivideWorkers

2. Route Network Tasks to the Network Worker

3. Implement Task Group Affinity

4. Validate Lane IDs

5. Handle Null Pointers

6. Consider Thread Safety

7. Test with Different Configurations

Integration Points

Singletons and Macros

Worker Access

Container Access

Logging

Configuration Access

Advanced Topics

Custom Container Scheduling

Work Stealing

Priority-Based Scheduling

Troubleshooting

Tasks Not Being Processed

Client Mapping Errors

Tasks Hang After Re-enqueue

Worker Crashes

References