qb
2.0.0.0
C++17 Actor Framework
|
A practical guide to designing robust actor systems in QB by effectively handling errors and implementing resilience patterns.
Building concurrent and distributed systems with actors requires a thoughtful approach to error handling and fault tolerance. While the QB Actor Framework provides isolation that can contain failures, it's up to the developer to implement strategies for detecting, managing, and recovering from errors. This guide outlines key techniques for creating resilient QB applications.
Each actor should be responsible for handling errors that occur within its own operational scope where possible.
Standard C++ Exception Handling (try...catch):
struct ProcessCommandReq : qb::Event { qb::string<128> command; }; struct CommandStatusRes : qb::Event { bool success; qb::string<256> info; };
class CommandHandlerActor : public qb::Actor { public: bool onInit() override { registerEvent<ProcessCommandReq>(*this); registerEvent<qb::KillEvent>(*this); return true; }
void on(const ProcessCommandReq& event) { try { if (event.command == "CRASH_NOW") { throw std::runtime_error("Simulated critical failure in command processing"); } if (event.command.empty()) { throw std::invalid_argument("Command cannot be empty"); } // ... process command ... qb::io::cout() << "Actor [" << id() << "] processed command: " << event.command.c_str() << ".\n"; push<CommandStatusRes>(event.getSource(), true, "Command processed successfully"); } catch (const std::invalid_argument& ia) { qb::io::cout() << "Actor [" << id() << "] Error: Invalid command argument - " << ia.what() << ".\n"; push<CommandStatusRes>(event.getSource(), false, "Invalid command format"); } catch (const std::exception& e) { qb::io::cout() << "Actor [" << id() << "] Error: Failed to process command '" << event.command.c_str() << "': " << e.what() << ".\n"; push<CommandStatusRes>(event.getSource(), false, "Internal processing error"); // For a severe, unrecoverable error, the actor might decide to terminate: // if (isUnrecoverable(e)) { // qb::io::cout() << "Actor [" << id() << "]: Unrecoverable error, terminating.\n"; // kill(); // } } } void on(const qb::KillEvent& /*event*/) { kill(); } }; ```
This is a critical aspect of QB's behavior:
(Reference: test-main.cpp includes tests for hasError() functionality. test-actor-error-handling.cpp simulates various actor error conditions.)**
QB-Core does not provide a built-in, Erlang-style supervisor hierarchy. Instead, you implement supervision using standard actor patterns. This gives you flexibility but requires explicit design.
Conceptual Supervisor Snippet: ```cpp // Events for supervision struct PingWorkerEvent : qb::Event {}; struct PongWorkerResponse : qb::Event {}; struct WorkerTimeoutCheck : qb::Event { qb::ActorId worker_id; }; // Self-sent by supervisor struct WorkerErrorReport : qb::Event { qb::string<128> error_details; };
class WorkerSupervisor : public qb::Actor { private: std::map<qb::ActorId, qb::TimePoint> _pending_pings; // Worker ID -> Ping Sent Time std::vector<qb::ActorId> _worker_pool; const qb::Duration PING_TIMEOUT = qb::literals::operator""_s(5); // 5 seconds
public: bool onInit() override { // ... create worker actors and store their IDs in _worker_pool ... // for (qb::ActorId worker_id : _worker_pool) { sendPingAndScheduleCheck(worker_id); } registerEvent<PongWorkerResponse>(*this); registerEvent<WorkerTimeoutCheck>(*this); registerEvent<WorkerErrorReport>(*this); registerEvent<qb::KillEvent>(*this); return true; }
void sendPingAndScheduleCheck(qb::ActorId worker_id) { if (!isActorKnownAndAlive(worker_id)) return; // Simplified check push<PingWorkerEvent>(worker_id); _pending_pings[worker_id] = qb::HighResTimePoint::now();
qb::io::async::callback([this, worker_id](){ if (this->is_alive()) this->push<WorkerTimeoutCheck>(this->id(), worker_id); }, PING_TIMEOUT.seconds_float()); }
void on(const PongWorkerResponse& event) { _pending_pings.erase(event.getSource()); // Pong received, clear pending // Optionally, schedule next ping after an interval // qb::io::async::callback([this, sid=event.getSource()](){ if(this->is_alive()) sendPingAndScheduleCheck(sid);}, 30.0); }
void on(const WorkerTimeoutCheck& event) { if (_pending_pings.count(event.worker_id)) { qb::io::cout() << "Supervisor: Worker " << event.worker_id << " timed out!\n"; _pending_pings.erase(event.worker_id); handleWorkerFailure(event.worker_id, "Ping Timeout"); } } void on(const WorkerErrorReport& event) { qb::io::cout() << "Supervisor: Worker " << event.getSource() << " reported error: " << event.error_details.c_str() << ".\n"; handleWorkerFailure(event.getSource(), event.error_details.c_str()); }
void handleWorkerFailure(qb::ActorId failed_worker_id, const qb::string<128>& reason) { // Remove from active pool, potentially restart, log, etc. // Example: _worker_pool.erase(std::remove(_worker_pool.begin(), _worker_pool.end(), failed_worker_id), _worker_pool.end()); // auto new_worker_id = addActor<MyWorkerType>(getIndex(), /*..args..*/); // _worker_pool.push_back(new_worker_id); sendPingAndScheduleCheck(new_worker_id); } // ... KillEvent handler to stop pings and workers ... bool isActorKnownAndAlive(qb::ActorId /*id*/) { return true; } // Placeholder }; ```
Actors that perform network or file I/O using qb::io::use<> helpers must be prepared to handle I/O-related errors, typically signaled by qb-io events:
(Reference: Client actors in chat_tcp/client/ClientActor.cpp and message_broker/client/ClientActor.cpp demonstrate reconnection logic in on(event::disconnected&).)**
By combining these strategies—internal actor robustness, awareness of framework behavior for unhandled exceptions, application-level supervision, and proper handling of I/O events—you can build QB actor systems that are significantly more resilient to failures and easier to maintain.
(Next: QB Framework: Effective Resource Management to learn about managing actor and system resources effectively.**)