Skip to main content
This policy ensures botBrains can continue and recover critical services during and after a disruption such as a provider outage, data loss, or loss of a team member. It defines recovery objectives, the technical strategy for staying available, and how botBrains communicates with customers while a disruption degrades service.
botBrains is not yet ISO 27001 certified. We are preparing our ISMS and writing these policies as part of pursuing certification, and we fully intend to get our controls attested.

Scope

This policy covers the production botBrains service (the API, background workers, model inference, and the data stores behind them) and the two-person team that operates it. botBrains is fully remote with no office or self-operated data center, so “facility loss” isn’t a meaningful failure mode. Continuity planning focuses on cloud infrastructure disruption, data loss, and personnel availability. The subprocessor list and the Technical and Organizational Measures detail the underlying provider redundancy.

Recovery objectives

botBrains classifies disruptions by severity and sets the following targets. botBrains validates these objectives against the restore tests defined in the Backup Policy.
SeverityExampleRecovery Time Objective (RTO)Recovery Point Objective (RPO)
LowSingle component degraded, redundancy absorbs it4 hoursNear zero
MediumLoss of a provider availability zone or non-primary service12 hoursMinutes
HighLoss of a primary data store or region24 hoursMinutes, recovered from PITR
The low RPO is achievable because the application database supports Point-in-Time Recovery, so botBrains can recover to a moment shortly before the disruption rather than to a nightly snapshot.

Continuity strategy

botBrains stays available through provider-level redundancy and rapid, reproducible recovery rather than a manually staffed standby site.
CapabilityHow botBrains provides it
Compute redundancyAPI and worker hosts on Hetzner with load balancing and hot standby; failover to healthy hosts
Data durability and recoveryManaged database in AWS (Germany) with PITR and WAL, plus durable object storage; database backups replicated to a secondary EU region in Ireland. See the Backup Policy
Provider failureProvider operates multiple availability zones; recover the affected service through a combination of infrastructure-as-code and tested manual restore procedures
Model inferenceMultiple inference providers (AWS Bedrock, Azure OpenAI, OpenAI Enterprise) all under EU data residency, allowing failover between them
Configuration and codeSource code and deployment configuration in GitHub, reproducible to a clean environment

Plan activation

A co-founder activates this plan when botBrains expects a disruption to exceed the low-severity RTO, when botBrains loses a primary data store, or when a security incident threatens availability. A security-driven disruption is also handled under the Incident Management Policy; this policy governs the availability and recovery dimension. Primary decision authority during a disruption rests with the CISO (Liam), and the co-founder (Ben) serves as backup when the CISO is unavailable. Both co-founders keep a local, offline copy of this plan together with critical provider and emergency contact information, so the plan and its contacts remain accessible when internet access is unavailable during a disaster.

Personnel continuity

botBrains is a two-person team, so loss or unavailability of one person is a material risk. botBrains mitigates this by giving each co-founder administrative access to the systems under their own credentials with MFA, keeping infrastructure reproducible from code, and documenting operational procedures so either co-founder can recover the service alone. The Access Control Policy governs access provisioning and revocation.

Communication

botBrains communicates service status to customers through the public status page, updated during and after any disruption that affects availability. Direct customer notification by email follows for incidents that warrant it, and any personal-data breach follows the Breach Notification Policy.

Testing

botBrains tests recovery at least annually. Testing combines a tabletop walkthrough of the activation and communication steps with a technical restore test that recovers the application database to a non-production environment and confirms the RTO and RPO targets above. botBrains records each test in the Employees Only: Backup & DR Test log and feeds findings back into this policy.

ISO 27001 mapping

ControlCoverage
A.5.29 Information security during disruptionMaintaining security controls while recovering
A.5.30 ICT readiness for business continuityRecovery objectives, strategy, and testing
A.8.13 Information backupRecovery from backups (see the Backup Policy)
A.8.14 Redundancy of information processing facilitiesLoad balancing, hot standby, reproducible-from-code recovery, and multi-provider model-inference failover
Clause 8.1 Operational planning and controlActivation, roles, and review of continuity arrangements

Review

The CISO owns this policy and reviews it at least annually, after each recovery test, and after any disruption or material architecture change.