This policy ensures botBrains can continue and recover critical services during and after a disruption such as a provider outage, data loss, or loss of a team member. It defines recovery objectives, the technical strategy for staying available, and how botBrains communicates with customers while a disruption degrades service.
botBrains is not yet ISO 27001 certified. We are preparing our ISMS and writing these policies as part of pursuing certification, and we fully intend to get our controls attested.
Scope
This policy covers the production botBrains service (the API, background workers, model inference, and the data stores behind them) and the two-person team that operates it. botBrains is fully remote with no office or self-operated data center, so “facility loss” isn’t a meaningful failure mode. Continuity planning focuses on cloud infrastructure disruption, data loss, and personnel availability. The subprocessor list and the Technical and Organizational Measures detail the underlying provider redundancy.
Recovery objectives
botBrains classifies disruptions by severity and sets the following targets. botBrains validates these objectives against the restore tests defined in the Backup Policy.
| Severity | Example | Recovery Time Objective (RTO) | Recovery Point Objective (RPO) |
|---|
| Low | Single component degraded, redundancy absorbs it | 4 hours | Near zero |
| Medium | Loss of a provider availability zone or non-primary service | 12 hours | Minutes |
| High | Loss of a primary data store or region | 24 hours | Minutes, recovered from PITR |
The low RPO is achievable because the application database supports Point-in-Time Recovery, so botBrains can recover to a moment shortly before the disruption rather than to a nightly snapshot.
Continuity strategy
botBrains stays available through provider-level redundancy and rapid, reproducible recovery rather than a manually staffed standby site.
| Capability | How botBrains provides it |
|---|
| Compute redundancy | API and worker hosts on Hetzner with load balancing and hot standby; failover to healthy hosts |
| Data durability and recovery | Managed database in AWS (Germany) with PITR and WAL, plus durable object storage; database backups replicated to a secondary EU region in Ireland. See the Backup Policy |
| Provider failure | Provider operates multiple availability zones; recover the affected service through a combination of infrastructure-as-code and tested manual restore procedures |
| Model inference | Multiple inference providers (AWS Bedrock, Azure OpenAI, OpenAI Enterprise) all under EU data residency, allowing failover between them |
| Configuration and code | Source code and deployment configuration in GitHub, reproducible to a clean environment |
Plan activation
A co-founder activates this plan when botBrains expects a disruption to exceed the low-severity RTO, when botBrains loses a primary data store, or when a security incident threatens availability. A security-driven disruption is also handled under the Incident Management Policy; this policy governs the availability and recovery dimension.
Primary decision authority during a disruption rests with the CISO (Liam), and the co-founder (Ben) serves as backup when the CISO is unavailable.
Both co-founders keep a local, offline copy of this plan together with critical provider and emergency contact information, so the plan and its contacts remain accessible when internet access is unavailable during a disaster.
Personnel continuity
botBrains is a two-person team, so loss or unavailability of one person is a material risk. botBrains mitigates this by giving each co-founder administrative access to the systems under their own credentials with MFA, keeping infrastructure reproducible from code, and documenting operational procedures so either co-founder can recover the service alone. The Access Control Policy governs access provisioning and revocation.
Communication
botBrains communicates service status to customers through the public status page, updated during and after any disruption that affects availability. Direct customer notification by email follows for incidents that warrant it, and any personal-data breach follows the Breach Notification Policy.
Testing
botBrains tests recovery at least annually. Testing combines a tabletop walkthrough of the activation and communication steps with a technical restore test that recovers the application database to a non-production environment and confirms the RTO and RPO targets above. botBrains records each test in the Employees Only: Backup & DR Test log and feeds findings back into this policy.
ISO 27001 mapping
| Control | Coverage |
|---|
| A.5.29 Information security during disruption | Maintaining security controls while recovering |
| A.5.30 ICT readiness for business continuity | Recovery objectives, strategy, and testing |
| A.8.13 Information backup | Recovery from backups (see the Backup Policy) |
| A.8.14 Redundancy of information processing facilities | Load balancing, hot standby, reproducible-from-code recovery, and multi-provider model-inference failover |
| Clause 8.1 Operational planning and control | Activation, roles, and review of continuity arrangements |
Review
The CISO owns this policy and reviews it at least annually, after each recovery test, and after any disruption or material architecture change.