> ## Documentation Index
> Fetch the complete documentation index at: https://docs.botbrains.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Business Continuity and Disaster Recovery

> ensures botBrains can continue and recover critical services during and after disruption

export const PolicyVersion = ({version, effective}) => <p><strong>Version {version}</strong> · Effective {effective}. Change history is tracked in version control.</p>;

This policy ensures botBrains can continue and recover critical services during and after a disruption such as a provider outage, data loss, or loss of a team member. It defines recovery objectives, the technical strategy for staying available, and how botBrains communicates with customers while a disruption degrades service.

<Warning>
  botBrains is **not yet ISO 27001 certified**. We are preparing our ISMS and writing these policies as part of pursuing certification, and we fully intend to get our controls attested.
</Warning>

<PolicyVersion version="1.0" effective="July 1, 2026" />

## Scope

This policy covers the production botBrains service (the API, background workers, model inference, and the data stores behind them) and the two-person team that operates it. botBrains is fully remote with no office or self-operated data center, so "facility loss" isn't a meaningful failure mode. Continuity planning focuses on cloud infrastructure disruption, data loss, and personnel availability. The [subprocessor list](/trust/subprocessors) and the [Technical and Organizational Measures](/trust/toms) detail the underlying provider redundancy.

## Recovery objectives

botBrains classifies disruptions by severity and sets the following targets. botBrains validates these objectives against the restore tests defined in the [Backup Policy](/trust/policies/backup-policy).

| Severity | Example                                                     | Recovery Time Objective (RTO) | Recovery Point Objective (RPO) |
| -------- | ----------------------------------------------------------- | ----------------------------- | ------------------------------ |
| Low      | Single component degraded, redundancy absorbs it            | 4 hours                       | Near zero                      |
| Medium   | Loss of a provider availability zone or non-primary service | 12 hours                      | Minutes                        |
| High     | Loss of a primary data store or region                      | 24 hours                      | Minutes, recovered from PITR   |

The low RPO is achievable because the application database supports Point-in-Time Recovery, so botBrains can recover to a moment shortly before the disruption rather than to a nightly snapshot.

## Continuity strategy

botBrains stays available through provider-level redundancy and rapid, reproducible recovery rather than a manually staffed standby site.

| Capability                   | How botBrains provides it                                                                                                                                                                                 |
| ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Compute redundancy           | API and worker hosts on Hetzner with load balancing and hot standby; failover to healthy hosts                                                                                                            |
| Data durability and recovery | Managed database in AWS (Germany) with PITR and WAL, plus durable object storage; database backups replicated to a secondary EU region in Ireland. See the [Backup Policy](/trust/policies/backup-policy) |
| Provider failure             | Provider operates multiple availability zones; recover the affected service through a combination of infrastructure-as-code and tested manual restore procedures                                          |
| Model inference              | Multiple inference providers (AWS Bedrock, Azure OpenAI, OpenAI Enterprise) all under EU data residency, allowing failover between them                                                                   |
| Configuration and code       | Source code and deployment configuration in GitHub, reproducible to a clean environment                                                                                                                   |

## Plan activation

A co-founder activates this plan when botBrains expects a disruption to exceed the low-severity RTO, when botBrains loses a primary data store, or when a security incident threatens availability. A security-driven disruption is also handled under the [Incident Management Policy](/trust/policies/incident-management-policy); this policy governs the availability and recovery dimension.

Primary decision authority during a disruption rests with the CISO (Liam), and the co-founder (Ben) serves as backup when the CISO is unavailable.

Both co-founders keep a local, offline copy of this plan together with critical provider and emergency contact information, so the plan and its contacts remain accessible when internet access is unavailable during a disaster.

## Personnel continuity

botBrains is a two-person team, so loss or unavailability of one person is a material risk. botBrains mitigates this by giving each co-founder administrative access to the systems under their own credentials with MFA, keeping infrastructure reproducible from code, and documenting operational procedures so either co-founder can recover the service alone. The [Access Control Policy](/trust/policies/access-control-policy) governs access provisioning and revocation.

## Communication

botBrains communicates service status to customers through the public [status page](https://status.botbrains.io), updated during and after any disruption that affects availability. Direct customer notification by email follows for incidents that warrant it, and any personal-data breach follows the [Breach Notification Policy](/trust/policies/breach-notification-policy).

## Testing

botBrains tests recovery at least annually. Testing combines a tabletop walkthrough of the activation and communication steps with a technical restore test that recovers the application database to a non-production environment and confirms the RTO and RPO targets above. botBrains records each test in the [Employees Only: Backup & DR Test log](https://app.notion.com/p/390481da93cf81e38b0bcb91685c51d2) and feeds findings back into this policy.

## ISO 27001 mapping

| Control                                                | Coverage                                                                                                  |
| ------------------------------------------------------ | --------------------------------------------------------------------------------------------------------- |
| A.5.29 Information security during disruption          | Maintaining security controls while recovering                                                            |
| A.5.30 ICT readiness for business continuity           | Recovery objectives, strategy, and testing                                                                |
| A.8.13 Information backup                              | Recovery from backups (see the [Backup Policy](/trust/policies/backup-policy))                            |
| A.8.14 Redundancy of information processing facilities | Load balancing, hot standby, reproducible-from-code recovery, and multi-provider model-inference failover |
| Clause 8.1 Operational planning and control            | Activation, roles, and review of continuity arrangements                                                  |

## Review

The CISO owns this policy and reviews it at least annually, after each recovery test, and after any disruption or material architecture change.
