Business Continuity and Disaster Recovery

This policy ensures botBrains can continue and recover critical services during and after a disruption such as a provider outage, data loss, or loss of a team member. It defines recovery objectives, the technical strategy for staying available, and how botBrains communicates with customers while a disruption degrades service.

botBrains is not yet ISO 27001 certified. We are preparing our ISMS and writing these policies as part of pursuing certification, and we fully intend to get our controls attested.

Scope

This policy covers the production botBrains service (the API, background workers, model inference, and the data stores behind them) and the two-person team that operates it. botBrains is fully remote with no office or self-operated data center, so “facility loss” isn’t a meaningful failure mode. Continuity planning focuses on cloud infrastructure disruption, data loss, and personnel availability. The subprocessor list and the Technical and Organizational Measures detail the underlying provider redundancy.

Recovery objectives

botBrains classifies disruptions by severity and sets the following targets. botBrains validates these objectives against the restore tests defined in the Backup Policy.

Severity	Example	Recovery Time Objective (RTO)	Recovery Point Objective (RPO)
Low	Single component degraded, redundancy absorbs it	4 hours	Near zero
Medium	Loss of a provider availability zone or non-primary service	12 hours	Minutes
High	Loss of a primary data store or region	24 hours	Minutes, recovered from PITR

The low RPO is achievable because the application database supports Point-in-Time Recovery, so botBrains can recover to a moment shortly before the disruption rather than to a nightly snapshot.

Continuity strategy

botBrains stays available through provider-level redundancy and rapid, reproducible recovery rather than a manually staffed standby site.

Capability	How botBrains provides it
Compute redundancy	API and worker hosts on Hetzner with load balancing and hot standby; failover to healthy hosts
Data durability and recovery	Managed database in AWS (Germany) with PITR and WAL, plus durable object storage; database backups replicated to a secondary EU region in Ireland. See the Backup Policy
Provider failure	Provider operates multiple availability zones; recover the affected service through a combination of infrastructure-as-code and tested manual restore procedures
Model inference	Multiple inference providers (AWS Bedrock, Azure OpenAI, OpenAI Enterprise) all under EU data residency, allowing failover between them
Configuration and code	Source code and deployment configuration in GitHub, reproducible to a clean environment

Plan activation

A co-founder activates this plan when botBrains expects a disruption to exceed the low-severity RTO, when botBrains loses a primary data store, or when a security incident threatens availability. A security-driven disruption is also handled under the Incident Management Policy; this policy governs the availability and recovery dimension. Primary decision authority during a disruption rests with the CISO (Liam), and the co-founder (Ben) serves as backup when the CISO is unavailable. Both co-founders keep a local, offline copy of this plan together with critical provider and emergency contact information, so the plan and its contacts remain accessible when internet access is unavailable during a disaster.

Personnel continuity

botBrains is a two-person team, so loss or unavailability of one person is a material risk. botBrains mitigates this by giving each co-founder administrative access to the systems under their own credentials with MFA, keeping infrastructure reproducible from code, and documenting operational procedures so either co-founder can recover the service alone. The Access Control Policy governs access provisioning and revocation.

Communication

botBrains communicates service status to customers through the public status page, updated during and after any disruption that affects availability. Direct customer notification by email follows for incidents that warrant it, and any personal-data breach follows the Breach Notification Policy.

Testing

botBrains tests recovery at least annually. Testing combines a tabletop walkthrough of the activation and communication steps with a technical restore test that recovers the application database to a non-production environment and confirms the RTO and RPO targets above. botBrains records each test in the Employees Only: Backup & DR Test log and feeds findings back into this policy.

ISO 27001 mapping

Control	Coverage
A.5.29 Information security during disruption	Maintaining security controls while recovering
A.5.30 ICT readiness for business continuity	Recovery objectives, strategy, and testing
A.8.13 Information backup	Recovery from backups (see the Backup Policy)
A.8.14 Redundancy of information processing facilities	Load balancing, hot standby, reproducible-from-code recovery, and multi-provider model-inference failover
Clause 8.1 Operational planning and control	Activation, roles, and review of continuity arrangements

Review

The CISO owns this policy and reviews it at least annually, after each recovery test, and after any disruption or material architecture change.

​Scope

​Recovery objectives

​Continuity strategy

​Plan activation

​Personnel continuity

​Communication

​Testing

​ISO 27001 mapping

​Review

Scope

Recovery objectives

Continuity strategy

Plan activation

Personnel continuity

Communication

Testing

ISO 27001 mapping

Review