Disaster Recovery Planning for SMBs: Protect Your Business from the Inevitable

60% of small businesses that lose their data shut down within 6 months. Yet only 23% of SMBs have a documented, tested disaster recovery plan. The businesses that survive disasters are not the ones that avoided them --- they are the ones that prepared for them.

This guide provides a practical disaster recovery framework for small and mid-sized businesses, covering everything from basic backup strategies to multi-region failover architectures.

Key Takeaways

Define RPO and RTO before choosing a DR strategy --- these numbers determine your architecture and budget

The 3-2-1 backup rule (3 copies, 2 media types, 1 offsite) is the minimum acceptable backup strategy

Untested backups are not backups --- schedule quarterly recovery drills

DR costs scale linearly with RTO requirements: 24-hour RTO costs 10% of 1-hour RTO

Defining Recovery Objectives

RPO (Recovery Point Objective)

The maximum acceptable data loss measured in time. If your RPO is 1 hour, you can tolerate losing up to 1 hour of data.

RTO (Recovery Time Objective)

The maximum acceptable downtime. If your RTO is 4 hours, your business can survive being offline for up to 4 hours.

Matching Objectives to Business Impact

System	RPO	RTO	Justification
eCommerce storefront	1 hour	30 minutes	Lost orders = lost revenue
ERP (Odoo, SAP)	4 hours	2 hours	Internal operations, some manual workaround
Email system	24 hours	4 hours	Inconvenient but not business-critical
Marketing website	7 days	24 hours	Can rebuild from Git
Analytics/BI	24 hours	48 hours	Historical data, not operational

Backup Strategies

The 3-2-1 Rule

3 copies of every critical dataset
2 different storage types (local disk + cloud, for example)
1 copy in a geographically separate location

Automated PostgreSQL Backup

#!/bin/bash
# /opt/scripts/backup-database.sh
# Run via cron: 0 */6 * * * /opt/scripts/backup-database.sh

set -euo pipefail

TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/opt/backups/database"
S3_BUCKET="s3://ecosire-backups/database"
DB_NAME="ecosire"
DB_USER="app"
RETENTION_DAYS=30

mkdir -p "$BACKUP_DIR"

# Create backup with compression
echo "Starting backup at $(date)"
pg_dump -h localhost -U "$DB_USER" -Fc "$DB_NAME" > "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump"

# Verify backup integrity
pg_restore --list "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump" > /dev/null 2>&1
if [ $? -ne 0 ]; then
  echo "ERROR: Backup verification failed"
  exit 1
fi

BACKUP_SIZE=$(du -h "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump" | cut -f1)
echo "Backup created: ${BACKUP_SIZE}"

# Upload to S3 with server-side encryption
aws s3 cp "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump" \
  "$S3_BUCKET/${DB_NAME}_${TIMESTAMP}.dump" \
  --sse AES256

# Upload to secondary region
aws s3 cp "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump" \
  "s3://ecosire-backups-dr/database/${DB_NAME}_${TIMESTAMP}.dump" \
  --sse AES256 \
  --region eu-west-1

# Clean up local backups older than retention period
find "$BACKUP_DIR" -name "*.dump" -mtime +$RETENTION_DAYS -delete

echo "Backup complete at $(date)"

Application File Backup

#!/bin/bash
# Backup application files, uploads, and configuration

TIMESTAMP=$(date +%Y%m%d_%H%M%S)

# Backup Odoo filestore
tar czf "/opt/backups/files/filestore_${TIMESTAMP}.tar.gz" /opt/odoo/data/filestore/

# Backup uploaded documents
tar czf "/opt/backups/files/uploads_${TIMESTAMP}.tar.gz" /opt/app/uploads/

# Backup configuration (secrets excluded)
tar czf "/opt/backups/config/config_${TIMESTAMP}.tar.gz" \
  --exclude='*.env*' \
  --exclude='*.pem' \
  /opt/app/infrastructure/

# Upload all to S3
aws s3 sync /opt/backups/ s3://ecosire-backups/ --sse AES256

Failover Architectures

Tier 1: Cold Standby (RTO: 4-24 hours)

Backups stored in cloud storage
Recovery involves provisioning new infrastructure and restoring from backup
Cheapest option: only pay for storage
Suitable for non-critical internal applications

Tier 2: Warm Standby (RTO: 1-4 hours)

Standby server running but not receiving traffic
Database replication keeps standby data current
Recovery involves promoting standby and updating DNS
Moderate cost: pay for standby server at reduced size

Primary (active) ----replication----> Standby (warm)
                                         |
                   On failure: promote + DNS update

Tier 3: Hot Standby (RTO: <30 minutes)

Active-passive or active-active configuration
Automatic failover via health checks
Database with synchronous replication
Higher cost: pay for full duplicate infrastructure

                     Health Check
                         |
Load Balancer ------> Primary (active)
       |
       +-------------> Secondary (hot standby, auto-promote)

Tier 4: Multi-Region Active-Active (RTO: <5 minutes)

Multiple regions serve traffic simultaneously
Global load balancer routes by geography
Database conflict resolution for multi-master writes
Highest cost and complexity

Cost Comparison

Tier	Monthly Cost (for $500/mo primary)	RTO	RPO
Cold Standby	$20 (storage only)	4-24 hours	6 hours
Warm Standby	$200	1-4 hours	1 hour
Hot Standby	$500	<30 minutes	<5 minutes
Active-Active	$1,200+	<5 minutes	Near-zero

Recovery Testing

Quarterly Recovery Drill

Every quarter, execute a full recovery test:

Select a random backup from the last 30 days
Provision recovery infrastructure (separate from production)
Restore the database from backup
Restore application files from backup
Deploy application code from Git
Run smoke tests against the restored environment
Measure actual recovery time against RTO target
Document findings and update the DR plan

Recovery Drill Checklist

Database restores successfully from backup
Application starts and serves requests
User authentication works
Critical business flows complete (place order, generate invoice)
Integration endpoints respond (payment gateway, email)
Actual recovery time meets RTO target
Team knows their roles without referencing documentation
Communication channels work (how do you notify stakeholders?)

Incident Response Playbook

Severity Levels

Level	Definition	Response Time	Communication
SEV1	Complete outage, revenue impacted	15 minutes	All hands, customer notification
SEV2	Partial outage, degraded service	30 minutes	On-call team, stakeholder update
SEV3	Minor issue, workaround available	2 hours	On-call engineer
SEV4	Non-urgent, no customer impact	Next business day	Ticket queue

SEV1 Response Steps

Acknowledge the incident within 15 minutes
Assess the scope: what is affected, how many users impacted
Communicate to stakeholders: status page update, customer notification
Mitigate using the quickest available option (rollback, failover, scaling)
Resolve the root cause
Post-mortem within 48 hours: timeline, root cause, action items

Frequently Asked Questions

How much should we budget for disaster recovery?

A reasonable DR budget is 10-25% of your production infrastructure cost. For a company spending $500/month on infrastructure, budget $50-125/month for DR. This covers cloud backup storage, a warm standby server, and monitoring. The ROI calculation: if your business loses $5,000/hour of downtime and DR reduces a potential 24-hour outage to 4 hours, the DR investment saved $100,000.

Do we need DR if we use a managed cloud provider?

Yes. Cloud providers protect against hardware failure and data center outages, but they do not protect against application bugs, accidental deletion, ransomware, or account compromise. Your DR plan must cover scenarios that the cloud provider does not: corrupted data, deleted resources, security breaches, and vendor lock-in risk.

How do we handle DR for our Odoo ERP system?

Odoo DR requires three components: (1) PostgreSQL database backups (automated, encrypted, offsite), (2) filestore backups (uploaded attachments, report templates), (3) custom module code (in Git). Recovery involves: provision a server, install Odoo, restore database, restore filestore, deploy custom modules. ECOSIRE provides managed Odoo DR with automated backups and tested recovery procedures.

What is the most common DR failure?

Untested backups. Over 30% of backup restores fail due to corruption, incomplete backups, missing dependencies, or changed passwords. The second most common failure is outdated documentation --- the DR plan references servers, credentials, or procedures that no longer exist. Quarterly testing catches both issues.

What Comes Next

Disaster recovery is one pillar of operational resilience. Combine it with monitoring and alerting for early detection, zero-downtime deployments for safe changes, and security hardening for threat prevention.

Contact ECOSIRE for disaster recovery planning and implementation, or explore our DevOps guide for the full infrastructure roadmap.

Published by ECOSIRE -- helping businesses prepare for the inevitable.

Disaster Recovery Planning for SMBs: Protect Your Business from the Inevitable

This guide provides a practical disaster recovery framework for small and mid-sized businesses, covering everything from basic backup strategies to multi-region failover architectures.

Key Takeaways

Define RPO and RTO before choosing a DR strategy --- these numbers determine your architecture and budget

The 3-2-1 backup rule (3 copies, 2 media types, 1 offsite) is the minimum acceptable backup strategy

Untested backups are not backups --- schedule quarterly recovery drills

DR costs scale linearly with RTO requirements: 24-hour RTO costs 10% of 1-hour RTO

Defining Recovery Objectives

RPO (Recovery Point Objective)

The maximum acceptable data loss measured in time. If your RPO is 1 hour, you can tolerate losing up to 1 hour of data.

RTO (Recovery Time Objective)

The maximum acceptable downtime. If your RTO is 4 hours, your business can survive being offline for up to 4 hours.

Matching Objectives to Business Impact

System	RPO	RTO	Justification
eCommerce storefront	1 hour	30 minutes	Lost orders = lost revenue
ERP (Odoo, SAP)	4 hours	2 hours	Internal operations, some manual workaround
Email system	24 hours	4 hours	Inconvenient but not business-critical
Marketing website	7 days	24 hours	Can rebuild from Git
Analytics/BI	24 hours	48 hours	Historical data, not operational

Backup Strategies

The 3-2-1 Rule

3 copies of every critical dataset
2 different storage types (local disk + cloud, for example)
1 copy in a geographically separate location

Automated PostgreSQL Backup

#!/bin/bash
# /opt/scripts/backup-database.sh
# Run via cron: 0 */6 * * * /opt/scripts/backup-database.sh

set -euo pipefail

TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/opt/backups/database"
S3_BUCKET="s3://ecosire-backups/database"
DB_NAME="ecosire"
DB_USER="app"
RETENTION_DAYS=30

mkdir -p "$BACKUP_DIR"

# Create backup with compression
echo "Starting backup at $(date)"
pg_dump -h localhost -U "$DB_USER" -Fc "$DB_NAME" > "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump"

# Verify backup integrity
pg_restore --list "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump" > /dev/null 2>&1
if [ $? -ne 0 ]; then
  echo "ERROR: Backup verification failed"
  exit 1
fi

BACKUP_SIZE=$(du -h "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump" | cut -f1)
echo "Backup created: ${BACKUP_SIZE}"

# Upload to S3 with server-side encryption
aws s3 cp "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump" \
  "$S3_BUCKET/${DB_NAME}_${TIMESTAMP}.dump" \
  --sse AES256

# Upload to secondary region
aws s3 cp "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump" \
  "s3://ecosire-backups-dr/database/${DB_NAME}_${TIMESTAMP}.dump" \
  --sse AES256 \
  --region eu-west-1

# Clean up local backups older than retention period
find "$BACKUP_DIR" -name "*.dump" -mtime +$RETENTION_DAYS -delete

echo "Backup complete at $(date)"

Application File Backup

#!/bin/bash
# Backup application files, uploads, and configuration

TIMESTAMP=$(date +%Y%m%d_%H%M%S)

# Backup Odoo filestore
tar czf "/opt/backups/files/filestore_${TIMESTAMP}.tar.gz" /opt/odoo/data/filestore/

# Backup uploaded documents
tar czf "/opt/backups/files/uploads_${TIMESTAMP}.tar.gz" /opt/app/uploads/

# Backup configuration (secrets excluded)
tar czf "/opt/backups/config/config_${TIMESTAMP}.tar.gz" \
  --exclude='*.env*' \
  --exclude='*.pem' \
  /opt/app/infrastructure/

# Upload all to S3
aws s3 sync /opt/backups/ s3://ecosire-backups/ --sse AES256

Failover Architectures

Tier 1: Cold Standby (RTO: 4-24 hours)

Backups stored in cloud storage
Recovery involves provisioning new infrastructure and restoring from backup
Cheapest option: only pay for storage
Suitable for non-critical internal applications

Tier 2: Warm Standby (RTO: 1-4 hours)

Standby server running but not receiving traffic
Database replication keeps standby data current
Recovery involves promoting standby and updating DNS
Moderate cost: pay for standby server at reduced size

Primary (active) ----replication----> Standby (warm)
                                         |
                   On failure: promote + DNS update

Tier 3: Hot Standby (RTO: <30 minutes)

Active-passive or active-active configuration
Automatic failover via health checks
Database with synchronous replication
Higher cost: pay for full duplicate infrastructure

                     Health Check
                         |
Load Balancer ------> Primary (active)
       |
       +-------------> Secondary (hot standby, auto-promote)

Tier 4: Multi-Region Active-Active (RTO: <5 minutes)

Multiple regions serve traffic simultaneously
Global load balancer routes by geography
Database conflict resolution for multi-master writes
Highest cost and complexity

Cost Comparison

Tier	Monthly Cost (for $500/mo primary)	RTO	RPO
Cold Standby	$20 (storage only)	4-24 hours	6 hours
Warm Standby	$200	1-4 hours	1 hour
Hot Standby	$500	<30 minutes	<5 minutes
Active-Active	$1,200+	<5 minutes	Near-zero

Recovery Testing

Quarterly Recovery Drill

Every quarter, execute a full recovery test:

Select a random backup from the last 30 days
Provision recovery infrastructure (separate from production)
Restore the database from backup
Restore application files from backup
Deploy application code from Git
Run smoke tests against the restored environment
Measure actual recovery time against RTO target
Document findings and update the DR plan

Recovery Drill Checklist

Database restores successfully from backup
Application starts and serves requests
User authentication works
Critical business flows complete (place order, generate invoice)
Integration endpoints respond (payment gateway, email)
Actual recovery time meets RTO target
Team knows their roles without referencing documentation
Communication channels work (how do you notify stakeholders?)

Incident Response Playbook

Severity Levels

Level	Definition	Response Time	Communication
SEV1	Complete outage, revenue impacted	15 minutes	All hands, customer notification
SEV2	Partial outage, degraded service	30 minutes	On-call team, stakeholder update
SEV3	Minor issue, workaround available	2 hours	On-call engineer
SEV4	Non-urgent, no customer impact	Next business day	Ticket queue

SEV1 Response Steps

Acknowledge the incident within 15 minutes
Assess the scope: what is affected, how many users impacted
Communicate to stakeholders: status page update, customer notification
Mitigate using the quickest available option (rollback, failover, scaling)
Resolve the root cause
Post-mortem within 48 hours: timeline, root cause, action items

Frequently Asked Questions

How much should we budget for disaster recovery?

Do we need DR if we use a managed cloud provider?

How do we handle DR for our Odoo ERP system?

What is the most common DR failure?

What Comes Next

Contact ECOSIRE for disaster recovery planning and implementation, or explore our DevOps guide for the full infrastructure roadmap.

Published by ECOSIRE -- helping businesses prepare for the inevitable.

Disaster Recovery Planning for SMBs: Protect Your Business from the Inevitable

Disaster Recovery Planning for SMBs: Protect Your Business from the Inevitable

Defining Recovery Objectives

RPO (Recovery Point Objective)

RTO (Recovery Time Objective)

Matching Objectives to Business Impact

Backup Strategies

The 3-2-1 Rule

Automated PostgreSQL Backup

Application File Backup

Failover Architectures

Tier 1: Cold Standby (RTO: 4-24 hours)

Tier 2: Warm Standby (RTO: 1-4 hours)

Tier 3: Hot Standby (RTO: <30 minutes)

Tier 4: Multi-Region Active-Active (RTO: <5 minutes)

Cost Comparison

Recovery Testing

Quarterly Recovery Drill

Recovery Drill Checklist

Incident Response Playbook

Severity Levels

SEV1 Response Steps

Frequently Asked Questions

What Comes Next

Grow Your Business with ECOSIRE

Related Articles

Supply Chain Resilience: 10 Strategies to Survive Disruptions in 2026

GitHub Actions CI/CD for Monorepo Projects

Power BI Deployment Pipelines: Dev to Production Workflow

Disaster Recovery Planning for SMBs: Protect Your Business from the Inevitable

Disaster Recovery Planning for SMBs: Protect Your Business from the Inevitable

Defining Recovery Objectives

RPO (Recovery Point Objective)

RTO (Recovery Time Objective)

Matching Objectives to Business Impact

Backup Strategies

The 3-2-1 Rule

Automated PostgreSQL Backup

Application File Backup

Failover Architectures

Tier 1: Cold Standby (RTO: 4-24 hours)

Tier 2: Warm Standby (RTO: 1-4 hours)

Tier 3: Hot Standby (RTO: <30 minutes)

Tier 4: Multi-Region Active-Active (RTO: <5 minutes)

Cost Comparison

Recovery Testing

Quarterly Recovery Drill

Recovery Drill Checklist

Incident Response Playbook

Severity Levels

SEV1 Response Steps

Frequently Asked Questions

What Comes Next

Grow Your Business with ECOSIRE

Related Articles

Supply Chain Resilience: 10 Strategies to Survive Disruptions in 2026

GitHub Actions CI/CD for Monorepo Projects

Power BI Deployment Pipelines: Dev to Production Workflow