Disaster Recovery Planning for SMBs: Protect Your Business from the Inevitable
60% of small businesses that lose their data shut down within 6 months. Yet only 23% of SMBs have a documented, tested disaster recovery plan. The businesses that survive disasters are not the ones that avoided them --- they are the ones that prepared for them.
This guide provides a practical disaster recovery framework for small and mid-sized businesses, covering everything from basic backup strategies to multi-region failover architectures.
Key Takeaways
- Define RPO and RTO before choosing a DR strategy --- these numbers determine your architecture and budget
- The 3-2-1 backup rule (3 copies, 2 media types, 1 offsite) is the minimum acceptable backup strategy
- Untested backups are not backups --- schedule quarterly recovery drills
- DR costs scale linearly with RTO requirements: 24-hour RTO costs 10% of 1-hour RTO
Defining Recovery Objectives
RPO (Recovery Point Objective)
The maximum acceptable data loss measured in time. If your RPO is 1 hour, you can tolerate losing up to 1 hour of data.
RTO (Recovery Time Objective)
The maximum acceptable downtime. If your RTO is 4 hours, your business can survive being offline for up to 4 hours.
Matching Objectives to Business Impact
| System | RPO | RTO | Justification |
|---|---|---|---|
| eCommerce storefront | 1 hour | 30 minutes | Lost orders = lost revenue |
| ERP (Odoo, SAP) | 4 hours | 2 hours | Internal operations, some manual workaround |
| Email system | 24 hours | 4 hours | Inconvenient but not business-critical |
| Marketing website | 7 days | 24 hours | Can rebuild from Git |
| Analytics/BI | 24 hours | 48 hours | Historical data, not operational |
Backup Strategies
The 3-2-1 Rule
- 3 copies of every critical dataset
- 2 different storage types (local disk + cloud, for example)
- 1 copy in a geographically separate location
Automated PostgreSQL Backup
#!/bin/bash
# /opt/scripts/backup-database.sh
# Run via cron: 0 */6 * * * /opt/scripts/backup-database.sh
set -euo pipefail
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/opt/backups/database"
S3_BUCKET="s3://ecosire-backups/database"
DB_NAME="ecosire"
DB_USER="app"
RETENTION_DAYS=30
mkdir -p "$BACKUP_DIR"
# Create backup with compression
echo "Starting backup at $(date)"
pg_dump -h localhost -U "$DB_USER" -Fc "$DB_NAME" > "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump"
# Verify backup integrity
pg_restore --list "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump" > /dev/null 2>&1
if [ $? -ne 0 ]; then
echo "ERROR: Backup verification failed"
exit 1
fi
BACKUP_SIZE=$(du -h "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump" | cut -f1)
echo "Backup created: ${BACKUP_SIZE}"
# Upload to S3 with server-side encryption
aws s3 cp "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump" \
"$S3_BUCKET/${DB_NAME}_${TIMESTAMP}.dump" \
--sse AES256
# Upload to secondary region
aws s3 cp "$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.dump" \
"s3://ecosire-backups-dr/database/${DB_NAME}_${TIMESTAMP}.dump" \
--sse AES256 \
--region eu-west-1
# Clean up local backups older than retention period
find "$BACKUP_DIR" -name "*.dump" -mtime +$RETENTION_DAYS -delete
echo "Backup complete at $(date)"
Application File Backup
#!/bin/bash
# Backup application files, uploads, and configuration
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
# Backup Odoo filestore
tar czf "/opt/backups/files/filestore_${TIMESTAMP}.tar.gz" /opt/odoo/data/filestore/
# Backup uploaded documents
tar czf "/opt/backups/files/uploads_${TIMESTAMP}.tar.gz" /opt/app/uploads/
# Backup configuration (secrets excluded)
tar czf "/opt/backups/config/config_${TIMESTAMP}.tar.gz" \
--exclude='*.env*' \
--exclude='*.pem' \
/opt/app/infrastructure/
# Upload all to S3
aws s3 sync /opt/backups/ s3://ecosire-backups/ --sse AES256
Failover Architectures
Tier 1: Cold Standby (RTO: 4-24 hours)
- Backups stored in cloud storage
- Recovery involves provisioning new infrastructure and restoring from backup
- Cheapest option: only pay for storage
- Suitable for non-critical internal applications
Tier 2: Warm Standby (RTO: 1-4 hours)
- Standby server running but not receiving traffic
- Database replication keeps standby data current
- Recovery involves promoting standby and updating DNS
- Moderate cost: pay for standby server at reduced size
Primary (active) ----replication----> Standby (warm)
|
On failure: promote + DNS update
Tier 3: Hot Standby (RTO: <30 minutes)
- Active-passive or active-active configuration
- Automatic failover via health checks
- Database with synchronous replication
- Higher cost: pay for full duplicate infrastructure
Health Check
|
Load Balancer ------> Primary (active)
|
+-------------> Secondary (hot standby, auto-promote)
Tier 4: Multi-Region Active-Active (RTO: <5 minutes)
- Multiple regions serve traffic simultaneously
- Global load balancer routes by geography
- Database conflict resolution for multi-master writes
- Highest cost and complexity
Cost Comparison
| Tier | Monthly Cost (for $500/mo primary) | RTO | RPO |
|---|---|---|---|
| Cold Standby | $20 (storage only) | 4-24 hours | 6 hours |
| Warm Standby | $200 | 1-4 hours | 1 hour |
| Hot Standby | $500 | <30 minutes | <5 minutes |
| Active-Active | $1,200+ | <5 minutes | Near-zero |
Recovery Testing
Quarterly Recovery Drill
Every quarter, execute a full recovery test:
- Select a random backup from the last 30 days
- Provision recovery infrastructure (separate from production)
- Restore the database from backup
- Restore application files from backup
- Deploy application code from Git
- Run smoke tests against the restored environment
- Measure actual recovery time against RTO target
- Document findings and update the DR plan
Recovery Drill Checklist
- Database restores successfully from backup
- Application starts and serves requests
- User authentication works
- Critical business flows complete (place order, generate invoice)
- Integration endpoints respond (payment gateway, email)
- Actual recovery time meets RTO target
- Team knows their roles without referencing documentation
- Communication channels work (how do you notify stakeholders?)
Incident Response Playbook
Severity Levels
| Level | Definition | Response Time | Communication |
|---|---|---|---|
| SEV1 | Complete outage, revenue impacted | 15 minutes | All hands, customer notification |
| SEV2 | Partial outage, degraded service | 30 minutes | On-call team, stakeholder update |
| SEV3 | Minor issue, workaround available | 2 hours | On-call engineer |
| SEV4 | Non-urgent, no customer impact | Next business day | Ticket queue |
SEV1 Response Steps
- Acknowledge the incident within 15 minutes
- Assess the scope: what is affected, how many users impacted
- Communicate to stakeholders: status page update, customer notification
- Mitigate using the quickest available option (rollback, failover, scaling)
- Resolve the root cause
- Post-mortem within 48 hours: timeline, root cause, action items
Frequently Asked Questions
How much should we budget for disaster recovery?
A reasonable DR budget is 10-25% of your production infrastructure cost. For a company spending $500/month on infrastructure, budget $50-125/month for DR. This covers cloud backup storage, a warm standby server, and monitoring. The ROI calculation: if your business loses $5,000/hour of downtime and DR reduces a potential 24-hour outage to 4 hours, the DR investment saved $100,000.
Do we need DR if we use a managed cloud provider?
Yes. Cloud providers protect against hardware failure and data center outages, but they do not protect against application bugs, accidental deletion, ransomware, or account compromise. Your DR plan must cover scenarios that the cloud provider does not: corrupted data, deleted resources, security breaches, and vendor lock-in risk.
How do we handle DR for our Odoo ERP system?
Odoo DR requires three components: (1) PostgreSQL database backups (automated, encrypted, offsite), (2) filestore backups (uploaded attachments, report templates), (3) custom module code (in Git). Recovery involves: provision a server, install Odoo, restore database, restore filestore, deploy custom modules. ECOSIRE provides managed Odoo DR with automated backups and tested recovery procedures.
What is the most common DR failure?
Untested backups. Over 30% of backup restores fail due to corruption, incomplete backups, missing dependencies, or changed passwords. The second most common failure is outdated documentation --- the DR plan references servers, credentials, or procedures that no longer exist. Quarterly testing catches both issues.
What Comes Next
Disaster recovery is one pillar of operational resilience. Combine it with monitoring and alerting for early detection, zero-downtime deployments for safe changes, and security hardening for threat prevention.
Contact ECOSIRE for disaster recovery planning and implementation, or explore our DevOps guide for the full infrastructure roadmap.
Published by ECOSIRE -- helping businesses prepare for the inevitable.
Written by
ECOSIRE TeamTechnical Writing
The ECOSIRE technical writing team covers Odoo ERP, Shopify eCommerce, AI agents, Power BI analytics, GoHighLevel automation, and enterprise software best practices. Our guides help businesses make informed technology decisions.
ECOSIRE
Grow Your Business with ECOSIRE
Enterprise solutions across ERP, eCommerce, AI, analytics, and automation.
Related Articles
Supply Chain Resilience: 10 Strategies to Survive Disruptions in 2026
Build supply chain resilience with dual sourcing, safety stock models, nearshoring, digital twins, supplier diversification, and ERP-driven visibility strategies.
GitHub Actions CI/CD for Monorepo Projects
Complete GitHub Actions CI/CD guide for Turborepo monorepos: affected-only builds, parallel jobs, caching strategies, environment-based deploys, and security best practices.
Power BI Deployment Pipelines: Dev to Production Workflow
Implement Power BI deployment pipelines for governed development — promote datasets and reports through Development, Test, and Production stages with automated validation and rollback.