Every team agrees backups are important. Most teams don't actually test that their backups restore. The first time you find out your backup is corrupted is at 03:00 the night a junior engineer ran DELETE FROM users in prod.
This guide covers what to back up, how often, where to store the backups, and — most importantly — how to verify they restore. It's pragmatic, not exhaustive; for a hyperscaler-grade setup, you'll need additional tooling beyond what's covered here.
RPO and RTO: the two numbers that matter
Two numbers shape every backup decision:
| Acronym | Stands for | Question it answers |
|---|---|---|
| RPO | Recovery Point Objective | "How much data am I willing to lose?" |
| RTO | Recovery Time Objective | "How long can the database stay down?" |
Different applications have different tolerances:
| Application type | Realistic RPO | Realistic RTO |
|---|---|---|
| Marketing site CMS | 24h | A few hours |
| SaaS B2B app | 1h | < 30 min |
| Payment / financial | < 1 min | < 5 min |
| Side project | 7 days | "whenever I'm free this weekend" |
Pick the row that matches your reality. The rest of the guide is about how to actually achieve those numbers.
Backup strategies
| Strategy | Frequency | RPO floor | Storage cost | Implementation |
|---|---|---|---|---|
Logical dump (pg_dump, mysqldump) | Hourly / daily | One backup interval | Cheap | Cron job |
| Filesystem snapshot | Daily | One snapshot interval | Medium | Provider feature (EBS, etc.) |
| WAL / binlog archiving (PITR) | Continuous | Seconds | Medium-high | Built into Postgres / MySQL |
| Streaming replica + delayed replica | Continuous + delayed | Seconds — or "user error window" with delay | High (extra DB) | Postgres streaming replication |
Most production setups end up with PITR + a daily logical dump. The logical dump is your "I can read this 5 years from now" archive; PITR is your "restore to 03:14:22 on Tuesday" weapon.
Where Launchverse fits
Launchverse marketplace databases (Postgres, MySQL, MariaDB) ship with:
- Daily automated backups taken to encrypted object storage with at least 7 days of retention on every paid tier.
- One-click restore to a fresh database from any retained backup.
- Backup-before-destructive-action prompts when you're about to drop or rename a database.
For PITR specifically (recover to an arbitrary second), production-tier customers can enable continuous WAL archiving — see the database's settings page.
What to back up
A "backup" is more than just the database tables. A complete backup of a small SaaS app includes:
| What | Why | Cadence |
|---|---|---|
| Database dump | The application data itself | Hourly / daily depending on RPO |
| Object storage (S3 / R2 / B2 user uploads) | Files referenced from DB rows | Provider versioning + lifecycle |
| Database schema migrations (in git) | So you can rebuild the schema | On every commit |
| Environment variables | Application config | At secret rotation; backed up in your secret manager |
| TLS certificates (if not auto-managed) | HTTPS continuity | When rotated |
Launchverse manages TLS automatically (Let's Encrypt) and stores environment variables encrypted in its own database. The user-managed parts are the data and the migrations.
Restore drills (the part everyone skips)
A backup you've never restored is not a backup. It's a hope.
A reasonable cadence: once a quarter, restore the most recent production backup to a brand new database in your staging environment. Verify:
- The dump completes without errors.
- The schema matches what your application expects (run migrations against it).
- A few representative queries return sensible data.
- Row counts match within tolerance (some delta is normal due to time delta of the dump).
Document what you find. The first time you do this you'll discover something — a schema mismatch, a missing column, a backup that's silently truncated.
A simple drill script (Postgres)
#!/usr/bin/env bash
set -euo pipefail
BACKUP_FILE="$1" # path to .dump or .sql.gz
# 1. Spin up a fresh Postgres on a sandbox server
TEST_DB="restore_drill_$(date +%s)"
createdb -h drill-host -U drill-user "$TEST_DB"
# 2. Restore
if [[ "$BACKUP_FILE" == *.gz ]]; then
gunzip -c "$BACKUP_FILE" | psql -h drill-host -U drill-user -d "$TEST_DB"
else
pg_restore --no-owner --no-acl --dbname "$TEST_DB" -h drill-host -U drill-user "$BACKUP_FILE"
fi
# 3. Run a sanity check query
ROW_COUNT=$(psql -h drill-host -U drill-user -d "$TEST_DB" -tAc "SELECT count(*) FROM users")
echo "Restored DB has $ROW_COUNT users"
# 4. Drop the test DB
dropdb -h drill-host -U drill-user "$TEST_DB"
Wire it into a CI job that runs weekly. If the restore ever fails, your backup is broken — find out now, not when you actually need it.
Common failure modes
- Backups stored on the same server as the database. If the disk fails, both are gone. Always store backups off-host (object storage).
- Backups encrypted with a key only one person knows. When that person leaves, your backups are inaccessible. Use a KMS, share the recovery procedure.
- Backups that succeed silently but contain no data. A
pg_dumpagainst the wrong database succeeds and produces an empty dump. Validate row counts post-backup. - PITR window too small. A 24h window is fine for "intentional" mistakes; ransomware-style incidents are typically discovered after days. Plan for a longer retention window.
- Logical backups taking longer than the backup interval. A 4-hour
pg_dumprunning every 1 hour means you've never finished a backup. Use streaming replication or reduce backup frequency.
A pragmatic baseline
For a small SaaS B2B app on Launchverse, a sane baseline:
- Daily automated dump to encrypted object storage with 30-day retention. (Provided.)
- WAL archiving enabled for continuous PITR on production tier.
- Quarterly restore drill in a staging environment.
- Monthly drift check — compare a restored backup's schema against current production.
- Documented runbook for "production database is broken; restore to N minutes ago" — written once, kept in your runbooks repo.
Adopt the baseline, run a real drill, and you're ahead of 80% of teams.