Really Working with SQL: Database Backup Generation (English)

Database Backup Generation (English)

Introduction:

Data protection is a cornerstone of any technology infrastructure. Data loss, whether due to hardware failures, human errors, malicious attacks, or natural disasters, can have devastating consequences for an organization. A robust and well-implemented backup strategy is essential to ensure business continuity, disaster recovery, and regulatory compliance.

This document provides a detailed guide for generating backups of the most relevant database types today, in both on-premise and cloud environments. We will cover both relational (SQL) and non-relational (NoSQL) databases, highlighting the specificities of each and best practices to ensure data integrity and availability.

1. General Concepts and Best Practices for Database Backups:

Before addressing specific databases, it is crucial to understand some fundamental concepts and best practices applicable to any backup strategy:

Types of Backups:

Full Backup: Copies all data in the database at a given time. It is the foundation of any backup strategy and allows for a complete restore.
Differential Backup: Copies only the data that has changed since the last full backup. It requires a previous full backup for restoration.
Incremental Backup: Copies only the data that has changed since the last backup (either full or incremental). It requires the restoration of the last full backup and all subsequent incremental backups.
Logical Backup: Exports the database structure (schema, tables, indexes, etc.) and the data in a readable format (e.g., SQL files, JSON, CSV). Useful for migrations, audits, and object-level recovery.
Physical Backup: Copies the underlying data files that make up the database. Generally faster and more efficient for full restores.

Backup Frequency and Scheduling: The frequency of backups should be based on the criticality of the data and the rate of change. Databases with high activity will require more frequent backups. Automated backup schedules should be established to avoid reliance on manual processes.
Backup Retention Policies: Define how long backups should be kept. This may vary depending on regulatory requirements, business needs, and storage limitations. Hierarchical retention policies should be implemented (e.g., daily backups for a week, weekly backups for a month, monthly backups for a year).
Backup Testing and Validation: It is essential to periodically test backups to ensure their integrity and the ability to restore the database when needed. Tests should simulate different data loss scenarios.
Backup Security: Backups must be protected against unauthorized access, modification, and deletion. This includes encrypting backups, controlling access to backup storage, and implementing physical and logical security measures.
Disaster Recovery (DR) Considerations: The backup strategy should be integrated with the organization's disaster recovery plan. This may involve storing backups in a secondary location (off-site) or using replication and failover solutions.

2. Relational Database (SQL) Backups:

Relational databases store data in tables with defined relationships. Some of the most popular Relational Database Management Systems (RDBMS) include MySQL, PostgreSQL, Microsoft SQL Server, and Oracle Database.

2.1. MySQL:

On-Premise:

Logical Backup: Use the mysqldump utility to export the database to an SQL file.
Bash
mysqldump -u [user] -p[password] --databases [db_name1] [db_name2] ... > backup.sql
For a full backup of all databases:
Bash
mysqldump -u [user] -p[password] --all-databases --routines --events > full_backup.sql
Physical Backup:

File Copy: If MySQL is configured with the InnoDB storage engine and the innodb_file_per_table option enabled, the .ibd (data) and .frm (structure) files for each table can be copied. However, this requires stopping the MySQL server or using tools like xtrabackup.
Xtrabackup: An open-source tool from Percona that allows for hot (non-blocking) physical backups of MySQL InnoDB databases.

Incremental Backups: xtrabackup also supports incremental backups based on the Log Sequence Number (LSN).

Cloud (AWS RDS, Azure Database for MySQL, Google Cloud SQL for MySQL):

Cloud service providers offer managed backup mechanisms.
AWS RDS: Uses EBS (Elastic Block Store) snapshots for physical backups. Automated backups can be configured with a specified time window and retention period. Manual snapshots can also be created.
Azure Database for MySQL: Offers automated server backups (daily full backups, differential backups every hour or every five minutes). The retention period is configurable.
Google Cloud SQL for MySQL: Performs automated daily backups. Backup time and frequency can be configured. On-demand backups can also be created.
In cloud environments, logical backups can also be performed using standard MySQL tools from an instance connected to the cloud database.

2.2. PostgreSQL:

On-Premise:

Logical Backup: Use the pg_dump utility to export the database to an SQL file.
Bash
pg_dump -U [user] -d [db_name] -f backup.sql
For a full backup of all databases:
Bash
pg_dumpall -U [user] -f full_backup.sql
Physical Backup:

File Copy: A copy of the data files from the PostgreSQL server's PGDATA directory can be made. This requires stopping the server to ensure consistency.
Hot Backups with pg_basebackup: A tool to perform hot (online) physical backups of a running PostgreSQL cluster.
Bash
pg_basebackup -h [host] -U [user] -D [backup_directory] -P -v

Incremental Backups: PostgreSQL supports Write-Ahead Logging (WAL), which can be archived for Point-in-Time Recovery (PITR). This allows restoring the database to a specific point in time.

Cloud (AWS RDS, Azure Database for PostgreSQL, Google Cloud SQL for PostgreSQL):

Similar to MySQL, cloud providers offer managed backup services.
AWS RDS: Uses EBS snapshots for physical backups. Automated backups and manual snapshots can be configured. PITR is also supported through the retention of transaction logs.
Azure Database for PostgreSQL: Offers automated server backups (weekly full backups, daily differential backups). PITR is supported with a configurable retention period.
Google Cloud SQL for PostgreSQL: Performs automated daily backups and supports PITR through the retention of transaction logs. On-demand backups can also be created.
Standard PostgreSQL tools can be used to perform logical backups from a connected client.

2.3. Microsoft SQL Server:

On-Premise:

Full Backup: Use SQL Server Management Studio (SSMS) or T-SQL commands:
SQL
BACKUP DATABASE [DatabaseName]
TO DISK = 'C:\Backup\DatabaseName.bak'
WITH FORMAT;
Differential Backup:
SQL
BACKUP DATABASE [DatabaseName]
TO DISK = 'C:\Backup\DatabaseName_diff.bak'
WITH DIFFERENTIAL;
Transaction Log Backup: Essential for Point-in-Time Recovery (PITR) in the Full or Bulk-Logged recovery model.
SQL
BACKUP LOG [DatabaseName]
TO DISK = 'C:\Backup\DatabaseName_log.trn';
Backup Agents: SQL Server Agent allows scheduling automated backups.

Cloud (Azure SQL Database, AWS RDS for SQL Server, Google Cloud SQL for SQL Server):

Azure SQL Database: Performs automated backups (weekly full backups, daily differential backups, transaction log backups every 5-10 minutes). Retention is configurable. On-demand backups can also be performed.
AWS RDS for SQL Server: Offers automated backups (storage volume snapshots) and manual backups (database instance snapshots). PITR is also supported through the retention of transaction logs.
Google Cloud SQL for SQL Server: Performs automated daily backups. Time and frequency can be configured. On-demand backups are also available, and PITR is supported.
Standard SQL Server tools (SSMS, T-SQL commands) can be used to interact with cloud databases and perform backups.

2.4. Oracle Database:

On-Premise:

Recovery Manager (RMAN): Oracle's recommended tool for performing backups and recoveries. RMAN allows for full, incremental, hot backups, and PITR.
Fragmento de código
RMAN> CONNECT TARGET /
RMAN> BACKUP AS BACKUPSET DATABASE PLUS ARCHIVELOG;
Cold Backups (Offline): Requires shutting down the database. Data files, control files, and redo log files are copied.
Hot Backups (Online): The database remains operational. Requires placing tablespaces in backup mode and archiving redo logs. RMAN simplifies this process.

Cloud (AWS RDS for Oracle, Azure Database for Oracle, Google Cloud SQL for Oracle):

AWS RDS for Oracle: Offers automated backups (storage volume snapshots) and manual backups (database instance snapshots). PITR is supported through the retention of archived log files. RMAN can be used for more advanced backups.
Azure Database for Oracle: Offers automated backups and allows for on-demand backups. PITR is supported with a configurable retention period.
Google Cloud SQL for Oracle: Performs automated daily backups and allows for on-demand backups. PITR is supported through the retention of archived log files. RMAN can also be used.

3. Non-Relational Database (NoSQL) Backups:

NoSQL databases have different data models and architectures compared to relational databases. This influences backup strategies.

3.1. MongoDB:

On-Premise:

Logical Backup: Use the mongodump utility to export data to BSON or JSON format.
Bash
mongodump --host [host] --port [port] --username [user] --password [password] --db [db_name] --out /backup/path
For a full backup of all databases:
Bash
mongodump --host [host] --port [port] --username [user] --password [password] --all --out /backup/path
Physical Backup:

File Copy: If MongoDB is configured with the WiredTiger storage engine, the data files in the dbPath directory can be copied. This requires stopping the server or using the db.fsyncLock() command to ensure consistency.
Hot Backups with mongorestore and Oplog: For hot backups, mongodump can be used for an initial copy, and then the operations from the oplog (operation log) can be applied to bring the copy up to date.
MongoDB Atlas (managed cloud service): Performs continuous backups and allows for point-in-time recovery.

Cloud (AWS DocumentDB, Azure Cosmos DB API for MongoDB, Google Cloud Firestore in Datastore mode):

AWS DocumentDB: Performs automated backups and allows for manual snapshots. PITR is supported with a configurable retention period.
Azure Cosmos DB API for MongoDB: Offers automated and on-demand backups. Restoration can be done to a specific point in time within the retention period.
Google Cloud Firestore in Datastore mode: Performs automated daily backups and allows for on-demand backups.

3.2. Cassandra:

On-Premise:

Snapshots: Cassandra provides a snapshot functionality that creates a consistent copy of the data files (SSTables) on disk. Snapshots are fast and do not impact performance.
Bash
nodetool snapshot -t [snapshot_name] [keyspace.table]
Full Backups: For a full backup, the data directories from each node in the cluster should be copied.
Third-Party Tools: Tools like Medusa can facilitate Cassandra backup management.

Cloud (AWS Keyspaces for Cassandra, Azure Managed Instance for Apache Cassandra, Google Cloud Memorystore for Redis - although Redis is not Cassandra):

AWS Keyspaces: Performs automated backups and allows for on-demand backups.
Azure Managed Instance for Apache Cassandra: Offers automated backups and allows for on-demand backups.
Google Cloud Memorystore for Redis: Not Cassandra. For Cassandra on GCP, the self-managed version on Compute Engine can be used with on-premise strategies, or partner solutions can be utilized.

3.3. Redis:

On-Premise:

RDB (Redis Database File): Redis can save snapshots of the database to disk in RDB format. This is configured in the redis.conf file or through commands.
Bash
redis-cli SAVE
Rules can be configured to automatically save the database after a certain number of changes within a specific time period.
AOF (Append Only File): Redis can also log every write operation to an AOF file. This provides higher durability than RDB. It can be configured to write to disk on every command, every second, or never.
Manual Backup: Copy the dump.rdb (for RDB) and appendonly.aof (for AOF) files.

Cloud (AWS ElastiCache for Redis, Azure Cache for Redis, Google Cloud Memorystore for Redis):

AWS ElastiCache for Redis: Allows creating snapshots and configuring automated backups.
Azure Cache for Redis: Offers automated backups and allows for on-demand backups.
Google Cloud Memorystore for Redis: Performs automated daily backups and allows for on-demand backups.

3.4. Elasticsearch:

On-Premise:

Snapshots: Elasticsearch provides an API to create snapshots of indices. First, a snapshot repository must be registered.
JSON
PUT _snapshot/my_backup_repository
{
"type": "fs",
"settings": {
"location": "/mnt/data/elasticsearch_backups"
}
}
Then, a snapshot can be created:
JSON
PUT _snapshot/my_backup_repository/snapshot_20230406
{
"indices": "my-index-*"
}
Third-Party Tools: Tools like Curator help manage snapshots and retention policies.

Cloud (AWS OpenSearch Service, Azure Cognitive Search, Google Cloud Elasticsearch):

AWS OpenSearch Service: Allows creating manual and automated snapshots.
Azure Cognitive Search: Offers backup and restore mechanisms.
Google Cloud Elasticsearch: Supports snapshots of indices. A Google Cloud Storage bucket must be configured as the snapshot repository.

Conclusion:

Implementing an effective backup strategy is fundamental for data protection in any environment. This document has provided an overview of key considerations and specific techniques for backing up the most relevant relational and non-relational databases, both on-premise and in the cloud.

It is important to remember that the optimal backup strategy will depend on the specific requirements of each organization, including data criticality, tolerance for data loss (RPO - Recovery Point Objective), acceptable downtime (RTO - Recovery Time Objective), and budgetary constraints.

It is strongly recommended to periodically test and validate backups, as well as to review and update the backup strategy as business needs and database technologies evolve. Investing in a solid backup strategy is an investment in business resilience and continuity.

Really Working with SQL

domingo, 6 de abril de 2025

Database Backup Generation (English)

No hay comentarios:

Publicar un comentario

Archivo del blog