Disaster Recovery + Migrations Flashcards
What is RPO?
Recovery Point Objective:
It’s basically how often you run backups. Or the time between your latest backup and the time of a disaster.
When a disaster happens, the time between the RPO and the disaster is the time in which data is lost.
For example if you back up data every hour. Your RPO is of 1 hour. When disaster strikes, you can go back to an hour ago to recover your data. So the data you lose is between the time of the disaster and the latest backup you have.
Which value identifies how much data loss you are willing to accept in case a disaster happens?
The RPO: Recovery Point Objective
What happens between the RPO time and the time a disaster strikes?
The data you processed is lost.
If you back up your data once a week. What is your RPO?
RTO = 1 week.
What is RTO?
RTO is the amount of downtime an application has or can have.
RTO is the downtime between the time of a disaster and the time your are back in production. (Meaning a replica was activated or a backup was restored and put into production, etc).
What are the disaster recovery strategies?
Backup and Restore
Pilot Light
Warm Standby
Hot Site / Multi Site approach
What are warm and cold disaster recovery setups?
Colder have slower RPO and RTO, warmer have faster RPO and RTO.
For example backup and restore is cold, since it has low rpo and rto compared to site recovery strategies or replication strategies.
What are some backup and restore strategies in AWS?
Backup Examples:
Backup data from corporate DC into S3 and through storage gateway, and move it to glacier with lifecycle policies.
This could have an RPO of 1 day for example.
Or once a week you send a snowball device with tons of data from your dc to an s3 glacier bucket. Here your RPO will be of 1 week.
Also when using services in aws like EBS volumes, RDS, Redshift, you can schedule regular snapshots, you could have an RPO of 1 day, or 2 hours, or 1 hour, based on how frequently you run these snapshots.
These are all backup strategies, and have a higher RPO.
Restore Examples:
Use AMIs recreate EC2 instances and spin up your applications, or restore your RDS, etc, straight from your snapshot.
Restoring your data from backups takes a lot of times, so you get a high RTO as well.
RTO and RPO are high, but backup and restore is cheaper.
What is Pilot Light strategy?
It’s a disaster recovery strategy, in which a smaller version of your production systems (apps, databases, servers, configurations) is always up and running in the cloud. These are the “critical core” components of your systems. You only include what is critical for your business to operate, so that in case of a disaster it’s ready to run and to be scaled into production quickly.
How do you achieve having a version of your critical core running in the cloud? With continuous replication of those critical servers. For example a database.
Then in case of a disaster you can restore from backup the not so critical servers.
This will lower your RPO and RTO.
This could be from onpremises to the cloud. Or from a region in the cloud to another region.
What do you need to do in case of a disaster when using pilot light as a disaster recovery strategy?
Similar to backup and restore, but your critical systems will be already running somewhere else, for example the cloud, so you only need to add the restored not so critical systems.
What is Warm Standby?
All your servers are ready to go in the cloud, but in a minimal size.
Then upon a disaster, you can scale them in the moment to production load.
This could be from onpremises to the cloud. Or from a region in the cloud to another region.
Scaling can be triggered with alarms and ASG in case of EC2, or RDS scaling.
Lower RTO because all backup resources are already running and only need to be scaled so they can meet the necessary resources for production.
More expensive than pilot light because you have more extra resources up on standby.
What role does Route 53 take in disaster recovery situations?
Route 53 can do the failover of your infrastructure when a disaster occurs in your onpremises DC, or in an AWS region. Destination would be another aws region.
Route 53 can reroute unhealthy resources to backup resources, thus performing failovers.
What is the Multi Site / Hot Site approach?
It’s a very low RTO (Minutes or seconds).
You have full production scale running both onpremises and on aws cloud. (Or only on cloud using 2 AWS regions)
This would be an active active setup, with route 53 routing traffic to both sites.
The most expensive option. Lowest RTO and RPO.
Multi DC type of infrastructure.
What are great backup options when backing up data from onpremises to the cloud?
Snowball
Storage Gateway
Which service helps you migrate DNS from a region to another, or from onpremises to aws?
Route 53
What is Database Migration Service? What are it’s characteristics?
DMS:
Quickly and securely migrate DBs from anywhere to the aws cloud.
It performs live migrations. DB remains available during the mgiration.
You can enable continuous data replication.
How does it work?
You need to create an EC2 instance that will run DMS, and it will perform the migration tasks.
There is also a serverless option that doesnt use EC2.
DMS can run in multi AZ.
Uses schema conversion tool for different database engines.
What is SCT: Schema Conversion Tool?
DMS runs it in an EC2 instance, or serverless, when your migration source and destination databases schemas have different engines, meaning different types of databases, for example migration from Oracle to MySQL, etc. (You dont need to run SCT with DMS if the source and destination databases have the same engine)
How can you migrate an RDS database to Aurora MySQL?
Option 1:
Take a snapshot of your RDS MySQL DB, and restore it into an Aurora MySQL DB. (You have some downtime while you switch the active database from RDS to Aurora).
Option 2:
Create a read replica in aurora for your RDS MySQL. This is possible.
Once replica lag is 0, promote it as its own DB Cluster. (No downtime, but takes longer to replicate the read replica and costs more money).
Option 3 From Onpremises:
Can migrate to s3 with percona backup, then create an aurora database from this backup.
Option 4 From Onpremises:
Create an Aurora MySQL DB. Then use mysqldump utility to migrate MySQL to Aurora. This is slower than s3 method.
Option 5: If both databases are up and running you should use DMS.
(The same goes for PostgreSQL, but for PostgreSQL any DBbackup utility works, contrary to MySQL that only supports Percona Backup)
How do you export an EC2 instance to onpremises?
With VM Import /Export. VM Import /Export can be used through the CLI.
How do you run Amazon Linux instances onpremises?
You can download the AMI as an iso file and upload it to onpremises platforms like vmware, kvm, virtualbox, hyper-v, etc.
What is AWS Server migration service?
To perform incremental replication of your onpremises servers.
How do you migrate existing onpremises applications to EC2?
With VM Import / Export.
Also great for disaster recovery.
What is Application Discovery Service?
To gather information about your current onpremises infrastructure, and plan a migration to aws.
What is AWS Backup?
An AWS managed service. It allows you to cantrally manage and automate backup of your AWS Services.
Supports: EC2, EBS, S3, RDS, Aurora, DynamoDB, DocumentDB, Neptune, FSx, EFS, Storage Gateway (Volume Gateway), etc.
Supports Cross Region Backups, and Cross Account Backups.
Features:
Point in time recovery for DB services, on demand and scheduled backups, tag based backup policies
Backup plans with whatever frequency, backup window, transition backups to cold storage, retention periods.
Data is backed up to S3.
With AWS Backup Vault Lock, you can enforce a WORM policy, meaning you can’t delete your backups from the s3 bucket.