Business Continuity Flashcards by Daniel Egan

Define business continuity…

Seeks to minimise business activity disruption when something unexpected happens

How well did you know this?

Not at all

Perfectly

Define Disaster recovery…

The act of responding to an event that threatens business continuity

How well did you know this?

Not at all

Perfectly

Define high availability…

Designing in redundancies to reduce the chance of impacting service levels

How well did you know this?

Not at all

Perfectly

Define fault tolerance…

The ability to tolerate faults. By designing in the ability to absorb problems without impacting service levels

How well did you know this?

Not at all

Perfectly

What is a service level agreement?

An agreed goal or target for a given service on its performance or availability

How well did you know this?

Not at all

Perfectly

Define RTO…

Recovery Time Objective…

The time that it takes after a disruption to restore business processes to their service levels

How well did you know this?

Not at all

Perfectly

Define RPO…

Recovery Point Objective…

An acceptable amount of data loss measured in time

How well did you know this?

Not at all

Perfectly

What does the Business continuity plan define?

The acceptable RPO and RTO

How well did you know this?

Not at all

Perfectly

What justifies the HA investment?

The RPO and RTO

How well did you know this?

Not at all

Perfectly

What does the disaster recovery plan deliver?

The RTO and RPO

How well did you know this?

Not at all

Perfectly

Name and provide examples of the 9 categories of disasters…

1) Hardware failure- Network switch power supply fails and brings down a LAN
2) Deployment failure- Deploying a patch that breaks a key ERP business process
3) Load induced- DDoS attack
4) Data induced- Ariane rocket float conversion error
5) Credential expiration- An SSL/TLS certificate expires on your site
6) Dependency- S3 subsystem failure which causes other services to fail
7) Infrastructure- A construction crew cuts through a fibre cable
8) Identifier exhaustion- We currently don’t have sufficient capacity in the AZ you have requested
9) human error!

How well did you know this?

Not at all

Perfectly

What are the 4 disaster recovery architecture?

1) Backup and restore
2) Pilot light
3) Warm standby
4) Multi-site

How well did you know this?

Not at all

Perfectly

Name 2 pros and cons of a backup and restore DR architecture…

Pro-

1) Very common entry point into AWS
2) Minimal effort to configure

Con-

1) Least flexibility
2) Analogous to off-site back-up

How well did you know this?

Not at all

Perfectly

Name 2 pros and 3 cons of a Pilot light DR architecture…

Pro-

1) Cost effective way to maintain a “hot site”
2) Suitable for a variety of landscapes and applications

Con-

1) Usually requires manual intervention for fail over
2) Spinning up cloud environments will take mins to hours
3) Must keep AMIs up-to-date with on-prem counterparts

How well did you know this?

Not at all

Perfectly

Name 2 pros and cons of a Warm standby DR architecture…

Pro-

1) All services are up and ready to accept a failover faster within minutes or seconds
2) Can be used to used as a “shadow environment” for testing or production staging

cons-

1) Resources would need to be scaled to accept production load
2) Still requires some environment adjustments but couple be scripted

How well did you know this?

Not at all

Perfectly

Name 3 pros and 2 cons of a multi-site DR architecture…

pro-

1) Ready all the time to take full production load-effectively a mirrored data center
2) Fails over in seconds or less
3) No or little intervention required

Cons-

1) Most expensive option
2) Can be perceived as wasteful as you have resources just standing around waiting for the primary to fail

How well did you know this?

Not at all

Perfectly

Are EBS volumes replicated automatically within a single AZ or multi-AZ by default?

A single AZ by default

… This makes them vulnerable to AZ failure

How well did you know this?

Not at all

Perfectly

What is RAID0?

Aka stripping, provides the fastest read and writes but no redundancy of data stored on drives

What is RAID1?

aka mirroring, where data is mirrored across 2 drives. Can tolerate total failure of 1 drive.

What is RAID6?

High redundancy as 2 drives can fail, but write times very slow

Which RAID configuration does AWS NOT recommend? and why?

RAID5/6 as EBS volumes are accessed over the network and writing parity bits sucks up IOPS

Which RAID configuration does AWS recommend?

RAID1

Does EFS support multi-AZ?

Yes

What is critical for rapid failover in HA and BC systems?

Up-to-date-AMIs

What is the only way to GUARANTEE that a resource such as an EC2 instance will be available when you need it?

Using reserved instances

How can Route53 be used to provide a DR solution?

Be conducting health checks and re-directing traffic e.f. on-prem to AWS env

Describe the order of preference when choosing a database in terms of HA and BC...

DynamoDB > Aurora (redundant and auto recover features) > Multi-AZ RDS with frequent RDS snapshots

Is a master to a standby asynchronous or synchronous in a Multi-AZ RDS architecture?

Synchronous

Is a master to a read replica asynchronous or synchronous in a Multi-AZ RDS architecture?

Asynchronous

What happens if we lose a master RDS in a multi-AZ RDS architecture?

The standby is promoted to the master

What happens if we an entire region in an RDS multi-AZ RDS architecture?

The read replica is promoted to the master and another RDS is spun up to be the read replica and stand by. This is manual but can be scripted using a cloud watch alarm.

Does RedShift support multi-AZ deployment?

What is the best HA option for RedShift?

The best option is to use a multi-node cluster that supports data replication and node recovery

What is your only option to restore if a single node RedShift cluster fails?

You have to restore from S3. RedShift does not support replication.

Does memchaced support replication?

No a node failure will result in data loss

How can you minimise data lost in Memcached?

You can use multiple nodes in each shard to minimise data loss on a AZ failure

How would you architect HA in redis?

Use multiple nodes in each shard and distribute these nodes across multiple AZs. Can also enable muli-AZ replication to permit automatic failover in the primary nodes fails.

How do you ensure HA in your VPC network when using VPN?

Create at least 2 VPN tunnels into your virtual private gateway

What is FMEA?

Failure mode and effects analysis A systematic process to examine- What could go wrong, What impact it might have, What is the likelihood of it occurring, and what is our ability to detect and react.

What are the 3 steps in a FMEA?

1) Collect all possible failures 2) Assign scores (risk priority number- high == worse) 3) Prioritise based on risk score RPN- Highest first

What is the relationship between RPO and BC?

The recovery point objective will define the potential for data loss during a disaster. This can inform an expectation of manual data re-entry for BC planners

Which RAID option provides the highest write performance?

RAID0

What is Aurora Global database?

A service that allows you to failover to a secondary cluster in a different region. It means your database will survive even in the unlikely event of a regional degradation or outage