Business Continuity Flashcards

1
Q

Define business continuity…

A

Seeks to minimise business activity disruption when something unexpected happens

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define Disaster recovery…

A

The act of responding to an event that threatens business continuity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define high availability…

A

Designing in redundancies to reduce the chance of impacting service levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define fault tolerance…

A

The ability to tolerate faults. By designing in the ability to absorb problems without impacting service levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a service level agreement?

A

An agreed goal or target for a given service on its performance or availability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define RTO…

A

Recovery Time Objective…

The time that it takes after a disruption to restore business processes to their service levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define RPO…

A

Recovery Point Objective…

An acceptable amount of data loss measured in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the Business continuity plan define?

A

The acceptable RPO and RTO

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What justifies the HA investment?

A

The RPO and RTO

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the disaster recovery plan deliver?

A

The RTO and RPO

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Name and provide examples of the 9 categories of disasters…

A

1) Hardware failure- Network switch power supply fails and brings down a LAN
2) Deployment failure- Deploying a patch that breaks a key ERP business process
3) Load induced- DDoS attack
4) Data induced- Ariane rocket float conversion error
5) Credential expiration- An SSL/TLS certificate expires on your site
6) Dependency- S3 subsystem failure which causes other services to fail
7) Infrastructure- A construction crew cuts through a fibre cable
8) Identifier exhaustion- We currently don’t have sufficient capacity in the AZ you have requested
9) human error!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the 4 disaster recovery architecture?

A

1) Backup and restore
2) Pilot light
3) Warm standby
4) Multi-site

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Name 2 pros and cons of a backup and restore DR architecture…

A

Pro-

1) Very common entry point into AWS
2) Minimal effort to configure

Con-

1) Least flexibility
2) Analogous to off-site back-up

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Name 2 pros and 3 cons of a Pilot light DR architecture…

A

Pro-

1) Cost effective way to maintain a “hot site”
2) Suitable for a variety of landscapes and applications

Con-

1) Usually requires manual intervention for fail over
2) Spinning up cloud environments will take mins to hours
3) Must keep AMIs up-to-date with on-prem counterparts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Name 2 pros and cons of a Warm standby DR architecture…

A

Pro-

1) All services are up and ready to accept a failover faster within minutes or seconds
2) Can be used to used as a “shadow environment” for testing or production staging

cons-

1) Resources would need to be scaled to accept production load
2) Still requires some environment adjustments but couple be scripted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Name 3 pros and 2 cons of a multi-site DR architecture…

A

pro-

1) Ready all the time to take full production load-effectively a mirrored data center
2) Fails over in seconds or less
3) No or little intervention required

Cons-

1) Most expensive option
2) Can be perceived as wasteful as you have resources just standing around waiting for the primary to fail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Are EBS volumes replicated automatically within a single AZ or multi-AZ by default?

A

A single AZ by default

… This makes them vulnerable to AZ failure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is RAID0?

A

Aka stripping, provides the fastest read and writes but no redundancy of data stored on drives

19
Q

What is RAID1?

A

aka mirroring, where data is mirrored across 2 drives. Can tolerate total failure of 1 drive.

20
Q

What is RAID6?

A

High redundancy as 2 drives can fail, but write times very slow

21
Q

Which RAID configuration does AWS NOT recommend? and why?

A

RAID5/6 as EBS volumes are accessed over the network and writing parity bits sucks up IOPS

22
Q

Which RAID configuration does AWS recommend?

A

RAID1

23
Q

Does EFS support multi-AZ?

A

Yes

24
Q

What is critical for rapid failover in HA and BC systems?

A

Up-to-date-AMIs

25
Q

What is the only way to GUARANTEE that a resource such as an EC2 instance will be available when you need it?

A

Using reserved instances

26
Q

How can Route53 be used to provide a DR solution?

A

Be conducting health checks and re-directing traffic e.f. on-prem to AWS env

27
Q

Describe the order of preference when choosing a database in terms of HA and BC…

A

DynamoDB > Aurora (redundant and auto recover features) > Multi-AZ RDS with frequent RDS snapshots

28
Q

Is a master to a standby asynchronous or synchronous in a Multi-AZ RDS architecture?

A

Synchronous

29
Q

Is a master to a read replica asynchronous or synchronous in a Multi-AZ RDS architecture?

A

Asynchronous

30
Q

What happens if we lose a master RDS in a multi-AZ RDS architecture?

A

The standby is promoted to the master

31
Q

What happens if we an entire region in an RDS multi-AZ RDS architecture?

A

The read replica is promoted to the master and another RDS is spun up to be the read replica and stand by. This is manual but can be scripted using a cloud watch alarm.

32
Q

Does RedShift support multi-AZ deployment?

A

No

33
Q

What is the best HA option for RedShift?

A

The best option is to use a multi-node cluster that supports data replication and node recovery

34
Q

What is your only option to restore if a single node RedShift cluster fails?

A

You have to restore from S3.

RedShift does not support replication.

35
Q

Does memchaced support replication?

A

No a node failure will result in data loss

36
Q

How can you minimise data lost in Memcached?

A

You can use multiple nodes in each shard to minimise data loss on a AZ failure

37
Q

How would you architect HA in redis?

A

Use multiple nodes in each shard and distribute these nodes across multiple AZs. Can also enable muli-AZ replication to permit automatic failover in the primary nodes fails.

38
Q

How do you ensure HA in your VPC network when using VPN?

A

Create at least 2 VPN tunnels into your virtual private gateway

39
Q

What is FMEA?

A

Failure mode and effects analysis

A systematic process to examine-

What could go wrong, What impact it might have, What is the likelihood of it occurring, and what is our ability to detect and react.

40
Q

What are the 3 steps in a FMEA?

A

1) Collect all possible failures
2) Assign scores (risk priority number- high == worse)
3) Prioritise based on risk score RPN- Highest first

41
Q

What is the relationship between RPO and BC?

A

The recovery point objective will define the potential for data loss during a disaster. This can inform an expectation of manual data re-entry for BC planners

42
Q

Which RAID option provides the highest write performance?

A

RAID0

43
Q

What is Aurora Global database?

A

A service that allows you to failover to a secondary cluster in a different region. It means your database will survive even in the unlikely event of a regional degradation or outage