Lesson 18: Explaining Disaster Recovery and High Availability Concepts Flashcards

1
Q

Define availability

A

The percentage of time that the system is online, measured over a certain period, typically one year.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe high availability and its goal

A

Metric that defines how closely systems approach the goal of providing data availability 100 percent of the time while maintaining a high level of system performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define Maximum Tolerable Downtime (MTD)

A

Longest period that a process can be inoperable without causing irrevocable business failure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is downtime calculated?

A

Calculated from the sum of scheduled service intervals (Agreed Service Time) plus unplanned outages over the period

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

For critical systems, what the the suggested availability?

A

99% (two nines) to 99.9999 (six nines)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define Recovery time objective (RTO)

A

Maximum time allowed to restore a system after a failure event; maximum amount of time allowed to identify that there is a problem and then perform recovery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define Work Recovery Time (WRT)

A

Time spent performing reintegration and testing of a restored or upgraded system following an event.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What two factors are considered in Maximum tolerable downtime (MTD)?

A
  1. RTO - Recovery time objective
  2. WRT - Work recovery time (WRT)
    Combined they must not exceed MTD
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define Recovery Point Objective (RPO)

A

Longest period that an organization can tolerate lost data being unrecoverable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define a fault

A

An event that causes a service/asset to become unavailable; servers, disk arrays, switches, routers, etc. can have faults

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a KPI?

A

Key performance indicator - used to determine the reliability of each asset and assess whether goals for MTD, RTO, and RPO can be met.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define Mean Time Between Failures (MTBF)

A

Metric for a device or component that predicts the expected time between failures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is Mean Time Between Failures (MTBF) calculated?

A

Total operational time divided by the number of failures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define Mean Time to Failure (MTTF)

A

Metric indicating average time a non-repairable component is expected to be in operation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What non-repairable components would be measure with mean time to failure (MTTF)?

A

HDDs, SSDs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is Mean Time to Failure (MTTF) calculated?

A

Total operational time divided by the number of devices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

When is Mean Time to Failure (MTTF) used in comparison to Mean Time Between Failures (MTBF)?

A

A hard drive may be described with an MTTF, while a server, which could be repaired by replacing the hard drive, would be described with an MTBF.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Define Mean Time to Repair (MTTR)

A

Metric representing average time taken for a device or component to be repaired, replaced, or recover from a failure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How is Mean Time to Repair (MTTR) calculated?

A

Total number hours of unplanned maintenance divided by the number of failure incidents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How is Mean Time to Repair (MTTR) used in a recovery effort?

A

Used to estimate whether a recovery time objective (RTO) is achievable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Define fault tolerance

A

A system that can experience failures in individual components and sub-systems and continue to provide the same (or nearly the same) level of service.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How is fault tolerance achieved?

A

By provisioning redundancy for critical components to eliminate single points of failure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Define a recovery/spare site

A

Another location that can provide the same (or similar) level of service. A disaster or systems failure at one site will cause services to failover to the alternate processing site.

24
Q

What are the three types of recovery/spare sites?

A
  1. Hot
  2. Warm
  3. Cold
25
Q

Define a hot site

A

Fully configured alternate processing site that can be brought online either instantly or very quickly.

26
Q

Define a warm site

A

Alternate processing location that is dormant or performs noncritical functions under normal conditions, but which can be rapidly converted to a key operations site.

27
Q

Define a cold site

A

Predetermined alternate location where a network can be rebuilt after a disaster.

28
Q

Define a power distribution unit (PDU)

A

Provides filtered output voltage to “clean” the power signal, provides protection against spikes, surges, and brownouts.

29
Q

Define an uninterruptible power supply (UPS)

A

Battery-powered device that swill provide a temporary power source in the event of a blackout or power failure.

30
Q

Which metric is used to determine frequency of data backups?

A

Recovery Point Objective (RPO) is the maximum amount of data loss permitted, measured in units of time

31
Q

Define multipathing and its purpose

A

A network node has more than one physical link to another node.

32
Q

Define SAN (storage area network) multipathing

A

A node having multiple physical links to the SAN

33
Q

How is multipathing performed with ISPs?

A

Contracting with multiple ISPs and using routing policies to forward traffic over multiple external circuits provides fault tolerance.

34
Q

What needs to be confirmed when contracting with multiple ISPs?

A

Need to ensure that the ISPs are operating separate infrastructure and not using peering arrangements.

35
Q

Define the concept of diverse paths

A

Provisioning links over separate cable conduits that are physically distant from one another.

36
Q

Define link aggregation/bonding

A

Combining two or more separate cabled links between a host and a switch into a single logical channel.

37
Q

How is link aggregation/bonding defined at the host level?

A

NIC teaming

38
Q

How is link aggregation/bonding defined at the switch level?

A

Port aggregation

39
Q

Besides increased bandwidth, what else does link aggregation/bonding offer?

A

Redundancy; if one link is broken, the connection is still maintained by the other.

40
Q

What ethernet standard does link aggregation/bonding belong to?

A

802.3ad/802.1ax

41
Q

What are bonded interfaces known as?

A

Link Aggregation Group (LAG)

42
Q

What protocol is used to implement link aggregation/bonding?

A

802.11ad Link Aggregation Control Protocol (LACP); can be used to detect configuration errors and recover from the failure of one of the physical links.

43
Q

Define a load balancer and its function

A

Type of switch, router, or software that distributes client requests between different resources, such as communications links or a pool of servers.

44
Q

What are the two main types of load balancers?

A
  1. Layer 4 switch
  2. Layer 7 switch (content switch)
45
Q

How does a layer 4 load balancer differ from a layer 7 load balancer?

A

Layer 4 load balancer makes decisions at the transport layer (Layer 4) while layer 7 load balancer makes decisions at the application layer

46
Q

Define the concept of clustering

A

Load balancing technique where a group of servers are configured as a unit and work together to provide network services.

47
Q

What term is used for the public IP address of a load balancer cluster/pair

A

Virtual IP/shared/floating address

48
Q

How does a load balanced cluster use the virtual IP?

A

The nodes in the cluster share a private connection with their internal IPs, a redundancy protocol allows the active node in the cluster to own the virtual IP and respond to connections.

49
Q

Define an active-passive cluster

A

Only one node is active at a time and the others are passive waiting to be active

50
Q

Define an active-active cluster

A

All nodes are processing connections concurrently.

51
Q

What is the purpose of a first hop redundancy protocol (FHRP)?

A

Designed to provide fault tolerance/redundancy to the default gateway of a subnet by provisioning failover routers in a cluster

52
Q

What are the two types of first hop redundancy protocols (FHRPs)?

A
  1. Hot Standby Router Protocol (HSRP)
  2. Virtual Router Redundancy Protocol (VRRP)
53
Q

How is the Hot Standby Router Protocol (HSRP) configured?

A

Each router interface connected is to the same subnet with its own unique MAC address and IP address, they also need to be configured to share a common virtual IP address and a common MAC address.

54
Q

How does a router cluster using Hot Standby Router Protocol (HSRP) communicate?

A

Using IP multicasts

55
Q

How is the active router chosen when using Hot Standby Router Protocol (HSRP)?

A

Based on priorities configured by an administrator; Of the remaining routers in the standby group, the router with the next highest priority is chosen as the standby router.

56
Q

What is the difference in functionality between Virtual Router Redundancy Protocol (VRRP) and Hot Standby Router Protocol (HSRP)?

A
  1. In VRRP, the active router is known as the master, and all other routers in the group are known as backup routers
  2. VRPP routers can be configured to only use the virtual IP address