Lesson 18: Explaining Disaster Recovery and High Availability Concepts Flashcards

Question 1

Q

Define availability

Answer

A

The percentage of time that the system is online, measured over a certain period, typically one year.

Question 2

Q

Describe high availability and its goal

Answer

A

Metric that defines how closely systems approach the goal of providing data availability 100 percent of the time while maintaining a high level of system performance.

Question 3

Q

Define Maximum Tolerable Downtime (MTD)

Answer

A

Longest period that a process can be inoperable without causing irrevocable business failure.

Question 4

Q

How is downtime calculated?

Answer

A

Calculated from the sum of scheduled service intervals (Agreed Service Time) plus unplanned outages over the period

Question 5

Q

For critical systems, what the the suggested availability?

Answer

A

99% (two nines) to 99.9999 (six nines)

Question 6

Q

Define Recovery time objective (RTO)

Answer

A

Maximum time allowed to restore a system after a failure event; maximum amount of time allowed to identify that there is a problem and then perform recovery.

Question 7

Q

Define Work Recovery Time (WRT)

Answer

A

Time spent performing reintegration and testing of a restored or upgraded system following an event.

Question 8

Q

What two factors are considered in Maximum tolerable downtime (MTD)?

Answer

A

RTO - Recovery time objective
WRT - Work recovery time (WRT)
Combined they must not exceed MTD

Question 9

Q

Define Recovery Point Objective (RPO)

Answer

A

Longest period that an organization can tolerate lost data being unrecoverable.

Question 10

Q

Define a fault

Answer

A

An event that causes a service/asset to become unavailable; servers, disk arrays, switches, routers, etc. can have faults

Question 11

Q

What is a KPI?

Answer

A

Key performance indicator - used to determine the reliability of each asset and assess whether goals for MTD, RTO, and RPO can be met.

Question 12

Q

Define Mean Time Between Failures (MTBF)

Answer

A

Metric for a device or component that predicts the expected time between failures

Question 13

Q

How is Mean Time Between Failures (MTBF) calculated?

Answer

A

Total operational time divided by the number of failures

Question 14

Q

Define Mean Time to Failure (MTTF)

Answer

A

Metric indicating average time a non-repairable component is expected to be in operation

Question 15

Q

What non-repairable components would be measure with mean time to failure (MTTF)?

Answer

A

HDDs, SSDs

Question 16

Q

How is Mean Time to Failure (MTTF) calculated?

Answer

A

Total operational time divided by the number of devices.

Question 17

Q

When is Mean Time to Failure (MTTF) used in comparison to Mean Time Between Failures (MTBF)?

Answer

A

A hard drive may be described with an MTTF, while a server, which could be repaired by replacing the hard drive, would be described with an MTBF.

Question 18

Q

Define Mean Time to Repair (MTTR)

Answer

A

Metric representing average time taken for a device or component to be repaired, replaced, or recover from a failure.

Question 19

Q

How is Mean Time to Repair (MTTR) calculated?

Answer

A

Total number hours of unplanned maintenance divided by the number of failure incidents.

Question 20

Q

How is Mean Time to Repair (MTTR) used in a recovery effort?

Answer

A

Used to estimate whether a recovery time objective (RTO) is achievable.

Question 21

Q

Define fault tolerance

Answer

A

A system that can experience failures in individual components and sub-systems and continue to provide the same (or nearly the same) level of service.

Question 22

Q

How is fault tolerance achieved?

Answer

A

By provisioning redundancy for critical components to eliminate single points of failure.

Question 23

Q

Define a recovery/spare site

Answer

A

Another location that can provide the same (or similar) level of service. A disaster or systems failure at one site will cause services to failover to the alternate processing site.

Question 24

Q

What are the three types of recovery/spare sites?

Answer

A

Hot
Warm
Cold

Question 25

Q

Define a hot site

Answer

A

Fully configured alternate processing site that can be brought online either instantly or very quickly.

Question 26

Q

Define a warm site

Answer

A

Alternate processing location that is dormant or performs noncritical functions under normal conditions, but which can be rapidly converted to a key operations site.

Question 27

Q

Define a cold site

Answer

A

Predetermined alternate location where a network can be rebuilt after a disaster.

Question 28

Q

Define a power distribution unit (PDU)

Answer

A

Provides filtered output voltage to “clean” the power signal, provides protection against spikes, surges, and brownouts.

Question 29

Q

Define an uninterruptible power supply (UPS)

Answer

A

Battery-powered device that swill provide a temporary power source in the event of a blackout or power failure.

Question 30

Q

Which metric is used to determine frequency of data backups?

Answer

A

Recovery Point Objective (RPO) is the maximum amount of data loss permitted, measured in units of time

Question 31

Q

Define multipathing and its purpose

Answer

A

A network node has more than one physical link to another node.

Question 32

Q

Define SAN (storage area network) multipathing

Answer

A

A node having multiple physical links to the SAN

Question 33

Q

How is multipathing performed with ISPs?

Answer

A

Contracting with multiple ISPs and using routing policies to forward traffic over multiple external circuits provides fault tolerance.

Question 34

Q

What needs to be confirmed when contracting with multiple ISPs?

Answer

A

Need to ensure that the ISPs are operating separate infrastructure and not using peering arrangements.

Question 35

Q

Define the concept of diverse paths

Answer

A

Provisioning links over separate cable conduits that are physically distant from one another.

Question 36

Q

Define link aggregation/bonding

Answer

A

Combining two or more separate cabled links between a host and a switch into a single logical channel.

Question 37

Q

How is link aggregation/bonding defined at the host level?

Answer

A

NIC teaming

Question 38

Q

How is link aggregation/bonding defined at the switch level?

Answer

A

Port aggregation

Question 39

Q

Besides increased bandwidth, what else does link aggregation/bonding offer?

Answer

A

Redundancy; if one link is broken, the connection is still maintained by the other.

Question 40

Q

What ethernet standard does link aggregation/bonding belong to?

Answer

A

802.3ad/802.1ax

Question 41

Q

What are bonded interfaces known as?

Answer

A

Link Aggregation Group (LAG)

Question 42

Q

What protocol is used to implement link aggregation/bonding?

Answer

A

802.11ad Link Aggregation Control Protocol (LACP); can be used to detect configuration errors and recover from the failure of one of the physical links.

Question 43

Q

Define a load balancer and its function

Answer

A

Type of switch, router, or software that distributes client requests between different resources, such as communications links or a pool of servers.

Question 44

Q

What are the two main types of load balancers?

Answer

A

Layer 4 switch
Layer 7 switch (content switch)

Question 45

Q

How does a layer 4 load balancer differ from a layer 7 load balancer?

Answer

A

Layer 4 load balancer makes decisions at the transport layer (Layer 4) while layer 7 load balancer makes decisions at the application layer

Question 46

Q

Define the concept of clustering

Answer

A

Load balancing technique where a group of servers are configured as a unit and work together to provide network services.

Question 47

Q

What term is used for the public IP address of a load balancer cluster/pair

Answer

A

Virtual IP/shared/floating address

Question 48

Q

How does a load balanced cluster use the virtual IP?

Answer

A

The nodes in the cluster share a private connection with their internal IPs, a redundancy protocol allows the active node in the cluster to own the virtual IP and respond to connections.

Question 49

Q

Define an active-passive cluster

Answer

A

Only one node is active at a time and the others are passive waiting to be active

Question 50

Q

Define an active-active cluster

Answer

A

All nodes are processing connections concurrently.

Question 51

Q

What is the purpose of a first hop redundancy protocol (FHRP)?

Answer

A

Designed to provide fault tolerance/redundancy to the default gateway of a subnet by provisioning failover routers in a cluster

Question 52

Q

What are the two types of first hop redundancy protocols (FHRPs)?

Answer

A

Hot Standby Router Protocol (HSRP)
Virtual Router Redundancy Protocol (VRRP)

Question 53

Q

How is the Hot Standby Router Protocol (HSRP) configured?

Answer

A

Each router interface connected is to the same subnet with its own unique MAC address and IP address, they also need to be configured to share a common virtual IP address and a common MAC address.

Question 54

Q

How does a router cluster using Hot Standby Router Protocol (HSRP) communicate?

Answer

A

Using IP multicasts

Question 55

Q

How is the active router chosen when using Hot Standby Router Protocol (HSRP)?

Answer

A

Based on priorities configured by an administrator; Of the remaining routers in the standby group, the router with the next highest priority is chosen as the standby router.

Question 56

Q

What is the difference in functionality between Virtual Router Redundancy Protocol (VRRP) and Hot Standby Router Protocol (HSRP)?

Answer

A

In VRRP, the active router is known as the master, and all other routers in the group are known as backup routers
VRPP routers can be configured to only use the virtual IP address

Brainscape's Knowledge GenomeTM

Lesson 18: Explaining Disaster Recovery and High Availability Concepts Flashcards

Brainscape's Knowledge Genome^TM