CHAPTER 8_Business Continuity and Disaster Recovery Flashcards Preview

CISSP_TEST > CHAPTER 8_Business Continuity and Disaster Recovery > Flashcards

Flashcards in CHAPTER 8_Business Continuity and Disaster Recovery Deck (170):
1

Recovery Strategies

Up to this point, the BCP team has carried out the project initiation phase. In this phase, the team obtained management support and the necessary resources, laid out the scope of the project, and identified the BCP team. It also completed the BIA phase. This means that the committee carried out a risk assessment and analysis, which resulted in a report of the real risk level the company faces.

2

Checklist Test

Okay, did we forget anything?

In this type of test, copies of the BCP are distributed to the different departments and functional areas for review. This is done so each functional manager can review the plan and indicate if anything has been left out or if some approaches should be modified or deleted. This is a method that ensures that some things have not been taken for granted or omitted. Once the departments have reviewed their copies and made suggestions, the planning team then integrates those changes into the master plan.

3

10. Which of the following is something that should be required of an offsite backup facility that stores backed-up media for companies?

  A. The facility should be within 10 to 15 minutes of the original facility to ensure easy access.

  B. The facility should contain all necessary PCs and servers and should have raised flooring.

  C. The facility should be protected by an armed guard.

  D. The facility should protect against unauthorized access and entry.

10. D. This question addresses a facility that is used to store backed-up data; it is not talking about an offsite facility used for disaster recovery purposes. The facility should not be only 10 to 15 minutes away, because some types of disasters could destroy both the company’s main facility and this facility if they are that close together, in which case the company would lose all of its information. The facility should have the same security standards as the company’s security, including protection against unauthorized access.

4

20. Which of the following incorrectly describes the concept of executive succession planning?

A. Predetermined steps protect the company if a senior executive leaves.

B. Two or more senior staff cannot be exposed to a particular risk at the same time.

C. It documents the assignment of deputy roles.

D. It covers assigning a skeleton crew to resume operations after a disaster.

Extended Questions:

CORRECT D. A skeleton crew consists of the employees who carry out the most critical functions following a disaster. They are put to work first during the recovery process. A skeleton crew is not related to the concept of executive succession planning, which addresses the steps that will be taken to fill a senior executive role should that person retire, leave the company, or die. The objective of a skeleton crew is to maintain critical operations, while the objective of executive succession planning is to protect the company by maintaining leadership roles.

WRONG A is incorrect because executive succession planning includes predetermined steps that protect the company if someone in a senior executive position retires, leaves the company, or is killed. The loss of a senior executive could tear a hole in the company’s fabric, creating a leadership vacuum that must be filled quickly with the right individual. The line of succession plan defines who would step in and assume responsibility for this role.

WRONG B is incorrect because the concept of two or more senior staff not being exposed to a particular risk at the same time is a policy that some larger organizations establish as part of their executive succession planning efforts. The idea is to protect senior personnel and the organization if a disaster were to strike. For example, an organization may decide that the CEO and president cannot travel on the same plane. If the plane went down and both individuals were killed, then the company could be in danger.

WRONG C is incorrect because executive succession planning can include the assignment of deputy roles. An organization may have a deputy CIO, deputy CFO, and deputy CEO ready to take over the necessary tasks if the CIO, CFO, or CEO becomes unavailable. Executive succession planning is the decision to have these deputies step into the CIO, CFO, or CEO roles.

5

redundant sites

Some companies choose to have redundant sites, or mirrored sites, meaning one site is equipped and configured exactly like the primary site, which serves as a redundant environment. The business-processing capabilities between the two sites can be completely synchronized. These sites are owned by the company and are mirrors of the original production environment. A redundant site has clear advantages: it has full availability, is ready to go at a moment’s notice, and is under the organization’s complete control. This is, however, one of the most expensive backup facility options, because a full environment must be maintained even though it usually is not used for regular production activities until after a disaster takes place that triggers the relocation of services to the redundant site. But expensive is relative here. If the company would lose a million dollars if it were out of business for just a few hours, the loss potential would override the cost of this option. Many organizations are subjected to regulations that dictate they must have redundant sites in place, so expense is not an issue in these situations.

6

Opportunities

Elements that could contribute to the project’s success

7

Parallel Test

Let’s do a little processing here and a little processing there.

A parallel test is done to ensure that the specific systems can actually perform adequately at the alternate offsite facility. Some systems are moved to the alternate site and processing takes place. The results are compared with the regular processing that is done at the original site. This points out any necessary tweaking or reconfiguring.

8

Electronic Backup Solutions

Manually backing up systems and data can be time-consuming, error-prone, and costly. Several technologies serve as automated backup alternatives. Although these technologies are usually more expensive, they are quicker and more accurate, which may be necessary for online information that changes often.

9

Responsibility

Each individual involved with recovery and continuity should have their responsibilities spelled out in writing to ensure a clear understanding in a chaotic situation. Each task should be assigned to the individual most logically situated to handle it. These individuals must know what is expected of them, which is done through training, drills, communication, and documentation. So, for example, instead of just running out of the building screaming, an individual must know that he is responsible for shutting down the servers before he can run out of the building screaming.

10

Sean has been hired as business continuity coordinator. He has been told by his management that he needed to ensure that the company is in compliance with the ISO/IEC standard that pertained to technology readiness for business continuity. He has also been instructed to find a way to transfer the risk of being unable to carry out critical business functions for a period of time because of a disaster.

28. Which of the following is most likely the standard that Sean has been asked to comply with?

A. ISO/IEC 27031

B. ISO/IEC 27005

C. ISO/IEC BS7799

D. ISO/IEC 2899

Extended Questions:

CORRECT A is correct. ISO/IEC 27031:2011 is a set of guidelines for information and communications technology readiness for business continuity. It is a component of the overall ISO/IEC 27000 series.

WRONG B is incorrect because the purpose of ISO/IEC 27005 is to provide guidelines for information security risk management. It supports the general concepts specified in ISO/IEC 27001 and is designed to assist the satisfactory implementation of information security based on a risk management approach.

11

26. What is the second step that is missing in the following graphic?

  A. Business impact analysis

  B. NIST standard

  C. Management approval and resource allocation

  D. Change control

26. A. The missing step is the BIA. The steps of the BIA are as follows:

• Identify the company’s critical business functions.

• Decide on information-gathering techniques: interviews, surveys, qualitative or quantitative questionnaires.

• Identify resources these functions depend upon.

• Calculate how long these functions can be without these resources.

• Identify vulnerabilities and threats to these functions.

• Calculate the risk for each different business function.

• Develop backup solutions for resources based on tolerable outage times.

• Develop recovery solutions for the company’s individual departments and for the company as a whole.

12

Strengths

Characteristics of the project team that give it an advantage over others

13

Work Recovery Time (WRT)

The Work Recovery Time (WRT) is the remainder of the overall MTD value. RTO usually deals with getting the infrastructure and systems back up and running, and WRT deals with restoring data, testing processes, and then making everything "live" for production purposes.

14

Threats

Elements that could contribute to the project’s failure

15

Tertiary Sites

During the BIA phase, the team may recognize the danger of the primary backup facility not being available when needed, which could require a tertiary site. This is a secondary backup site, just in case the primary backup site is unavailable. The secondary backup site is sometimes referred to as a "backup to the backup." This is basically plan B if plan A does not work out.

16

14. Which of the following is a critical first step in disaster recovery and contingency planning?

A. Plan testing and drills.

B. Complete a business impact analysis.

C. Determine offsite backup facility alternatives.

D. Organize and create relevant documentation.

Extended Questions:

CORRECT B. Of the steps listed in this question, completing a business impact analysis would take the highest priority. The BIA is essential in determining the most critical business functions and identifying the threats that correlate them. Qualitative and quantitative data needs to be gathered, analyzed, interpreted, and presented to management.

WRONG A is incorrect because plan testing and drills are the last step in disaster recovery and contingency planning. It is important to test the business continuity plan regularly because environments continually change. Tests and disaster recovery drills and exercises should be performed at least once a year. Most companies cannot afford for these exercises to interrupt production or productivity, so the exercises may need to take place in sections or at specific times, which requires logistical planning.

WRONG C is incorrect because determining offsite backup facility alternatives is part of the recovery strategy, which takes place in the middle of the disaster recovery and contingency planning process. Organizations must have alternative offsite backup facilities in the case of a larger disaster. Generally, contracts are established with third-party vendors to provide such services. The client pays a monthly fee to retain the right to use the facility in a time of need, and then incurs an activation fee when the facility has to be used.

WRONG D is incorrect because organizing and creating relevant documentation takes place toward the end of the disaster recovery and contingency planning process. Procedures need to be documented because when they are actually needed, it will most likely be a chaotic and frantic atmosphere with a demanding time schedule. The documentation may need to include information on how to install images, configure operating systems and servers, and properly install utilities and proprietary software. Other documentation could include a calling tree, and contact information for specific vendors, emergency agencies, offsite facilities, etc.

17

16. Business continuity plans can be assessed via a number of tests. Which type of test continues up to the point of actual relocation to an offsite facility and actual shipment of replacement equipment?

A. Parallel test

B. Checklist test

C. Structured walk-through test

D. Simulation test

Extended Questions:

CORRECT D. In a simulation test, all employees who participate in operational and support functions come together to practice executing the disaster recovery plan based on a specific scenario. The scenario is used to test the reaction of each operational and support representative. This is done to ensure that specific steps were not left out and certain threats were not overlooked, as well as to act as a catalyst to raise awareness of the people involved. The drill includes only those materials available in an actual disaster to portray a more realistic environment. The simulation test continues up to the point of actual relocation to an offsite facility and actual shipment of replacement equipment.

WRONG A is incorrect because a parallel test is carried out to ensure that the specific systems can actually perform adequately at the alternate offsite facility. The systems are moved to the alternate site and processing takes place. The results are compared with the regular processing that is done at the original site. This activity points out any necessary tweaking, reconfiguring, or steps that need to take place to ensure that proper processing can take place at the alternate site.

WRONG B is incorrect because in a checklist test copies of the disaster recovery and business continuity plans are distributed to the different departments and functional areas for review. This is done so that each functional manager or team can review the plan and indicate if anything has been left out or if some approaches should be modified or deleted. This is a method that ensures that some things have not been taken for granted or omitted. Once the departments have reviewed their copy and made suggestions, the planning team then integrates those changes into the master plan.

WRONG C is incorrect because in a structured walk-through test representatives from each department or functional area come together to go over the plan to ensure its accuracy. The group goes over the objectives of the plan; discusses the scope and assumptions of the plan; reviews the organization and reporting structure; and evaluates the testing, maintenance, and training requirements described. This gives the people who will be responsible for making sure that a disaster recovery happens effectively and efficiently a chance to review what has been decided upon and what is expected of them. The group walks through different scenarios of the plan from beginning to end to make sure nothing was left out and to raise the awareness of the recovery team members.

18

functional analysis

A BIA (business impact analysis) is considered a functional analysis, in which a team collects data through interviews and documentary sources; documents business functions, activities, and transactions; develops a hierarchy of business functions; and finally applies a classification scheme to indicate each individual function’s criticality level. But how do we determine a classification scheme based on criticality levels?

19

10. It is not unusual for business continuity plans to become out of date. Which of the following is not a reason why plans become outdated?

A. Changes in hardware, software, and applications

B. Infrastructure and environment changes

C. Personnel turnover

D. That the business continuity process is integrated into the change management process

Extended Questions:

CORRECT D. Unfortunately, business continuity plans can become quickly out of date. An out-of-date BCP may provide a company with a false sense of security, which could be devastating if and when a disaster actually takes place. One of the simplest and most cost-effective and process-efficient ways to keep a plan up to date is to incorporate it within the change management process of the organization. When you think about it, it makes a lot of sense. Where do you document new applications, equipment, or services? Where do you document updates and patches? Your change management process should be updated to incorporate fields and triggers that alert the BCP team when a significant change will occur and should provide a means to update the recovery documentation. Other measures that can help ensure that the BCP remains current include the performance of regular drills that use the plan, including the plan’s maintenance in personnel evaluations, and making business continuity a part of every business decision.

WRONG A is incorrect because changes in hardware, software, and applications occur frequently, and unless the BCP is part of the change management process, then these changes are unlikely to be included in the BCP. When changes to the environment take place, the BCP needs to be updated. If it is not updated after changes, it is out of date.

WRONG B is incorrect because infrastructure and environment changes occur frequently. Just as with software, hardware, and application changes, unless the BCP is part of the change management process, infrastructure and environment changes are unlikely to make it into the BCP.

WRONG C is incorrect because plans often become outdated as a result of personnel turnover. It is not unusual for a BCP to become abandoned when the person or people responsible for its maintenance leave the organization. These responsibilities must be reassigned. To ensure this happens, maintenance responsibilities should be incorporated into job descriptions and properly monitored.

20

The main parts of a risk assessment are:

  • Review the existing strategies for risk management
  • Construct a numerical scoring system for probabilities and impacts
  • Make use of a numerical score to gauge the effect of the threat
  • Estimate the probability of each threat
  • Weigh each threat through the scoring system
  • Calculate the risk by combining the scores of likelihood and impact of each threat
  • Get the organization’s sponsor to sign off on these risk priorities
  • Weigh appropriate measures
  • Make sure that planned measures that alleviate risk do not heighten other risks
  • Present the assessment’s findings to executive management

21

As indicators of success, the risk assessment should identify, evaluate, and record all relevant items, which may include:

  • Vulnerabilities for all of the organization’s most time-sensitive resources and activities
  • Threats and hazards to the organization’s most urgent resources and activities
  • Measures that cut the possibility, length, or effect of a disruption on critical services and products

22

13. Who has the final approval of the business continuity plan?

  A. The planning committee

  B. Each representative of each department

  C. Management

  D. External authority

13. C. Management really has the final approval over everything within a company, including these plans.

23

11. Which item will a business impact analysis not identify?

  A. Whether the company is best suited for a parallel or full-interrupt test

  B. What areas would suffer the greatest operational and financial loss in the event of a particular disaster or disruption

  C. What systems are critical for the company and must be highly protected

  D. What amount of outage time a company can endure before it is permanently crippled

11. A. All the other answers address the main components of a business impact analysis. Determining the best type of exercise or drill to carry out is not covered under this type of analysis.

24

7. The operations team is responsible for defining which data gets backed up and how often. Which type of backup process backs up files that have been modified since the last time all data was backed up?

A. Incremental process

B. Full backup

C. Partial backup

D. Differential process

Extended Questions:

CORRECT D. Backups can be full, differential, or incremental, and are usually used in some type of combination with each other. Most files are not altered every day, so to save time and resources, it is best to devise a backup plan that does not continually back up data that has not been modified. Backup software reviews the archive bit setting when making its determination on what gets backed up and what does not. If a file is modified or created, the file system sets the archive bit to 1, and the backup software knows to back up that file. A differential process backs up the files that have been modified since the last full backup; in other words, the last time all the data was backed up. When the data needs to be restored, the full backup is laid down first, and then the differential backup is put down on top of it.

WRONG A is incorrect because an incremental process backs up all the files that have changed since the last full or incremental backup. If a company experienced a disaster and it used the incremental process, it would first need to restore the full backup on its hard drives and lay down every incremental backup that was carried out before the disaster took place. So, if the full backup was done six months ago and the operations department carried out an incremental backup each month, the restoration team would restore the full backup and start with the older incremental backups and restore each one of them until they are all restored.

WRONG B is incorrect because with a full backup, all data is backed up and saved to some type of storage media. During a full backup, the archive bit is cleared, which means that it is set to 0. A company can choose to do full backups only, in which case the restoration process is just one step, but the backup and restore processes could take a long time.

WRONG C is incorrect because it is not the best answer to this question. While a backup can be a partial backup, it does not necessarily mean that it backs up all the files that have been modified since the last time a backup process was run.

25

business continuity plan (BCP)

A disaster recovery plan (DRP) is carried out when everything is still in emergency mode, and everyone is scrambling to get all critical systems back online. A business continuity plan (BCP) takes a broader approach to the problem. It can include getting critical systems to another environment while repair of the original facilities is under way, getting the right people to the right places during this time, and performing business in a different mode until regular conditions are back in place. It also involves dealing with customers, partners, and shareholders through different channels until everything returns to normal. So, disaster recovery deals with, "Oh my goodness, the sky is falling," and continuity planning deals with, "Okay, the sky fell. Now, how do we stay in business until someone can put the sky back where it belongs?"

26

Once the coordinator, management, and salvage team sign off on the readiness of the facility, the salvage team should carry out the following steps:

  • Back up data from the alternate site and restore it within the new facility.
  • Carefully terminate contingency operations.
  • Securely transport equipment and personnel to the new facility.

27

executive succession planning

Organizations should already have executive succession planning in place. This means that if someone in a senior executive position retires, leaves the company, or is killed, the organization has predetermined steps to carry out to protect the company. The loss of a senior executive could tear a hole in the company’s fabric, creating a leadership vacuum that must be filled quickly with the right individual. The line-of-succes-sion plan defines who would step in and assume responsibility for this role. Many organizations have "deputy" roles. For example, an organization may have a deputy CIO, deputy CFO, and deputy CEO ready to take over the necessary tasks if the CIO, CFO, or CEO becomes unavailable.

28

cold site

Most companies use warm sites, which have some devices such as disk drives, tape drives, and controllers, but very little else. These companies usually cannot afford a hot site, and the extra downtime would not be considered detrimental. A warm site can provide a longer-term solution than a hot site. Companies that decide to go with a cold site must be able to be out of operation for a week or two. The cold site usually includes power, raised flooring, climate control, and wiring.

29

18. Which of the following describes a parallel test?

  A. It is performed to ensure that operations performed at the alternate site also give the same results as at the primary site.

  B. All departments receive a copy of the disaster recovery plan and walk through it.

  C. Representatives from each department come together and go through the test collectively.

  D. Normal operations are shut down.

18. A. In a parallel test, some systems are run at the alternate site, and the results are compared with how processing takes place at the primary site. This is to ensure that the systems work in that area and productivity is not affected. This also extends the previous test and allows the team to walk through the steps of setting up and configuring systems at the offsite facility.

30

26. A. The missing step is the BIA. The steps of the BIA are as follows:

  • Identify the company’s critical business functions.
  • Decide on information-gathering techniques: interviews, surveys, qualitative or quantitative questionnaires.
  • Identify resources these functions depend upon.
  • Calculate how long these functions can be without these resources.
  • Identify vulnerabilities and threats to these functions.
  • Calculate the risk for each different business function.
  • Develop backup solutions for resources based on tolerable outage times.
  • Develop recovery solutions for the company’s individual departments and for the company as a whole.

31

Organizations can keep the plan updated by taking the following actions:

  • Make business continuity a part of every business decision.
  • Insert the maintenance responsibilities into job descriptions.
  • Include maintenance in personnel evaluations.
  • Perform internal audits that include disaster recovery and continuity documentation and procedures.
  • Perform regular drills that use the plan.
  • Integrate the BCP into the current change management process.
  • Incorporate lessons learned from actual incidents into the plan.

32

24. Which of the following is not an advantage of a hot site?

  A. Offers many hardware and software choices.

  B. Is readily available.

  C. Can be up and running in hours.

  D. Annual testing is available.

24. A. Because hot sites are fully equipped, they do not allow for a lot of different hardware and software choices. The subscription service offers basic software and hardware products, and does not usually offer a wide range of proprietary items.

33

BCP policy

The BCP policy supplies the framework for and governance of designing and building the BCP effort. The policy helps the organization understand the importance of BCP by outlining BCP’s purpose. It provides an overview of the principles of the organization and those behind BCP, and the context for how the BCP team will proceed.

34

End-User Environment

Do you think the users could just use an abacus for calculations and fire for light?

Because the end users are usually the worker bees of a company, they must be provided a functioning environment as soon as possible after a disaster hits. This means that the BCP team must understand the current operational and technical functioning environment and examine critical pieces so they can replicate them.

35

recovery strategy stage

In the recovery strategy stage, the team approaches the information gathered during the BIA stage from a practical perspective. It has to figure out what the company needs to do to actually recover the items it has identified as being so important to the organization overall. In its business continuity and recovery strategy, the team closely examines the critical, agreed-upon business functions, and then evaluates the numerous recovery and backup alternatives that might be used to recover critical business operations.

36

2. Performed the BIA

  • Identified critical business functions, their resources, and MTD values
  • Identified threats and calculated the impact of these threats
  • Identified solutions
  • Presented findings to management

37

Work Recovery Time (WRT)

The Work Recovery Time (WRT) is the remainder of the overall MTD value. RTO usually deals with getting the infrastructure and systems back up and running, and WRT deals with restoring data, testing processes, and then making everything "live" for production purposes.

38

22. Which of the following describes a cold site?

  A. Fully equipped and operational in a few hours

  B. Partially equipped with data processing equipment

  C. Expensive and fully configured

  D. Provides environmental measures but no equipment

22. D. A cold site only provides environmental measures—wiring, air conditioning, raised floors—basically a shell of a building and no more.

39

Warm and Cold Site Advantages

  • Less expensive
  • Available for longer timeframes because of the reduced costs
  • Practical for proprietary hardware or software use

40

8. After a disaster occurs, a damage assessment needs to take place. Which of the following steps occurs last in a damage assessment?

A. Determine the cause of the disaster.

B. Identify the resources that must be replaced immediately.

C. Declare a disaster.

D. Determine how long it will take to bring critical functions back online.

Extended Questions:

CORRECT C. The final step in a damage assessment is to declare a disaster. After information from the damage assessment is collected and assessed, it will indicate what teams need to be called to action and whether the BCP actually needs to be activated. The BCP coordinator and team must develop activation criteria before a disaster takes place. After the damage assessment, if one or more of the situations outlined in the criteria have taken place, then the team is moved into recovery mode. Different organizations have different criteria, because the business drivers and critical functions will vary from organization to organization. The criteria may consist of danger to human life, danger to state or national security, damage to facility, damage to critical systems, and estimated value of downtime that will be experienced.

WRONG A is incorrect because determining the cause of the disaster is the first step of the damage assessment. The issue that caused the damage may still be taking place and the team must figure out how to stop it before a full damage assessment can take place.

WRONG B is incorrect because identifying the resources that must be replaced immediately is not the last step of a damage assessment. It does occur near the end of the assessment, however. Once the resources are identified, the team must estimate how long it will take to bring critical functions back online, and then declare a disaster, if necessary.

WRONG D is incorrect because determining how long it will take to bring critical functions back online is the second to last step in a damage assessment. If it will take longer than the previously determined maximum tolerable downtime (MTD) values to restore operations, then a disaster should be declared and the BCP should be put into action.

41

failover

If a technology has a failover capability, this means that if there is a failure that cannot be handled through normal means, then processing is "switched over" to a working system. For example, two servers can be configured to send each other heartbeat signals every 30 seconds. If server A does not receive a heartbeat signal from server B after 40 seconds, then all processes are moved to server A so that there is no lag in operations. Also, when servers are clustered, this means that there is an overarching piece of software monitoring each server and carrying out load balancing. If one server within the cluster goes down, the clustering software stops sending it data to process so that there are no delays in processing activities.

42

Warm site

Warm site A leased or rented facility that is usually partially configured with some equipment, such as HVAC, and foundational infrastructure components, but not the actual computers. In other words, a warm site is usually a hot site without the expensive equipment such as communication equipment and servers. Staging a facility with duplicate hardware and computers configured for immediate operation is extremely expensive, so a warm site provides an alternate facility with some peripheral devices.

43

6. Which of the following steps comes first in a business impact analysis?

A. Calculate the risk for each different business function.

B. Identify critical business functions.

C. Create data-gathering techniques.

D. Identify vulnerabilities and threats to business functions.

Extended Questions:

CORRECT C. Of the steps listed, the first step in a business impact analysis (BIA) is creating data-gathering techniques. The BCP committee can use surveys, questionnaires, and interviews to gather information from key personnel about how different tasks get accomplished within the organization, whether it’s a process, transaction, or service, along with any relevant dependencies. Process flow diagrams should be built from this data, which will be used throughout the BIA and plan development stages.

WRONG A is incorrect because calculating the risk of each business function occurs after business functions have been identified. And before that can happen, the BCP team must gather data from key personnel. To calculate the risk of each business function, qualitative and quantitative impact information should be gathered and properly analyzed and interpreted. Upon completion of the data analysis, it should be reviewed with the most knowledgeable people within the company to ensure that the findings are appropriate and describe the real risks and impacts the organization faces. This will help flush out any additional data points not originally obtained and will give a fuller understanding of all the possible business impacts.

WRONG B is incorrect because identifying critical business functions takes place after the BCP committee has learned about the business functions that exist by interviewing and surveying key personnel. Upon completion of the data collection phase, the BCP committee conducts an analysis to establish which processes, devices, or operational activities are critical. If a system stands on its own, doesn’t affect other systems, and is of low criticality, then it can be classified as a tier two or three recovery step. This means these resources will not be dealt with during the recovery stages until the most critical (tier one) resources are up and running.

WRONG D is incorrect because identifying vulnerabilities and threats to business functions takes place toward the end of a business impact analysis. Of the steps listed in the answers, it is the last one. Threats can be manmade, natural, or technical. It is important to identify all possible threats and estimate the probability of them happening. Some issues may not immediately come to mind when developing these plans. These issues are often best addressed in a group with scenario-based exercises. This ensures that if a threat becomes a reality, the plan includes the ramifications on all business tasks, departments, and critical operations. The more issues that are thought of and planned for, the better prepared a company will be if and when these events occur.

44

A company needs to address several issues and ask specific questions when it is deciding upon a storage facility for its backup materials. The following provides a list of just some of the issues that need to be thought through before committing to a specific vendor for this service:

  • Can the media be accessed in the necessary timeframe?
  • Is the facility closed on weekends and holidays, and does it only operate during specific hours of the day?
  • Are the access control mechanisms tied to an alarm and/or the police station?
  • Does the facility have the capability to protect the media from a variety of threats?
  • What is the availability of a bonded transport service?
  • Are there any geographical environmental hazards such as floods, earthquakes, tornadoes, and so on that might affect the facility?
  • Is there a fire detection and suppression system?
  • Does the facility provide temperature and humidity monitoring and control?
  • What type of physical, administrative, and logical access controls are used?

45

In the industry, HA is usually thought about only in technology terms, but remember that there are many things that an organization needs to keep functioning. Availability of each of the following items must be thought through and planned:

  • Facility
  • Cold, warm, hot, redundant, rolling, reciprocal sites
  • Infrastructure
  • Redundancy, fault tolerance
  • Storage
  • RAID, Storage Area Network (SAN), mirroring, disk shadowing, cloud
  • Server
  • Clustering, load balancing
  • Data
  • Tapes, backups, vaulting, online replication
  • Business processes
  • People

46

business continuity management (BCM)

While DRP and BCP are directed at the development of plans, business continuity management (BCM) is the holistic management process that should cover both of them. BCM provides a framework for integrating resilience with the capability for effective responses that protects the interests of an organization’s key stakeholders. The main objective of BCM is to allow the organization to continue to perform business operations under various conditions.

47

25. The following is a graphic of a business continuity policy. Which component is missing from this graphic?

A. Damage assessment phase

B. Reconstitution phase

C. Business resumption phase

D. Continuity of operations plan

Extended Questions:

CORRECT B. After a disaster takes place and a company moves out of its facility, it must move back in after the facility is reconstructed. When it is time for the company to move back into its original site or a new site, the company is ready to enter into the reconstitution phase. A company is not out of an emergency state until it is back in operation at the original primary site or a new site that was constructed to replace the primary site, because the company is always vulnerable while operating in a backup facility. Many logistical issues need to be considered as to when a company must return from the alternate site to the original site. The following lists a few of these issues:

• Ensuring the safety of employees

• Ensuring an adequate environment is provided (power, facility infrastructure, water, HVAC)

• Ensuring that the necessary equipment and supplies are present and in working order

• Ensuring proper communications and connectivity methods are working

• Properly testing the new environment

WRONG A is incorrect because a role, or a team, needs to be created to carry out a damage assessment once a disaster has taken place. The assessment procedures should be properly documented and include the following steps:

• Determine the cause of the disaster.

• Determine the potential for further damage.

• Identify the affected business functions and areas.

• Identify the level of functionality for the critical resources.

• Identify the resources that must be replaced immediately.

• Estimate how long it will take to bring critical functions back online.

• If it will take longer than the previously estimated Maximum Tolerable Downtime (MTD) values to restore operations, then a disaster should be declared and the Business Continuity Planning (BCP) should be put into action.

After this information is collected and assessed, it will indicate what teams need to be called to action and whether the BCP actually needs to be activated. The BCP coordinator and team must develop activation criteria. After the damage assessment, if one or more of the situations outlined in the criteria have taken place, then the team is moved into recovery mode.

WRONG C is incorrect because a business resumption plan focuses on how to re-create the necessary business processes that need to be reestablished instead of focusing on only IT components (i.e., it is process-oriented instead of procedure-oriented). This plan could be mentioned in the BCP policy, but the policy does not outline the specifics of reestablishing business processes.

WRONG D is incorrect because a continuity of operations plan (COOP) establishes senior management and a headquarters after a disaster. It provides instructions on how to set up a command center so that all activities and communication take place centrally and in a controlled manner. This type of plan also outlines roles and authorities, orders of succession, and individual role tasks that need to be put into place after a disaster takes place. This plan could be mentioned in the BCP policy, but the policy does not outline the specifics of setting up a command center and its components.

26. The Recovery Time Objective (RTO) and Maximum Tolerable Downtime (MTD) metrics have similar roles, but their values are very different. Which of the following best describes the difference between RTO and MTD metrics?

A. The RTO is a time period that represents the inability to recover, and the MTD represents an allowable amount of downtime.

B. The RTO is an allowable amount of downtime, and the MTD represents a time period that represents the inability to recover.

C. The RTO is a metric used in disruptions, and the MTD is a metric used in disasters.

D. The RTO is a metric pertaining to loss of access to data, and the MTD is a metric pertaining to loss of access to hardware and processing capabilities.

CORRECT B. The RTO value is smaller than the MTD value, because the MTD value represents the time after which an inability to recover significant operations will mean severe and perhaps irreparable damage to the organization’s reputation or bottom line. The RTO assumes that there is a period of acceptable downtime. This means that a company can be out of production for a certain period of time (RTO) and still get back on its feet. But if the company cannot get production up and running within the MTD window, the company is sinking too fast to properly recover.

WRONG A is incorrect because the MTD is a time period that represents the inability to recover, and the RTO represents an allowable amount of downtime.

WRONG C is incorrect because the Recovery Time Objective (RTO) is the earliest time period and a service level within which a business process must be restored after a disaster to avoid unacceptable consequences associated with a break in business continuity. The RTO value is smaller than the MTD value, because the MTD value represents the time after which an inability to recover significant operations will mean severe and perhaps irreparable damage to the organization’s reputation or bottom line.

WRONG D is incorrect because the Recovery Time Objective (RTO) is the earliest time period and a service level within which a business process must be restored after a disaster to avoid unacceptable consequences associated with a break in business continuity. The RTO value is smaller than the MTD value, because the MTD value represents the time after which an inability to recover significant operations will mean severe and perhaps irreparable damage to the organization’s reputation or bottom line. RTO is not a metric pertaining to loss of access to data, and the MTD is not a metric pertaining to loss of access to hardware and processing capabilities.

48

The initiation process for BCP might include the following:

  • Setting up a budget and staff for the program before the BCP process begins. Dedicated personnel and dedicated hours are essential for executing something as labor-intensive as a BCP.
  • Setting up the program would include assigning duties and responsibilities to the BCP coordinator and to representatives from all of the functional units of the organization.
  • Senior management should kick off the BCP with a formal announcement or, better still, an organization-wide meeting to demonstrate high-level support.
  • Awareness-raising activities to let employees know about the BCP program and to build internal support for it.
  • Establishment of skills training for the support of the BCP effort.
  • The start of data collection from throughout the organization to aid in crafting various continuity options.
  • Putting into effect "quick wins" and gathering of "low-hanging fruit" to show tangible evidence of improvement in the organization’s readiness, as well as improving readiness.

49

mean time to repair (MTTR)

Disasters and catastrophes are rare compared to nondisasters, thank goodness. Nondisasters can usually be taken care of by replacing a device or restoring files from onsite backups. For nondisasters, the BCP team needs to think through onsite backup requirements and make well-informed decisions. The team must identify the critical equipment, and estimate the mean time between failures (MTBF) and the mean time to repair (MTTR). This will provide the necessary statistics on when a device may be meeting its maker and a new device may be required.

50

Interdependencies

Operations depend on manufacturing, manufacturing depends on R&D, payroll depends on accounting, and they all depend on IT.

51

Software Backups

I have a backup server and my backed-up data, but no operating system or applications.

52

23. Which of the following best describes what a disaster recovery plan should contain?

  A. Hardware, software, people, emergency procedures, recovery procedures

  B. People, hardware, offsite facility

  C. Software, media interaction, people, hardware, management issues

  D. Hardware, emergency procedures, software, identified risk

23. A. The recovery plan should contain information about how to deal with people, hardware, software, emergency procedures, recovery procedures, facility issues, and supplies.

53

Recovery Time Objective (RTO)

The Recovery Time Objective (RTO) is the earliest time period and a service level within which a business process must be restored after a disaster to avoid unacceptable consequences associated with a break in business continuity. The RTO value is smaller than the MTD value, because the MTD value represents the time after which an inability to recover significant operations will mean severe and perhaps irreparable damage to the organization’s reputation or bottom line. The RTO assumes that there is a period of acceptable downtime. This means that a company can be out of production for a certain period of time (RTO) and still get back on its feet. But if the company cannot get production up and running within the MTD window, the company is sinking too fast to properly recover.

54

Supply and Technology Recovery : At this point, the BCP team has mapped out the necessary business functions that need to be up and running and the specific backup facility option that is best for its organization. Now the team needs to dig down into the more granular items, such as backup solutions for the following:

  • Network and computer equipment
  • Voice and data communications resources
  • Human resources
  • Transportation of equipment and personnel
  • Environment issues (HVAC)
  • Data and personnel security issues
  • Supplies (paper, forms, cabling, and so on)
  • Documentation

55

The committee should be made up of representatives from at least the following departments:

  • Business units
  • Senior management
  • IT department
  • Security department
  • Communications department
  • Legal department

56

3. A business impact analysis is considered a functional analysis. Which of the following is not carried out during a business impact analysis?

A. A parallel or full-interruption test

B. The application of a classification scheme based on criticality levels

C. The gathering of information via interviews

D. Documentation of business functions

Extended Questions:

CORRECT A. A business impact analysis (BIA) is considered a functional analysis, in which a team collects data through interviews and documentary sources; documents business functions, activities, and transactions; develops a hierarchy of business functions; and finally applies a classification scheme to indicate each individual function’s criticality level. Parallel and full-interruption tests are not part of a BIA. These tests are carried out to ensure the continued validity of a business continuity plan, since environments continually change. A parallel test is done to ensure that specific systems can actually perform adequately at the alternate offsite facility, while a full-interruption test involves shutting down the original site and resuming operations and processing at the alternate site.

WRONG B is incorrect because the application of a classification scheme based on criticality levels is carried out during a business impact analysis (BIA). This is done by identifying the critical assets of the company and mapping them to the following characteristics: maximum tolerable downtime, operational disruption and productivity, financial considerations, regulatory responsibilities, and reputation.

WRONG C is incorrect because the gathering of information during interviews is conducted during a business impact analysis. The BCP committee will not truly understand all business processes, the steps that must take place, or the resources and supplies those processes require. So the committee must gather this information from the people who do know, which are department managers and specific employees throughout the organization. The committee must identify the individuals who will provide information and how that information will be collected (surveys, interviews, or workshops).

WRONG D is incorrect because the BCP committee does document business functions as part of a business impact analysis (BIA). Business activities and transactions must also be documented. This information is obtained from the department managers and specific employees that are interviewed or surveyed. Once the information is documented, the BCP committee can conduct an analysis to determine which processes, devices, or operational activities are the most critical.

57

Develop the continuity planning policy statement

1. Develop the continuity planning policy statement. Write a policy that provides the guidance necessary to develop a BCP, and that assigns authority to the necessary roles to carry out these tasks.

58

19. Which of the following describes a structured walk-through test?

  A. It is performed to ensure that critical systems will run at the alternate site.

  B. All departments receive a copy of the disaster recovery plan and walk through it.

  C. Representatives from each department come together and review the steps of the test collectively without actually performing those steps.

  D. Normal operations are shut down.

19. C. During a structured walk-through test, functional representatives review the plan to ensure its accuracy and that it correctly and accurately reflects the company’s recovery strategy.

59

18. Several teams should be involved in carrying out the business continuity plan. Which team is responsible for starting the recovery of the original site?

A. Damage assessment team

B. BCP team

C. Salvage team

D. Restoration team

Extended Questions:

CORRECT C. The BCP coordinator should have an understanding of the needs of the company and the types of teams that need to be developed and trained. Employees should be assigned to the specific teams based on their knowledge and skill set. Each team needs to have a designated leader, who will direct the members and their activities. These team leaders will be responsible not only for ensuring that their team’s objectives are met, but also for communicating with each other to make sure each team is working in parallel phases. The salvage team is responsible for starting the recovery of the original site. It is also responsible for backing up data from the alternate site and restoring it within the new facility, carefully terminating contingency operations, and securely transporting equipment and personnel to the new facility.

WRONG A is incorrect because the damage assessment team is responsible for determining the scope and severity of the damage caused. Whether or not a disaster is declared and the BCP put into action is based on this information collected and assessed by the damage assessment team.

WRONG B is incorrect because the BCP team is responsible for creating and maintaining the business continuity plan. As such, its responsibilities also include identifying regulatory and legal requirements that must be met, identifying all possible vulnerabilities and threats, performing a business impact analysis, and developing procedures and steps in resuming business after a disaster. The BCP team is made up of representatives from a variety of business units and departments, including senior management, the security department, the communications department, and the legal department. This is not the team that starts the physical recovery of the original site.

WRONG D is incorrect because the restoration team is responsible for getting the alternate site into a working and functioning environment. Both the restoration team and the salvage team must know how to do many tasks, such as install operating systems, configure workstations and servers, string wire and cabling, set up the network and configure networking services, and install equipment and applications. Both teams must also know how to restore data from backup facilities, and how to do so in a secure manner that ensures that the systems’ and data’s confidentiality, integrity, and availability are not compromised.

60

12. Management support is critical to the success of a business continuity plan. Which of the following is the most important to be provided to management to obtain their support?

A. Business case

B. Business impact analysis

C. Risk analysis

D. Threat report

Extended Questions:

CORRECT A. The most critical part of establishing and maintaining a current continuity plan is management support. Management may need to be convinced of the necessity of such a plan. Therefore, a business case must be made to obtain this support. The business case may include current vulnerabilities, regulatory and legal obligations, the current status of recovery plans, and recommendations. Management is commonly most concerned with cost/benefit issues, so preliminary numbers can be gathered and potential losses estimated. The decision of how a company should recover is a business decision and should always be treated as such.

WRONG B is incorrect because a business impact analysis (BIA) is conducted after the BCP team has obtained management’s support for their efforts. A BIA is performed to identify the areas that would suffer the greatest financial or operational loss in the event of a disaster or disruption. It identifies the company’s critical systems needed for survival and estimates the outage time that can be tolerated by the company as a result of a disaster or disruption.

WRONG C is incorrect because a risk analysis is a method of identifying risks and assessing the possible damage that could be caused in order to justify security safeguards. In the context of BCP, risk analysis methodologies are used during a business impact analysis to establish which processes, devices, or operational activities are critical and should therefore be recovered first.

WRONG D is incorrect because threat report is a distracter. However, it is critical that management understand what the real threats are to the company, the consequences of those threats, and the potential loss values for each threat. Without this understanding, management may only give lip service to continuity planning, and in some cases that is worse than not having any plans at all because of the false sense of security that it creates.

61

Data Backup Alternatives

As we have discussed so far, backup alternatives are needed for hardware, software, personnel, and offsite facilities. It is up to each company and its continuity team to decide if all of these components are necessary for its survival and the specifics for each type of backup needed.

62

The main reasons plans become outdated include the following:

  • The business continuity process is not integrated into the change management process.
  • Changes occur to the infrastructure and environment.
  • Reorganization of the company, layoffs, or mergers occur.
  • Changes in hardware, software, and applications occur.
  • After the plan is constructed, people feel their job is done.
  • Personnel turn over.
  • Large plans take a lot of work to maintain.
  • Plans do not have a direct line to profitability.

63

6. The purpose of initiating emergency procedures right after a disaster takes place is to prevent loss of life and injuries, and to ____________.

  A. Secure the area to ensure that no looting or fraud takes place

  B. Mitigate further damage

  C. Protect evidence and clues

  D. Investigate the extent of the damages

6. B. The main goal of disaster recovery and business continuity plans is to mitigate all risks that could be experienced by a company. Emergency procedures first need to be carried out to protect human life, and then other procedures need to be executed to reduce the damage from further threats.

64

Conduct the business impact analysis (BIA)

2. Conduct the business impact analysis (BIA). Identify critical functions and systems and allow the organization to prioritize them based on necessity. Identify vulnerabilities and threats, and calculate risks.

65

Test the plan and conduct training and exercises

6. Test the plan and conduct training and exercises. Test the plan to identify deficiencies in the BCP, and conduct training to properly prepare individuals on their expected tasks.

66

ISO 22301

ISO 22301 Pending International Standard for business continuity management systems. The specification document against which organizations will seek certification.

67

16. During development, testing, and maintenance of the continuity plan, a high degree of interaction and communications is crucial to the process. Why?

  A. This is a regulatory requirement of the process.

  B. The more people who talk about it and are involved, the more awareness will increase.

  C. This is not crucial to the plan and should not be interactive because it will most likely affect operations.

  D. Management will more likely support it.

16. B. Communication not only spreads awareness of these plans and their contents, but also allows more people to discuss the possible threats and solutions, which may lead to ideas that the original team did not consider.

68

Offsite Location

When choosing a backup facility, it should be far enough away from the original site so that one disaster does not take out both locations. In other words, it is not logical to have the backup site only a few miles away if the company is concerned about tornado damage, because the backup site could also be affected or destroyed. There is a rule of thumb that suggests that alternate facilities should be, at a bare minimum, at least 5 miles away from the primary site, while 15 miles is recommended for most low-to-medium critical environments, and 50 to 200 miles is recommended for critical operations to give maximum protection in cases of regional disasters.

69

Hot site

Hot site A facility that is leased or rented and is fully configured and ready to operate within a few hours. The only missing resources from a hot site are usually the data, which will be retrieved from a backup site, and the people who will be processing the data. The equipment and system software must absolutely be compatible with the data being restored from the main site and must not cause any negative interoperability issues. Some facilities, for a fee, store data backups close to the hot site. These sites are a good choice for a company that needs to ensure a site will be available for it as soon as possible. Most hot-site facilities support annual tests that can be done by the company to ensure the site is functioning in the necessary state of readiness.

70

The BCP team should carry out and address in the resulting plan the following interrelation and interdependency tasks:

  • Define essential business functions and supporting departments.
  • Identify interdependencies between these functions and departments.
  • Discover all possible disruptions that could affect the mechanisms necessary to allow these departments to function together.
  • Identify and document potential threats that could disrupt interdepartmental communication.
  • Gather quantitative and qualitative information pertaining to those threats.
  • Provide alternative methods of restoring functionality and communication.
  • Provide a brief statement of rationale for each threat and corresponding information.

71

BCP Policy

Now that we know what we are doing, we should write this down.

The BCP policy supplies the framework for and governance of designing and building the BCP effort. The policy helps the organization understand the importance of BCP by outlining BCP’s purpose. It provides an overview of the principles of the organization and those behind BCP, and the context for how the BCP team will proceed.

72

Facility Recovery

Disruptions, in BCP terms, are of three main types: nondisasters, disasters, and catastrophes. A nondisaster is a disruption in service due to a device malfunction or failure. The solution could include hardware, software, or file restoration. A disaster is an event that causes the entire facility to be unusable for a day or longer. This usually requires the use of an alternate processing facility and restoration of software and data from offsite copies. The alternate site must be available to the company until its main facility is repaired and usable. A catastrophe is a major disruption that destroys the facility altogether. This requires both a short-term solution, which would be an offsite facility, and a long-term solution, which may require rebuilding the original facility.

73

12. Which areas of a company are recovery plans recommended for?

  A. The most important operational and financial areas

  B. The areas that house the critical systems

  C. All areas

  D. The areas that the company cannot survive without

12. C. It is best if every department within the company has its own contingency plan and procedures in place. These individual plans would "roll up" into the overall enterprise BCP.

74

BCP Development Products

Since there is so much work in collecting, analyzing, and maintaining DRP and BCP data, using a product that automates these tasks can prove to be extremely helpful.

75

business continuity plan (BCP)

A disaster recovery plan (DRP) is carried out when everything is still in emergency mode, and everyone is scrambling to get all critical systems back online. A business continuity plan (BCP) takes a broader approach to the problem. It can include getting critical systems to another environment while repair of the original facilities is under way, getting the right people to the right places during this time, and performing business in a different mode until regular conditions are back in place. It also involves dealing with customers, partners, and shareholders through different channels until everything returns to normal. So, disaster recovery deals with, "Oh my goodness, the sky is falling," and continuity planning deals with, "Okay, the sky fell. Now, how do we stay in business until someone can put the sky back where it belongs?"

76

13. Gizmos and Gadgets have restored its original facility after a disaster. What should be moved in first?

A. Management

B. Most critical systems

C. Most critical functions

D. Least critical functions

Extended Questions:

CORRECT D. After the primary site has been repaired, the least critical components are moved in first. This ensures that the primary site is really ready to resume processing. By doing this, you can validate that environmental controls, power, and communication links are working properly. It can also avoid putting the company into another disaster. If the less critical functions survive, then the more critical components of the company can be moved over.

WRONG A is incorrect because personnel should not be moved into the facility until it is determined that the environment is safe, everything is in good working order, and all necessary equipment and supplies are present. Least critical functions should be moved back first, so if there are issues in network configurations or connectivity, or important steps were not carried out, the critical operations of the company are not negatively affected.

WRONG B is incorrect because the most critical systems should not be resumed in the new environment until it has been properly tested. You do not want to go through the trouble of moving the most critical systems and operations from a safe and stable site, only to return them to a main site that is untested. When you move less critical departments over first, they act as the canary. If they survive, then move on to critical systems.

WRONG C is incorrect because the most critical functions should not be moved over before less critical functions, which serve to test the stability and safety of the site. If the site proves to need further preparation, then no harm is done to the critical functions.

77

7. Which of the following is the best way to ensure that the company’s backup tapes can be restored and used at a warm site?

  A. Retrieve the tapes from the offsite facility, and verify that the equipment at the original site can read them.

  B. Ask the offsite vendor to test them, and label the ones that were properly read.

  C. Test them on the vendor’s machine, which won’t be used during an emergency.

  D. Inventory each tape kept at the vendor’s site twice a month.

7. A. A warm site is a facility that will not be fully equipped with the company’s main systems. The goal of using a warm site is that, if a disaster takes place, the company will bring its systems with it to the warm site. If the company cannot bring the systems with it because they are damaged, the company must purchase new systems that are exactly like the original systems. So, to properly test backups, the company needs to test them by recovering the data on its original systems at its main site.

78

Different organizations have different criteria, because the business drivers and critical functions will vary from organization to organization. The criteria may comprise some or all of the following elements:

  • Danger to human life
  • Danger to state or national security
  • Damage to facility
  • Damage to critical systems
  • Estimated value of downtime that will be experienced

79

Hot Site Disadvantages

  • Very expensive
  • Limited on hardware and software choices

80

DRI International Institute’s Professional Practices for Business Continuity Planners Best practices and framework to allow for BCM processes, which are broken down into the following sections:

  • Program Initiation and Management
  • Risk Evaluation and Control
  • Business Impact Analysis
  • Business Continuity Strategies
  • Emergency Response and Operations
  • Business Continuity Plans
  • Awareness and Training Programs
  • Business Continuity Plan Exercise, Audit, and Maintenance
  • Crisis Communications
  • Coordination with External Agencies

81

31. Which of the following best describes the relationship between high-availability and disaster recovery techniques and technologies?

  A. High-availability technologies and processes are commonly put into place so that if a disaster does take place, either availability of the critical functions continues or the delay of getting them back online and running is low.

  B. High availability deals with asynchronous replication and recovery time objective requirements, which increases disaster recovery performance.

  C. High availability deals with synchronous replication and recovery point objective requirements, which increases disaster recovery performance.

  D. Disaster recovery technologies and processes are put into place to provide high-availability service levels.

31. A. High availability and disaster recovery are not the same, but they have a relationship. High-availability technologies and processes are commonly put into place so that if a disaster does take place, either availability of the critical functions continues or the delay of getting them back online and running is low.

82

Risk Assessment

To achieve success, the organization should systematically plan and execute a formal BCP-related risk assessment. The assessment fully takes into account the organization’s tolerance for continuity risks. The risk assessment also makes use of the data in the BIA to supply a consistent estimate of exposure.

83

Business Continuity Planning

Preplanned procedures allow an organization to

  • Provide an immediate and appropriate response to emergency situations
  • Protect lives and ensure safety
  • Reduce business impact
  • Resume critical business functions
  • Work with outside vendors and partners during the recovery period
  • Reduce confusion during a crisis
  • Ensure survivability of the business
  • Get "up and running" quickly after a disaster

84

Business Impact Analysis (BIA)

How bad is it going to hurt and how long can we deal with this level of pain?

Business continuity planning deals with uncertainty and chance. What is important to note here is that even though you cannot predict whether or when a disaster will happen, that doesn’t mean you can’t plan for it. Just because we are not planning for an earthquake to hit us tomorrow morning at 10 A.M. doesn’t mean we can’t plan the activities required to successfully survive when an earthquake (or a similar disaster) does hit. The point of making these plans is to try to think of all the possible disasters that could take place, estimate the potential damage and loss, categorize and prioritize the potential disasters, and develop viable alternatives in case those events do actually happen.

85

Disk duplexing

NOTE Disk duplexing means there is more than one disk controller. If one disk controller fails, the other is ready and available.

86

Warm and Cold Site Disadvantages

  • Operational testing not usually available
  • Resources for operations not immediately available

87

The following provides a quick overview of the differences between offsite facilities:

Hot Site Advantages

  • Ready within hours for operation
  • Highly available
  • Usually used for short-term solutions, but available for longer stays
  • Annual testing available

88

Simulation Test

Everyone take your places. Okay, action!

This type of test takes a lot more planning and people. In this situation, all employees who participate in operational and support functions, or their representatives, come together to practice executing the disaster recovery plan based on a specific scenario. The scenario is used to test the reaction of each operational and support representative. Again, this is done to ensure specific steps were not left out and that certain threats were not overlooked. It raises the awareness of the people involved.

89

salvage team

The restoration team should be responsible for getting the alternate site into a working and functioning environment, and the salvage team should be responsible for starting the recovery of the original site. Both teams must know how to do many tasks, such as install operating systems, configure workstations and servers, string wire and cabling, set up the network and configure networking services, and install equipment and applications. Both teams must also know how to restore data from backup facilities. They also must know how to do so in a secure manner, one that ensures the confidentiality, integrity, and availability of the system and data.

90

Business Continuity Institute’s Good Practice Guidelines (GPG) BCM best practices, which are broken down into the following management and technical practices:

  • Management Practices:
  • Technical Practices:

91

3. Identified and implemented preventive controls

  • Put controls into place to reduce the company’s identified risks
  • Bought insurance
  • Implemented facility structural reinforcements
  • Rolled out backup solutions for data
  • Installed redundant and fault-tolerant mechanisms

92

High availability (HA)

High availability (HA) is a combination of technologies and processes that work together to ensure that some specific thing is always up and running. The specific thing can be a database, a network, an application, a power supply, etc. Service providers have SLAs with their customers, which outline the amount of uptime they promise to provide and a turnaround time to get the item fixed if it does go down. For example, a hosting company can promise to provide 98 percent uptime for Internet connectivity. This means they are guaranteeing that at least 98 percent of the time, the Internet connection you purchase from them will be up and running. The hosting company knows that some things may take place to interrupt this service, but within your SLA with them, it promises an eight-hour turnaround time. This means if your Internet connection does go down, they will either fix it or provide you with a different connection within eight hours.

93

ISO/IEC 27031:2011

ISO/IEC 27031:2011 Guidelines for information and communications technology readiness for business continuity. This ISO/IEC standard that is a component of the overall ISO/IEC 27000 series was covered in Chapter 2.

94

30. C. Loss criteria must be applied to the individual threats that were identified. The criteria should include at least the following:

  • Loss in reputation and public confidence
  • Loss of competitive advantages
  • Increase in operational expenses
  • Violations of contract agreements
  • Violations of legal and regulatory requirements
  • Delayed income costs
  • Loss in revenue
  • Loss in productivity

95

Understanding the Organization First

A company has no real hope of rebuilding itself and its processes after a disaster if it does not have a good understanding of how its organization works in the first place. This notion might seem absurd at first. You might think, "Well, of course a company knows how it works." But you would be surprised at how difficult it is to fully understand an organization down to the level of detail required to rebuild it. Each individual may know and understand his or her little world within the company, but hardly anyone at any company can fully explain how each and every business process takes place.

96

4. Which of the following is the best way to ensure that the company’s backup tapes can be restored and used at a warm site?

A. Ask the offsite vendor to test them and label the ones that were properly read.

B. Test them on the vendor’s machine, which won’t be used during an emergency.

C. Retrieve the tapes from the offsite facility and verify that the equipment from the original site can read them.

D. Inventory each tape kept at the vendor’s site twice a month.

Extended Questions:

CORRECT C. A warm site is a facility that will not be fully equipped with the company’s main systems. The goal of using a warm site is that, if a disaster takes place, the company will bring its systems with it to the warm site. If the company cannot bring the systems with it because they are damaged, the company must purchase new systems that are exactly like the original systems. So, to properly test backups, the company needs to test them by recovering the data on its original systems at its main site.

WRONG A is incorrect because a warm site is a leased or rented facility that is usually partially configured with some equipment, but not the actual computers. Staging a facility with duplicate hardware and computers configured for immediate operation is extremely expensive, so a warm site provides an alternate facility with some peripheral devices. This is the most widely used model. It is less expensive than a hot site and can be up and running within a reasonably time period. It may be a better choice for companies that depend upon proprietary and unusual hardware and software, because they will bring their own hardware and software with them to the site after the disaster hits.

WRONG B is incorrect because testing backups on machines that won’t be used during an emergency does not provide assurance that the backups will work on the machines that will be used. The backups should be tested by recovering the data on the original systems at the company’s main site because these systems will need to be moved to the warm site in the case of an emergency.

WRONG D is incorrect because inventorying backup tapes does not provide assurance that the data on the tapes will be properly recovered. The tapes must be tested by recovering the data on them on the systems at the company’s main site.

97

Assigning Values to Assets

The next step in the risk analysis is to assign a value to the assets that could be affected by each threat. This helps establish economic feasibility of the overall plan. As discussed in Chapter 2, assigning values to assets is not as straightforward as it seems. The value of an asset is not just the amount of money paid for it. The asset’s role in the company has to be considered, along with the labor hours that went into creating it if it is a piece of software. The value amount could also encompass the liability issues that surround the asset if it were damaged or insecure in any manner. (Review Chapter 2 for an in-depth description and criteria for calculating asset value.)

98

32. Susan is the new BCM coordinator and needs to identify various preventive and recovery solutions her company should implement for BCP\DRP efforts. She and her team have carried out an impact analysis and found out that the company’s order processing functionality cannot be out of operation for more than 15 hours. She has calculated that the order processing systems and applications must be brought back online within eight hours after a disruption. The analysis efforts have also indicated that the data that are restored cannot be older than five minutes of current real-time data. Which of the following best describes the metrics and their corresponding values that Susan’s team has derived?

  A. MTD of the order processing functionality is 15 hours. RPO value is 8 hours. WRT value is 7 hours. RTO value is 5 minutes.

  B. MTD of the order processing functionality is 15 hours. RTO value is 8 hours. WRT value is 7 hours. RPO value is 5 minutes.

  C. MTD of the order processing functionality is 15 hours. RTO value is 7 hours. WRT value is 8 hours. RPO value is 5 minutes.

  D. MTD of the order processing functionality is 8 hours. RTO value is 15 hours. WRT value is 7 hours. RPO value is 5 minutes.

Answers

32. B. The order processing functionality as a whole has to be up and running within 15 hours, which is the maximum tolerable downtime (MTD). The systems and applications have to be up and running in eight hours, which is the Recovery Time Objective (RTO). RTO deals with technology, but we still need processes and people in place to run the technology. Work Recovery Time (WRT) is the remainder of the overall MTD value. RTO usually deals with getting the infrastructure and systems back up and running, and WRT deals with restoring data, testing processes, and then making everything "live" for production purposes. The data that are restored for this function can only be five minutes old; thus, the Recovery Point Objective (RPO) has the value of five minutes.

99

The organization can take the following steps to better ensure the continuity of its outsourcing:

  • Make the ability of such companies to reliably assure continuity of products and services part of any work proposals.
  • Make sure that BCP is included in contracts with such companies, and that their responsibilities and levels of service are clearly spelled out.
  • Draw up realistic and reasonable service levels that the outsourced firm will meet during an incident.
  • If possible, have the outsourcing companies take part in BCP awareness programs, training, and testing.

100

21. Which of the following does not describe a reciprocal agreement?

  A. The agreement is enforceable.

  B. It is a cheap solution.

  C. It may be able to be implemented right after a disaster.

  D. It could overwhelm a current data processing site.

21. A. A reciprocal agreement is not enforceable, meaning that the company that agreed to let the damaged company work out of its facility can decide not to allow this to take place. A reciprocal agreement is a better secondary backup option if the original plan falls through.

101

BCP Project Components

Before everyone runs off in 2,000 different directions at one time, let’s understand what needs to be done in the project initiation phase. This is the phase in which the company really needs to figure out what it is doing and why. So, after someone gets the donuts and coffee, let’s get down to business.

102

Which types of preventive mechanisms should be put in place depends upon the results of the BIA, but they may include some of the following:

  • Fortification of the facility in its construction materials
  • Redundant servers and communications links
  • Redundant power lines coming in through different transformers
  • Redundant vendor support
  • Purchasing of insurance
  • Purchasing of uninterruptible power supplies (UPSs) and generators
  • Data backup technologies
  • Media protection safeguards
  • Increased inventory of critical equipment
  • Fire detection and suppression systems

103

continuity planning

The goal of disaster recovery is to minimize the effects of a disaster or disruption. It means taking the necessary steps to ensure that the resources, personnel, and business processes are able to resume operation in a timely manner. This is different from continuity planning, which provides methods and procedures for dealing with longer-term outages and disasters. The goal of a disaster recovery plan is to handle the disaster and its ramifications right after the disaster hits; the disaster recovery plan is usually very information technology (IT)-focused.

104

15. What is the most crucial requirement in developing a business continuity plan?

  A. Business impact analysis

  B. Implementation, testing, and following through

  C. Participation from each and every department

  D. Management support

15. D. Management’s support is the first thing to obtain before putting any real effort into developing these plans. Without management’s support, the effort will not receive the necessary attention, resources, funds, or enforcement.

105

The BCP coordinator needs to define several different teams that should be properly trained and available if a disaster hits. The types of teams an organization needs depend upon the organization. The following are some examples of teams that a company may need to construct:

  • Damage assessment team
  • Legal team
  • Media relations team
  • Recovery team
  • Relocation team
  • Restoration team
  • Salvage team
  • Security team

106

Priorities

It is extremely important to know what is critical versus what is merely nice to have. Different departments provide different functionality for an organization. The critical departments must be singled out from the departments that provide functionality that the company can live without for a week or two. It is necessary to know which department must come online first, which second, and so on. That way, the efforts are made in the most useful, effective, and focused manner. Along with the priorities of departments, the priorities of systems, information, and programs must be established. It may be necessary to ensure that the database is up and running before working to bring the web servers online. The general priorities must be set by management with the help of the different departments and IT staff.

107

full backup

The first step is to do a full backup, which is just what it sounds like—all data are backed up and saved to some type of storage media. During a full backup, the archive bit is cleared, which means that it is set to 0. A company can choose to do full backups only, in which case the restoration process is just one step, but the backup and restore processes could take a long time.

108

Table 8-1 Steps to Be Documented and Approved in Continuity Planning

  • Objective-to-task mapping
  • Resource-to-task mapping
  • Workflows
  • Milestones
  • Deliverables
  • Budget estimates
  • Success factors
  • Deadlines

109

Authority

In times of crisis, it is important to know who is in charge. Teamwork is important in these situations, and almost every team does much better with an established and trusted leader. Such leaders must know that they are expected to step up to the plate in a time of crisis and understand what type of direction they should provide to the rest of the employees. Clear-cut authority will aid in reducing confusion and increasing cooperation.

110

9. Of the following plans, which establishes senior management and a head-quarters after a disaster?

A. Continuity of operations plan

B. Cyber-incident response plan

C. Occupant emergency plan

D. IT contingency plan

Extended Questions:

CORRECTA. A continuity of operations plan (COOP) establishes senior management and a headquarters after a disaster. It also outlines roles and authorities, orders of succession, and individual role tasks. Creating a COOP begins with assessing how the organization operates to identify mission-critical staff, materials, procedures, and equipment. If one exists, review the business process flowchart. Identify suppliers, partners, contractors, and other businesses the organization interacts with on a daily basis, and create a list of these and other businesses the organization could use in an emergency. It is important for an organization to make plans for what it will do if the building becomes inaccessible.

WRONG B is incorrect because a cyber-incident response plan focuses on malware, hackers, intrusions, attacks, and other security issues. It outlines procedures for incident response with the goal of limiting damage, minimizing recovery time, and reducing costs. A cyber-incident response plan should include a description of the different types of incidents, who to call when an incident occurs and each person’s responsibilities, procedures for addressing different types of incidents, and forensic procedures. The plan should be tested, and all participants should be trained on their responsibilities.

WRONG C is incorrect because an occupant emergency plan establishes personnel safety and evacuation procedures. The goal of an occupant emergency plan is to reduce the risk to personnel and minimize the disruption to work and operations in the case of an emergency. The plan should include procedures for ensuring the safety of employees with disabilities, including their evacuation from the facility if necessary. All employees should have access to the occupant emergency response plan, and it should be practiced so that everyone knows how to execute it.

WRONG D is incorrect because an IT contingency plan establishes procedures for the recovery of systems, networks, and major applications after disruptions. Steps for creating IT contingency plans are addressed in the NIST 800-34 document.

111

Important issues need to be addressed before a disaster hits if a company decides to participate in a reciprocal agreement with another company:

  • How long will the facility be available to the company in need?
  • How much assistance will the staff supply in integrating the two environments and ongoing support?
  • How quickly can the company in need move into the facility?
  • What are the issues pertaining to interoperability?
  • How many of the resources will be available to the company in need?
  • How will differences and conflicts be addressed?
  • How does change control and configuration management take place?
  • How often can drills and testing take place?
  • How can critical assets of both companies be properly protected?

112

disk mirroring

Disk shadowing is used to ensure the availability of data and to provide a fault-tolerant solution by duplicating hardware and maintaining more than one copy of the information. The data are dynamically created and maintained on two or more identical disks. If only disk mirroring is used, then each disk would have a corresponding mirrored disk that contains the exact same information. If shadow sets are used, the data can be stored as images on two or more disks.

113

11. Preplanned business continuity procedures provide organizations a number of benefits. Which of the following is not a capability enabled by business continuity planning?

A. Resuming critical business functions

B. Letting business partners know your company is unprepared

C. Protecting lives and ensuring safety

D. Ensuring survivability of the business

Extended Questions:

CORRECT B. Preplanned business continuity procedures afford organizations a number of benefits. They allow an organization to provide an immediate and appropriate response to emergency situations, reduce business impact, and work with outside vendors during a recovery period—in addition to the other answer options listed above. The efforts in these areas should be communicated to business partners to let them know that the company is prepared in case a disaster takes place.

WRONG A is incorrect because a business continuity plan allows an organization to resume critical business functions. As part of the BCP creation, the BCP team conducts a business impact analysis, which includes identifying the maximum tolerable downtime for critical resources. This effort helps the team prioritize recovery efforts so that the most critical resources can be recovered first.

WRONG C is incorrect because a business continuity plan allows an organization to protect lives and ensure safety. People are a company’s most valuable asset; thus, human resources are a critical component to any recovery and continuity process and need to be fully thought out and integrated into the plan. When this is done, a business continuity plan helps a company protect its employees.

WRONG D is incorrect because a preplanned business continuity plan allows a company to ensure the survivability of the business. A business continuity plan provides methods and procedures for dealing with longer-term outages and disasters. It includes getting critical systems to another environment while the original facility is being repaired and conducting business operations in a different mode until regular operations are back in place. In short, the business continuity plan deals with how business is conducted during the aftermath of an emergency.

114

14. Which is the proper sequence of steps followed in business continuity management?

  A. Project initiation, strategy development, business impact analysis, plan development, implementation, testing, and maintenance

  B. Strategy development, project initiation, business impact analysis, plan development, implementation, testing, and maintenance

  C. Implementation and testing, project initiation, strategy development, business impact analysis, and plan development

  D. Plan development, project initiation, strategy development, business impact analysis, implementation, testing, and maintenance

14. A. These steps outline the processes that should take place in the correct order from beginning to end in business continuity management.

115

2. As his company’s business continuity coordinator, Matthew is responsible for helping recruit members to the business continuity planning (BCP) committee. Which of the following does not correctly describe this effort?

A. Committee members should be involved with the planning stages, as well as the testing and implementation stages.

B. The smaller the team the better, to keep meetings under control.

C. The business continuity coordinator should work with management to appoint committee members.

D. The team should consist of people from different departments across the company.

Extended Questions:

CORRECT B. The BCP committee should be as large as it needs to be in order to represent each department within the organization. The team must be composed of people who are familiar with the different departments within the company, because each department is unique in its functionality and has distinctive risks and threats. The best plan is when all issues and threats are brought to the table and discussed. This cannot be done effectively with a few people who are familiar with only a couple of departments. The committee should be made up of representatives from at least the following departments: business units, senior management, IT department, security department, communications department, and legal department.

WRONG A is incorrect because it is true that committee members should be involved with the planning stages, as well as the testing and implementation stages. If Matthew, the BCP coordinator, is a good management leader, he will understand that it is best to make team members feel a sense of ownership pertaining to their tasks and roles. The people who develop the BCP should also be the ones who execute it. If you knew that in a time of crisis you would be expected to carry out some critical tasks, you might pay more attention during the planning and testing phases.

WRONG C is incorrect because the BCP coordinator should work with management to appoint committee members. But management’s involvement does not stop there. The BCP team should work with management to develop the ultimate goals of the plan, identify the critical parts of the business that must be dealt with first during a disaster, and ascertain the priorities of departments and tasks. Management also needs to help direct the team on the scope of the project and the specific objectives.

WRONG D is incorrect because it is true that the team should be composed of people from different departments across the company. This is the only way the team will be able to consider the distinctive risks and threats that each department faces.

116

29. Which of the following have incorrect definition mapping when it comes to disaster recovery steps?

i. Develop the continuity planning policy statement. Write a policy that provides the guidance necessary to develop a BCP and that assigns authority to the necessary roles to carry out these tasks.

ii. Conduct the BIA. Identify critical functions and systems, and allow the organization to prioritize them based on necessity. Identify vulnerabilities and threats, and calculate risks.

iii. Identify preventive controls. Once threats are recognized, identify and implement controls and countermeasures to reduce the organization’s risk level in an economical manner.

iv. Develop recovery strategies. Write procedures and guidelines for how the organization can still stay functional in a crippled state.

v. Develop the contingency plan. Formulate methods to ensure systems and critical functions can be brought online quickly.

vi. Test the plan and conduct training and exercises. Test the plan to identify deficiencies in the BCP, and conduct training to properly prepare individuals on their expected tasks.

vii. Maintain the plan. Put in place steps to ensure the BCP is a living document that is updated regularly.

  A. iii, iv, v

  B. ii, vii

  C. iv, v

  D. iii, iv, v

29. C. The correct disaster recovery steps and their associated definition mappings are laid out as follows:

i. Develop the continuity planning policy statement. Write a policy that provides the guidance necessary to develop a BCP and that assigns authority to the necessary roles to carry out these tasks.

ii. Conduct the BIA. Identify critical functions and systems, and allow the organization to prioritize them based on necessity. Identify vulnerabilities and threats, and calculate risks.

iii. Identify preventive controls. Once threats are recognized, identify and implement controls and countermeasures to reduce the organization’s risk level in an economical manner.

iv. Develop recovery strategies. Formulate methods to ensure systems and critical functions can be brought online quickly.

v. Develop the contingency plan. Write procedures and guidelines for how the organization can still stay functional in a crippled state.

vi. Test the plan and conduct training and exercises. Test the plan to identify deficiencies in the BCP, and conduct training to properly prepare individuals on their expected tasks.

vii. Maintain the plan. Put in place steps to ensure the BCP is a living document that is updated regularly.

117

30. Sam is a manager who is responsible for overseeing the development and the approval of the business continuity plan. He needs to make sure that his team is creating correct and all-inclusive loss criteria when it comes to potential business impacts. Which of the following is not a negative characteristic or value that is commonly included in the criteria?

i. Loss in reputation and public confidence

ii. Loss of competitive advantages

iii. Decrease in operational expenses

iv. Violations of contract agreements

v. Violations of legal and regulatory requirements

vi. Delayed income costs

vii. Loss in revenue

viii. Loss in productivity

  A. i, vii, viii

  B. iii, v, vi

  C. iii

  D. vi

30. C. Loss criteria must be applied to the individual threats that were identified. The criteria should include at least the following:

• Loss in reputation and public confidence

• Loss of competitive advantages

• Increase in operational expenses

• Violations of contract agreements

• Violations of legal and regulatory requirements

• Delayed income costs

• Loss in revenue

• Loss in productivity

118

The team has figured out these types of MTD timelines for the individual business functions, operations, and resources. Now it has to identify the recovery mechanisms and strategies that must be implemented to make sure everything is up and running within the timelines it has calculated. The team needs to break down these recovery strategies into the following sections:

  • Business process recovery
  • Facility recovery
  • Supply and technology recovery
  • User environment recovery
  • Data recovery

119

8. Which best describes a hot-site facility versus a warm- or cold-site facility?

  A. A site that has disk drives, controllers, and tape drives

  B. A site that has all necessary PCs, servers, and telecommunications

  C. A site that has wiring, central air-conditioning, and raised flooring

  D. A mobile site that can be brought to the company’s parking lot

8. B. A hot site is a facility that is fully equipped and properly configured so that it can be up and running within hours to get a company back into production. Answer B gives the best definition of a fully functionally environment.

120

Loss criteria must be applied to the individual threats that were identified. The criteria may include the following:

  • Loss in reputation and public confidence
  • Loss of competitive advantages
  • Increase in operational expenses
  • Violations of contract agreements
  • Violations of legal and regulatory requirements
  • Delayed income costs
  • Loss in revenue
  • Loss in productivity

121

Structured Walk-Through Test

Let’s get in a room and talk about this.

In this test, representatives from each department or functional area come together and go over the plan to ensure its accuracy. The group reviews the objectives of the plan; discusses the scope and assumptions of the plan; reviews the organization and reporting structure; and evaluates the testing, maintenance, and training requirements described. This gives the people responsible for making sure a disaster recovery happens effectively and efficiently a chance to review what has been decided upon and what is expected of them.

122

Implementing Strategies

Once the strategies have been decided upon, the BCP team needs to document them and put them into place. This moves the efforts from a purely planning stage to an actual implementation and action phase.

123

Up until now, we have established management’s responsibilities as the following:

  • Committing fully to the BCP
  • Setting policy and goals
  • Making available the necessary funds and resources
  • Taking responsibility for the outcome of the development of the BCP
  • Appointing a team for the process

124

Recovery Time Objective (RTO)

The Recovery Time Objective (RTO) is the earliest time period and a service level within which a business process must be restored after a disaster to avoid unacceptable consequences associated with a break in business continuity. The RTO value is smaller than the MTD value, because the MTD value represents the time after which an inability to recover significant operations will mean severe and perhaps irreparable damage to the organization’s reputation or bottom line. The RTO assumes that there is a period of acceptable downtime. This means that a company can be out of production for a certain period of time (RTO) and still get back on its feet. But if the company cannot get production up and running within the MTD window, the company is sinking too fast to properly recover.

125

Project Management

BCP is not that important; let’s just wing it.

Sound project management processes, practices, and procedures are important for any organizational effort, and doubly so for BCP. Following accepted project management principles will help ensure effective management of the BCP process once it gets underway.

126

Business Continuity and Disaster Recovery

What do we do if everything blows up? And how can we still make our widgets?

The goal of disaster recovery is to minimize the effects of a disaster or disruption. It means taking the necessary steps to ensure that the resources, personnel, and business processes are able to resume operation in a timely manner. This is different from continuity planning, which provides methods and procedures for dealing with longer-term outages and disasters. The goal of a disaster recovery plan is to handle the disaster and its ramifications right after the disaster hits; the disaster recovery plan is usually very information technology (IT)-focused.

127

17. To get proper management support and approval of the plan, a business case must be made. Which of the following is least important to this business case?

  A. Regulatory and legal requirements

  B. Company vulnerabilities to disasters and disruptions

  C. How other companies are dealing with these issues

  D. The impact the company can endure if a disaster hits

17. C. The other three answers are key components when building a business case. Although it is a good idea to investigate and learn about how other companies are dealing with similar issues, it is the least important of the four items listed.

128

2. What is one of the first steps in developing a business continuity plan?

  A. Identify a backup solution.

  B. Perform a simulation test.

  C. Perform a business impact analysis.

  D. Develop a business resumption plan.

2. C. A business impact analysis includes identifying critical systems and functions of a company and interviewing representatives from each department. Once management’s support is solidified, a business impact analysis needs to be performed to identify the threats the company faces and the potential costs of these threats.

129

4. Developed recovery strategies

  • Implemented processes of getting the company up and running in the necessary time
  • Created the necessary teams
  • Developed goals and procedures for each team
  • Created notification steps and planned activation criteria
  • Identified alternate backup solutions

130

4. During a recovery procedure test, one important step is to maintain records of important events that happen during the test. What other step is just as important?

  A. Schedule another test to address issues that were identified during that procedure.

  B. Make sure someone is prepared to talk to the media with the appropriate responses.

  C. Report the events to management.

  D. Identify essential business functions.

4. C. When recovery procedures are carried out, the outcome of those procedures should be reported to the individuals who are responsible for this type of activity, which is usually some level of management. If the procedures worked properly, management should know it, and if problems were encountered, management should definitely be made aware of them. Members of management are the ones who are responsible overall for fixing the recovery system and will be the ones to delegate this work and provide the necessary funding and resources.

131

A role, or a team, needs to be created to carry out a damage assessment once a disaster has taken place. The assessment procedures should be properly documented and include the following steps:

  • Determine the cause of the disaster.
  • Determine the potential for further damage.
  • Identify the affected business functions and areas.
  • Identify the level of functionality for the critical resources.
  • Identify the resources that must be replaced immediately.
  • Estimate how long it will take to bring critical functions back online.
  • If it will take longer than the previously estimated MTD values to restore operations, then a disaster should be declared and the BCP should be put into action.

132

22. Different threats need to be evaluated and ranked based upon their severity of business risk when developing a BCP. Which ranking approach is illustrated in the graphic that follows?

Choose the following statement that best describes the effect on this business unit/cost center should there be an unplanned interruption of normal business operations.

8 hours of an interruption. This business unit/cost center is Vital.

24 hours of an interruption. This business unit/cost center is Critical.

3 days of an interruption. This business unit/cost center is Essential.

5 days of an interruption. This business unit/cost center is Important.

10 days of an interruption. This business unit/cost center is Noncritical.

30 days of an interruption. This business unit/cost center is Deferrable.

A. Mean time to repair

B. Mean time between failures

C. Maximum critical downtime

D. Maximum tolerable downtime

Extended Questions:

CORRECT D. The BIA identifies which of the company’s critical systems are needed for survival and estimates the outage time that can be tolerated by the company as a result of various unfortunate events. The outage time that can be endured by a company is referred to as the maximum tolerable downtime (MTD). This is the time frame between an unplanned interruption of business operations and the resumption of business at a reduced level of service. During the BIA, the BCP team identifies the maximum tolerable downtime for the critical resources. This was done to understand the business impact that would be caused if the assets were unavailable for one reason or another.

WRONG A is incorrect because the mean time to repair (MTTR) is the amount of time it will be expected to take to get a device fixed and back into production. For a hard drive in a redundant array, the MTTR is the amount of time between the actual failure and the time when, after noticing the failure, someone has replaced the failed drive and the redundant array has completed rewriting the information on the new drive. This is likely to be measured in hours. For an unplanned reboot, the MTTR is the amount of time between the failure of the system and the point in time when it has rebooted its operating system, checked the state of its disks (hopefully finding nothing that its file systems cannot handle), restarted its applications, allowed its applications to check the consistency of their data (hopefully finding nothing that their journals cannot handle), and once again begun processing transactions. For well-built hardware running high-quality, well-managed operating systems and software, this may be only minutes. For commodity equipment without high-performance journaling file systems and databases, this may be hours, or, worse, days if automated recovery/rollback does not work and a restore of data from tape is required.

WRONG B is incorrect because the mean time between failures (MTBF) is the estimated lifespan of a piece of equipment. MTBF is calculated by the vendor of the equipment or a third party. The reason for using this value is to know approximately when a particular device will need to be replaced. Either based on historical data or scientifically estimated by vendors, it is used as a benchmark for reliability by predicting the average time that will pass in the operation of a component or a system until its final death. Organizations trending MTBF over time for the device they use may be able to identify types of devices that are failing above the averages promised by manufacturers and take action such as proactively contacting manufacturers under warranty, or deciding that old devices are reaching the end of their useful life and choosing to replace them en masse before larger-scale failures and operational disruptions occur.

WRONG C is incorrect because maximum critical downtime is not an official term used in BCP and is a distracter answer.

133

The BCP team needs to understand these different steps of the company’s most critical processes. The data are usually presented as a workflow document that contains the roles and resources needed for each process. The BCP team must understand the following about critical business processes:

  • Required roles
  • Required resources
  • Input and output mechanisms
  • Workflow steps
  • Required time for completion
  • Interfaces with other processes

134

28. Business continuity planning needs to provide several types of functionalities and protection types for an organization. Which of the following is not one of these items?

i. Provide an immediate and appropriate response to emergency situations

ii. Protect lives and ensure safety

iii. Reduce business conflicts

iv. Resume critical business functions

v. Work with outside vendors during the recovery period

vi. Reduce confusion during a crisis

vii. Ensure survivability of the business

viii. Get "up and running" quickly after a disaster

  A. ii, iii, vii

  B. ii, iii, v, vi

  C. iii

  D. i, ii

28. C. Preplanned procedures allow an organization to

i. Provide an immediate and appropriate response to emergency situations

ii. Protect lives and ensure safety

iii. Reduce business impact

iv. Resume critical business functions

v. Work with outside vendors during the recovery period

vi. Reduce confusion during a crisis

vii. Ensure survivability of the business

viii. Get "up and running" quickly after a disaster

135

Implementation and testing

It is great to write down very profound ideas and develop plans, but unless they are actually carried out and tested, they may not add up to a hill of beans. Once a continuity plan is developed, it actually has to be put into action. It needs to be documented and put in places that are easily accessible in times of crisis. The people who are assigned specific tasks need to be taught and informed how to fulfill those tasks, and dry runs must be done to walk people through different situations. The drills should take place at least once a year, and the entire program should be continually updated and improved.

136

Risk Assessment Evaluation and Process

In a BCP setting, a risk assessment looks at the impact and likelihood of various threats that could trigger a business disruption. The tools, techniques, and methods of risk assessment include determining threats, assessing probabilities, tabulating threats, and analyzing costs and benefits.

137

The end goals of a risk assessment include:

  • Identifying and documenting single points of failure
  • Making a prioritized list of threats to the particular business processes of the organization
  • Putting together information for developing a management strategy for risk control, and for developing action plans for addressing risks
  • Documenting acceptance of identified risks, or documenting acknowledgment of risks that will not be addressed

138

17. With what phase of a business continuity plan does a company proceed when it is ready to move back into its original site or a new site?

A. Reconstitution phase

B. Recovery phase

C. Project initiation phase

D. Damage assessment phase

Extended Questions:

CORRECT A. When it is time for the company to move back into its original site or a new site, the company is ready to enter into the reconstitution phase. A company is not out of an emergency state until it is back in operation at the original primary site or a new site that was constructed to replace the primary site, because the company is always vulnerable while operating in a backup facility. Many logistical issues need to be considered as to when a company must return from the alternate site to the original site. Some of these issues include ensuring the safety of the employees, ensuring proper communications and connectivity methods are working, and properly testing the new environment. Once the coordinator, management, and salvage team sign off on the readiness of the facility, the salvage team should back up data from the alternate site and restore it within the new facility, carefully terminate contingency operations, and securely transport equipment and personnel to the new facility.

WRONG B is incorrect because the recovery phase includes the preparation of the offsite facility (if needed), the rebuilding of the network and systems, and the organization of staff to move into a new facility. The recovery process needs to be as organized as possible to get the company up and running as soon as possible. Templates should be developed during the plan development stage that can be used by the different teams during the recovery phase to step them through the necessary phases and to document their findings. The templates keep the teams on task and also quickly tell the team leaders about the progress, obstacles, and potential recovery time.

WRONG C is incorrect because the project initiation phase is how the actual planning of the business continuity plan begins. It does not occur during the execution of the plan. The project initiation phase involves getting management support, developing the scope of the plan, and securing funding and resources.

WRONG D is incorrect because the damage assessment takes place at the start of actually carrying out the business continuity procedures. A damage assessment helps determine whether the business continuity plan should be put into action based on activation criteria predefined by the BCP coordinator and team. After the damage assessment, if one or more of the situations outlined in the criteria have taken place, then the team is moved into recovery mode.

139

1. The NIST organization has defined best practices for creating continuity plans. Which of the following phases deals with identifying and prioritizing critical functions and systems?

A. Identify preventive controls.

B. Develop the continuity planning policy statement.

C. Develop recovery strategies.

D. Conduct the business impact analysis.

Extended Questions:

CORRECT D. Although no specific scientific equation must be followed to create continuity plans, certain best practices have proven themselves over time. The National Institute of Standards and Technology (NIST) organization is responsible for developing many of these best practices and documenting them so that they are easily available to all. NIST outlines seven steps in its Special Publication 800-34, Continuity Planning Guide for Information Technology Systems: develop the continuity planning statement; conduct the business impact analysis; identify preventive controls; develop recovery strategies; develop the contingency plan; test the plan and conduct training and exercises; and maintain the plan. Conducting a business impact analysis involves identifying critical functions and systems, and allowing the organization to prioritize them based on necessity. It also includes identifying vulnerabilities and threats, and calculating risks.

WRONG A is incorrect because identifying preventive controls must be done after critical functions and systems have been prioritized, and their vulnerabilities, threats, and risks identified—which is all part of the business impact analysis. Conducting a business impact analysis is step two of creating a continuity plan, and identifying preventive controls is step three.

WRONG B is incorrect because developing the continuity planning policy statement involves writing a policy that provides the guidance necessary to develop a business continuity plan and that assigns authority to the necessary roles to carry out these tasks. It is the first step in creating a business continuity plan and thus comes before identifying and prioritizing critical systems and functions, which is part of the business impact analysis.

WRONG C is incorrect because developing recovery strategies involves formulating methods to ensure systems and critical functions can be brought online quickly. Before this can be done, a business impact analysis must be carried out to determine which systems and functions are critical and should be given priority during recovery.

140

The BCP team’s responsibilities are as follows:

  • Identifying regulatory and legal requirements that must be met
  • Identifying all possible vulnerabilities and threats
  • Estimating the possibilities of these threats and the loss potential
  • Performing a BIA
  • Outlining which departments, systems, and processes must be up and running before any others
  • Identifying interdependencies among departments and processes
  • Developing procedures and steps in resuming business after a disaster

141

Develop the contingency plan

5. Develop the contingency plan. Write procedures and guidelines for how the organization can still stay functional in a crippled state.

142

3. How often should a business continuity plan be tested?

  A. At least every ten years

  B. Only when the infrastructure or environment changes

  C. At least every two years

  D. Whenever there are significant changes in the organization and annually

3. D. The plans should be tested if there have been substantial changes to the company or the environment. They should also be tested at least once a year.

143

Weaknesses

Characteristics that place the team at a disadvantage relative to others

144

business interruption insurance

A company could also choose to purchase a business interruption insurance policy. With this type of policy, if the company is out of business for a certain length of time, the insurance company will pay for specified expenses and lost earnings. Another policy that can be bought insures accounts receivable. If a company cannot collect on its accounts receivable for one reason or another, this type of coverage covers part or all of the losses and costs.

145

9. Which is the best description of remote journaling?

  A. Backing up bulk data to an offsite facility

  B. Backing up transaction logs to an offsite facility

  C. Capturing and saving transactions to two mirrored servers in-house

  D. Capturing and saving transactions to different media types

9. B. Remote journaling is a technology used to transmit data to an offsite facility, but this usually only includes moving the journal or transaction logs to the offsite facility, not the actual files.

146

5. Which of the following actions is least important when quantifying risks associated with a potential disaster?

  A. Gathering information from agencies that report the probability of certain natural disasters taking place in that area

  B. Identifying the company’s key functions and business requirements

  C. Identifying critical systems that support the company’s operations

  D. Estimating the potential loss and impact the company would face based on how long the outage lasted

5. A. The question asked you about quantifying the risks, which means to calculate the potential business impact of specific disasters. The core components of a business impact analysis are

• Identifying the company’s key functions and business requirements

• Identifying critical systems that support the company’s operations

• Estimating the potential loss and impact the company would face based on how long the outage lasted

147

Full-Interruption Test

Shut down and move out!

This type of test is the most intrusive to regular operations and business productivity. The original site is actually shut down, and processing takes place at the alternate site. The recovery team fulfills its obligations in preparing the systems and environment for the alternate site. All processing is done only on devices at the alternate offsite facility.

148

1. What action should take place to restore a system and its data files after a system failure?

  A. Restore from storage media backup.

  B. Perform a parallel test.

  C. Implement recovery procedures.

  D. Perform a walk-through test.

1. C. In this and similar situations, recovery procedures should be followed, which most likely include recovering data from the backup media. Recovery procedures could include proper steps for rebuilding a system from the beginning, applying the necessary patches and configurations, and ensuring that what needs to take place to ensure productivity is not affected. Some type of redundant system may need to be put into place.

149

The committee needs to step through scenarios in which the following problems result:

  • Equipment malfunction or unavailable equipment
  • Unavailable utilities (HVAC, power, communications lines)
  • Facility becomes unavailable
  • Critical personnel become unavailable
  • Vendor and service providers become unavailable
  • Software and/or data corruption

150

Standards and Best Practices

Although no specific scientific equation must be followed to create continuity plans, certain best practices have proven themselves over time. The National Institute of Standards and Technology (NIST) is responsible for developing best practices and standards as they pertain to U.S. government and military environments. It is common for NIST to document the requirements for these types of environments, and then everyone else in the industry uses their documents as guidelines. So these are "musts" for U.S. government organizations and "good to have" for other nongovernment entities.

151

BCP committee

A leader needs a team, so a BCP committee needs to be put together. Management and the coordinator should work together to appoint specific, qualified people to be on this committee. The team must comprise people who are familiar with the different departments within the company, because each department is unique in its functionality and has distinctive risks and threats. The best plan is when all issues and threats are brought to the table and discussed. This cannot be done effectively with a few people who are familiar with only a couple of departments. Representatives from each department must be involved with not only the planning stages but also the testing and implementation stages.

152

The process of drawing up a policy includes these steps:

  • Identify and document the components of the policy.
  • Identify and define policies of the organization that the BCP might affect.
  • Identify pertinent legislation, laws, regulations, and standards.
  • Identify "good industry practice" guidelines by consulting with industry experts.
  • Perform a gap analysis. Find out where the organization currently is in terms of continuity planning, and spell out where it wants to be at the end of the BCP process.
  • Compose a draft of the new policy.
  • Have different departments within the organization review the draft.
  • Put the feedback from the departments into a revised draft.
  • Get the approval of top management on the new policy.
  • Publish a final draft, and distribute and publicize it throughout the organization.

153

27. What would the items in the following graphic best be collectively called?

  A. Business impact values

  B. Activation phase values

  C. Maximum tolerable downtime values

  D. Reconstitution impact times and values

27. C. Maximum tolerable downtime values. This is the timeframe between an unplanned interruption of business operations and the resumption of business at a reduced level of service. The BIA identifies which of the company’s critical systems are needed for survival and estimates the outage time that can be tolerated by the company as a result of various unfortunate events. The outage time that can be endured by a company is referred to as the maximum tolerable downtime.

154

Single points of failure, that is, concentrations of risk that threaten business continuity

  • Continuity risks from concentrations of critical skills or critical shortages of skills
  • Continuity risks due to outsourced vendors and suppliers
  • Continuity risks that the BCP program has accepted, that are handled elsewhere, or that the BCP program does not address

155

19. ACME Inc. paid a software vendor to develop specialized software, and that vendor has gone out of business. ACME Inc. does not have access to the code and therefore cannot keep it updated. What mechanism should the company have implemented to prevent this from happening?

A. Reciprocal agreement

B. Software escrow

C. Electronic vaulting

D. Business interruption insurance

Extended Questions:

CORRECT B. The protection mechanism that ACME Inc. should have implemented is called software escrow. Software escrow means that a third party holds the source code, and backups of the compiled code, manuals, and other supporting materials. A contract between the software vendor, customer, and third party outlines who can do what and when with the source code. This contract usually states that the customer can have access to the source code only if and when the vendor goes out of business, is unable to carry out stated responsibilities, or is in breach of the original contract. If any of these activities takes place, then the customer is protected because it can still gain access to the source code and other materials through the third-party escrow agent.

WRONG A is incorrect because a reciprocal agreement is an offsite facility option that involves two companies agreeing to share their facility in case a disaster renders one of the facilities unusable. Reciprocal agreements deal with disaster recovery and not software protection when dealing with the developing vendor.

WRONG C is incorrect because electronic vaulting is a type of electronic backup solution. Electronic vaulting makes copies of files as they are modified and periodically transmits them to an offsite backup site. The transmission does not happen in real time but is carried out in batches. So, a company can choose to have all files that have been changed sent to the backup facility every hour, day, week, or month. The information can be stored in an offsite facility and retrieved from that facility in a short period of time. Electronic vaulting has to do with backing up data so that it is available if there is a disruption or disaster.

WRONG D is incorrect because a business interruption insurance policy covers specified expenses and lost earnings if a company is out of business for a certain length of time. This insurance is commonly purchased to protect a company in case a disaster takes place and they have to shut down their services for a specific period of time. It does not have anything to do with protection or accessibility of source code.

156

5. An approach to alternate offsite facilities is to establish a reciprocal agreement. Which of the following describes the pros and cons of a reciprocal agreement?

A. It is fully configured and ready to operate within a few hours, but is the most expensive of the offsite choices.

B. It is an inexpensive option, but takes the most time and effort to get up and running after a disaster.

C. It is a good alternative for companies that depend upon proprietary software, but annual testing is not usually available.

D. It is the cheapest of the offsite choices, but mixing operations could introduce many security issues.

Extended Questions:

CORRECT D. A reciprocal agreement, also referred to as mutual aid, means that company A agrees to allow company B to use its facilities if company B is hit by a disaster, and vice versa. This is a cheaper way to go than the other offsite choices, but it is not always the best choice. Most environments are maxed out pertaining to the use of facility space, resources, and computing capability. To allow another company to come in and work out of the same shop could prove to be detrimental to both companies. The stress of two companies working in the same environment could cause tremendous levels of tension. If it did work out, it would only provide a short-term solution. Configuration management could be a nightmare, and the mixing of operations could introduce many security issues. Reciprocal agreements have been known to work well in specific businesses, such as newspaper printing. These businesses require very specific technology and equipment that will not be available through any subscription service. For most other organizations, they are generally, at best, a secondary option for disaster protection.

WRONG A is incorrect because a hot site—not a reciprocal agreement—is fully configured and ready to operate within a few hours. A hot site is also the most expensive offsite option. The only missing resources from a hot site are usually the data, which will be retrieved from a backup site, and the people who will be processing the data. The equipment and system software must be compatible with the data being restored from the main site and must not cause any negative interoperability issues. Hot sites are a good choice for a company that needs to ensure a site will be available for it as soon as possible.

WRONG B is incorrect because a cold site is an inexpensive offsite option, but it takes the most time and effort to actually get up and functioning right after a disaster. With cold sites the vendor supplies the basic environment, electrical wiring, air conditioning, plumbing, and flooring, but none of the equipment or additional services. It may take weeks to get the site activated and ready for work.

WRONG C is incorrect because a warm site is a good alternative for companies that depend upon proprietary software. A warm site is equipped with some equipment, but not the actual computers. It is a better choice than a reciprocal agreement or hot site for a company that depends upon proprietary and unusual hardware and software, because they will bring their own hardware and software with them to the site after a disaster hits. The disadvantage of using a warm site is that the vendors’ contracts do not usually include annual testing, which helps ensure that the company can return to an operating state within hours.

157

Human Resources

We have everything up and running now—where are all the people to run these systems?

One of the resources commonly left out of the equation is people. A company may restore its networks and critical systems and get business functions up and running, only to realize it doesn’t know the answer to the question, "Who will take it from here?" The area of human resources is a critical component to any recovery and continuity process, and it needs to be fully thought out and integrated into the plan.

158

Identify preventive controls

3. Identify preventive controls. Once threats are recognized, identify and implement controls and countermeasures to reduce the organization’s risk level in an economical manner.

159

Enterprise-Wide BCP

The agreed-upon scope of the BCP will indicate if one or more facilities will be included in the plan. Most BCPs are developed to cover the enterprise as a whole, instead of dealing with only portions of the organization. In larger organizations, it can be helpful for each department to have its own specific contingency plan that will address its specific needs during recovery. These individual plans need to be compatible with the enterprise-wide BCP.

160

High Availability

High availability (HA) is a combination of technologies and processes that work together to ensure that some specific thing is always up and running. The specific thing can be a database, a network, an application, a power supply, etc. Service providers have SLAs with their customers, which outline the amount of uptime they promise to provide and a turnaround time to get the item fixed if it does go down. For example, a hosting company can promise to provide 98 percent uptime for Internet connectivity. This means they are guaranteeing that at least 98 percent of the time, the Internet connection you purchase from them will be up and running. The hosting company knows that some things may take place to interrupt this service, but within your SLA with them, it promises an eight-hour turnaround time. This means if your Internet connection does go down, they will either fix it or provide you with a different connection within eight hours.

161

25. Disaster recovery plans can stay updated by doing any of the following except:

  A. Making disaster recovery a part of every business decision

  B. Making sure it is part of employees’ job descriptions

  C. Performing regular drills that use the plan

  D. Making copies of the plan and storing them in an offsite facility

25. D. The plan should be part of normal business activities. A lot of time and resources go into creating disaster recovery plans, after which they are usually stored away and forgotten. They need to be updated continuously as the environment changes to ensure that the company can properly react to any type of disaster or disruption.

162

Business Continuity Planning Requirements

A major requirement for anything that has such far-reaching ramifications as business continuity planning is management support, as mentioned previously. It is critical that management understands what the real threats are to the company, the consequences of those threats, and the potential loss values for each threat. Without this understanding, management may only give lip service to continuity planning, and in some cases, that is worse than not having any plans at all because of the false sense of security it creates. Without management support, the necessary resources, funds, and time will not be devoted, which could result in bad plans that, again, may instill a false sense of security. Failure of these plans usually means a failure in management understanding, vision, and due-care responsibilities.

163

Fault tolerance

Fault tolerance is the capability of a technology to continue to operate as expected even if something unexpected takes place (a fault). If a database experiences an unexpected glitch, it can roll back to a known good state and continue functioning as though nothing bad happened. If a packet gets lost or corrupted during a TCP session, the TCP protocol will resend the packet so that system-to-system communication is not affected. If a disk within a RAID system gets corrupted, the system uses its parity data to rebuild the corrupted data so that operations are not affected.

164

15. Which of the following is not a reason to develop and implement a disaster recovery plan?

A. Provide steps for a post-disaster recovery.

B. Extend backup operations to include more than just backing up data.

C. Outline business functions and systems.

D. Provide procedures for emergency responses.

Extended Questions:

CORRECT C. Outlining business functions and systems is not a viable reason to create and implement a disaster recovery plan. Although these tasks will most likely be accomplished as a result of a disaster recovery plan, it is not a good reason to carry out the plan compared to the other answers in the question. You don’t develop and implement a disaster recovery plan just to outline business functions and systems, although that usually takes place during the planning process.

WRONG A is incorrect because providing steps for a post-disaster recovery is a good reason to develop and implement a disaster recovery plan. In fact, that is exactly what a disaster recovery plan provides. The goal of disaster recovery is to minimize the effects of a disaster and take the necessary steps to ensure that the resources, personnel, and business processes are able to resume operation in a timely manner. The goal of a disaster recovery plan is to handle the disaster and its ramifications right after the disaster hits.

WRONG B is incorrect because extending backup operations to include more than just backing up data is a good reason to develop and implement a disaster recovery plan. When looking at disaster recovery plans, some companies focus mainly on backing up data and providing redundant hardware. Although these items are extremely important, they are just small pieces of the company’s overall operations. Hardware and computers need people to configure and operate them, and data is usually not useful unless it is accessible by other systems and possibly outside entities. All of these things can require backups, not just data.

WRONG D is incorrect because providing procedures for emergency responses is a good reason to develop and implement a disaster recovery plan. A disaster recovery plan is carried out when everything is still in emergency mode and everyone is scrambling to get all critical systems back online. Having well-thought-out written procedures makes this whole process much more effective.

165

Cold site

Cold site A leased or rented facility that supplies the basic environment, electrical wiring, air conditioning, plumbing, and flooring, but none of the equipment or additional services. A cold site is essentially an empty data center. It may take weeks to get the site activated and ready for work. The cold site could have equipment racks and dark fiber (fiber that does not have the circuit engaged) and maybe even desks. However, it would require the receipt of equipment from the client, since it does not provide any.

166

27. High availability (HA) is a combination of technologies and processes that work together to ensure that specific critical functions are always up and running at the necessary level. To provide this level of high availability, a company has to have a long list of technologies and processes that provide redundancy, fault tolerance, and failover capabilities. Which of the following best describes these characteristics?

A. Redundancy is the duplication of noncritical components or functions of a system with the intention of decreasing reliability of the system. Fault tolerance is the capability of a technology to discontinue to operate as expected even if something unexpected takes place. If a technology has a failover capability, this means that if there is a failure that cannot be handled through normal means, then processing is "switched over" to a working system.

B. Redundancy is the duplication of critical components or functions of a system with the intention of increasing reliability of the system. Fault tolerance is the capability of a technology to continue to operate as expected even if something unexpected takes place. If a technology has a failover capability, this means that if there is a failure that cannot be handled through normal means, then processing is "switched over" to a working system.

C. Redundancy is the duplication of critical components or functions of a system with the intention of increasing reliability of the system. Fault tolerance is the capability of a technology to continue to operate as expected even if something unexpected takes place. If a technology has a failover capability, this means that if there is a failure that cannot be handled through normal means, then processing is "switched over" to a nonworking system.

D. Redundancy is the duplication of critical components or functions of a system with the intention of increasing reliability of the system. Fault tolerance is the capability of a technology to continue to operate as expected even if something unexpected takes place. If a technology has a failover capability, this means that if there is a failure that cannot be handled through normal means, then processing is "switched over" to a working system.

Extended Questions:

CORRECT D. High availability (HA) is a combination of technologies and processes that work together to ensure that specific critical functions are always up and running. The specific thing can be a database, a network, an application, a power supply, etc. To provide this level of high availability, the company has to have a long list of technologies and processes that provide redundancy, fault tolerance, and failover capabilities. Redundancy, fault tolerance, and failover capabilities increase the reliability of a system or network. High reliability allows for high availability.

WRONG A is incorrect because redundancy within this type of technology encompasses the duplication of critical components or functions of a system with the intention of increasing reliability of the system. Redundancy is commonly built into the network at a routing protocol level. The routing protocols are configured so if one link goes down or gets congested, then traffic is routed over a different network link. Redundant hardware can also be available so if a primary device goes down the backup component can be swapped out and activated.

WRONG B is incorrect because fault tolerance is the capability of a technology to continue to operate as expected even if something unexpected takes place (a fault). If a database experiences an unexpected glitch, it can roll back to a known good state and continue functioning as though nothing bad happened. If a packet gets lost or corrupted during a TCP session, the TCP protocol will resend the packet so that system-to-system communication is not affected. If a disk within a RAID system gets corrupted, the system uses its parity data to rebuild the corrupted data so that operations are not affected.

WRONG C is incorrect because if a technology has a failover capability, this means that if there is a failure that cannot be handled through normal means, then processing is "switched over" to a working system.

The following scenario will be used to answer questions 28 and 29.

167

The BCP committee must identify the threats to the company and map them to the following characteristics:

  • Maximum tolerable downtime and disruption for activities
  • Operational disruption and productivity
  • Financial considerations
  • Regulatory responsibilities
  • Reputation

168

20. When is the emergency actually over for a company?

  A. When all people are safe and accounted for

  B. When all operations and people are moved back into the primary site

  C. When operations are safely moved to the offsite facility

  D. When a civil official declares that all is safe

20. B. The emergency is not actually over until the company moves back into its primary site. The company is still vulnerable and at risk while it is operating in an altered or crippled state. This state of vulnerability is not over until the company is operating in the way it was prior to the disaster. Of course, this may mean that the primary site has to be totally rebuilt if it was destroyed.

169

Maintain the plan

7. Maintain the plan. Put in place steps to ensure the BCP is a living document that is updated regularly.

170

24. There are several types of redundant technologies that can be put into place. What type of technology is shown in the graphic that follows?

A. Tape vaulting

B. Remote journaling

C. Electronic vaulting

D. Redundant site

Extended Questions:

CORRECT A. Each site should have a full set of the most current and updated information and files, and a commonly used software backup technology is referred to as tape vaulting. Many businesses back up their data to tapes that are then manually transferred to an offsite facility by a courier or an employee. With automatic tape vaulting, the data is sent over a serial line to a backup tape system at the offsite facility. The company that maintains the offsite facility maintains the systems and changes out tapes when necessary. Data can be quickly backed up and retrieved when necessary. This technology reduces the manual steps in the traditional tape backup procedures. Basic vaulting of tape data is sending backup tapes to an offsite location, but a manual process can be error prone. Electronic tape vaulting transmits data over a network to tape devices located at an alternate data center. Electronic tape vaulting improves recovery speed and reduces errors, and backups can be run more frequently.

WRONG B is incorrect because remote journaling is a technology used to transmit data to an offsite facility, but this usually only includes moving the journal or transaction logs to the offsite facility, not the actual files. This graphic specifically shows a tape controller and remote journaling mainly takes place between databases. Remote journaling involves transmitting the journal or transaction log offsite to a backup facility. These logs contain the deltas (changes) that have taken place to the individual files. If and when data are corrupted and need to be restored, the company can retrieve these logs, which are used to rebuild the lost data. Journaling is efficient for database recovery, where only the reapplication of a series of changes to individual records is required to resynchronize the database.

WRONG C is incorrect because electronic vaulting most commonly takes place between databases and makes copies of files as they are modified and periodically transmits them to an offsite backup site. The transmission does not happen in real time but is carried out in batches. So, a company can choose to have all files that have been changed sent to the backup facility every hour, day, week, or month. The information can be stored in an offsite facility and retrieved from that facility in a short period of time. This form of backup takes place in many financial institutions, so when a bank teller accepts a deposit or withdrawal, the change to the customer’s account is made locally to that branch’s database and to the remote site that maintains the backup copies of all customer records.

WRONG D is incorrect because while the graphic could be illustrating that the tape controller is located at a redundant site, a redundant site is not actually a technology. Some companies choose to have redundant sites, meaning one site is equipped and configured exactly like the primary site, which serves as a redundant environment. These sites are owned by the company and are mirrors of the original production environment. This is one of the most expensive backup facility options, because a full environment must be maintained even though it usually is not used for regular production activities until after a disaster takes place that triggers the relocation of services to the redundant site.