Lesson 20 Flashcards

1
Q

High Availability

A

High
availability is usually loosely described as 24x7 (24 hours per day, 7 days per week) or
24x365 (24 hours per day, 365 days per year). For a critical system, availability will be
described as “two-nines” (99%) up to five- or six-nines (99.9999%):

Availability
Annual Downtime
99.9999%
00:00:32
99.999%
00:05:15
99.99%
00:52:34
99.9%
08:45:36
99.0%
87:36

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Maximum tolerable downtime (MTD)

A

The maximum tolerable downtime (MTD) metric
expresses the availability requirement for a particular business function.

High
availability is usually loosely described as 24x7 (24 hours per day, 7 days per week) or
24x365 (24 hours per day, 365 days per year). For a critical system, availability will be
described as “two-nines” (99%) up to five- or six-nines (99.9999%):

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Scheduled service intervals versus unplanned outages

A

Downtime is calculated from the sum of scheduled service intervals plus unplanned outages over
the period.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Scalability

A

Scalability
•Increase capacity within similar cost ratio
•Scale out versus scale up

Scalability is the capacity to
increase resources to meet demand within similar cost ratios. This means that if service
demand doubles, costs do not more than double. There are two types of scalability:
• To scale out is to add more resources in parallel with existing resources.
• To scale up is to increase the power of existing resources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Elasticity

A

Elasticity
•Cope with changes to demand in real time

Elasticity refers to the system’s ability to handle these changes on demand in real
time. A system with high elasticity will not experience loss of service or performance if
demand suddenly increases rapidly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Fault tolerance and redundancy

A

A system that can experience failures and continue to provide the same (or nearly the
same) level of service is said to be fault tolerant. Fault tolerance is often achieved
by provisioning redundancy for critical components and single points of failure. A
redundant component is one that is not essential to the normal function of a system
but that allows the system to recover from the failure of another component.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Power problems
•Spikes and surges
•Blackouts and brownouts

A

All types of computer systems require a stable power supply to operate. Electrical
events, such as voltage spikes or surges, can crash computers and network appliances,
while loss of power from brownouts or blackouts will cause equipment to fail.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Power management

A

Powermanagement means deploying systems to ensure that equipment is protected against
these events [blackouts, brownouts, spikes and surges] and that network operations can either continue uninterrupted or be
recovered quickly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Dual Power Supplies

A

Dual power supplies
•Component redundancy for server chassis

An enterprise-class server or appliance enclosure is likely to feature two or more power
supply units (PSUs) for redundancy. A hot plug PSU can be replaced (in the event of
failure) without powering down the system.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Managed power distribution units (PDUs)

A

Managed power distribution units (PDUs)
•Protection against spikes, surges, and brownouts
•Remote monitoring

The power circuits supplying grid power to a rack, network closet, or server room
must be enough to meet the load capacity of all the installed equipment, plus room
for growth. Consequently, circuits to a server room will typically be higher capacity
than domestic or office circuits (30 or 60 amps as opposed to 13 amps, for instance).
These circuits may be run through a power distribution unit (PDU). These come with
circuitry to “clean” the power signal, provide protection against spikes, surges, and
brownouts, and can integrate with uninterruptible power supplies (UPSs). Managed
PDUs support remote power monitoring functions, such as reporting load and
status, switching power to a socket on and off, or switching sockets on in a particular
sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Battery backups and uninterruptible power supply (UPS)

A

Battery backups and uninterruptible power supply (UPS)
•Battery backup at component level
•UPS battery backups for servers and appliances

If there is loss of power, system operation can be sustained for a few minutes or hours
(depending on load) using battery backup. Battery backup can be provisioned at the
component level for disk drives and RAID arrays. The battery protects any read or write
operations cached at the time of power loss. At the system level, an uninterruptible
power supply (UPS) will provide a temporary power source in the event of a blackout
(complete power loss). This may range from a few minutes for a desktop-rated model
to hours for an enterprise system. In its simplest form, a UPS comprises a bank of
batteries and their charging circuit plus an inverter to generate AC voltage from the DC
voltage supplied by the batteries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Generators

A

A backup power generator can provide power to the whole building, often for several
days. Most generators use diesel, propane, or natural gas as a fuel source. With diesel
and propane, the main drawback is safe storage (diesel also has a shelf-life of between
18 months and two years); with natural gas, the issue is the reliability of the gas
supply in the event of a natural disaster. Data centers are also investing in renewable
power sources, such as solar, wind, geothermal, hydrogen fuel cells, and hydro. The
ability to use renewable power is a strong factor in determining the best site for new
data centers. Large-scale battery solutions, such as Tesla’s Powerpack (tesla.com/
powerpack), may be able to provide an alternative to backup power generators. There
are also emerging technologies to use all the battery resources of a data center as a
microgrid for power storage (scientificamerican.com/article/how-big-batteries-at-datacenters-
could-replace-power-plants/).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Network reducnancy

A

Networking is another critical resource where the a single point of failure could cause
significant service disruption.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Network Interface Card (NIC) Teaming

A

Network interface card (NIC) teaming, or adapter teaming, means that the server
is installed with multiple NICs, or NICs with multiple ports, or both. Each port is
connected to separate network cabling. During normal operation, this can provide a
high-bandwidth link. For example, four 1 GB ports gives an overall bandwidth of 4 GB.
If there is a problem with one cable, or one NIC, the network connection will continue
to work, though at just 3 GB.

From Wikipedia: A network interface controller (NIC, also known as a network interface card,[3] network adapter, LAN adapter or physical network interface,[4] and by similar terms) is a computer hardware component that connects a computer to a computer network.[5]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Switching and Routing (for network redundancy

A

Switching and routing
•Design network with multiple paths

Network cabling should be designed to allow for multiple paths between the various
switches and routers, so that during a failure of one part of the network, the rest
remains operational.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Load balancers (for network reducnancy)

A

Load balancers
•Load balancing switch to distribute workloads
•Clusters provision multiple redundant servers to share data and session information

NIC teaming provides load balancing at the adapter level. Load balancing and
clustering can also be provisioned at a service level:
• A load balancing switch distributes workloads between available servers.
• A load balancing cluster enables multiple redundant servers to share data and
session information to maintain a consistent service if there is failover from one
server to another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Disk Redundancy

A

Disk and storage resources are critically dependent on redundancy. While backup provides
integrity for when a disk fails, to restore from backup would require installing a new
storage unit, restoring the data, and testing the system configuration. Disk redundancy
ensures that a server can continue to operate if one, or possibly more, storage devices fail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Redundant array of independent disks (RAID)

A

When a storage system is configured as a Redundant Array of Independent Disks
(RAID), many disks can act as backups for each other to increase reliability and fault
tolerance. If one disk fails, the data is not lost, and the server can keep functioning.
The RAID advisory board defines RAID levels, numbered from 0 to 6, where each level
corresponds to a specific type of fault tolerance. There are also proprietary and nested
RAID solutions. Some of the most commonly implemented types of RAID are listed in
the following table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Raid 1, 5, 6, Nested, and level 0

A

RAID 1
•Mirroring
•50% storage efficiency

RAID 5 and RAID 6
•Striping with distributed parity
•Better storage efficiency

Nested RAID
•Better performance or redundancy

RAID -

Level 1

Mirroring means that data is written to two
disks simultaneously, providing redundancy
(if one disk fails, there is a copy of data
on the other). The main drawback is that
storage efficiency is only 50%.

Level 5

Striping with parity means that data is
written across three or more disks, but
additional information (parity) is calculated.
This allows the volume to continue if one
disk is lost. This solution has better storage
efficiency than RAID 1.

Level 6
Double parity, or level 5 with an additional
parity stripe, allows the volume to continue
when two devices have been lost

Nested (0+1, 1+0, or 5+0)

Nesting RAID sets generally improves
performance or redundancy. For example,
some nested RAID solutions can support the
failure of more than one disk.

Raid Level 0

RAID level 0 refers to striping without parity. Data is written in blocks across several disks
simultaneously, but with no redundancy. This can improve performance, but if one disk
fails, so does the whole volume, and data on it will be corrupted. There are some use cases
for RAID 0, but typically striping without parity is only implemented to improve performance
in a nested RAID solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Multipath

A

Multipath
•Controller and cabling redundancy

Where RAID provides redundancy for the storage devices, multipath is focused on
the bus between the server and the storage devices or RAID array. A storage system is
accessed via some type of controller. The controller might be connected to disk units
locally installed in a server, or it might connect to storage devices within a storage area
network (SAN). Multipath input/ouput (I/O) ensures that there is controller redundancy
and/or multiple network paths to the storage devices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Replication context

A
  • Local storage (RAID)
  • Storage area network (SAN)
  • Database
  • Virtual machine (VM)

Data replication is technology that maintains exact copies of data at more than one
location. RAID mirroring and parity implement types of replication between local
storage devices. Data replication can be applied in many other contexts:
• Storage Area Network (SAN)—most enterprise storage is configured as a SAN. A
SAN is a high-speed fiber optic network of storage devices built from technologies
such as Fibre Channel, Small Computer System Interface (SCSI), or Infiniband.
Redundancy can be provided within the SAN, and replication can also take place
between SANs using WAN links.
• Database—much data is stored within a database. Where a database is replicated
between multiple servers or sites, it is very important to maintain consistency
between the replicas. Database management systems come with specific tools to
implement different kinds of replication.
• Virtual Machine (VM)—the same VM instance may need to be deployed in multiple
locations. This can be achieved by replicating the VM’s disk image and configuration
settings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Geographic dispersal

A

Geographical dispersal refers to data replicating hot and warm sites that are
physically distant from one another. This means that data is protected against a
natural disaster wiping out storage at one of the sites. This is also described as a georedundant
solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Asynchronous and synchronous replication

A
  • Synchronous (must be written at both sites—expensive)
  • Asynchronous (one site is primary and the others secondary)
  • Optimum distances between sites

Synchronous replication is designed to write data to all replicas simultaneously.
Therefore, all replicas should always have the same data all of the time. Asynchronous
replication writes data to the primary storage first, and then copies data to the replicas
at scheduled intervals.
Asynchronous replication isn’t a good choice for a solution that requires data in
multiple locations to be consistent, such as data from product inventory lists accessed
in different regions. Many geo-redundant replication services rely on asynchronous
replication due to the distances between data centers in multiple regions. In some
cases, business solutions work around the limitations of asynchronous replication. For
example, an online retailer may choose only to show inventory from their local regional
warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

On-Premises versus Cloud

A

High availability through redundancy and replication is resource-intensive, especially
when configuring multiple hot or warm sites. For on-premises sites, provisioning the
storage devices and high-bandwidth, low-latency WAN links required between two
geographically dispersed hot sites could incur unaffordable costs. This cost is one of the
big drivers of cloud services, where local and geographic redundancy are built into the
system, if you trust the CSP to operate the cloud effectively. For example, in the cloud,
geo-redundancy replicates data or services between data centers physically located
in two different regions. Disasters that occur at the regional level, like earthquakes,
hurricanes, or floods, should not impact availability across multiple zones.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Backups and Retention Policy
* Short term retention * Version control and recovery from corruption/malware * Long term retention * Regulatory/business requirements * Recovery window * Recovery point objective (RPO)
26
•Short term retention
* Short term retention * Version control and recovery from In the short term, files that change frequently might need retaining for version control. Short-term retention is also important in recovering from malware infection. Consider the scenario where a backup is made on Monday, a file is infected with a virus on Tuesday, and when that file is backed up later on Tuesday, the copy made on Monday is overwritten. This means that there is no good means of restoring the uninfected version of the file. Short-term retention is determined by how often the youngest media sets are overwritten.
27
•Long term retention
* Long term retention | * Regulatory/business requirements
28
•Recovery window
* Recovery window * Recovery point objective (RPO) For these reasons, backups are kept back to certain points in time. As backups take up a lot of space, and there is never limitless storage capacity, this introduces the need for storage management routines to reduce the amount of data occupying backup storage media while giving adequate coverage of the required recovery window. The recovery window is determined by the recovery point objective (RPO), which is determined through business continuity planning. Advanced backup software can prevent media sets from being overwritten in line with the specified retention policy.
29
Backup Typers
See slide or guide. there is a graphic. Full Incremental Differential
30
•Snapshots
* Snapshots * Feature of file system allowing open file copy * Volume Shadow Copy Service (VSS) * VM snapshots and checkpoints * Image-based backup * System images Snapshots are a means of getting around the problem of open files. If the data that you're considering backing up is part of a database, such as SQL data or an Exchange messaging system, then the data is probably being used all the time. Often copy-based mechanisms will be unable to back up open files. Short of closing the files, and so too the database, a copy-based system will not work. A snapshot is a point-in-time copy of data maintained by the file system. A backup program can use the snapshot rather than the live data to perform the backup. In Windows, snapshots are provided for on NTFS volumes by the Volume Shadow Copy Service (VSS). They are also supported on Sun's ZFS file system, and under some enterprise distributions of Linux. Virtual system managers can usually take snapshot or cloned copies of VMs. A snapshot remains linked to the original VM, while a clone becomes a separate VM from the point that the cloned image was made. An image backup is made by duplicating an OS installation. This can be done either from a physical hard disk or from a VM's virtual hard disk. Imaging allows the system to be redeployed quickly, without having to reinstall third-party software, patches, and configuration settings. A system image should generally not contain any user data files, as these will quickly become out of date.
31
•Image-based backup
* Image-based backup * System images An image backup is made by duplicating an OS installation. This can be done either from a physical hard disk or from a VM's virtual hard disk. Imaging allows the system to be redeployed quickly, without having to reinstall third-party software, patches, and configuration settings. A system image should generally not contain any user data files, as these will quickly become out of date.
32
•Volume Shadow Copy Service (VSS)
In Windows, snapshots are provided for on | NTFS volumes by the Volume Shadow Copy Service (VSS).
33
•VM snapshots and checkpoints
Virtual system managers can usually take snapshot or cloned copies of VMs. A snapshot remains linked to the original VM, while a clone becomes a separate VM from the point that the cloned image was made.
34
Backup Storage Issues
Backup security •Access control and encryption Offsite storage •Distance consideration •Physical transfer •Network/cloud backups Online versus offline backups •Speed of restore operations •Risk to online backup data (offline take more time to get operational but offline offers better security 3-2-1 rule
35
3-2-1 rule
The 3-2-1 rule states that you should have three copies of your data, across two media types, with one copy held offline and offsite.
36
Backup Media Types
Disk •SOHO backups •Lack enterprise-level capacity and manageability Network attached storage (NAS) •File-level/protocol-based access •No offsite option Tape •Enterprise-level capacity and manageability Storage area network (SAN) and cloud •Block-level access to storage devices •Highly configurable •Mix storage technologies to implement performance tiers
37
Disk
Disk •SOHO backups •Lack enterprise-level capacity and manageability Individual removable hard drives are an excellent low-cost option for SOHO network backups, but they do not have sufficient capacity or flexibility to be used within an automated enterprise backup solution.
38
Network attached storage (NAS)
Network attached storage (NAS) •File-level/protocol-based access •No offsite option A network attached storage (NAS) appliance is a specially configured type of server that makes RAID storage available over common network protocols, such as Windows File Sharing (SMB) or FTP. A NAS appliance is accessed via an IP address and backup takes place at file-level. A NAS can be another good option for SOHO backup, but as a single device, it provides no offsite option. As it is normally kept online, it can be vulnerable to cryptoransomware as well.
39
Tape
Tape •Enterprise-level capacity and manageability - slow Digital tape systems are a popular choice for institutions with multi-terabyte storage requirements. Tape is very cost effective and, given a media rotation system, tapes can be transported offsite. The latest generation of tape will store about 10-12 terabytes per cartridge or up to about 30 TB with compression. The main drawback of tape is that it is slow, compared to disk-based solutions, especially for restore operations.
40
Storage area network (SAN) and cloud
Storage area network (SAN) and cloud •Block-level access to storage devices •Highly configurable •Mix storage technologies to implement performance tiers A RAID array or tape drive/autoloader can be provisioned as direct attached storage, where a server hosts the backup devices, usually over serial attached SCSI (SAS). Direct attached storage has limited scalability, so enterprise and cloud storage solutions often use storage area networks (SAN) as a layer of abstraction between the file system objects presented to servers and the configuration of the actual storage media. Where NAS uses file-level access to storage, a SAN is based on block-level addressing. A SAN can incorporate RAID arrays and tape systems within the same network. SANs can achieve offsite storage through replication.
41
Restoration Order
A complex facility such as a data center or campus network must be reconstituted according to a carefully designed order of restoration. If systems are brought back online in an uncontrolled way, there is the serious risk of causing additional power problems or of causing problems in the network, OS, or application layers because dependencies between different appliances and servers have not been met. 1. Power delivery systems 2. Switch infrastructure then routing appliances and systems 3. Network security appliances 4. Critical network servers 5. Backend and middleware and verify data integrity 6. Front-end applications 7. Client workstations and devices and client browser access
42
Non-Persistence
Separate compute instance from data •Snapshot/revert to known state •Rollback to known configuration •Live boot media Provisioning •Master image •Automated build from template Configuration validation
43
Definition of Non-persistance
Separate compute instance from data •Snapshot/revert to known state •Rollback to known configuration •Live boot media When recovering systems, it may be necessary to ensure that any artifacts from the disaster, such as malware or backdoors, are removed when reconstituting the production environment. This can be facilitated in an environment designed for nonpersistence. Nonpersistence means that any given instance is completely static in terms of processing function. Data is separated from the instance so that it can be swapped out for an "as new" copy without suffering any configuration problems. There are various mechanisms for ensuring nonpersistence:
44
Provisioning
Provisioning •Master image •Automated build from template When provisioning a new or replacement instance automatically, the automation system may use one of two types of mastering instructions: • Master image—this is the "gold" copy of a server instance, with the OS, applications, and patches all installed and configured. This is faster than using a template, but keeping the image up to date can involve more work than updating a template. • Automated build from a template—similar to a master image, this is the build instructions for an instance. Rather than storing a master image, the software may build and provision an instance according to the template instructions.
45
Configuration Validation
Another important process in automating resiliency strategies is to provide configuration validation. This process ensures that a recovery solution is working at each layer (hardware, network connectivity, data replication, and application). An automation solution for incident and disaster recovery will have a dashboard of key indicators and may be able to evaluate metrics such as compliance with RPO and RTO from observed data.
46
Configuration Management
Configuration management ensures that each component of ICT infrastructure is in a trusted state that has not diverged from its documented properties. Change control and change management reduce the risk that changes to these components could cause service disruption. * Service assets * Configuration items (CIs) * Assets that require configuration management * Baseline configuration * Configuration management system (CMS) * Creating and updating diagrams * Workflows * Physical and logical network topologies * Network rack layouts * …
47
Service Assets
Service assets are things, processes, or people that contribute to the delivery of an IT service.
48
Configuration Items (CIs)
* Configuration items (CIs) * Assets that require configuration management A Configuration Item (CI) is an asset that requires specific management procedures for it to be used to deliver the service. Each CI must be identified by some sort of label, ideally using a standard naming convention. CIs are defined by their attributes and relationships, which are stored in a configuration management database (CMDB).
49
•Baseline configuration
baseline configuration is the template of settings that a device, VM instance, or other CI was configured to, and that it should continue to match. You might also record performance baselines, such as the throughput achieved by a server, for comparison with monitored levels.
50
•Configuration management system (CMS)
* Configuration management system (CMS) * Creating and updating diagrams * Workflows * Physical and logical network topologies * Network rack layouts * … A configuration management system (CMS) is the tools and databases that collect, store, manage, update, and present information about CIs and their relationships.A small network might capture this information in spreadsheets and diagrams; there are dedicated applications for enterprise CMS. Diagrams are the best way to capture the complex relationships between network elements. Diagrams can be used to show how CIs are involved in business workflows, logical (IP) and physical network topologies, and network rack layouts. Remember, it is not sufficient simply to create the diagram, you must also keep the diagram up to date.
51
Asset Management
Inventory/asset management database Asset identification and standard naming conventions •Barcodes and RFID tags •Standard naming conventions for asset IDs •Attribute fields and tags ``` Internet protocol (IP) schema •Static allocation versus DHCP ranges •IP address management (IPAM) software suites ```
52
Assess Management (definition)
An asset management process tracks all the organization's critical systems, components, devices, and other objects of value in an inventory. It also involves collecting and analyzing information about these assets so that personnel can make more informed changes or otherwise work with assets to achieve business goals.
53
Inventory/asset management database
There are many software suites and associated hardware solutions available for tracking and managing assets. An asset management database can be configured to store as much or as little information as is deemed necessary, though typical data would be type, model, serial number, asset ID, location, user(s), value, and service information.
54
Asset identification and standard naming conventions
Asset identification and standard naming conventions •Barcodes and RFID tags •Standard naming conventions for asset IDs •Attribute fields and tags Tangible assets can be identified using a barcode label or radio frequency ID (RFID) tag attached to the device (or more simply, using an identification number). An RFID tag is a chip programmed with asset data. When in range of a scanner, the chip activates and signals the scanner. The scanner alerts management software to update the device's location. As well as asset tracking, this allows the management software to track the location of the device, making theft more difficult. A standard naming convention for hardware assets, and for digital assets such as accounts and virtual machines, makes the environment more consistent. This means that errors are easier to spot and that it is easier to automate through scripting. The naming strategy should allow administrators to identify the type and function of any particular resource or location at any point in the CMDB or network directory. Each label should conform to rules for host and DNS names (support.microsoft.com/en-us/ help/909264/naming-conventions-in-active-directory-for-computers-domains-sitesand). As well as an ID attribute, the location and function of tangible and digital assets can be recorded using attribute tags and fields or DNS CNAME and TXT resource records.
55
Internet protocol (IP) schema
``` Internet protocol (IP) schema •Static allocation versus DHCP ranges •IP address management (IPAM) software suites ``` The division of the IP address space into subnets should be carefully planned and documented in an Internet Protocol (IP) schema. Using a consistent addressing methodology makes it easier to apply firewall access control lists (ACLs) and perform security monitoring (tools.cisco.com/security/center/resources/security_ip_addressing. html). It also makes configuration errors less likely and easier to detect. Within each subnet, the schema should identify IP addresses reserved for manual or static allocation versus DHCP address pools. IP address management (IPAM) software suites can be used to monitor IP usage.
56
Change Control and Change Management
``` Change control •Assesswhethera change shouldbemade •Classifying change (reactive, proactive, risk) •Request for Change (RFC) •Change Advisory Board (CAB) ``` Change management •Ensurechanges are appliedwithminimum disruption •Rollback plan - Every change should be accompanied by a rollback (or remediation) plan, so that the change can be reversed if it has harmful or unforeseen consequences.
57
Site Resiliency
Alternate processing sites/recovery sites •Provideredundancy for damage to resources stored on the primary site •Failover to alternate processing site (or system) Hot site •Instantaneous failover Warm site •Some delay or manual configuration before failover occurs Cold site •Significant delay and configuration before failover can occur
58
Hot Site
Hot site •Instantaneous failover A hot site can failover almost immediately. It generally means that the site is already within the organization's ownership and is ready to deploy. For example, a hot site could consist of a building with operational computer equipment that is kept updated with a live data set.
59
Warm Site
Warm site •Some delay or manual configuration before failover occurs A warm site could be similar, but with the requirement that the latest data set will need to be loaded.
60
Cold Site
Cold site •Significant delay and configuration before failover can occur A cold site takes longer to set up. A cold site may be an empty building with a lease agreement in place to install whatever equipment is required when necessary.
61
Diversity and Defense in Depth
Layered security and defense in depth Technology and control diversity •Provision different classes and types of controls •Mix technical, administrative, and physical controls •Deploy controls to prevent, deter, detect, and correct Vendor diversity •Use more than one supplier Crypto diversity
62
Layered security and defense in depth
Layered security is typically seen as improving cybersecurity resiliency because it provides defense in depth. The idea is that to fully compromise a system, the attacker must get past multiple security controls, providing control diversity. These layers reduce the potential attack surface and make it much more likely that an attack will be deterred or prevented, or at least detected and then prevented by manual intervention.
63
Technology and control diversity
Technology and control diversity •Provision different classes and types of controls •Mix technical, administrative, and physical controls •Deploy controls to prevent, deter, detect, and correct Allied with defense in depth is the concept of security through (or with) diversity. Technology diversity refers to environments that are a mix of operating systems, applications, coding languages, virtualization solutions, and so on. Control diversity means that the layers of controls should combine different classes of technical and administrative controls with the range of control functions: prevent, detect, correct, and deter. Consider the scenario where Alan from marketing is sent a USB stick containing designs for a new billboard campaign from an agency. Without defense in depth, Alan might find the USB stick on his desk in the morning, plug it into his laptop without much thought, and from that point is potentially vulnerable to compromise. There are many opportunities in this scenario for an attacker to tamper with the media: at the agency, in the post, or at Alan's desk. Defense in depth, established by deploying a diverse range of security controls, could mitigate the numerous risks inherent in this scenario: • User training (administrative control) could ensure that the media is not left unattended on a desk and is not inserted into a computer system without scanning it first. • Endpoint security (technical control) on the laptop could scan the media for malware or block access automatically. • Security locks inserted into USB ports (physical control) on the laptop could prevent attachment of media without requesting a key, allowing authorization checks to be performed first. • Permissions restricting Alan's user account (technical control) could prevent the malware from executing successfully. • The use of encrypted and digitally signed media (technical control) could prevent or identify an attempt to tamper with it. • If the laptop were compromised, intrusion detection and logging/alerting systems (technical control) could detect and prevent the malware spreading on the network.
64
Vendor diversity
Vendor diversity •Use more than one supplier As well as deploying multiple types of controls, you should consider the advantages of leveraging vendor diversity. Vendor diversity means that security controls are sourced from multiple suppliers. A single vendor solution is a tempting choice for many organizations, as it provides interoperability and can reduce training and support costs. Some disadvantages could include the following: • Not obtaining best-in-class performance—one vendor might provide an effective firewall solution, but the bundled malware scanning is found to be less effective. • Less complex attack surface—a single vulnerability in a supplier's code could put multiple appliances at risk in a single vendor solution. A threat actor will be able to identify controls and possible weaknesses more easily. • Less innovation—dependence on a single vendor might make the organization invest too much trust in that vendor's solutions and less willing to research and test new approaches.
65
Crypto diversity
This concept can be extended to the selection of algorithms and implementations of cryptography. Adoption of methods such as blockchain-based IAM (ibm.com/ blogs/blockchain/2018/10/decentralized-identity-an-alternative-to-password-basedauthentication) or selecting ChaCha in place of AES as a preferred cipher suite (blog.cloudflare.com/it-takes-two-to-chacha-poly) forces threat actors to develop new attack methods.
66
Deception and Disruption Strategies
Asymmetry of attack and defense Active defense Fake/decoy assets •Honeypots, honeynets, and honeyfiles •Breadcrumbs ``` Disruption strategies •Bogus DNS records •Decoy directories and resources •Port spoofing to return fake telemetry/monitoring data •DNS sinkholes ```
67
Asymmetry of attack and defense
The practice of cybersecurity is often described as asymmetric warfare; the defenders have to win every encounter and be ready all the time. The threat actors can choose when to attack and only have to win once. Some cybersecurity tactics aim to reduce that asymmetry by increasing the attack cost. This means that a threat actor has to commit more resources to even plan an attack.
68
Active defense
Active defense means an engagement with the adversary, but this can be interpreted in several different ways. One type of active defense involves the deployment of decoy assets to act as lures or bait. It is much easier to detect intrusions when an attacker interacts with a decoy resource, because you can precisely control baseline traffic and normal behavior in a way that is more difficult to do for production assets.
69
Fake/decoy assets
Fake/decoy assets •Honeypots, honeynets, and honeyfiles •Breadcrumbs •Disruption strategies
70
Honeypot and honeynet
A honeypot is a computer system set up to attract threat actors, with the intention of analyzing attack strategies and tools, to provide early warnings of attack attempts, or possibly as a decoy to divert attention from actual computer systems. Another use is to detect internal fraud, snooping, and malpractice. A honeynet is an entire decoy network. This may be set up as an actual network or simulated using an emulator.
71
Honeyfile
A honeypot or honeynet can be combined with the concept of a honeyfile, which is convincingly useful, but actually fake, data. This honeyfile can be made trackable, so that when a threat actor successfully exfiltrates it, the attempts to resuse or exploit it can be traced.
72
Disruption strategies
``` Disruption strategies •Bogus DNS records •Decoy directories and resources •Port spoofing to return fake telemetry/monitoring data •DNS sinkholes ```
73
•Bogus DNS records
Using bogus DNS entries to list multiple hosts that do not exist.
74
•Decoy directories and resources
Configuring a web server with multiple decoy directories or dynamically generated pages to slow down scanning.
75
•Port spoofing to return fake telemetry/monitoring data
Using port triggering or spoofing to return fake telemetry data when a host detects port scanning activity. This will result in multiple ports being falsely reported as open and will slow down the scan. Telemetry can refer to any type of measurement or data returned by remote scanning. Similar fake telemetry could be used to report IP addresses as up when they are not, for instance.
76
•DNS sinkholes
Using a DNS sinkhole to route suspect traffic to a different network, such as a honeynet, where it can be analyzed.