Lecture Two - System Availability Flashcards
Information Technology (IT):
Encompasses technologies related to storage, retrieval, manipulation, and communication of information.
Includes computers, networks, phones, and fax machines.
Infrastructure Definition:
The underlying framework or features of a system or organization.
Fundamental facilities and systems serving a country, city, or area, such as transportation and communication systems.
Benefits of IT Infrastructure
Commonly Accepted Benefits:
Automates manual activities.
Handles increased volumes of data efficiently.
Extends the range of tasks that can be performed.
Enhances customer service quality.
Increases the quality of finished products.
Improves information sharing and manipulation capabilities.
Components of IT Infrastructure - Elements of IT Infrastructure
Business Process: Operations that support business goals.
Information and Data: Key resources for decision-making.
Applications and Servers: Software and hardware systems.
Buildings and Electricity Providers: Physical and power resources.
Hardware and Software: Essential computing equipment and programs.
Data & Storage: Systems for data management and retention.
Network Services: Connectivity and communication services.
IT System Model - System Layers
Process/Information: Core business processes and data handling.
Applications: Software tools and systems.
Application Integration: Ensures seamless operation and data flow.
Infrastructure: Physical and virtual resources.
IT System Model - Considerations
Availability: System uptime and reliability.
Performance: Efficiency and speed of operations.
Security: Protection against threats and vulnerabilities.
End User Devices: Interfaces for user interaction.
Operating Systems, Servers, Networks, Virtualisation, Data Centres: Core components for IT operation.
System Availability -
Availability%=( MeasuredTimePeriod/Uptime
)×100
System Availability and SLAs - Common Availability Levels
99.0%, 99.9%, 99.95% typically specified in SLAs.
99.999% known as carrier-grade availability.
System Availability and SLAs - Downtime Estimates
99.8%: 17.5 hours/year, 86.2 minutes/month, 20.2 minutes/week.
99.9%: 8.8 hours/year, 43.2 minutes/month, 10.1 minutes/week.
99.99%: 52.6 minutes/year, 4.3 minutes/month, 1.0 minute/week.
99.999%: 5.3 minutes/year, 25.9 seconds/month, 6.1 seconds/week.
Unavailability Intervals - Definition
Used in conjunction with availability percentage to define acceptable downtime.
Example for 99.9% Availability:
525 minutes of downtime/year should not occur as a single event.
Downtime can be spread across many short events.
Unavailability Intervals - Interval Specifications
0 - 5 minutes: ≤ 35 times/year
5 - 10 minutes: ≤ 10 times/year
10 - 20 minutes: ≤ 5 times/year
20 - 30 minutes: ≤ 2 times/year
> 30 minutes: ≤ 1 time/year
Estimating System Availability - SLAs
Provide upfront availability guarantees; actual availability is computed afterward.
Estimating System Availability - Estimation Factors
Mean Time to Repair (MTTR): Average time to repair/recover failed components.
Mean Time Between Failures (MTBF): Average time between failures.
Estimating System Availability - Timeline
Failure: Time when a system component fails.
Recovery: Time taken to repair the system.
MTTR and MTBF: Key metrics for assessing system reliability.
Estimating Availability with MTBF and MTTR
EstimatedAvailability%=(
MTBF/(MTBF+MTTR))×100
Observed Availability and Failures
Failure Probability: Changes over time, typically following a bathtub curve.
Failure Phases:
Early Failures: Initial phase with higher failure rates.
Random Failures: Stable phase with constant failure rate.
Wear-Out Failures: Increased failures as components age.
Observed Availability: Influenced by component reliability and failure rates.
Multi-Component Availability
Comprise multiple components, each with its availability.
A (system) = A1 x A2 x A3… where A1, A2, A3… are the availabilities of the individual components.
System Availability and Components - Graphical Representation
System availability decreases as the number of components increases.
Visualizes availability for different component reliability (99%, 95%, 90%).
System Availability with Multiple Components - Insight
Increasing the number of components increases the likelihood of system failures.
Redundancy in IT Systems - Purpose
Improves system availability and robustness by duplicating components/functions.
Acts as a backup to mitigate failures.
Redundancy in IT Systems - Cost Implications
Pros: Enhances reliability and reduces downtime.
Cons: Increases overall system cost.
Parallel System Availability
Availability improves as the number of systems/components in parallel increases.
Formula -
A (Parrallel) = 1 - (1-A)^m
A is the availability of a single system/component, m is the number in parallel.
Business Continuity
Disaster Events: Potential incidents like fires, natural disasters, or social unrest.
Preparedness: Businesses must prepare for contingencies to ensure continuity.
Disaster Recovery Plan (DRP):
Outlines procedures to protect and recover IT infrastructure.
Ensures minimal disruption and swift recovery from incidents.
Business Continuity Concepts
Downtime and Data Loss Metrics:
Recovery Time Objective (RTO):
Time needed to restore a business process.
Indicates the maximum allowable downtime.
Recovery Point Objective (RPO):
Data freshness required for recovery.
Commonly set at 24 hours, dictating data lost between last backup and incident.