Lesson 8: Scalability and High Availability Flashcards
High Availability and Scalability - ELB & ASG (9 cards)
Introduction to Scalability and High Availability
Scalability allows an application system to handle increased load by adapting, either vertically or horizontally.
Vertical scalability involves increasing the size or capacity of a single instance, such as upgrading a server or operator’s capability.
Horizontal scalability involves increasing the number of instances or systems, commonly used in distributed systems and modern web applications.
High availability ensures system operation across multiple data centers or availability zones to survive failures, often implemented alongside horizontal scaling.
Elastic Load Balancing (ELB) Overview
Load balancers distribute incoming traffic across multiple backend EC2 instances to ensure efficient resource use and reliability.
Elastic Load Balancers provide a single endpoint for users, handle health checks, SSL termination, and support high availability.
AWS offers four types of managed load balancers: Classic Load Balancer (deprecated), Application Load Balancer, Network Load Balancer, and Gateway Load Balancer.
Security groups should be configured to allow user traffic to the load balancer and restrict EC2 instance traffic to only originate from the load balancer’s security group.
Application Load Balancer (ALB)
Application Load Balancers (ALBs) operate at layer seven, supporting HTTP, HTTP/2, and Web Sockets.
ALBs enable routing to multiple HTTP applications across machines grouped in target groups, supporting path, host, query string, and header-based routing.
Target groups can consist of EC2 instances, ECS tasks, Lambda functions, or private IP addresses, with health checks performed at the target group level.
ALBs support advanced features like automatic HTTP to HTTPS redirection and preserve client IP information via X-Forwarded headers.
Network Load Balancer (NLB)
The Network Load Balancer (NLB) operates at layer 4, handling TCP and UDP traffic.
NLB offers ultra-high performance with the ability to handle millions of requests per second and ultra-low latency.
It provides one static IP per availability zone, with the option to assign elastic IPs for static IP exposure.
NLB supports target groups consisting of EC2 instances or hardcoded private IP addresses, including those from on-premises servers.
It can be used in combination with an Application Load Balancer (ALB) to leverage fixed IPs and advanced HTTP routing rules.
Health checks for NLB target groups support TCP, HTTP, and HTTPS protocols.
Gateway Load Balancer (GWLB)
Sticky sessions, or session affinity, ensure that a client consistently connects to the same backend instance through the load balancer.
Sticky sessions can be enabled on Classic Load Balancer, Application Load Balancer, and Network Load Balancer using cookies.
There are two types of cookies for stickiness: application-based cookies generated by the target application, and duration-based cookies generated by the load balancer.
Enabling stickiness may cause load imbalance if some users are very sticky to specific backend instances.
Elastic Load Balancer - SSL Certificates
SSL certificates encrypt traffic between clients and load balancers, ensuring secure in-flight data transmission.
TLS is the modern version of SSL, but the term SSL is still commonly used for simplicity.
Public SSL certificates are issued by Certificate Authorities and must be regularly renewed.
Server Name Indication (SNI) allows multiple SSL certificates on one load balancer, enabling hosting multiple secure websites.
Application Load Balancers (ALB) and Network Load Balancers (NLB) support multiple SSL certificates with SNI, while Classic Load Balancers support only one.
Elastic Load Balancer - Connection Draining
Connection Draining allows instances to complete active requests before deregistration.
Classic Load Balancer uses the term Connection Draining; Application and Network Load Balancers use Deregistration Delay.
The draining period can be configured between 1 and 3600 seconds, with a default of 300 seconds.
Setting a low draining timeout suits short requests, while longer requests require a higher timeout to avoid premature termination.
Auto Scaling Groups (ASG) Overview
Auto Scaling Groups (ASGs) dynamically adjust the number of EC2 instances to match the load.
ASGs can scale out (add instances) or scale in (remove instances) based on demand.
Minimum, desired, and maximum capacities define the size boundaries of an ASG.
ASGs integrate with load balancers and CloudWatch alarms to ensure healthy instances and automatic scaling.
Auto Scaling Groups - Scaling Policies
Auto Scaling Groups (ASGs) support several scaling policies including dynamic, scheduled, and predictive scaling.
Target tracking scaling maintains a specified metric, such as CPU utilization, at a target value by automatically scaling the ASG.
Step scaling uses CloudWatch alarms to trigger scaling actions based on defined thresholds.
Scheduled and predictive scaling anticipate load changes based on known patterns or forecasts.
Important metrics for scaling include CPU utilization, RequestCountPerTarget, network traffic, and custom application-specific metrics.
Scaling cooldown periods prevent rapid scaling actions to allow metrics to stabilize after scaling events.
Using pre-configured AMIs and enabling detailed monitoring can improve scaling responsiveness and efficiency.