Amazon EC2 for SysOps Flashcards

(47 cards)

1
Q

EC2 Changing Instance Type

A
  • This only works for EBS backed instances
  • Stop the instance
  • Instance Settings => Change Instance Type
  • Start Instance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

EC2 Enhanced Networking (SR-IOV)

A
  • Higher bandwidth, higher PPS (packet per second), lower latency
  • Option 1: Elastic Network Adapter (ENA) up to 100 Gbps
  • Option 2: Intel 82599 VF up to 10 Gbps – LEGACY
  • Works for newer generation EC2 Instances
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Enhanced Networking - Elastic Fabric Adapter (EFA)

A
  • Improved ENA for HPC, only works for Linux
  • Great for inter-node communications, tightly coupled workloads
  • Leverages Message Passing Interface (MPI) standard
  • Bypasses the underlying Linux OS to provide low-latency, reliable transport
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Placement Groups

A
  • Sometimes you want control over the EC2 Instance placement strategy
  • That strategy can be defined using placement groups
  • When you create a placement group, you specify one of the following
    strategies for the group:
  • Cluster—clusters instances into a low-latency group in a single Availability Zone
  • Spread—spreads instances across underlying hardware (max 7 instances per group per AZ) – critical applications
  • Partition—spreads instances across many different partitions (which rely on different sets of racks) within an AZ. Scales to 100s of EC2 instances per group
    (Hadoop, Cassandra, Kafka)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Placement Groups Cluster

A

Pros: Great network (10 Gbps bandwidth between instances with Enhanced Networking enabled - recommended)
* Cons: If the AZ fails, all instances fails at the same time
* Use case:
* Big Data job that needs to complete fast
* Application that needs extremely low latency and high network throughput

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Placement Groups Spread

A
  • Pros:
  • Can span across Availability Zones (AZ)
  • Reduced risk is simultaneous
    failure
  • EC2 Instances are on different
    physical hardware

Cons:
* Limited to 7 instance

Use case:
EC2
* Application that needs to maximize high availability
Hardware 6
* Critical Applications where each instance must be isolated
from failure from each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Placements Groups Partition

A
  • Up to 7 partitions per AZ
  • Can span across multiple AZs in the same region
  • Up to 100s of EC2 instances
  • The instances in a partition do not share racks with the instances in the EC2 other partitions
  • A partition failure can affect many EC2 but won’t affect other partitions EC2
  • EC2 instances get access to the partition information as metadata.
  • Use cases: HDFS, HBase, Cassandra, Kafka
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Shutdown Behavior

A
  • Shutdown Behavior: How should the instance
    react when shutdown is done using the OS?
  • Stop (default)
  • Terminate
  • This is not applicable when shutting down from AWS console.
  • CLI Attribute:
    InstanceInitiatedShutdownBehavior
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Termination Protection

A
  • Enable termination protection:
    To protect against accidental termination in AWS Console or CLI
  • Exam Tip:
  • We have an instance where shutdown behavior = terminate and enable terminate protection is ticked
  • We shutdown the instance from the OS, what will happen ?
  • The instance will still be terminated!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

InstanceLimitExceeded

EC2 Launch Tshooting - InstanceLimitExceeded

A
  • # InstanceLimitExceeded: if you get this error, it means that you have reached your limit of max number of vCPUs per region
  • On-Demand instance limits are set on a per-region basis
  • Resolution: Either launch the instance in a different region or request AWS to increase your limit of the region
  • NOTE: vCPU-based limits only apply to running On-Demand instances and Spot instances
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

EC2 Launch Troubleshooting - InsufficientInstanceCapacity

A
  • # InsufficientInstanceCapacity : if you get this error, it means AWS does not have that enough On-Demand capacity in the particular AZ where the instance is launched.
  • Resolution :
  • Wait for few mins before requesting again.
  • If requesting more than 1 requests, break down the requests. If you need 5 instances, rather than a single request of 5, request one by one.
  • If urgent, submit a request for a different instance type now, which can be
    resized later.
  • Also, can request the EC2 instance in a different AZ
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

EC2 Launch Troubleshooting - Instance Terminates Immediately

A
  • # Instance Terminates Immediately (goes from pending to terminated)
  • You’ve reached your EBS volume limit.
  • An EBS snapshot is corrupt.
  • The root EBS volume is encrypted and you do not have permissions to access the KMS key for decryption.
  • The instance store-backed AMI that you used to launch the instance is missing a required part (an image.part.xx file).
  • To find the exact reason, check out the EC2 console of AWS - instances - Description tab, note the reason next to the State transition reason label.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

EC2 SSH troubleshooting - Logging in Errors

A
  • Make sure the private key (pem file) on your linux machine has 400 permissions, else you will get “Unprotected private key file” error
  • Make sure the username for the OS is given Correctly when logging in via SSH. Else you will get a “Host Key Not Found”, “Permission Denied” , or “Connection Closed by [instance}” port 22 error.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

EC2 SSH troubleshooting - Connection timed out

A
  • Possible reasons for “Connection timed out” to EC2 instance via SSH:
  • SG is not configured correctly
  • NACL is not configured correctly
  • Check the route table for the subnet (routes traffic destined outside VPC to IGW)
  • Instance doesn’t have a public IPv4
  • CPU load of the instance is high
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

SSH vs. EC2 Instance Connect

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

EC2 Instances Purchasing Options

A
  • On-Demand Instances – short workload, predictable pricing, pay by second
  • Reserved (1 & 3 years)
  • Reserved Instances – long workloads
  • Convertible Reserved Instances – long workloads with flexible instances
  • Savings Plans (1 & 3 years) –commitment to an amount of usage, long workload
  • Spot Instances – short workloads, cheap, can lose instances (less reliable)
  • Dedicated Hosts – book an entire physical server, control instance placement
  • Dedicated Instances – no other customers will share your hardware
  • Capacity Reservations – reserve capacity in a specific AZ for any duration
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

EC2 On Demand

A
  • Pay for what you use:
  • Linux or Windows - billing per second, after the first minute
  • All other operating systems - billing per hour
  • Has the highest cost but no upfront payment
  • No long-term commitment
  • Recommended for short-term and un-interrupted workloads, where you can’t predict how the application will behave
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

EC2 Savings Plans

A
  • Get a discount based on long-term usage (up to 72% - same as RIs)
  • Commit to a certain type of usage ($10/hour for 1 or 3 years)
  • Usage beyond EC2 Savings Plans is billed at the On-Demand price
  • Locked to a specific instance family & AWS region (e.g., M5 in us-east-1)
  • Flexible across:
  • Instance Size (e.g., m5.xlarge, m5.2xlarge)
  • OS (e.g., Linux, Windows)
  • Tenancy (Host, Dedicated, Default)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

EC2 Spot Instances

A
  • Can get a discount of up to 90% compared to On-demand
  • Instances that you can “lose” at any point of time if your max price is less than the current spot price
  • The MOST cost-efficient instances in AWS
  • Useful for workloads that are resilient to failure -
  • Batch jobs
  • Data analysis
  • Image processing
  • Any distributed workloads
  • Workloads with a flexible start and end time
  • Not suitable for critical jobs or databases
20
Q

EC2 Dedicated Hosts

A
  • A physical server with EC2 instance capacity fully dedicated to your use
  • Allows you address compliance requirements and use your existing server-
    bound software licenses (per-socket, per-core, pe—VM software licenses)
  • Purchasing Options:
  • On-demand – pay per second for active Dedicated Host
  • Reserved - 1 or 3 years (No Upfront, Partial Upfront, All Upfront)
  • The most expensive option
  • Useful for software that have complicated licensing model (BYOL – Bring Your
    Own License)
  • Or for companies that have strong regulatory or compliance needs
21
Q

EC2 Dedicated Instances

A
  • Instances run on hardware that’s dedicated to you
  • May share hardware with other instances in same account
  • No control over instance placement (can move hardware after Stop / Start)
22
Q

EC2 Capacity Reservations

A
  • Reserve On-Demand instances capacity in a specific AZ for any
    duration
  • You always have access to EC2 capacity when you need it
  • No time commitment (create/cancel anytime), no billing discounts
  • Combine with Regional Reserved Instances and Savings Plans to benefit
    from billing discounts
  • You’re charged at On-Demand rate whether you run instances or not
  • Suitable for short-term, uninterrupted workloads that needs to be in a specific AZ
23
Q

Price Comparison
Example – m4.large – us-east-1

24
Q

AWS charges for IPv4 addresses

A
  • Starting February 1st 2024, there’s a charge for all Public IPv4 created in your account
  • $0.005 per hour of Public IPv4 (~ $3.6 per month)
  • For new accounts in AWS, you have a free tier for the EC2 service: 750 hours of Public IPv4 per month for the first 12 months
  • For all other services there is no free tier
25
AWS charges for IPv6 addresses
* What about IPv6? * Unfortunately, many Internet Service Provider (ISP) around the world don’t support IPv6, so the course would not work for some of you * You can test IPv6 by going to https://test-ipv6.com/ * If you use IPv6 in this course, you’re on your own (security groups, networking…) but you can do it! * How to troubleshoot charges? * Go into your AWS Bill * Look into the AWS Public IP Insights service * Nice article here: https://repost.aws/articles/ARknH_OR0cTvqoTfJrVGaB8A/why-am-i-seeing-charges-for-public-ipv4-addresses-when-i-am-under-the-aws-free-tier
26
EC2 Spot Instance Requests
* Can get a discount of up to 90% compared to On-demand * Define max spot price and get the instance while current spot price < max * The hourly spot price varies based on offer and capacity * If the current spot price > your max price you can choose to stop or terminate your instance with a 2 minutes grace period. * Other strategy: Spot Block * “block” spot instance during a specified time frame (1 to 6 hours) without interruptions * In rare situations, the instance may be reclaimed * Used for batch jobs, data analysis, or workloads that are resilient to failures. * Not great for critical jobs or databases
27
How to terminate Spot Instances?
see attachment Note : You can only cancel Spot Instance requests that are open, active, or disabled. Cancelling a Spot Request does not terminate instances You must first cancel a Spot Request, and then terminate the associated Spot Instances
28
Spot Fleets
* Spot Fleets = set of Spot Instances + (optional) On-Demand Instances * The Spot Fleet will try to meet the target capacity with price constraints- * Define possible launch pools: instance type (m5.large), OS, Availability Zone * Can have multiple launch pools, so that the fleet can choose * Spot Fleet stops launching instances when reaching capacity or max cost * Strategies to allocate Spot Instances: * lowestPrice: from the pool with the lowest price (cost optimization, short workload) * diversified: distributed across all pools (great for availability, long workloads) * capacityOptimized: pool with the optimal capacity for the number of instances * priceCapacityOptimized (recommended): pools with highest capacity available, then select the pool with the lowest price (best choice for most workloads) * Spot Fleets allow us to automatically request Spot Instances with the lowest price
29
Burstable Instances (T2/T3)
* AWS has the concept of burstable instances (T2/T3 machines) * Burst means that overall, the instance has OK CPU performance. * When the machine needs to process something unexpected (a spike in load for example), it can burst, and CPU can be VERY good. * If the machine bursts, it utilizes “burst credits” * If all the credits are gone, the CPU becomes BAD * If the machine stops bursting, credits are accumulated over time
30
Burstable Instances (T2/T3) cont'
* Burstable instances can be amazing to handle unexpected traffic and getting the insurance that it will be handled correctly * If your instance consistently runs low on credit, you need to move to a different kind of non-burstable instance
31
CPU Credits
32
What happens when credit are exhausted?
* Experiment: run a CPU stress command (to peak at 100%) * After the credits are exhausted, the measured CPU utilization drops
33
T2/T3 Unlimited
* It is possible to have an “unlimited burst credit balance” * You pay extra money if you go over your credit balance, but you don’t lose in performance * If average CPU usage over a 24-hour period exceeds the baseline, the instance is billed for additional usage per vCPU/hour * Be careful, costs could go high if you’re not monitoring the CPU health of your instances
34
Elastic IPs
* When you stop and then start an EC2 instance, it changes its public IP * If you need to have a fixed public IP, you need an Elastic IP * An Elastic IP is a public IPv4 you own as long as you don’t delete it * You can attach it to one instance at a time * You can remap it across instances * You don’t pay for the Elastic IP if it’s attached to a server * You pay for the Elastic IP if it’s not attached to a server * With an Elastic IP address, you can mask the failure of an instance or software by rapidly remapping the address to another instance in your account. * You can only have 5 Elastic IP in your account (you can ask AWS to increase that).
35
How you can avoid using Elastic IP?
* Always think if other alternatives are available to you * You could use a random public IP and register a DNS name to it * Or use a Load Balancer with a static hostname
36
CloudWatch Metrics for EC2
AWS Provided metrics (AWS pushes them): * Basic Monitoring (default): metrics are collected at a 5 minute internal * Detailed Monitoring (paid): metrics are collected at a 1 minute interval * Includes CPU, Network, Disk and Status Check Metrics Custom metric (yours to push): * Basic Resolution: 1 minute resolution * High Resolution: all the way to 1 second resolution * Include RAM, application level metrics * Make sure the IAM permissions on the EC2 instance role are correct !
37
EC2 included metrics
* CPU: CPU Utilization + Credit Usage / Balance * Network: Network In / Out * Status Check: * Instance status = check the EC2 VM * System status = check the underlying hardware * Attached EBS status = check attached EBS volumes Note - * Disk: Read / Write for Ops / Bytes (only for instance store) * RAM is NOT included in the AWS EC2 metrics
38
Unified CloudWatch Agent
* For virtual servers (EC2 instances, on-premises servers,…) * Collect additional system-level metrics such as RAM, processes, used disk space, etc. * Collect logs to send to CloudWatch Logs - No logs from inside your EC2 instance will be sent to cloud watch logs without using an Agent. * Centralized configuration using SSM Parameter Store * Make sure IAM permissions are correct * Default namespace for metrics collected by Unified CloudWatch Agent is CWAgent (Can be configured and changed)
39
Unified CloudWatch Agent – procstat Plugin
* Collect metrics and monitor system utilization of individual processes * Supports both Linux and Windows servers * Example: amount of time the process uses CPU, amount of memory the process uses, … * Select which processes to monitor by: * pid_file: name of process identification number (PID) files they create * exe: process name that match string you specify (RegEx) * pattern: command lines used to start the processes (RegEx) * Metrics collected by procstat plugin begins with “procstat” prefix (e.g., procstat_cpu_time, procstat_cpu_usage, …)
40
Status Checks
* Automated checks to identify hardware and software issues
41
System Status Checks
Monitors Problems with AWS Systems (Software/hardware issues on the physical host, loss of power,....) Check Personal Health Dashboard for any schedule. critical maintenance by AWS to your instance's host Resolution: stop and start the instance (instance migrated to a new host)
42
Instance Status Checks
Monitors software/network configuration of your instance (invalid network configuration, exhausted memory,...)
43
Attached EBS Status Checks
Monitors EBS Volumes attached to your instance (reachable & complete I/O operations) Resolution: Reboot the instance or replace affected EBS Volumes.
44
Status Checks - CW Metrics & Recovery
CloudWatch Metrics (1 minute interval) * StatusCheckFailed_System * StatusCheckFailed_Instance EC2 Instance * StatusCheckFailed_AttachedEBS * StatusCheckFailed (for any) Option 1: CloudWatch Alarm Recover EC2 instance with the same private/public IP, EIP, metadata, and Placement Group * Send notifications using SNS * Option 2: Auto Scaling Group * Set min/max/desired 1 to recover an instance but won't keep the same private and elastic IP.
45
Why EC2 Hibernate?
* We know we can stop, terminate instances * Stop – s the data on disk (EBS) is kept intact in the next start * Terminate – any EBS volumes (root) also set-up to be destroyed is lost * On start, the following happens: * First start: the OS boots & the EC2 User Data script is run * Following starts: the OS boots up * Then your application starts, caches get warmed up, and that can take time!
46
EC2 Hibernate - Behavior
Introducing EC2 Hibernate: * The in-memory (RAM) state is preserved * The instance boot is much faster! (the OS is not stopped / restarted) * Under the hood: the RAM state is written to a file in the root EBS volume * The root EBS volume must be encrypted Use cases: * Long-running processing * Saving the RAM state * Services that take time to initialize
47
EC2 Hibernate – Good to know
* Supported Instance Families – C3, C4, C5, I3, M3, M4, R3, R4, T2, T3, … * Instance RAM Size – must be less than 150 GB. * Instance Size – not supported for bare metal instances. * AMI – Amazon Linux 2, Linux AMI, Ubuntu, RHEL, CentOS & Windows… * Root Volume – must be EBS, encrypted, not instance store, and large * Available for On-Demand, Reserved and Spot Instances * An instance can NOT be hibernated more than 60 days