L5 - Autoscaling 1/2 Flashcards
Why an elastic application?
- reduce over/under-provisioning
- reduce cost + increase customer satisfaction
What are 4 typical resources applications use?
- CPU
- Memory
- Disk
- Network
Dynamism for desktop apps on the laptop
seconds, thread scheduling
Dynamism for HPC with a cluster as a shared resource
hours, days for job scheduling
Dynamism for banking with mainframe
periodically every day for processor allocation
Dynamism for web and server clusters
highly dynamic, limited predictability
What is increased throughput?
Ability to handle more workload (requests) in the same time
What is decreased latency?
Individual requests are handled faster
Can we normally decrease latency or increase throughput for web-applications?
Normally we can only increase throughput
Is there a scalability limit for throughput?
Yes, the curve converges to a certain limit in the long-run
Why is there a scalability limit for throughput?
- overhead with parallelization
- bottleneck: initiation of parallelization is a sequential process –> at a certain point, the sequential part dominates the execution (Amdahl’s law)
- shared databases limit the load that can be processed
- programming influences whether applications can scale
What is scalability of applications?
Characteristics of an application to increase its capacity (throughput)
What does the capacity of an application depend on?
- available resource capacities
- application design (whether the app is programmed for scalability)
What are scalability limits?
- maximum application capacity
- throughput can be limited by max resource capacities or application design
What happens when applications with poor scalability are scaled?
- significant drop in efficiency
What is speedup?
performace (p processors) / performance (1 processor)
e.g. for CPU the transactions per second
efficiency
efficiency (p processors) = speedup (p processors) / p
Does speedup linearly scale?
No. With one processor the efficiency = 1 but then the efficiency drops
What is parallel computing?
Where many processors work simultaneously to produce exceptional computational power and to significantly reduce the total computational time.
What is elasticity?
- dynamic adaptation of the capacity to a change in the workload
- no shutdown/restart required
- shrink capacity, if workload decreases
- increase capacity, if workload increases
What is autoscaling?
Cloud computing feature that enables organizations to scale cloud services such as server capacities or VMs up or down automatically, based on defined situations such as traffic or utilization levels.
What is a backend service?
Needed to answer requests that arrive at the frontend
What is vertical scaling - scaling up?
Scale the server on which the service is running.
You can increase the capacity of a single service instance by increasing its resources:
- increase CPU time percentage
- increase clock frequency
- add more cores
- replace existing resources with more powerful ones
Pros of vertical scaling
- easy to replace a resource with a more powerful one
- it does not require a re-design of the application