Architecting to scale Flashcards

1
Q

What is a loosely coupled architecture?

A

Where components can stand independently and require little or no knowledge of the inner workings of the other components

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why use a loosely coupled architecture for scalability?

4 points

A

1) provides abstraction
2) Interchangeable components
3) More atomic functional units
4) you can scale components independently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is horizontal scaling? (4 points)

A

1) Where you add instances as demand increases
2) no downtime required to scale up or down
3) You can do this automatically using auto-scaling groups
4) theoretically unlimited

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is vertical scaling (4 points)?

A

1) Where you add more CPI and or RAM to an existing instance as demand increases
2) Requires restart to scale up or down
3) Would require scripting to automate
4) limited by instance size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define scale out…

A

Where you add another instance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define scale up…

A

Where you increase resources of an instance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define scale in…

A

Where you remote an instance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define scale down…

A

Where you decrease the resources of an instance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why would you scale out over scaling up?

A

Because demand is never constant! So you will be wasting resources when scaling up…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the key benefit of scaling out?

A

Cost savings!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the 2 types of autoscaling offered by AWS?

A

1) EC2 autoscaling

2) Application autoscaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is AWS auto scaling? and why would you use it?

A

What- Provides a centralised way to manage scalability for whole stacks and can provide predictive scaling.

why- Gives you the ability to manage EC2 and application autoscaling from a unified standpoint

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the 4 scaling options with EC2 autoscaling groups/types?

A

1) Maintaining- Keep a specific or min number of instances running
2) Manual- use max and min or specified number of instances
3) schedule- increase or decrease instances based on a schedule
4) Dynamic scale based on real-time metrics of the system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a launch configuration? and what 7 things do you include in this?

A

A launch configuration is an instance configuration template that an Auto Scaling group uses to launch EC2 instances. When you create a launch configuration, you specify information for the instances.

1) Include the ID of the Amazon Machine Image (AMI)
2) the instance type
3) a key pair
4) one or more security groups
5) a block device mapping
6) define a health check grace period
7) the scale type (how we want to scale)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a health check grace period?

A

A time period that the scaling policy will allow to let that system to spin up before checking the health of that service.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which use case would be the most appropriate for a maintain scaling type?

A

When you always need X number of instances always

e.g. 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Which use case would be the most appropriate for a manual scaling type?

A

My needs change so rarely that I can just manually add and remove instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which use case would be the most appropriate for a scheduled scaling type?

A

Every Monday morning we get a rush on our website

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which use case would be the most appropriate for a dynamic scaling type?

A

When CPU utilisation gets to 70% on current instances, scale up

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Within the dynamic scaling type, we have EC2 autoscaling policies. Name and describe the 3 policies…

A

1) Target tracking policy- Scale based on a pre-defined or customer metric in relation to a target value
2) Simple scaling policy- wait until health checks and cold down period expires before evaluating new need
3) Step scaling policy- Responds to scaling needs with more sophistication and logic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Which use case would be the most appropriate for a target tracking policy…

A

When CPU utilization gets to 70% on current instances, scale up

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Which use case would be the most appropriate for a simple scaling policy…

A

Let’s add new instances slowly and steadily

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Which use case would be the most appropriate for a step scaling policy…

A

AGG add all the instances!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a scaling cooldown?

A

Configurable duration that gives your scaling a chance to “come up to speed” and absorb load.

Different than a health check!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How long is the default cooldown period?

A

300 seconds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Which scaling policy does cooldown period get applied to by default?

A

Dynamic scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Can you override the default cooldown period?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the benefit of a cool down period?

A

Sanity check to see if adding the resource was enough to absorb the load.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Name 3 types of scaling policies available with application autoscaling…

A

1) Target tracking policy- initiates scaling to try and track as closely as possible to a given metric
2) Step scaling policy- Based on a metric it adjusts capacity to a given defined threshold
3) Scheduled scaling policy- Initiated scaling events based on a pre-defined time, day or date

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Which use case would be the most appropriate for a target tracking policy…

A

I want my ECS (container) hosts to stay at or below 70% CPU utilization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Which use case would be the most appropriate for a step scaling policy…

A

I want to increase my EC2 spot fleet by 20% everytime I add another 10,000 connections on my ELB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Which use case would be the most appropriate for a scheduled scaling policy…

A

Every Moday at 08:00 I want to increase the read capacity units of my DynmoDB table to 20,000

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is a shard?

A

Shard is the base throughput unit of an Amazon Kinesis data stream. One shard provides a capacity of 1MB/sec data input and 2MB/sec data output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What are the 3 parts of a shard?

A

1) partition key
2) sequence (order of the shard in a sequence)
3) data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What are the two dimensions of DynamoDB scaling?

A

1) Throughput- Read capacity units and write capacity units

2) Size- Max size is 400KB but it can scale as you can store as many as you like

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is a partition in DynamoDB?

A

A physical space where DynamoDB is stored

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is a partition key in DynamoDB?

A

A unique identifier for each record sometimes called a hash key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is a sort key in DynamoDB?

A

An optional key that defines storage order on the partition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

How does dynamoDB scale out?

A

DynamoDB adds additional partitions to scale out

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

How do you work out the number of partitions you get?

A

You work out how many partitions you need by capacity (how many RCU and WRU you have provisioned!) and the size. Then take the MAX of the largest dimension and round up to get the total number of partitions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is the formula for calculating the partition size of your DynamoDB table by capacity?

A

(total RCU/3000) + (total WCU/1000)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What is the formula for calculating the partition size of your DynamoDB table by size?

A

Total size in GB/10GB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

How are the read and write capacity allocated across partitions?

A

Splits equally across partitions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What happens when if we increase our RCU/WCU or we reach our 10GB size limit?

A

The data is divided down the middle and creates another partition based on the partition key hash. This will keep happening to scale out.

45
Q

What is a hotkey issue? provide an example…

A

When read and writes are concentrated in the same partition. For example, if you used a date as a partition key and store lots of different data under the same date, when querying this data it will be accessing the same partition over and over…

46
Q

How do you avoid a hotkey issue?

A

Choose a different variable for a partition key. e.g. one that is not date and is by sensor type for example and use date as a sort key

47
Q

What is the issue with using a target tracking method to scale a DynamoDB table?

A

It will not scale down, there are some work around like sending dummy requests at reducing frequency or reducing the max capacity to equal the min capacity

48
Q

What is a global secondary index?

A

Like a copy of the table

49
Q

What is an alternative to autoscaling in DynamoDB?

A

Using a ‘on-demand’ setting for DynamoDB, costs more! but is useful when you are not sure if an app will be super popular

50
Q

What is DynamoDB Accelerator (DAX)?

A

An in-memory cache that sits in front of your table

51
Q

What is a good use case for DAX? (3 points)

A

When you require the fastest possible reads from a database, such as live auctions or securities trading

or read intense scenarios where you want to offload the reads from DynamoDB

Repeated reads against a large set of dynamoDB data

52
Q

What are bad use cases for DAX? (2 points)

A

1) write intense applications that don’t have many reads

2) Applications where you use client caching methods

53
Q

What types of data content can you cache at edge locations using CloudFront?

A

Static and dynamic content

54
Q

How is dynamic content delivered using CloudFront?

A

Delievered using HTTP cookies forwarded from your origin

55
Q

What protocol do you use for media streaming and live media streaming?

A

HTTP and HTTPS

56
Q

Which services can be used as origins to CloudFront?

A

S3, EC2, ELB or another webserver

57
Q

How can a behaviour be used to configure serving content via CloudFront?

A

You can use behaviours to configure serving up origin content based on URL paths.

This will route user to different content origins based on a URL path e.g. wp-content/* static content wp-admin/ directs to ELB…

58
Q

What is an invalidation request?

A

A way of invalidating a CloudFront cache

59
Q

What are the 4 methods you can use to invalidate content from a CloudFront cache?

A

1) simply delete the file from the origin and wait for the TTL (time to live) to expire
2) Use the AWS console to request invalidation for all content or a specific path such as /images/*
3) Use the CloudFront API to submit an invalidation request
4) Use a 3rd party tool to perform a CloudFront invalidation e.g. cloudberry, ylastic….

60
Q

What is a Zone apex? and does CloudFront support it?

A

Yes

A domain without a www. or subdomain in front

61
Q

Can you add geo-restrictions in CloudFront?

A

Yes, you can whitelist (show) or blacklist (block) content based on location

62
Q

What is SNS?

A

Simple Notification Service. Scalable hosted a queuing service. Is integrated with KMS for encryptedmessaging

63
Q

What is the data storage type and how long is data persisted for?

A

Transient. 4 days default, max 14 days

64
Q

What is the max size of messages in SQS?

A

256KB or 2GB using the SDK

65
Q

What are the key benefits of using SQS?

A

Allows the creation of a loosely coupled architecture

66
Q

What is meant by standard and FIFO queuing?

A

Standard- No assurances that a message will enter and leave the queue based on the order they arrived

FIFO- Will maintain the order of the queue

67
Q

What is the risk with Standard queueing?

A

There is a risk that order will be lost for the process

68
Q

What is the risk with FIFO queueing?

A

If a message fails it will hold all the other messages behind it- causes delay or latency

69
Q

What is Amazon MQ?

A

A implementation of ApacheMQ. A message broker. Usually used to replace on-prem message broker.

70
Q

What is a lambda fan-out model?

A

Where a lambda function call sets of multiple lambda calls in parallel

71
Q

What is the AWS serverless application model (SAM)?

A

An opensource framework for building a serverless app on AWS

72
Q

Which language does AWS serverless application model use as it’s configuration language?

A

YAML

73
Q

What are the 3 steps of an AWS SAM workflow?

A

1) create your YAML file
2) convert this to a CloudFormation
3) Creates AWS infrastructure

74
Q

What are the 3 key features of AWS SAM?

A

1) uses YAML for templates
2) Purpose built to help make developing serverless apps as efficient as possible
3) Generates CloudFormation scripts

75
Q

What are the 4 key features of the Serverless Framework?

A

1) uses YAML for templates
2) Purpose built to help make developing and deploying serverless apps
3) Generats Cloud formation scripts
4) Supports many other cloud providers such as Azure…

76
Q

What is Amazon EventBridge?

A

Designed to link a variety of AWS and 3rd party apps

e.g. integrate ZenDesk with your application

77
Q

What is simple workflow service?

A

Creates a distributed asynchronous system workflows. It support sequential as well as parallel workflows. Activity worker and a Decider worker

78
Q

What is SWF best suited for?

A

Best suited for human-enabled workflows like order fulfilment or procedure requests

79
Q

Would AWS recommend SWF or step function?

A

Step function

80
Q

What is an AWS step function?

A

A way to manage workflows. An orchestration platform. You define you app as a state machine. Each object can assume a different state throughout a process. Creates tasks, sequential steps, parallel steps etc…

81
Q

What language do you use to define step functions?

A

JSON

82
Q

What is AWS batch?

A

A management tool for creating and executing batch orientated tasks using EC2 instances

83
Q

What are the 4 steps to running a batch using AWS batch?

A

1) Create a compute environment
2) Create a job queue with the priority assigned to a compute environment
3) Create a job description, script or JSON, env vars, IAM roles e.t.c.
4) Schedule the job

84
Q

When would you use a step function and provide a use case?

A

out of the box coordination of an AWS service component

use case- order processing flows

85
Q

When would you use a simple workflow service? and provide a use case…

A

When you need to support external processes or specialised execution logic

use case- loan application process with manual review steps

86
Q

When would you use a simple queue service? and provide a use case…

A

Messaging queue store and forward patterns

use case- image resize process

87
Q

When would you use a AWS batch? and provide a use case…

A

Scheduled or re-occurring tasks that do not require heavy logic

use case- Rotate logs daily on firewall appliance

88
Q

What is Elastic MapReduce?

A

Designed for big data processing and analysis. It is comprised of a hadoop framework. It is a collection of services to process large data sets. “The Zoo”

89
Q

What is hadoop map reduce?

A

A tool used for distributed processing

90
Q

What is Hadoop HDFS?

A

A Hadoop distributed file system. A persistent data store

91
Q

What is Zookeeper?

A

A tool to ensure resources are coordinated in a hadoop framework

92
Q

What oozie?

A

Hadoop workflow framework

93
Q

What is pig?

A

A hadoop scripting framework

94
Q

What is Hive?

A

A SQL interface into a hadoop landascape

95
Q

What is Mahout?

A

A machine learning component in the hadoop landscape

96
Q

What is HBase?

A

A columnar database for storing hadoop data

97
Q

What is Flume?

A

A log collection system for a hadoop landscape

98
Q

What is Sqoop?

A

Facilitates input of data from other data stores into a hadoop landscape

99
Q

What is Ambari?

A

A tool used to manage and monitor a hadoop landscape

100
Q

What is meant by the term green field?

A

When an application/software is developed from scrap

101
Q

What is meant by the term brownfield?

A

When an application/software is developed or built from an existing program?

102
Q

What 3 pieces of information do you need to determine the number of partitions in a DynamoDB?

A

1) size of the table
2) Number of RCUs
3) Number of WRUs

103
Q

How can you make scaling more dramatic and responsive?

A

Reduce the cooldown time to allow scaling to be more dramatic and responsive

104
Q

Which Kinesis service can stream into S3?

A

Kinesis Firehose

105
Q

What are the 2 main uses from Kinesis data streams?

A

1) They can enable real-time reporting and analysis of streamed data
2) They can accept data as soon as it has been produced with out the need for batching

106
Q

What is the most cost effective way to scale based on sometimes getting spikes on a Monday morning?

A

Dynamic based on a metric like connections or CPU. If using scheduled you would be scaling even when there is no spike.

107
Q

What is the main benefit of a loosley coupled architecture?

A

More atomic functional units

108
Q

What is the Kinesis Client Library (KCL) used for?

A

A method of reading data from a shard

109
Q

What is the best practice way for storing time-series data in a DynamoDB?

A

Use one table per application period.

If all time series data in table the last partition would get all the read and write actions

General DynamoDB best practice is to keep the number of tables to a minimum.