Databases and Analytics Flashcards

1
Q

What is Amazon DynamoDB?

A

Fully managed NoSQL database service
- Key/value store and document store
- It is a non-releational, key-value type of database
- Fully serverless service
- Push button scaling

  • DynamoDB is made up of:
    • tables
    • items -> Record
    • Attributes -> Details of the record
  • TTL (Time to live) lets you define when items in a table expire so that they can be automatically deleted from the database
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is DynamoDB Accelerator (DAX)

A

DAX is a fully managed, highly available, in-memory cache for DynamoDB

Improves performance from millisecond to microsecond
- Can be a read-through cache and write-through cache
- Used to improve Read and Write performance
- You do not need to modify application logic, since DAX is compatible with existing DynamoDB API calls
- DAX is optimized for DynamoDB, so it is better for them, then ElastiCache

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are DynamoDB Global Tables

A

Global Tables is a Multi Region, multi active database
- Asynchronous replication (cross region)
- Each replica table stores the same set of data items
- We can use logic in the application to failover to a replica region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Amazon Redshift

A

Is a fast, fully managed Data Warehouse solution
- Analyze data using standard SQL and existing Business Intelligence (BI) tools
- RedShift is a SQL based data warehouse used for Analytics applications
- RedShift is relational data base that is used for Online Analytics Processing (OLAP) use cases
- RedSHift uses Amazon EC2 Instances, so you must choose an instance family/type
- RedShift always keeps three copies of your data
- RedShift provides continuous/incremental backups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When to use RedShift?

A
  • Perform complex queries on massive collections of structured and semi-structured data and get fast performance
  • Frequently accessed data that needs a consistent, highly structured format
  • Use Spectrum for direct access of S3 objects in a data lake
  • Manages data warehouse solution with:
    • Automated provisioning, configuration and patching
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Amazon Aurora?

A
  • developed by Amazon
  • Key Features:
    • MySQL and PostgreSQL compatible relational database build for the cloud
    • is 5 time faster then MySQL and 3 times faster then PostgreSQL
    • distributed, fault-tolerant, self-healing storage system that autoscales up 128Tb per database instance
  • Aurora Replicas are always in the same region.
  • You can have multiple Replicas of a primary in different AZs
  • Aurora Fault tolerance:
    • across 3 AZs
    • Single logical volume
    • Can promote Aurora Replica to be a new primary or create new primary
    • Can use Auto Scaling to add replicas
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Amazon Aurora Global Database

A

It is designed for globally distributed applications, allowing a single Amazon Aurora database to span multiple AWS regions. It replicates your data with no impact on database performance, enables fast local reads with low latency in each region, and provides disaster recovery from region-wide outages.
An Aurora global database consists of one primary AWS Region where your data is mastered, and up to five read-only, secondary AWS Regions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Amazon Kinesis

A

Amazon Kinesis enables you to ingest, buffer, and process streaming data in real-time.
Kinesis can handle any amount of streaming data and process data from hundreds of thousands of sources with very low latencies.
This is an ideal solution for data ingestion

  • Producers send data to Kinesis, data is stored in Shards for 24 hours by default (up to 7 days)
  • Consumers then take the data and process it - data can the be saved into another AWS Service
  • Firehose is near real time - Amazon Kinesis is real time
  • Kinesis Client Library (KCL)
    • KCL helps you consume and process data from a Kinesis data stream
    • Each Shard is processed by exactly one KCL worker and has exactly one corresponding record processor
    • One worker can process any number of shards, so it’s fine if the number of shards exceeds the number of instances

  • A Partition Key can be specified with PutRecord to groups data by shard
  • The Order is maintained for records within a shard and not a cross shards
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Can I create an encrypted RDS Read Replica from an an unencrypted RDS master DB instance?

A

No, you cannot create an encrypted Read Replica from an unencrypted master DB instance. You also cannot enable encryption after launch time for the master DB instance. Therefore, you must create a new master DB by taking a snapshot of the existing DB, encrypting it, and then creating the new DB from the snapshot. You can then create the encrypted cross-region Read Replica of the master DB.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Amazon Athena

A

Query Service:
Amazon Athena is an interactive query service that allows you to analyze data in Amazon S3 using standard SQL queries. It enables you to quickly and easily query data stored in various formats (e.g., Parquet, JSON, CSV) without the need for complex data transformation or loading into a separate database.

Integration with AWS Glue:
Athena can be integrated with AWS Glue for data cataloging and schema discovery.

Serverless:
Athena is serverless, which means you don’t need to provision or manage any infrastructure

Use Cases:
Athena is commonly used for ad hoc data analysis, log analysis, business intelligence, and other scenarios where you need to query large datasets stored in Amazon S3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can we use encryption in transit from an Application running on an EC2 Instance and an RDS Database?

A

Amazon RDS creates an SSL certificate and installs the certificate on the DB instance when Amazon RDS provisions the instance. These certificates are signed by a certificate authority. The SSL certificate includes the DB instance endpoint as the Common Name (CN) for the SSL certificate to guard against spoofing attacks.

You can download a root certificate from AWS that works for all Regions or you can download Region-specific intermediate certificates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is AWS Glue

A

ETL Service:
AWS Glue is a fully managed Extract, Transform, Load (ETL) service that helps you prepare and transform your data for analytics. It can automatically discover and catalog metadata about your data sources, generate ETL code, and execute ETL jobs.

Use Cases:
AWS Glue is used for data preparation, data integration, and data transformation tasks. It’s especially valuable in scenarios where you need to combine data from multiple sources, clean and normalize it, and make it ready for analysis.

Serverless and Scalable:
Like Athena, AWS Glue is serverless and can automatically scale to handle large volumes of data and complex ETL workloads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Amazon Kinesis Data Firehose

A

Data Firehose simplifies the process of ingesting and loading streaming data into AWS services such as Amazon S3, Amazon Redshift, and Amazon Elasticsearch.
It can automatically transform, compress, and encrypt data before loading it into the destination service.
It is a fully managed, serverless service, making it easy to set up data delivery pipelines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly