Database & Analytics Flashcards

1
Q

QLDB

A

QLDB stands for ”Quantum Ledger Database”

  • A ledger is a book recording financial transactions
  • FullyManaged,Serverless,Highavailable,Replicationacross3AZ
  • Used to review history of all the changes made to your application data over time
  • Immutable system: no entry can be removed or modified, cryptographically verifiable
  • 2-3x better performance than common ledger blockchain frameworks, manipulate data using SQL
  • Difference with Amazon Managed Blockchain: no decentralization component, in accordance with
    financial regulation rules
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Elasticache

A

Managed Redis or Memcached

  • In-memory database with high performance
  • Offloads databases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

EMR

A

EMR stands for “Elastic MapReduce”

  • EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amount of data
  • The clusters can be made of hundreds of EC2 instances
  • Also supports Apache Spark, HBase, Presto, Flink…
  • EMR takes care of all the provisioning and configuration
  • Auto-scaling and integrated with Spot instances
  • Use cases: data processing, machine learning, web indexing, big data…
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

DocumentDB

A

** DocumentDB is the same for MongoDB (which is a NoSQL database)**

  • MongoDB is used to store, query, and index JSON data
  • Similar “deployment concepts” as Aurora
  • Fully Managed, highly available with replication across 3 AZ
  • DocumentDB storage automatically grows in increments of 10GB
  • Automatically scales to workloads with millions of requests per seconds
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

DynamoDB

A

Key value database

  • Fully Managed Highly available with replication across 3 AZ
  • NoSQL database - not a relational database
  • Scales to massive workloads, distributed “serverless” database
  • Millions of requests per seconds, trillions of row, 100s of TB of storage * Fast and consistent in performance
  • Single-digit millisecond latency – low latency retrieval
  • Integrated with IAM for security, authorization and administration
  • Low cost and auto scaling capabilities
  • Standard & Infrequent Access (IA) Table Class
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Redshift

A

Redshift is based on PostgreSQL, but it’s not used for OLTP

  • It’s OLAP – online analytical processing (analytics and data warehousing) * Load data once every hour, not every second
  • 10x better performance than other data warehouses, scale to PBs of data
  • Columnar storage of data (instead of row based)
  • Massively Parallel Query Execution (MPP), highly available
  • Pay as you go based on the instances provisioned
  • Has a SQL interface for performing the queries
  • BI tools such as AWS Quicksight or Tableau integrate with it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Athena

A

Serverless query service to analyze data stored in Amazon S3

  • Uses standard SQL language to query the files
  • SupportsCSV,JSON,ORC,Avro,andParquet(builtonPresto)
  • Pricing: $5.00 per TB of data scanned
  • Use compressed or columnar data for cost-savings (less scan)
  • Use cases: Business intelligence / analytics / reporting, analyze &
    query VPC Flow Logs, ELB Logs, CloudTrail trails, etc…
  • Exam Tip: analyze data in S3 using serverless SQL, use Athena
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

QuickSight

A

Serverless machine learning-powered business intelligence service to create interactive dashboards

  • Fast, automatically scalable, embeddable, with per-session pricing
  • Use cases:
  • Business analytics
  • Building visualizations
  • Perform ad-hoc analysis
  • Get business insights using data
  • Integrated with RDS, Aurora, Athena, Redshift, S3…
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

AMB

A

Amazon Managed Blockchain

  • Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority.
  • Amazon Managed Blockchain is a managed service to: * Join public blockchain networks
  • Or create your own scalable private network
  • Compatible with the frameworks Hyperledger Fabric & Ethereum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Glue

A

Managed extract, transform, and load (ETL) service

Useful to prepare and transform data for analytics
* Fully serverless service
S3 Bucket Amazon RDS
Extract
Glue ETL
Transform
Load
* Glue Data Catalog: catalog of datasets * can be used by Athena, Redshift, EMR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DMS

A

database migration service

Quickly and securely migrate databases

to AWS, resilient, self healing
* The source database remains available during the migration
* Supports:
* Homogeneous migrations: ex Oracle to
Oracle
* Heterogeneous migrations: ex Microsoft SQL Server to Aurora

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Neptune

A

Fully managed graph database

  • A popular graph dataset would be a social network
  • Users have friends
  • Posts have comments
  • Comments have likes from users
  • Users share and like posts…
  • Highly available across 3 AZ, with up to 15 read replicas
  • Build and run applications working with highly connected
    datasets – optimized for these complex and hard queries
  • Can store up to billions of relations and query the graph with milliseconds latency
  • Highly available with replications across multiple AZs
  • Great for knowledge graphs (Wikipedia), fraud detection,
    recommendation engines, social networking
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Timestream

A

Fully managed, fast, scalable, serverless time
series database

  • Automatically scales up/down to adjust capacity
  • Store and analyze trillions of events per day
  • 1000s times faster & 1/10th the cost of
    relational databases
  • Built-in time series analytics functions (helps you identify patterns in your data in near real-time)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

RDS

A

Relational Database Service

Managed database using SQL
* Postgres
* MySQL
* MariaDB
* Oracle
* Microsoft SQL Server
* Aurora

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Aurora

A

Proprietary technology
Implements support for PostgresSLQ and MySQL
Cloud optimized
5x over MsSQL on RDS, 3x over Poestgres

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Three ways to scale RDS

A
  • Read replicas
  • Multi-AZ
  • Multi-region