Database & Analytics Flashcards by Gerardo Barboza

Databases

Is an organized collection of structured information, or data, typically stored electronically in a computer system.

• You build indexes to efficiently query / search through the data
• You define relationships between your datasets

How well did you know this?

Not at all

Perfectly

Relational Databases

Is a collection of information that organizes data in predefined relationships where data is stored in one or more tables of columns and rows

• Can use the SQL language to perform queries / lookups

How well did you know this?

Not at all

Perfectly

NoSQL Databases

• NoSQL databases are purpose built for specific data models and have flexible schemas for building modern applications
• Benefits: Flexibility, Scalability, High-performance, Highly functional
• Examples: Key-value, document, graph, in-memory, search databases

How well did you know this?

Not at all

Perfectly

NoSQL data example: JSON

• JSON = JavaScript Object Notation
• JSON is a common form of data that fits into a NoSQL model
• Data can be nested
• Fields can change over time

How well did you know this?

Not at all

Perfectly

Databases & Shared Responsibility on AWS

• AWS offers use to manage different databases
• Benefits include:
• Quick Provisioning, High Availability, Vertical and Horizontal Scaling
• Automated Backup & Restore, Operations, Upgrades
• Operating System Patching is handled by AWS
• Monitoring, alerting

How well did you know this?

Not at all

Perfectly

AWS RDS

• RDS stands for Relational Database Service
• It’s a managed DB service for DB use SQL as a query language.
• It allows you to create databases in the cloud that are managed by AWS
• Postgres
• MySQL
• MariaDB
• Oracle
• Microsoft SQL Server
• Aurora (AWS Proprietary database)

How well did you know this?

Not at all

Perfectly

Advantage over using RDS versus deploying
DB on EC2

• Automated provisioning, OS patching
• Continuous backups and restore to specific timestamp (Point in Time Restore)!
• Monitoring dashboards
• Read replicas for improved read performance
• Multi AZ setup for DR (Disaster Recovery)
• Maintenance windows for upgrades
• Scaling capability (vertical and horizontal)
• Storage backed by EBS (gp2 or io1)

How well did you know this?

Not at all

Perfectly

Amazon Aurora

• Aurora is a proprietary technology from AWS (not open sourced)
• PostgreSQL and MySQL are both supported as Aurora DB
• Aurora is “AWS cloud optimized”, better performance than RDS
• Aurora storage automatically grows in increments of 10GB, up to 64 TB.
• Aurora costs more than RDS (20% more) – but is more efficient

How well did you know this?

Not at all

Perfectly

RDS Deployments: Read Replicas, Multi-AZ

• Read Replicas:
• Scale the read workload of your DB
• Can create up to 5 Read Replicas
• Data is only written to the main DB

• Multi-AZ:
• Failover in case of AZ outage (high availability)
• Data is only read/written to the main database
• Can only have 1 other AZ as failover

How well did you know this?

Not at all

Perfectly

RDS Deployments: Multi-Region

• Multi-Region (Read Replicas)
• Disaster recovery in case of region issue
• Local performance for global reads
• Replication cost

How well did you know this?

Not at all

Perfectly

Amazon ElastiCache

• ElastiCache is to get managed Redis or Memcached
• Caches are in-memory databases with high performance, low latency
• Helps reduce load off databases for read intensive workloads
• You want to save the queries somewhere else,so that they’re very readily available.

How well did you know this?

Not at all

Perfectly

DynamoDB

• Fully Managed Highly available with replication across 3 AZ
• NoSQL database /// Serverless
• Automatically scales up and down to adjust for capacity and maintain performance
• Millions of requests per seconds, 100s of TB of storage
• Single-digit millisecond latency – low latency retrieval

How well did you know this?

Not at all

Perfectly

DynamoDB – type of data

• DynamoDB is a key/value database

How well did you know this?

Not at all

Perfectly

DynamoDB Accelerator - DAX

• Fully Managed in-memory cache for
DynamoDB
• 10x performance improvement – single- digit millisecond latency to microseconds
latency
• Secure, highly scalable & highly available

How well did you know this?

Not at all

Perfectly

DynamoDB – Global Tables

• Make a DynamoDB table accessible with low latency in multiple-regions
• Active-Active replication (read/write to any AWS Region)

How well did you know this?

Not at all

Perfectly

Redshift

Study These Flashcards

• Relational database
• Redshift is based on PostgreSQL, but it’s not used for OLTP
• It’s OLAP – online analytical processing (analytics and data warehousing)
Columnar storage of data (instead of row based)
• Load data once every hour, not every second

Amazon EMR “Elastic MapReduce”

Study These Flashcards

• EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amount of data
• The clusters can be made of hundreds of EC2 instances
• Use cases: data processing, machine learning, web indexing, big data

Amazon Athena

Study These Flashcards

• Serverless query service to analyze data stored in Amazon S3
• Uses standard SQL language
• Use cases: Business intelligence / analytics / reporting, analyze

• Exam Tip: analyze data in S3 using serverless SQL, use Athena

Amazon QuickSight

Study These Flashcards

• Serverless machine that allows you to create dashboards on your databases so we can visually represent your data and show your business users the insights they’re looking for
• Fast, automatically scalable, embeddable, with per-session pricing
• Use cases: • Business analytics • Building visualization

DocumentDB

Study These Flashcards

• DocumentDB is the same for MongoDB (which is a NoSQL database)
• MongoDB is used to store, query, and index JSON data
• Fully Managed, highly available with replication across 3 AZ
• Aurora storage automatically grows in increments of 10GB, up to 64 TB.
• Automatically scales to workloads with millions of requests per seconds

Amazon Neptune

Study These Flashcards

• Fully managed graph database
• A popular graph dataset would be a social network
• Highly available across 3 AZ, with up to 15 read replicas
• Build and run applications working with highly connected datasets
• Can store up to billions of relations

Amazon QLDB

Study These Flashcards

• QLDB stands for ”Quantum Ledger Database”
• Centralized component
• A ledger is a book recording financial transactions
• Fully Managed, Serverless, High available, Replication across 3 AZ
• Used to review history of all the changes made to your application data over time
• NoSQL

Amazon Managed Blockchain

Study These Flashcards

• Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority
• Amazon Managed Blockchain is a managed service to: • Join public blockchain networks
• Or create your own scalable private network

AWS Glue

Study These Flashcards

• Fully serverless service
• Managed extract, transform, and load (ETL) service
• Useful to prepare and transform data for analytics

• Glue Data Catalog: catalog of datasets

DMS – Database Migration Service

• Quickly and securely migrate databases to AWS, resilient, self healing • The source database remains available during the migration

Difference between relational and non relational db

The difference between DynamoDB and, say, RDS is that DynamoDB will have all the data living within one single table, and there's no way to join it with another table.

Database & Analytics Flashcards

(26 cards)