Databases and Analytics Flashcards
Amazon Relational Database Service (RDS)
- RDS uses EC2 instances, so you must choose an instance
family/type - Relational databases are known as Structured Query Language
(SQL) databases - RDS is an Online Transaction Processing (OLTP) type of database
- Easy to setup, highly available, fault tolerant, and scalable
RDS Encryption
- Can encrypt your Amazon RDS instances and snapshots at rest
- Encryption uses AWS Key Management Service (KMS)
RDS DB support types?
SQL Server, Oracle, MySQL Server, PostgreSQL, Aurora,
MariaDB
RDS scaling measures and DR?
- Scales up by increasing instance size (compute and storage)
- Read replicas option for read heavy workloads (scales out for
reads/queries only) - Disaster recovery with Multi-AZ option
Amazon Aurora
- Amazon Aurora is an AWS database offering in the RDS
family - Amazon Aurora is a MySQL and PostgreSQLcompatible relational database built for the cloud
- Amazon Aurora features a distributed, fault-tolerant, self healing storage system that auto-scales up to 128TB per database instance
Amazon DynamoDB
- Fully managed NoSQL database service
- Key/value store and document store
- It is a non-relational, key-value type of database
- Fully serverless service
- Push button scaling
Amazon DynamoDB features and benefits
Serverless - Fully managed, fault tolerant, service
Highly available - 99.99% availability SLA – 99.999% for Global Tables
NoSQL type of database with Name / Value
structure - Flexible schema, good for when data is not well structured or unpredictable
Horizontal scaling - Seamless scalability to any scale with push button scaling or Auto Scaling
DynamoDB Accelerator (DAX) - Fully managed in-memory cache for DynamoDB that increases performance (microsecond latency)
Backup - Point-in-time recovery down to the second in last 35 days; On-demand backup and restore
Global Tables - Fully managed multi-region, multi-master solution
Amazon RedShift
- RedShift is a SQL based data warehouse used for analytics
applications - RedShift is a relational database that is used for Online
Analytics Processing (OLAP) use cases - RedShift uses Amazon EC2 instances, so you must choose an
instance family/type - RedShift always keeps three copies of your data
- RedShift provides continuous/incremental backups
Amazon EMR
- Managed cluster platform that simplifies running big data
frameworks including Apache Hadoop and Apache Spark - Used for processing data for analytics and business
intelligence - Can also be used for transforming and moving large amounts
of data - Performs extract, transform, and load (ETL) functions
Amazon ElastiCache
- Fully managed implementations Redis and Memcached
- ElastiCache is a key/value store
- In-memory database offering high performance and low
latency - Can be put in front of databases such as RDS and DynamoDB
Amazon Athena
- Athena queries data in S3 using SQL
- Can be connected to other data sources with Lambda
- Data can be in CSV, TSV, JSON, Parquet and ORC formats
- Uses a managed Data Catalog (AWS Glue) to store
information and schemas about the databases and tables
AWS Glue
- Fully managed extract, transform and load (ETL) service
- Used for preparing data for analytics
- AWS Glue runs the ETL jobs on a fully managed, scale-out
Apache Spark environment - Works with data lakes (e.g. data on S3), data warehouses
(including RedShift), and data stores (including RDS or EC2
databases)
Amazon Kinesis Data Streams
- Producers send data which is stored in shards for up to 7
days - Consumers process the data and save to another service
Amazon Kinesis Data Firehose
- No shards, completely automated and elastically scalable
- Saves data directly to another service such as S3, Splunk,
RedShift, or Elasticsearch
Amazon Kinesis Data Analytics
- Provides real-time SQL processing for streaming data
AWS Data Pipeline
AWS Data Pipeline
* Processes and moves data between different AWS compute and
storage services
* Save results to services including S3, RDS, DynamoDB, and EMR
Amazon QuickSight
Amazon QuickSight
* Business intelligence (BI) service
* Create and publish interactive BI dashboards for Machine
Learning-powered insights
Amazon Neptune
Amazon Neptune
* Fully managed graph database service
Amazon DocumentDB
Amazon DocumentDB
* Fully managed document database service (non-relational)
* Supports MongoDB workloads
* Queries and indexes JSON data
Amazon QLDB
- Fully managed ledger database for immutable change history
- Provides cryptographically verifiable transaction logging
Amazon Managed Blockchain
- Fully managed service for joining public and private networks
using Hyperledger Fabric and Ethereum
AWS Migration Hub
- Provides a single location to track the progress of application
migrations across multiple AWS and partner solutions
AWS Database Migration Service (DMS)
- AWS Database Migration Service helps you migrate
databases to AWS quickly and securely. - The source database remains fully operational during the
migration, minimizing downtime to applications that rely on
the database
AWS Server Migration Service (SMS)
- Migrates servers and virtual machines to Amazon EC2
- Agentless service which makes it easier and faster for you to
migrate thousands of on-premises workloads to AWS - Automate, schedule, and track incremental replications of
live server volumes