SQL in AWS: Amazon RDS, Glue, and Redshift Flashcards

Question

What is Redshift’s concurrency scaling?

Answer 1

Concurrency scaling in Redshift adds cluster capacity dynamically to handle query spikes, maintaining performance under load. MySQL manages concurrency via connection pooling, but Redshift’s scaling is more robust, adapting to analytical workloads MySQL isn’t designed for. Example of enabling concurrency scaling aws redshift modify-cluster --cluster-identifier mycluster --concurrency-scaling-mode auto

Answer 2

RDS monitoring includes CloudWatch metrics, Enhanced Monitoring, and Performance Insights, tracking CPU, memory, and query performance. MySQL’s SHOW STATUS provides basic monitoring, but RDS integrates with AWS tools for richer, real-time insights into MySQL performance. Example of describing DB instances aws rds describe-db-instances --db-instance-identifier mydbinstance

Answer 3

Glue connections securely link to data sources like MySQL, storing credentials and network details for easy access in ETL jobs. MySQL connections are managed manually or via drivers. Glue centralizes this, enhancing security and usability for MySQL and beyond. Example of creating a Glue connection aws glue create-connection --connection-input Name=mysql-connection,ConnectionType=JDBC,ConnectionProperties={"JDBC_CONNECTION_URL":"jdbc:mysql://host:3306/db","USERNAME":"user","PASSWORD":"pass"}

Answer 4

AQUA (Advanced Query Accelerator) in Redshift uses hardware acceleration to speed up queries by offloading compute tasks. MySQL’s query caching is less advanced. AQUA provides a significant performance boost for Redshift’s analytical queries, unavailable in MySQL. Example of enabling AQUA aws redshift modify-cluster --cluster-identifier mycluster --aqua-configuration-status enabled

Answer 5

RDS supports minor (automatic) and major (manual) MySQL version upgrades, with major upgrades requiring downtime planning. Self-managed MySQL upgrades are manual and complex. RDS streamlines this, reducing risk and effort for MySQL version management. Example of upgrading an RDS instance aws rds modify-db-instance --db-instance-identifier mydbinstance --engine-version 8.0.23 --apply-immediately

Answer 6

Glue’s data lineage tracks data flow from source to target, providing visibility into transformations and dependencies in ETL processes. MySQL lacks native lineage tracking, requiring manual documentation. Glue enhances data governance for MySQL-sourced ETL workflows. Example of viewing lineage (conceptual) aws glue get-dataflow-graph --workflow-name myworkflow

Answer 7

Redshift’s sort keys order data on disk, while distribution keys spread it across nodes, both optimizing query performance in a distributed system. MySQL uses indexes but lacks distribution keys. Redshift’s keys are critical for analytics, unlike MySQL’s transactional design. -- Example of creating a table with sort and distribution keys CREATE TABLE customers ( customer_id INT, name VARCHAR(100), region VARCHAR(50) ) DISTKEY(region) SORTKEY(customer_id);

Answer 8

RDS encryption at rest uses AWS KMS and can be enabled at creation or later (with downtime). It also supports SSL for data in transit. MySQL supports encryption, but RDS integrates with KMS for easier key management, enhancing security over MySQL’s manual setup. Example of creating an encrypted RDS instance aws rds create-db-instance --db-instance-identifier mydbinstance --engine mysql --storage-encrypted --kms-key-id my-key

Answer 9

Glue’s data quality features let you define rules to monitor and ensure data reliability in ETL jobs, catching issues like inconsistencies. MySQL requires custom validation queries. Glue’s built-in checks provide a more automated, scalable approach for data quality. Example of using data quality in Glue (conceptual) dq_checks = DataQualityChecks.apply(frame=datasource0, ruleset="myruleset")

Answer 10

Redshift compiles SQL into optimized machine code, accelerating complex query execution, especially for analytics. MySQL’s query optimizer lacks this compilation step. Redshift’s approach offers a performance edge for large-scale queries over MySQL. -- Example of a complex query in Redshift SELECT a.*, b.total_sales FROM customers a JOIN ( SELECT customer_id, SUM(amount) as total_sales FROM orders GROUP BY customer_id ) b ON a.customer_id = b.customer_id;

Answer 11

RDS event notifications send alerts via SNS for events like backups or failovers, configurable per instance or category. MySQL lacks built-in event notifications. RDS integrates with AWS SNS, automating monitoring beyond MySQL’s capabilities. Example of subscribing to RDS events aws rds create-event-subscription --subscription-name myeventsub --sns-topic-arn arn:aws:sns:us-west-2:123456789012:mytopic --source-type db-instance --event-categories "availability,failure"

Answer 12

Glue integrates with Git for version control of ETL jobs, enabling collaboration and code management. MySQL scripts lack native version control. Glue’s Git support streamlines ETL development, complementing MySQL data workflows. Example of cloning a Glue job repository (conceptual) git clone https://git-codecommit.us-west-2.amazonaws.com/v1/repos/mygluejobs

Answer 13

Redshift’s data sharing allows live data access across clusters without copying, facilitating real-time analytics collaboration. MySQL requires replication or exports for sharing. Redshift’s approach is more efficient for large-scale, distributed data. -- Example of creating a datashare in Redshift CREATE DATASHARE myshare; ALTER DATASHARE myshare ADD SCHEMA public; GRANT USAGE ON DATASHARE myshare TO ACCOUNT '123456789012';

Answer 14

RDS achieves high availability with Multi-AZ deployments, automatically failing over to a standby instance in another AZ during outages. MySQL’s master-slave replication is manual. RDS automates this with synchronous replication, enhancing reliability over MySQL. Example of checking Multi-AZ status aws rds describe-db-instances --db-instance-identifier mydbinstance --query 'DBInstances[0].MultiAZ'

Answer 15

Glue uses Apache Spark for distributed data processing, enabling scalable ETL jobs with SQL or Python in a serverless environment. MySQL isn’t built for distributed processing. Glue’s Spark integration handles large-scale data transformations, complementing MySQL’s role. Example of using Spark in Glue spark_df = glueContext.create_dynamic_frame.from_catalog(database="mydatabase", table_name="mytable").toDF() spark_df.show()

Answer 16

Redshift automatically adjusts sort and distribution keys based on query patterns, optimizing performance without manual tuning. MySQL requires manual index management. Redshift’s automation reduces effort, enhancing analytics efficiency over MySQL. -- Example of enabling automatic table optimization ALTER TABLE mytable SET AUTO REFRESH = YES;

Answer 17

RDS supports manual snapshots and automated backups, useful for backups, cloning, or cross-account sharing, with point-in-time recovery options. MySQL uses mysqldump or binary logs for backups. RDS snapshots offer a managed, scalable alternative with finer recovery granularity. Example of creating a manual snapshot aws rds create-db-snapshot --db-instance-identifier mydbinstance --db-snapshot-identifier mysnapshot

Answer 18

Glue’s library includes pre-built transforms like DropFields or Join, simplifying ETL job creation without extensive coding. MySQL requires custom SQL for transformations. Glue’s library accelerates development, integrating with MySQL-sourced data. Example of using a Glue transform transformed_data = DropFields.apply(frame=datasource0, paths=["unwanted_column"])

Answer 19

Redshift’s query monitoring uses tables like STL_QUERY to track performance, helping identify bottlenecks in analytical queries. MySQL’s EXPLAIN analyzes queries, but Redshift’s tools are tailored for distributed systems, offering deeper insights than MySQL. -- Example of checking query performance in Redshift SELECT query, starttime, endtime, elapsed FROM stl_query ORDER BY starttime DESC LIMIT 10;

Answer 20

RDS supports cross-region read replicas for disaster recovery and low-latency access, replicating data across AWS regions. MySQL replication can span regions manually, but RDS automates this, simplifying setup and management over MySQL’s approach. Example of creating a cross-region read replica aws rds create-db-instance-read-replica --db-instance-identifier mycrossregionreplica --source-db-instance-identifier arn:aws:rds:us-west-2:123456789012:db:mydbinstance --region us-east-1

Answer 21

Glue integrates with Lake Formation for access control and governance, using the Data Catalog to manage permissions across data lakes. MySQL uses user privileges for access. Lake Formation extends this to centralized governance, enhancing MySQL data security in AWS. Example of granting permissions in Lake Formation aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::123456789012:role/MyRole --permissions "SELECT" --resource '{ "Table": { "DatabaseName": "mydatabase", "Name": "mytable" } }'

Answer 22

Redshift automatically runs VACUUM to reclaim space and sort data, maintaining performance without manual intervention. MySQL’s OPTIMIZE TABLE is manual. Redshift’s automation reduces maintenance overhead, unlike MySQL’s hands-on approach. -- Example of manually running VACUUM in Redshift VACUUM FULL mytable;

Answer 23

RDS Proxy pools database connections, improving scalability and reducing failover times for applications connecting to RDS. MySQL connection pooling is app-managed. RDS Proxy offers a serverless, integrated solution, enhancing MySQL performance in AWS. Example of creating an RDS Proxy aws rds create-db-proxy --db-proxy-name myproxy --engine-family MYSQL --auth '[{"AuthScheme": "SECRETS","SecretArn": "arn:aws:secretsmanager:us-west-2:123456789012:secret:mysecret","IAMAuth": "DISABLED"}]' --role-arn arn:aws:iam::123456789012:role/MyProxyRole --vpc-subnet-ids subnet-12345678

Answer 24

Glue processes streaming data from Kinesis or Kafka, enabling real-time ETL with SQL transformations in a managed environment. MySQL isn’t designed for streaming. Glue’s streaming support extends MySQL data into real-time analytics workflows. Example of a Glue streaming job (conceptual) streaming_source = glueContext.create_dynamic_frame.from_catalog(database="mydatabase", table_name="mystream")

Answer 25

Redshift integrates with SageMaker, allowing ML models to run directly on warehouse data for predictive analytics. MySQL lacks native ML support. Redshift’s integration offers advanced analytics capabilities unavailable in MySQL. -- Example of using ML in Redshift (conceptual) SELECT predict_customer_churn(customer_id) FROM customers;

Answer 26

Redshift cost optimization involves using reserved nodes, pausing clusters when idle, and enabling concurrency scaling only as needed. MySQL costs depend on server size, managed manually. Redshift’s pricing requires strategic management, unlike MySQL’s simpler model. Example of pausing a Redshift cluster aws redshift pause-cluster --cluster-identifier mycluster

SQL in AWS: Amazon RDS, Glue, and Redshift Flashcards

(50 cards)