SQL in AWS: Amazon RDS, Glue, and Redshift Flashcards

(50 cards)

1
Q

What is Amazon RDS and how does it support MySQL?

A

Amazon RDS (Relational Database Service) is a managed database service that simplifies the setup, operation, and scaling of relational databases in the cloud. It supports several database engines, including MySQL. With RDS, you can easily create a MySQL instance, manage backups, apply patches, and scale your database without worrying about the underlying infrastructure. RDS handles routine database tasks, allowing you to focus on your application.

RDS provides a fully managed MySQL environment, ensuring compatibility with standard MySQL features while adding cloud-specific benefits like automated backups, point-in-time recovery, and high availability through Multi-AZ deployments. Unlike a self-managed MySQL instance where you’d handle server maintenance, RDS abstracts these complexities, making it ideal for production workloads.

– Example of creating a table in a MySQL RDS instance
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100)
);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you secure a MySQL database in Amazon RDS?

A

Securing a MySQL database in RDS involves multiple layers: (1) VPC and Security Groups: Place your RDS instance in a Virtual Private Cloud (VPC) and configure security groups to control traffic; (2) IAM Roles: Use AWS Identity and Access Management (IAM) to manage access; (3) Encryption: Enable encryption at rest with AWS KMS and in transit with SSL/TLS; (4) Database Authentication: Use strong passwords or IAM database authentication.

MySQL offers basic security features like user privileges and SSL connections, but RDS enhances these with AWS-specific tools. For instance, while MySQL requires manual SSL configuration, RDS simplifies it with pre-configured certificates and IAM integration, offering a more robust security model.

Example of enabling SSL for MySQL connections
mysql -h myrdsinstance.abcdef123456.us-west-2.rds.amazonaws.com –ssl-ca=rds-ca-2019-root.pem –ssl-mode=VERIFY_IDENTITY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do automated backups work in Amazon RDS for MySQL?

A

RDS provides automated backups with daily snapshots taken during a backup window, retained for up to 35 days. It also supports continuous backups for point-in-time recovery to any second within that period. These features ensure data durability and easy recovery.

In a self-managed MySQL setup, backups require tools like mysqldump or custom scripts, often scheduled via cron jobs. RDS eliminates this overhead by automating the process, offering more granular recovery options than MySQL’s native binary log-based recovery.

Example of restoring a MySQL RDS instance to a point in time
aws rds restore-db-instance-to-point-in-time –source-db-instance-identifier mydbinstance –target-db-instance-identifier myrestoredinstance –restore-time 2025-06-20T23:45:00Z

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is AWS Glue and how does it relate to SQL?

A

AWS Glue is a fully managed ETL (Extract, Transform, Load) service that simplifies data preparation for analytics. It can crawl data sources, build a catalog, and generate ETL code. Glue supports SQL for transformations, allowing you to write queries to process data efficiently.

Glue can use MySQL as a data source or target, enabling SQL-based transformations before loading data elsewhere. Unlike MySQL, which focuses on transactional queries, Glue’s SQL capabilities are geared toward batch processing and data integration across AWS services.

Example of a Glue job script using SQL
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

args = getResolvedOptions(sys.argv, [‘JOB_NAME’])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args[‘JOB_NAME’], args)

datasource0 = glueContext.create_dynamic_frame.from_catalog(database=”mydatabase”, table_name=”mytable”)
sql_transform = datasource0.sql(“SELECT id, name FROM mytable WHERE age > 18”)
datasink4 = glueContext.write_dynamic_frame.from_options(frame=sql_transform, connection_type=”s3”, connection_options={“path”: “s3://mybucket/output”}, format=”json”)
job.commit()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Glue Data Catalog and how is it used?

A

The Glue Data Catalog is a centralized metadata repository that stores information about data assets across AWS. It acts as a hive metastore for services like Glue, EMR, and Athena, enabling data discovery and management.

By crawling MySQL databases, Glue populates the Data Catalog, making MySQL tables queryable via SQL in tools like Athena. MySQL itself doesn’t offer a comparable metadata service, requiring manual schema tracking.

Example of creating a Glue crawler for a MySQL database
aws glue create-crawler –name mysql-crawler –role GlueServiceRole –database-name mydatabase –targets “JdbcTargets=[{ConnectionName=mysql-connection, Path=mydatabase/%}]”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Amazon Redshift and how does it differ from MySQL?

A

Amazon Redshift is a fully managed data warehouse service designed for large-scale analytics. It uses columnar storage and parallel processing, contrasting with MySQL’s row-based, transactional focus. Redshift excels at complex analytical queries over massive datasets.

MySQL is suited for OLTP workloads, while Redshift targets OLAP. Redshift’s SQL dialect, based on PostgreSQL, differs from MySQL’s, offering features like advanced analytics that MySQL lacks natively.

– Example of creating a table in Redshift
CREATE TABLE sales (
sale_id INT,
product VARCHAR(50),
quantity INT,
sale_date DATE
)
DISTKEY(product)
SORTKEY(sale_date);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you load data into Amazon Redshift from S3?

A

Redshift uses the COPY command to efficiently load data from S3, supporting formats like CSV or JSON. This bulk ingestion method leverages Redshift’s distributed architecture for speed.

MySQL’s LOAD DATA INFILE serves a similar purpose but operates on a single server. Redshift’s COPY is optimized for parallel loading across nodes, far surpassing MySQL’s capabilities for large datasets.

– Example of loading data from S3 to Redshift
COPY sales FROM ‘s3://mybucket/sales_data.csv’
IAM_ROLE ‘arn:aws:iam::123456789012:role/MyRedshiftRole’
FORMAT AS CSV;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are distribution styles in Redshift and why are they important?

A

Distribution styles (EVEN, KEY, ALL) dictate how data is spread across Redshift nodes. Choosing the right style reduces data movement during queries, boosting performance for joins and aggregations.

MySQL lacks data distribution as it’s not a clustered database. Redshift’s distribution styles are a key optimization absent in MySQL, tailored for distributed analytics.

– Example of creating a table with KEY distribution
CREATE TABLE orders (
order_id INT,
customer_id INT,
order_date DATE
)
DISTKEY(customer_id);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are window functions in Redshift and how do they compare to MySQL?

A

Window functions in Redshift, like ROW_NUMBER() or RANK(), compute values across rows without collapsing the result set. They’re powerful for analytics, supported by Redshift’s PostgreSQL-based SQL.

MySQL added window functions in version 8.0, aligning its capabilities with Redshift’s. However, Redshift’s implementation benefits from its distributed architecture, handling larger datasets more efficiently.

– Example of using ROW_NUMBER() in Redshift
SELECT sale_id, product, quantity,
ROW_NUMBER() OVER (PARTITION BY product ORDER BY quantity DESC) as rank
FROM sales;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can AWS Glue be used with Amazon Redshift?

A

Glue extracts data from sources, transforms it with SQL or Python, and loads it into Redshift. The Glue Data Catalog also supports Redshift Spectrum for querying external data.

Glue can extract from MySQL, but its Redshift integration excels in data warehousing. MySQL lacks native ETL orchestration, making Glue a complementary tool for MySQL-to-Redshift pipelines.

Example of a Glue job loading data into Redshift
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

args = getResolvedOptions(sys.argv, [‘JOB_NAME’])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args[‘JOB_NAME’], args)

datasource0 = glueContext.create_dynamic_frame.from_catalog(database=”mydatabase”, table_name=”mytable”)
transformed_data = datasource0.select_fields([“id”, “name”])
glueContext.write_dynamic_frame.from_jdbc_conf(frame=transformed_data,
catalog_connection=”redshift-connection”,
connection_options={“dbtable”: “public.mytable”, “database”: “mydatabase”},
redshift_tmp_dir=args[“TempDir”])
job.commit()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a Multi-AZ deployment in Amazon RDS?

A

Multi-AZ deployments enhance availability by replicating your RDS database to a standby instance in another Availability Zone. If the primary fails, RDS automatically fails over to the standby.

MySQL requires manual replication setup for high availability. RDS automates this with synchronous replication, reducing downtime compared to MySQL’s asynchronous options.

Example of enabling Multi-AZ for an RDS instance
aws rds modify-db-instance –db-instance-identifier mydbinstance –multi-az

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are Glue job bookmarks?

A

Glue job bookmarks track processed data across job runs, enabling incremental processing of new data only. This optimizes ETL jobs handling growing datasets.

MySQL might use timestamps or IDs for incremental updates, but Glue automates this with bookmarks, integrating seamlessly with AWS data sources beyond MySQL’s scope.

Example of enabling job bookmarks in Glue
job = Job(glueContext)
job.init(args[‘JOB_NAME’], args, bookmark=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Redshift workload management (WLM)?

A

WLM in Redshift manages query concurrency and memory by defining queues, prioritizing workloads to optimize cluster performance.

MySQL uses connection limits and timeouts for concurrency, but Redshift’s WLM offers finer control, critical for managing diverse analytical workloads unlike MySQL’s transactional focus.

– Example of setting up a WLM queue in Redshift
ALTER WLM CONFIGURATION
SET query_concurrency = 5
FOR queue_name = ‘default’;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are RDS parameter groups?

A

Parameter groups in RDS manage database engine settings, allowing customization for performance or features. You can create and apply custom groups to instances.

MySQL uses my.cnf for configuration, requiring manual edits. RDS parameter groups offer a managed, scalable alternative, simplifying MySQL tuning in the cloud.

Example of creating a custom parameter group
aws rds create-db-parameter-group –db-parameter-group-name myparamgroup –db-parameter-group-family mysql8.0 –description “My custom parameter group”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do Glue crawlers detect schema changes?

A

Glue crawlers detect schema changes in data sources and update the Data Catalog, ensuring metadata reflects the latest structure without manual updates.

MySQL requires manual schema management. Glue’s automation extends beyond MySQL, supporting dynamic schema evolution across diverse AWS data stores.

Example of scheduling a Glue crawler
aws glue start-crawler –name mysql-crawler –schedule “cron(0 12 * * ? *)”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are materialized views in Redshift?

A

Materialized views in Redshift store precomputed query results, speeding up repeated analytical queries. They can be refreshed as needed.

MySQL lacks native materialized views, relying on triggers or jobs to simulate them. Redshift’s built-in support offers a performance edge for analytics.

– Example of creating a materialized view in Redshift
CREATE MATERIALIZED VIEW sales_summary AS
SELECT product, SUM(quantity) as total_quantity
FROM sales
GROUP BY product;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are RDS read replicas?

A

RDS read replicas are copies of the primary database for read-only traffic, enhancing scalability. They can be promoted to standalone instances if needed.

MySQL supports replication, but RDS automates read replica management, reducing setup complexity and improving scalability over MySQL’s manual approach.

Example of creating a read replica
aws rds create-db-instance-read-replica –db-instance-identifier myreadreplica –source-db-instance-identifier mydbinstance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are Glue triggers and workflows?

A

Glue triggers automate ETL job execution based on schedules or events, while workflows orchestrate multiple jobs and dependencies for complex pipelines.

MySQL uses cron jobs or event schedulers for automation. Glue’s managed workflows provide a broader, scalable solution beyond MySQL’s capabilities.

Example of creating a Glue trigger
aws glue create-trigger –name mytrigger –type SCHEDULED –schedule “cron(0 12 * * ? *)” –actions JobName=myjob

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are sort keys in Redshift?

A

Sort keys in Redshift physically order data on disk, optimizing query performance by reducing sorting overhead, especially for range-based queries.

MySQL uses indexes for sorting, but Redshift’s sort keys are integrated into table design, offering superior performance for large-scale analytics.

– Example of creating a table with a sort key
CREATE TABLE events (
event_id INT,
event_time TIMESTAMP
)
SORTKEY(event_time);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is RDS Performance Insights?

A

Performance Insights in RDS monitors database performance, identifying bottlenecks and optimizing queries with visual tools and metrics.

MySQL offers EXPLAIN and slow query logs, but Performance Insights provides a more comprehensive, user-friendly interface, enhancing MySQL monitoring in RDS.

Example of enabling Performance Insights
aws rds modify-db-instance –db-instance-identifier mydbinstance –enable-performance-insights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How does Glue handle schema evolution?

A

Glue detects schema changes and updates the Data Catalog, supporting schema evolution without disrupting ETL jobs.

MySQL requires manual schema migrations. Glue’s automated evolution simplifies managing dynamic data sources, complementing MySQL’s static structure.

Example of using Glue’s schema evolution
glueContext.create_dynamic_frame.from_catalog(database=”mydatabase”, table_name=”mytable”, push_down_predicate=”partition_key=2025”)

22
Q

What is Redshift Spectrum?

A

Redshift Spectrum lets you query S3 data using SQL without loading it into Redshift, leveraging the Glue Data Catalog for metadata.

MySQL’s federated tables offer similar external querying, but Redshift Spectrum scales better for data lakes, far exceeding MySQL’s scope.

– Example of querying S3 data with Redshift Spectrum
SELECT * FROM s3.my_external_table WHERE year=2025;

23
Q

How do you scale an RDS MySQL instance?

A

RDS MySQL scales vertically by changing instance types or horizontally with read replicas. Storage can also auto-scale.

Self-managed MySQL scaling is manual and labor-intensive. RDS automates these processes, making scalability more accessible than in MySQL alone.

Example of modifying an RDS instance type
aws rds modify-db-instance –db-instance-identifier mydbinstance –db-instance-class db.m5.large –apply-immediately

24
Q

What is Glue’s machine learning transform?

A

Glue’s ML transforms use machine learning to detect data quality issues like duplicates, automating cleanup tasks in ETL jobs.

MySQL requires custom SQL for such tasks. Glue’s ML capabilities extend beyond MySQL, offering advanced data processing automation.

Example of using FindMatches ML transform
find_matches = FindMatches.apply(frame=datasource0, transform_id=”tfm-123”, transformation_ctx=”find_matches”)

25
What is Redshift’s concurrency scaling?
Concurrency scaling in Redshift adds cluster capacity dynamically to handle query spikes, maintaining performance under load. MySQL manages concurrency via connection pooling, but Redshift’s scaling is more robust, adapting to analytical workloads MySQL isn’t designed for. Example of enabling concurrency scaling aws redshift modify-cluster --cluster-identifier mycluster --concurrency-scaling-mode auto
26
How do you monitor an RDS MySQL instance?
RDS monitoring includes CloudWatch metrics, Enhanced Monitoring, and Performance Insights, tracking CPU, memory, and query performance. MySQL’s SHOW STATUS provides basic monitoring, but RDS integrates with AWS tools for richer, real-time insights into MySQL performance. Example of describing DB instances aws rds describe-db-instances --db-instance-identifier mydbinstance
27
What is Glue’s connection feature?
Glue connections securely link to data sources like MySQL, storing credentials and network details for easy access in ETL jobs. MySQL connections are managed manually or via drivers. Glue centralizes this, enhancing security and usability for MySQL and beyond. Example of creating a Glue connection aws glue create-connection --connection-input Name=mysql-connection,ConnectionType=JDBC,ConnectionProperties={"JDBC_CONNECTION_URL":"jdbc:mysql://host:3306/db","USERNAME":"user","PASSWORD":"pass"}
28
What is Redshift’s AQUA feature?
AQUA (Advanced Query Accelerator) in Redshift uses hardware acceleration to speed up queries by offloading compute tasks. MySQL’s query caching is less advanced. AQUA provides a significant performance boost for Redshift’s analytical queries, unavailable in MySQL. Example of enabling AQUA aws redshift modify-cluster --cluster-identifier mycluster --aqua-configuration-status enabled
29
How do you handle RDS MySQL upgrades? Back:
RDS supports minor (automatic) and major (manual) MySQL version upgrades, with major upgrades requiring downtime planning. Self-managed MySQL upgrades are manual and complex. RDS streamlines this, reducing risk and effort for MySQL version management. Example of upgrading an RDS instance aws rds modify-db-instance --db-instance-identifier mydbinstance --engine-version 8.0.23 --apply-immediately
30
What is Glue’s data lineage feature?
Glue’s data lineage tracks data flow from source to target, providing visibility into transformations and dependencies in ETL processes. MySQL lacks native lineage tracking, requiring manual documentation. Glue enhances data governance for MySQL-sourced ETL workflows. Example of viewing lineage (conceptual) aws glue get-dataflow-graph --workflow-name myworkflow
31
What are Redshift’s sort and distribution keys?
Redshift’s sort keys order data on disk, while distribution keys spread it across nodes, both optimizing query performance in a distributed system. MySQL uses indexes but lacks distribution keys. Redshift’s keys are critical for analytics, unlike MySQL’s transactional design. -- Example of creating a table with sort and distribution keys CREATE TABLE customers ( customer_id INT, name VARCHAR(100), region VARCHAR(50) ) DISTKEY(region) SORTKEY(customer_id);
32
How do you enable RDS encryption?
RDS encryption at rest uses AWS KMS and can be enabled at creation or later (with downtime). It also supports SSL for data in transit. MySQL supports encryption, but RDS integrates with KMS for easier key management, enhancing security over MySQL’s manual setup. Example of creating an encrypted RDS instance aws rds create-db-instance --db-instance-identifier mydbinstance --engine mysql --storage-encrypted --kms-key-id my-key
33
What is Glue’s data quality feature?
Glue’s data quality features let you define rules to monitor and ensure data reliability in ETL jobs, catching issues like inconsistencies. MySQL requires custom validation queries. Glue’s built-in checks provide a more automated, scalable approach for data quality. Example of using data quality in Glue (conceptual) dq_checks = DataQualityChecks.apply(frame=datasource0, ruleset="myruleset")
34
What is Redshift’s query compilation?
Redshift compiles SQL into optimized machine code, accelerating complex query execution, especially for analytics. MySQL’s query optimizer lacks this compilation step. Redshift’s approach offers a performance edge for large-scale queries over MySQL. -- Example of a complex query in Redshift SELECT a.*, b.total_sales FROM customers a JOIN ( SELECT customer_id, SUM(amount) as total_sales FROM orders GROUP BY customer_id ) b ON a.customer_id = b.customer_id;
35
How do you set up RDS event notifications?
RDS event notifications send alerts via SNS for events like backups or failovers, configurable per instance or category. MySQL lacks built-in event notifications. RDS integrates with AWS SNS, automating monitoring beyond MySQL’s capabilities. Example of subscribing to RDS events aws rds create-event-subscription --subscription-name myeventsub --sns-topic-arn arn:aws:sns:us-west-2:123456789012:mytopic --source-type db-instance --event-categories "availability,failure"
36
What is Glue’s Git integration?
Glue integrates with Git for version control of ETL jobs, enabling collaboration and code management. MySQL scripts lack native version control. Glue’s Git support streamlines ETL development, complementing MySQL data workflows. Example of cloning a Glue job repository (conceptual) git clone https://git-codecommit.us-west-2.amazonaws.com/v1/repos/mygluejobs
37
What is Redshift’s data sharing feature?
Redshift’s data sharing allows live data access across clusters without copying, facilitating real-time analytics collaboration. MySQL requires replication or exports for sharing. Redshift’s approach is more efficient for large-scale, distributed data. -- Example of creating a datashare in Redshift CREATE DATASHARE myshare; ALTER DATASHARE myshare ADD SCHEMA public; GRANT USAGE ON DATASHARE myshare TO ACCOUNT '123456789012';
38
How do you configure RDS for high availability?
RDS achieves high availability with Multi-AZ deployments, automatically failing over to a standby instance in another AZ during outages. MySQL’s master-slave replication is manual. RDS automates this with synchronous replication, enhancing reliability over MySQL. Example of checking Multi-AZ status aws rds describe-db-instances --db-instance-identifier mydbinstance --query 'DBInstances[0].MultiAZ'
39
What is Glue’s Spark integration?
Glue uses Apache Spark for distributed data processing, enabling scalable ETL jobs with SQL or Python in a serverless environment. MySQL isn’t built for distributed processing. Glue’s Spark integration handles large-scale data transformations, complementing MySQL’s role. Example of using Spark in Glue spark_df = glueContext.create_dynamic_frame.from_catalog(database="mydatabase", table_name="mytable").toDF() spark_df.show()
40
What is Redshift’s automatic table optimization?
Redshift automatically adjusts sort and distribution keys based on query patterns, optimizing performance without manual tuning. MySQL requires manual index management. Redshift’s automation reduces effort, enhancing analytics efficiency over MySQL. -- Example of enabling automatic table optimization ALTER TABLE mytable SET AUTO REFRESH = YES;
41
How do you handle RDS database snapshots?
RDS supports manual snapshots and automated backups, useful for backups, cloning, or cross-account sharing, with point-in-time recovery options. MySQL uses mysqldump or binary logs for backups. RDS snapshots offer a managed, scalable alternative with finer recovery granularity. Example of creating a manual snapshot aws rds create-db-snapshot --db-instance-identifier mydbinstance --db-snapshot-identifier mysnapshot
42
What is Glue’s data transformation library?
Glue’s library includes pre-built transforms like DropFields or Join, simplifying ETL job creation without extensive coding. MySQL requires custom SQL for transformations. Glue’s library accelerates development, integrating with MySQL-sourced data. Example of using a Glue transform transformed_data = DropFields.apply(frame=datasource0, paths=["unwanted_column"])
43
What is Redshift’s query monitoring?
Redshift’s query monitoring uses tables like STL_QUERY to track performance, helping identify bottlenecks in analytical queries. MySQL’s EXPLAIN analyzes queries, but Redshift’s tools are tailored for distributed systems, offering deeper insights than MySQL. -- Example of checking query performance in Redshift SELECT query, starttime, endtime, elapsed FROM stl_query ORDER BY starttime DESC LIMIT 10;
44
How do you set up RDS for cross-region replication?
RDS supports cross-region read replicas for disaster recovery and low-latency access, replicating data across AWS regions. MySQL replication can span regions manually, but RDS automates this, simplifying setup and management over MySQL’s approach. Example of creating a cross-region read replica aws rds create-db-instance-read-replica --db-instance-identifier mycrossregionreplica --source-db-instance-identifier arn:aws:rds:us-west-2:123456789012:db:mydbinstance --region us-east-1
45
What is Glue’s integration with AWS Lake Formation?
Glue integrates with Lake Formation for access control and governance, using the Data Catalog to manage permissions across data lakes. MySQL uses user privileges for access. Lake Formation extends this to centralized governance, enhancing MySQL data security in AWS. Example of granting permissions in Lake Formation aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::123456789012:role/MyRole --permissions "SELECT" --resource '{ "Table": { "DatabaseName": "mydatabase", "Name": "mytable" } }'
46
What is Redshift’s automatic vacuuming?
Redshift automatically runs VACUUM to reclaim space and sort data, maintaining performance without manual intervention. MySQL’s OPTIMIZE TABLE is manual. Redshift’s automation reduces maintenance overhead, unlike MySQL’s hands-on approach. -- Example of manually running VACUUM in Redshift VACUUM FULL mytable;
47
How do you use RDS Proxy?
RDS Proxy pools database connections, improving scalability and reducing failover times for applications connecting to RDS. MySQL connection pooling is app-managed. RDS Proxy offers a serverless, integrated solution, enhancing MySQL performance in AWS. Example of creating an RDS Proxy aws rds create-db-proxy --db-proxy-name myproxy --engine-family MYSQL --auth '[{"AuthScheme": "SECRETS","SecretArn": "arn:aws:secretsmanager:us-west-2:123456789012:secret:mysecret","IAMAuth": "DISABLED"}]' --role-arn arn:aws:iam::123456789012:role/MyProxyRole --vpc-subnet-ids subnet-12345678
48
What is Glue’s support for streaming data?
Glue processes streaming data from Kinesis or Kafka, enabling real-time ETL with SQL transformations in a managed environment. MySQL isn’t designed for streaming. Glue’s streaming support extends MySQL data into real-time analytics workflows. Example of a Glue streaming job (conceptual) streaming_source = glueContext.create_dynamic_frame.from_catalog(database="mydatabase", table_name="mystream")
49
What is Redshift’s support for machine learning? Back:
Redshift integrates with SageMaker, allowing ML models to run directly on warehouse data for predictive analytics. MySQL lacks native ML support. Redshift’s integration offers advanced analytics capabilities unavailable in MySQL. -- Example of using ML in Redshift (conceptual) SELECT predict_customer_churn(customer_id) FROM customers;
50
How do you optimize Redshift for cost?
Redshift cost optimization involves using reserved nodes, pausing clusters when idle, and enabling concurrency scaling only as needed. MySQL costs depend on server size, managed manually. Redshift’s pricing requires strategic management, unlike MySQL’s simpler model. Example of pausing a Redshift cluster aws redshift pause-cluster --cluster-identifier mycluster