Database Specialty - Database Migration, DMS and SCT Flashcards
1
Q
Why database migration? (Use cases)
A
- Move from legacy to cloud
- Switch the database solution
- Replicate databases across regions
- Replicate to streaming platforms
- In place DB upgrades
- Archive data to S3
2
Q
Migration scenarios
A
- On-premise à AWS cloud
- Relational à Non-relational
- DBs hosted on EC2 à Managed AWS services
3
Q
Migration tools
A
- Native tools (e.g. mysqldump)
- SCT (Schema Conversion Tool)
- AWS DMS (Database Migration Service)
- Migration Playbooks (migration templates + best practices)
4
Q
DMS – Database Migration Service
A
- Quickly and securely migrate databases to AWS cloud, resilient, self-healing
- The source database remains available during the migration
- Minimal to zero-downtime
- Costs as low as $3 / TB
- Supports homogeneous and heterogeneous migrations
- Continuous data replication using CDC (Change data capture)
5
Q
DMS architecture
A
- Endpoints include DB connection and credentials information
- Replication instance is an EC2 instance running in a VPC, can be Multi-AZ
- Replication task specifies actual data tables to migrate and data transformation rules
6
Q
DMS – Database Migration Service > Points
A
- DMS can create target tables / primary keys if they don’t exist on the target
- Tables/keys can also be created manually
- Creates only those objects required that are required for efficient migration (e.g. primary keys / unique indexes etc.) –This is called as Basic Schema Copy
- Remaining DB elements/schema can be migrated using other tools (e.g. secondary indexes, FK constraints, or data defaults)
- e.g. use SCT or native schema export tools
7
Q
SCT – AWS Schema Conversion Tool
A
- Converts DB/DW schema from source to target (including procedures /
views / secondary indexes / FK and constraints) - Mainly for heterogeneous DB migrations and DW migrations
- For homogeneous migrations, can use it for migration assessment
- Can be installed on EC2 or locally on your computer (closer to source DB)
- Application conversion – can convert SQL statements embedded in the
application code - Script conversion – can convert ETL scripts (Oracle / Microsoft / Teradata
scripts à Aurora / Redshift scripts) - Can optimize schemas in Redshift (recommends distribution and sort keys,
different from native Redshift advisor)
8
Q
SCT – AWS Schema Conversion Tool cont
A
- Converts DB/DW schema from source to target (including procedures /
views / secondary indexes / FK and constraints) - Mainly for heterogeneous DB migrations and DW migrations
- For homogeneous migrations, can use it for migration assessment
- Can be installed on EC2 or locally on your computer (closer to source DB)
- Application conversion – can convert SQL statements embedded in the
application code - Script conversion – can convert ETL scripts (Oracle / Microsoft / Teradata
scripts à Aurora / Redshift scripts) - Can optimize schemas in Redshift (recommends distribution and sort keys,
different from native Redshift advisor)
9
Q
SCT – AWS Schema Conversion Tool Provides
A
- Provides assessment report
- Granular report showing which objects can be converted automatically and which
need manual intervention (with color codes) - Can be used to create a business case for migration
- Identifies issues/limitations / actions for schema conversion
- Granular report showing which objects can be converted automatically and which
- For objects that cannot be created manually, SCT
provides guidance to help you create the equivalent
schema manually
10
Q
Workload Qualification Framework (WQF)
A
- Standalone app that is included with SCT
- For workload assessment
- Qualifies for OLTP workloads based on
- Ease/complexity of migration
- Estimated time/effort required
- Recommends migration strategy and tools
- Integrated with SCT and DMS
- Just provide DB connection strings and WQF can generate the assessment reports
- Available as EC2 AMI
11
Q
DMS tasks (replication tasks)
A
- Lets you specify migration options
- Table mapping (transformation rules)
- Filters (to migrate data selectively)
- Migration type – full load, CDC, or both
- Supports data validation
- Monitoring options
- task status
- task’s control table
- Allows reloading table data (in case of errors)
12
Q
DMS task assessment reports
A
- For pre-migration assessment (optional, recommended)
- To run an assessment, a task must be in a Stopped state
- Lets you know potential issues that might occur during a given migration
- Includes JSON output containing summary and details of unsupported data
types
13
Q
DMS migration types - Full Load
A
- Migrate existing data
- performs a one-time migration from source to target
14
Q
DMS migration types - CDC Only
A
- Ongoing replication
- Only replicates data changes from source to target without migrating
existing data - CDC = Change data capture
15
Q
DMS migration types - Full Load + CDC
A
- Migrate existing data and replicate ongoing changes
- First, performs a one- time migration from source to target
- Then, continues replicating data changes
16
Q
DMS – Good things to know
A
- Recommended to create only the primary keys before full load
- Secondary keys and FKs should be created only after full load is complete
- In full load, multiple tables are loaded in parallel and this can
create issues if you have FKs (hence create them later) - Secondary keys can slow down the full load operation (hence create them later)
- Enable Multi-AZ for ongoing replication (for high availability and failover support)
- DMS can read/write from/to encrypted DBs
17
Q
Migrating Large Tables
A
- Break the migration into multiple tasks
- Use row filtering on a key or partition key to create multiple tasks
- Example – if you have integer primary key from 1 to 8000,000
- Create 8 tasks using row filtering to migrate 1000,000 records each
- Example 2 – if you have a date field as primary key
- Partition the data by month using row filtering
- Use full load tasks to migrate data of previous months
- Use full load + CDC to migrate current month data
18
Q
DMS Migrating LOBs / CLOBs
A
- LOB = Large Binary Object
- CLOB = Character Large Object
- DMS migrates LOB data in two phases
- creates a new row in the target table and populates all data except the LOB data
- updates the row with the LOB data
- LOB options
- Don’t include LOB columns – LOB data is ignored
- Full LOB mode – migrates all LOB data, piecewise in chunks (you provide LOB
chunk size) - Limited LOB mode – truncates each LOB to Max LOB size (is faster)
19
Q
Best practices for handling LOBs with DMS
A
- Full LOB mode performance is slowest, but data is not truncated
- For LOBs < few MBs
- Use Limited LOB mode with Max LOB size = largest LOB size in your DB
- Allocate enough memory to the replication instance
- For LOBs > few MBs
- Create a separate DMS task with Full LOB mode
- Ideally, separate task on a new replication instance
- LOB chunk size should allow DMS to capture most LOBs in as few chunks as possible
- Inline LOB mode
- Combines the advantages of both modes (full LOB and limited LOB mode)
- Migrate without truncating the data or slowing the task’s performance
- You specify InlineLobMaxSize (Full LOB mode must be set to true)
- Small LOBs are transferred inline, large LOBs by using source table lookup
- Supported only during full load (not during CDC)
20
Q
SCT Extractors (=DW migration)
A
- SCT extractors are migration agents installed locally (or on EC2)
- Extracts data from the source DW in parallel
- Supports encryption
- Data is optimized for Redshift and stored in local files
- Files are then loaded into S3 (using N/W or Snowball Edge)
- Use the COPY command to load data from S3 into Redshift
- If you are using Oracle or SQL Server, you can use DMS to keep
your DBs in sync with Redshift / or target engine
21
Q
Migration playbooks
A
- Series of step-by-step guides published by AWS
- Provide best practices and blueprints for different heterogeneous migrations
- https://aws.amazon.com/dms/resources/
22
Q
Service Substitutions
A
- Service Substitutions are highlighted in playbooks
- SQL server – DB mail to send email based on certain events
- This feature is not available in open source engines
- We can use Lambda functions through Aurora MySQL along with SNS to
emulate the email functionality
- Similar substitutions are possible with queues, file management etc.
- Can use AWS glue to manage ETL pipelines
- SCT can convert your Teradata / Oracle scripts and move them
into AWS Glue by auto-generating necessary python code
23
Q
Monitoring DMS
A
- Can monitor task progress by:
- checking the task status
- using the task’s control table
- or with CloudWatch
- DMS Task Logs and service metrics/statistics are provided by CloudWatch
- Task monitoring tab shows CloudWatch metrics
- Table statistics tab shows statistics for each table
- Can subscribe to event notifications (uses SNS)
- API calls are logged in CloudTrail
24
Q
DMS Task Logs
A
- Certain DMS issues/warnings / error messages appear only in the task log
- e.g. data truncation issues or row rejections due to FK violations are only written to the task log
- Must enable CloudWatch logs while creating replication task
25
DMS Validation
* DMS can validate the migrated data for RDBMS migrations
* Supports partial validation of LOBs
* You enable validation in the DMS task settings
* Tracks the progress of the migration and incrementally validates new
data as it is written to the target (by comparing source and target data)
* Table must have a primary key or unique index for validation to work
* Requires additional time to complete
* Validation is recommended during CDC (but can also be done during
full load)
* SCT extractors do not support validation. Only DMS does
26
Table statistics
* Table statistics tab shows the table state (performance statistics) for
each table that’s being migrated
* Use the command
* describe-table-statistics
to receive the data validation report in JSON format
27
Task Statistics
* If you enable data validation, DMS provides table-level statistics for
the given task
* Indicates the validation state for each table
* You can revalidate tables from Table Statistics tab, if desired
* Validation errors and diagnostic info is written to a table named
awsdms_validation_failures_v1 at the target endpoint
* Example – to troubleshoot validation errors, run this query:
SELECT * FROM awsdms_validation_failures_v1 WHERE
TASK_NAME = 'C89TDNZRYUKH56DR5RGNM'
28
Control tables
* Help with troubleshooting DMS migrations
* Provide useful statistics to help you plan and manage current/future migration tasks
* Can be created when creating the replication task
* Common control tables
29
DMS Security – IAM and Encryption
* Use IAM for manging DMS access and resource permissions
* Can encrypt DMS endpoints using SSL certificates
* Can assign a certificate to an endpoint (via DMS console or API)
* Each endpoint may need different SSL configuration depending
on the DB engine
* Ex. Redshift already uses an SSL connection, does not require DMS SSL
* Oracle SSL requires you to upload Oracle wallet instead of certificate (.pem) files
* Encryption at rest (for storage) uses KMS keys
30
DMS Security - Networking
* DMS replication instance is always created within a VPC
* The DB endpoints must include NACLs / SG config to allow incoming access from the replication instance
* Network configurations for DMS
* Single VPC
* Two VPCs
* On-premises Network to VPC (using DX / VPN / Internet)
* RDS outside VPC (on EC2) to a DB inside VPC (via ClassicLink)
31
DMS Networking – Single VPC
* Simplest network configuration – all components within the same VPC
32
DMS Networking –Two VPCs
* Source and target endpoints in different VPCs
* Create replication instance in one of the VPCs and use VPC peering
* Generally, you’d get better performance by placing primary DMS replication
instance is in the same AZ as the target DB
33
DMS Networking – On-premises to VPC
* Can use either DX or VPN
* Use Internet Gateway if DX or VPN cannot be used
* Using IG = public replication instance in a VPC
34
DMS Networking – RDS outside VPC to VPC
* Use ClassicLink with a proxy server
* Replication instance in the VPC cannot use ClassicLink directly
(hence the need for proxy)
* Port forwarding on the proxy server allows communication between
source and target
35
DMS Pricing
* You only pay for
* replication instances
* additional log storage
* data transf
36
DMS general best practices
* Disable backups and transaction logs during migration
* Carry out validation during CDC (instead of during full-load)
* Use multi
-AZ deployment for replication instances
* Provision appropriate instance resources
* https://docs.aws.amazon.com/dms/latest/userguide/
CHAP_BestPractices.html
37
Minimizing downtime due to migration
* Solution architectures for migration with DMS
* Fallback
* Roll forward / Fall forward
* Dynamic connections
* Dual write
* Minimal downtime / near-zero downtime / zero downtime
* Zero downtime => Full load + CDC
38
Migrating large databases
* Use multiphase migration
* Copy static tables first (before migrating active tables)
* Cleanup old unwanted data to reduce DB size
* Alternatively, use Snowball Edge to move data to S3 and then migrate using DMS
39
Migrating to MySQL / MariaDB on RDS
* From MySQL / MariaDB (on-premises / S3 / EC2)
* Small to medium DBs – use mysqldump / mysqlimport utilities (some downtime)
* One time – Restore from backup (or data dump) stored on S3 (some downtime)
* Ongoing – Configure binlog replication from existing source (minimal downtime)
* From MySQL / MariaDB on RDS
* One time / Ongoing – Promote a read replica to be a standalone
instance
* From any DB
* One time / Ongoing – Use DMS (minimal downtime)
40
Migrating to PostgreSQL on RDS
* From PostgreSQL (on-premises / EC2)
* One time – use pg_dump / pg_restore (some downtime)
* From CSV data stored on S3
* Use aws_s3 PostgreSQL extension and import data using
the aws_s3.table_import_from_s3 function (some downtime)
* From PostgreSQL on RDS (large DBs)
* Use pg_transport extension (streams data, is extremely fast, minimal downtime)
* From any DB
* One time / Ongoing – Use DMS (minimal downtime)
41
Migrating to Oracle on RDS
* For small DBs – use Oracle SQL Developer tool (freeware)
* Perform Database Copy with Oracle SQL Developer
* Supports Oracle and MySQL as source
* For large DBs – use Oracle Data Pump
* Can export and import between Oracle DBs (on-prem / EC2 / RDS)
* Can use S3 to transfer the dump file (use option group with option S3_INTEGRATION)
* Can also transfer using creating a database link between source and target
* From any DB
* One time / Ongoing – Use DMS (minimal downtime)
42
Migrating to SQL Server on RDS
* From SQL Server (on-premises / EC2)
* Use native backup and restore (.bak backup files stored on S3)
* Supports encryption and compression
* Or use SQL Server Management Studio
* Microsoft SQL Server Management Studio (freeware, has three options)
* Generate and Publish Scripts wizard – creates a script with schema and/or data
* Import and Export wizard
* Bulk copy
* From SQL Server (on RDS)
* Restore from a snapshot
* Or use native backup and restore feature
* Or use SQL Server Management Studio
* From any DB
* One time / Ongoing – Use DMS (minimal downtime)
43
Homogenous Migration to Aurora
* MySQL 5.6 compliant DBs (MySQL / MariaDB / Percona)
* Homogenous Migration with Downtime
* Restore from RDS snapshot
* Full load using native DB tools
* Full load using DMS (migrate schema with native tools first)
* Homogenous Migration with Near-Zero Downtime (minimal downtime)
* Restore from RDS snapshot + MySQL binlog replication
* Full load using native tools + MySQL binlog replication
* Full load + CDC using DMS
* Create an Aurora replica from MySQL / PostgreSQL on RDS and promote it to a standalone DB
44
Heterogeneous Migration to Aurora
* Schema migration – use SCT
* Data Migration – use DMS
* For near-zero downtime – use continuous replication using DMS (Full
load + CDC)
45
Migrating to Aurora Serverless
* Can migrate between Aurora provisioned cluster and Aurora Serverless cluster
* Can migrate from RDS to Aurora to Aurora Serverless (not directly)
46
Strategies for Migration to Aurora
* MySQL / PostgreSQL on RDS to Aurora (Homogeneous)
* Restore from a snapshot to Aurora (manual or automated snapshot)
* Replicate to an Aurora read replica and then promote the replica to a standalone DB
* Use DMS
* MySQL on EC2 or MySQL on-premise to Aurora (Homogeneous)
* Restore from backup files stored on S3
* Restore from text files (CSV / XML) stored in S3
* Restore using mysqldump utility
* Use DMS
* PostgreSQL on EC2 or PostgreSQL on-premise to Aurora (Homogeneous)
* Migrate to PostgreSQL to RDS and then migrate to Aurora
* Use DMS
* MariaDB / Oracle / SQL Server to Aurora (Heterogeneous)
* Use DMS and SCT
47
Migrating Redis workloads to ElastiCache
* Two approaches
* Offline migration (using backup)
* Online migration (migrate data from endpoint
48
Offline migration to ElastiCache for Redis
* Create a Redis backup
(.RDB file)
* Create an S3 bucket
* Upload the backup to S3
* Create the target
ElastiCache cluster and choose the option to seed the RDB file from S3
location
* ElastiCache should have read access to the RDB file
49
Online migration to ElastiCache for Redis
* Real-time data migration
* Migrate self-hosted Redis on EC2 to ElastiCache
* Create the target Redis cluster (or choose an existing one)
* Under Actions, choose Migrate Data from Endpoint
* Monitor using the Events section on the ElastiCache console
* Failover to the new DB
* You can decide when to failover
50
Good to know – Online Redis migration * Source cluster
* must have Redis AUTH disabled
* must have “protected-mode” disabled
* “bind” config if present should allow requests from ElastiCache
51
Good to know – Online Redis migration * Target cluster
* must have Cluster-mode disabled
* must have Multi-AZ enabled
* must have Encryption disabled
* must have sufficient memor
52
Migrating to DocumentDB
* Four approaches
* Offline migration * Online migration * Hybrid Approach * Dual write approach
53
Migrating to DocumentDB
* Four approaches
* Offline migration
* Online migration
* Hybrid Approach
* Dual write approach
54
Offline migration to DocumentDB
* Simplest and fastest, but with the longest downtime
* Non-critical / non-production workloads
* Homogeneous – mongodump / mongorestore (BSON)
* Homo / Heterogeneous – mongoexport / mongoimport (JSON / CSV)
* Use DocumentDB index tool to export/import indexes
55
Online migration to DocumentDB
* Medium complexity, slowest, and with minimal downtime
* For production workloads
* Uses DMS
* Migrate indexes using DocumentDB index tool
* DMS does not migrate indexes
56
Hybrid Approach to DocumentDB Migration
* Mix of offline and online migration
* Most complex, faster than online, and with minimal downtime
* For production workloads
* when DB size is large
* or if you don’t have enough network bandwidth
* Phase 1
* Export with mongodump
* Transfer it to AWS if on-premise (Direct Connect / Snowball)
* Migrate indexes using DocumentDB index tool
* Restore to DocumentDB
* Phase 2
* Use DMS in CDC mode
57
Dual write approach for DocumentDB Migration
* Typically used for heterogeneous migrations
* Example – RDBMS to DocumentDB
* Create indexes manually (in case of heterogeneous migration)
58
Streaming use cases for DMS
* Can stream data from source to
target (=ongoing replication)
* Redshift - Stream from OLTP to redshift for analytics
* S3 data lakes – hydrate data lakes
* Stream to Kinesis data streams
* Stream to ElasticSearch Service
* Can use fan-out architecture
* Can also use fan-in architecture