Database Specialty - Database Migration, DMS and SCT Flashcards

1
Q

Why database migration? (Use cases)

A
  • Move from legacy to cloud
  • Switch the database solution
  • Replicate databases across regions
  • Replicate to streaming platforms
  • In place DB upgrades
  • Archive data to S3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Migration scenarios

A
  • On-premise à AWS cloud
  • Relational à Non-relational
  • DBs hosted on EC2 à Managed AWS services
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Migration tools

A
  • Native tools (e.g. mysqldump)
  • SCT (Schema Conversion Tool)
  • AWS DMS (Database Migration Service)
  • Migration Playbooks (migration templates + best practices)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

DMS – Database Migration Service

A
  • Quickly and securely migrate databases to AWS cloud, resilient, self-healing
  • The source database remains available during the migration
  • Minimal to zero-downtime
  • Costs as low as $3 / TB
  • Supports homogeneous and heterogeneous migrations
  • Continuous data replication using CDC (Change data capture)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

DMS architecture

A
  • Endpoints include DB connection and credentials information
  • Replication instance is an EC2 instance running in a VPC, can be Multi-AZ
  • Replication task specifies actual data tables to migrate and data transformation rules
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

DMS – Database Migration Service > Points

A
  • DMS can create target tables / primary keys if they don’t exist on the target
  • Tables/keys can also be created manually
  • Creates only those objects required that are required for efficient migration (e.g. primary keys / unique indexes etc.) –This is called as Basic Schema Copy
  • Remaining DB elements/schema can be migrated using other tools (e.g. secondary indexes, FK constraints, or data defaults)
    • e.g. use SCT or native schema export tools
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

SCT – AWS Schema Conversion Tool

A
  • Converts DB/DW schema from source to target (including procedures /
    views / secondary indexes / FK and constraints)
  • Mainly for heterogeneous DB migrations and DW migrations
  • For homogeneous migrations, can use it for migration assessment
  • Can be installed on EC2 or locally on your computer (closer to source DB)
  • Application conversion – can convert SQL statements embedded in the
    application code
  • Script conversion – can convert ETL scripts (Oracle / Microsoft / Teradata
    scripts à Aurora / Redshift scripts)
  • Can optimize schemas in Redshift (recommends distribution and sort keys,
    different from native Redshift advisor)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

SCT – AWS Schema Conversion Tool cont

A
  • Converts DB/DW schema from source to target (including procedures /
    views / secondary indexes / FK and constraints)
  • Mainly for heterogeneous DB migrations and DW migrations
  • For homogeneous migrations, can use it for migration assessment
  • Can be installed on EC2 or locally on your computer (closer to source DB)
  • Application conversion – can convert SQL statements embedded in the
    application code
  • Script conversion – can convert ETL scripts (Oracle / Microsoft / Teradata
    scripts à Aurora / Redshift scripts)
  • Can optimize schemas in Redshift (recommends distribution and sort keys,
    different from native Redshift advisor)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

SCT – AWS Schema Conversion Tool Provides

A
  • Provides assessment report
    • Granular report showing which objects can be converted automatically and which
      need manual intervention (with color codes)
    • Can be used to create a business case for migration
    • Identifies issues/limitations / actions for schema conversion
  • For objects that cannot be created manually, SCT
    provides guidance to help you create the equivalent
    schema manually
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Workload Qualification Framework (WQF)

A
  • Standalone app that is included with SCT
  • For workload assessment
  • Qualifies for OLTP workloads based on
    • Ease/complexity of migration
    • Estimated time/effort required
  • Recommends migration strategy and tools
  • Integrated with SCT and DMS
  • Just provide DB connection strings and WQF can generate the assessment reports
  • Available as EC2 AMI
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DMS tasks (replication tasks)

A
  • Lets you specify migration options
    • Table mapping (transformation rules)
    • Filters (to migrate data selectively)
    • Migration type – full load, CDC, or both
  • Supports data validation
  • Monitoring options
    • task status
    • task’s control table
  • Allows reloading table data (in case of errors)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DMS task assessment reports

A
  • For pre-migration assessment (optional, recommended)
  • To run an assessment, a task must be in a Stopped state
  • Lets you know potential issues that might occur during a given migration
  • Includes JSON output containing summary and details of unsupported data
    types
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DMS migration types - Full Load

A
  • Migrate existing data
  • performs a one-time migration from source to target
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

DMS migration types - CDC Only

A
  • Ongoing replication
  • Only replicates data changes from source to target without migrating
    existing data
  • CDC = Change data capture
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

DMS migration types - Full Load + CDC

A
  • Migrate existing data and replicate ongoing changes
  • First, performs a one- time migration from source to target
  • Then, continues replicating data changes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DMS – Good things to know

A
  • Recommended to create only the primary keys before full load
  • Secondary keys and FKs should be created only after full load is complete
  • In full load, multiple tables are loaded in parallel and this can
    create issues if you have FKs (hence create them later)
  • Secondary keys can slow down the full load operation (hence create them later)
  • Enable Multi-AZ for ongoing replication (for high availability and failover support)
  • DMS can read/write from/to encrypted DBs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Migrating Large Tables

A
  • Break the migration into multiple tasks
  • Use row filtering on a key or partition key to create multiple tasks
  • Example – if you have integer primary key from 1 to 8000,000
    • Create 8 tasks using row filtering to migrate 1000,000 records each
  • Example 2 – if you have a date field as primary key
    • Partition the data by month using row filtering
    • Use full load tasks to migrate data of previous months
    • Use full load + CDC to migrate current month data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

DMS Migrating LOBs / CLOBs

A
  • LOB = Large Binary Object
  • CLOB = Character Large Object
  • DMS migrates LOB data in two phases
    • creates a new row in the target table and populates all data except the LOB data
    • updates the row with the LOB data
  • LOB options
    • Don’t include LOB columns – LOB data is ignored
    • Full LOB mode – migrates all LOB data, piecewise in chunks (you provide LOB
      chunk size)
    • Limited LOB mode – truncates each LOB to Max LOB size (is faster)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Best practices for handling LOBs with DMS

A
  • Full LOB mode performance is slowest, but data is not truncated
  • For LOBs < few MBs
    • Use Limited LOB mode with Max LOB size = largest LOB size in your DB
    • Allocate enough memory to the replication instance
  • For LOBs > few MBs
    • Create a separate DMS task with Full LOB mode
    • Ideally, separate task on a new replication instance
    • LOB chunk size should allow DMS to capture most LOBs in as few chunks as possible
  • Inline LOB mode
    • Combines the advantages of both modes (full LOB and limited LOB mode)
    • Migrate without truncating the data or slowing the task’s performance
    • You specify InlineLobMaxSize (Full LOB mode must be set to true)
    • Small LOBs are transferred inline, large LOBs by using source table lookup
    • Supported only during full load (not during CDC)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

SCT Extractors (=DW migration)

A
  • SCT extractors are migration agents installed locally (or on EC2)
  • Extracts data from the source DW in parallel
  • Supports encryption
  • Data is optimized for Redshift and stored in local files
  • Files are then loaded into S3 (using N/W or Snowball Edge)
  • Use the COPY command to load data from S3 into Redshift
  • If you are using Oracle or SQL Server, you can use DMS to keep
    your DBs in sync with Redshift / or target engine
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Migration playbooks

A
  • Series of step-by-step guides published by AWS
  • Provide best practices and blueprints for different heterogeneous migrations
  • https://aws.amazon.com/dms/resources/
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Service Substitutions

A
  • Service Substitutions are highlighted in playbooks
  • SQL server – DB mail to send email based on certain events
    • This feature is not available in open source engines
    • We can use Lambda functions through Aurora MySQL along with SNS to
      emulate the email functionality
  • Similar substitutions are possible with queues, file management etc.
  • Can use AWS glue to manage ETL pipelines
  • SCT can convert your Teradata / Oracle scripts and move them
    into AWS Glue by auto-generating necessary python code
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Monitoring DMS

A
  • Can monitor task progress by:
    • checking the task status
    • using the task’s control table
    • or with CloudWatch
  • DMS Task Logs and service metrics/statistics are provided by CloudWatch
  • Task monitoring tab shows CloudWatch metrics
  • Table statistics tab shows statistics for each table
  • Can subscribe to event notifications (uses SNS)
  • API calls are logged in CloudTrail
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

DMS Task Logs

A
  • Certain DMS issues/warnings / error messages appear only in the task log
    • e.g. data truncation issues or row rejections due to FK violations are only written to the task log
  • Must enable CloudWatch logs while creating replication task
25
Q

DMS Validation

A
  • DMS can validate the migrated data for RDBMS migrations
  • Supports partial validation of LOBs
  • You enable validation in the DMS task settings
  • Tracks the progress of the migration and incrementally validates new
    data as it is written to the target (by comparing source and target data)
  • Table must have a primary key or unique index for validation to work
  • Requires additional time to complete
  • Validation is recommended during CDC (but can also be done during
    full load)
  • SCT extractors do not support validation. Only DMS does
26
Q

Table statistics

A
  • Table statistics tab shows the table state (performance statistics) for
    each table that’s being migrated
  • Use the command
    • describe-table-statistics
      to receive the data validation report in JSON format
27
Q

Task Statistics

A
  • If you enable data validation, DMS provides table-level statistics for
    the given task
  • Indicates the validation state for each table
  • You can revalidate tables from Table Statistics tab, if desired
  • Validation errors and diagnostic info is written to a table named
    awsdms_validation_failures_v1 at the target endpoint
  • Example – to troubleshoot validation errors, run this query:

SELECT * FROM awsdms_validation_failures_v1 WHERE
TASK_NAME = ‘C89TDNZRYUKH56DR5RGNM’

28
Q

Control tables

A
  • Help with troubleshooting DMS migrations
  • Provide useful statistics to help you plan and manage current/future migration tasks
  • Can be created when creating the replication task
  • Common control tables
29
Q

DMS Security – IAM and Encryption

A
  • Use IAM for manging DMS access and resource permissions
  • Can encrypt DMS endpoints using SSL certificates
    • Can assign a certificate to an endpoint (via DMS console or API)
    • Each endpoint may need different SSL configuration depending
      on the DB engine
    • Ex. Redshift already uses an SSL connection, does not require DMS SSL
    • Oracle SSL requires you to upload Oracle wallet instead of certificate (.pem) files
  • Encryption at rest (for storage) uses KMS keys
30
Q

DMS Security - Networking

A
  • DMS replication instance is always created within a VPC
  • The DB endpoints must include NACLs / SG config to allow incoming access from the replication instance
  • Network configurations for DMS
    • Single VPC
    • Two VPCs
    • On-premises Network to VPC (using DX / VPN / Internet)
    • RDS outside VPC (on EC2) to a DB inside VPC (via ClassicLink)
31
Q

DMS Networking – Single VPC

A
  • Simplest network configuration – all components within the same VPC
32
Q

DMS Networking –Two VPCs

A
  • Source and target endpoints in different VPCs
  • Create replication instance in one of the VPCs and use VPC peering
  • Generally, you’d get better performance by placing primary DMS replication
    instance is in the same AZ as the target DB
33
Q

DMS Networking – On-premises to VPC

A
  • Can use either DX or VPN
  • Use Internet Gateway if DX or VPN cannot be used
  • Using IG = public replication instance in a VPC
34
Q

DMS Networking – RDS outside VPC to VPC

A
  • Use ClassicLink with a proxy server
  • Replication instance in the VPC cannot use ClassicLink directly
    (hence the need for proxy)
  • Port forwarding on the proxy server allows communication between
    source and target
35
Q

DMS Pricing

A
  • You only pay for
  • replication instances
  • additional log storage
  • data transf
36
Q

DMS general best practices

A
  • Disable backups and transaction logs during migration
  • Carry out validation during CDC (instead of during full-load)
  • Use multi
    -AZ deployment for replication instances
  • Provision appropriate instance resources
  • https://docs.aws.amazon.com/dms/latest/userguide/
    CHAP_BestPractices.html
37
Q

Minimizing downtime due to migration

A
  • Solution architectures for migration with DMS
  • Fallback
  • Roll forward / Fall forward
  • Dynamic connections
  • Dual write
  • Minimal downtime / near-zero downtime / zero downtime
  • Zero downtime => Full load + CDC
38
Q

Migrating large databases

A
  • Use multiphase migration
  • Copy static tables first (before migrating active tables)
  • Cleanup old unwanted data to reduce DB size
  • Alternatively, use Snowball Edge to move data to S3 and then migrate using DMS
39
Q

Migrating to MySQL / MariaDB on RDS

A
  • From MySQL / MariaDB (on-premises / S3 / EC2)
    • Small to medium DBs – use mysqldump / mysqlimport utilities (some downtime)
    • One time – Restore from backup (or data dump) stored on S3 (some downtime)
    • Ongoing – Configure binlog replication from existing source (minimal downtime)
  • From MySQL / MariaDB on RDS
    • One time / Ongoing – Promote a read replica to be a standalone
      instance
  • From any DB
    • One time / Ongoing – Use DMS (minimal downtime)
40
Q

Migrating to PostgreSQL on RDS

A
  • From PostgreSQL (on-premises / EC2)
    • One time – use pg_dump / pg_restore (some downtime)
  • From CSV data stored on S3
    • Use aws_s3 PostgreSQL extension and import data using
      the aws_s3.table_import_from_s3 function (some downtime)
  • From PostgreSQL on RDS (large DBs)
    • Use pg_transport extension (streams data, is extremely fast, minimal downtime)
  • From any DB
    • One time / Ongoing – Use DMS (minimal downtime)
41
Q

Migrating to Oracle on RDS

A
  • For small DBs – use Oracle SQL Developer tool (freeware)
    • Perform Database Copy with Oracle SQL Developer
    • Supports Oracle and MySQL as source
  • For large DBs – use Oracle Data Pump
    • Can export and import between Oracle DBs (on-prem / EC2 / RDS)
    • Can use S3 to transfer the dump file (use option group with option S3_INTEGRATION)
    • Can also transfer using creating a database link between source and target
  • From any DB
    • One time / Ongoing – Use DMS (minimal downtime)
42
Q

Migrating to SQL Server on RDS

A
  • From SQL Server (on-premises / EC2)
    • Use native backup and restore (.bak backup files stored on S3)
    • Supports encryption and compression
    • Or use SQL Server Management Studio
  • Microsoft SQL Server Management Studio (freeware, has three options)
    • Generate and Publish Scripts wizard – creates a script with schema and/or data
    • Import and Export wizard
    • Bulk copy
  • From SQL Server (on RDS)
    • Restore from a snapshot
    • Or use native backup and restore feature
    • Or use SQL Server Management Studio
  • From any DB
    • One time / Ongoing – Use DMS (minimal downtime)
43
Q

Homogenous Migration to Aurora

A
  • MySQL 5.6 compliant DBs (MySQL / MariaDB / Percona)
  • Homogenous Migration with Downtime
    • Restore from RDS snapshot
    • Full load using native DB tools
    • Full load using DMS (migrate schema with native tools first)
  • Homogenous Migration with Near-Zero Downtime (minimal downtime)
    • Restore from RDS snapshot + MySQL binlog replication
    • Full load using native tools + MySQL binlog replication
    • Full load + CDC using DMS
    • Create an Aurora replica from MySQL / PostgreSQL on RDS and promote it to a standalone DB
44
Q

Heterogeneous Migration to Aurora

A
  • Schema migration – use SCT
  • Data Migration – use DMS
  • For near-zero downtime – use continuous replication using DMS (Full
    load + CDC)
45
Q

Migrating to Aurora Serverless

A
  • Can migrate between Aurora provisioned cluster and Aurora Serverless cluster
  • Can migrate from RDS to Aurora to Aurora Serverless (not directly)
46
Q

Strategies for Migration to Aurora

A
  • MySQL / PostgreSQL on RDS to Aurora (Homogeneous)
    • Restore from a snapshot to Aurora (manual or automated snapshot)
    • Replicate to an Aurora read replica and then promote the replica to a standalone DB
    • Use DMS
  • MySQL on EC2 or MySQL on-premise to Aurora (Homogeneous)
    • Restore from backup files stored on S3
    • Restore from text files (CSV / XML) stored in S3
    • Restore using mysqldump utility
    • Use DMS
  • PostgreSQL on EC2 or PostgreSQL on-premise to Aurora (Homogeneous)
    • Migrate to PostgreSQL to RDS and then migrate to Aurora
    • Use DMS
  • MariaDB / Oracle / SQL Server to Aurora (Heterogeneous)
    • Use DMS and SCT
47
Q

Migrating Redis workloads to ElastiCache

A
  • Two approaches
    • Offline migration (using backup)
    • Online migration (migrate data from endpoint
48
Q

Offline migration to ElastiCache for Redis

A
  • Create a Redis backup
    (.RDB file)
  • Create an S3 bucket
  • Upload the backup to S3
  • Create the target
    ElastiCache cluster and choose the option to seed the RDB file from S3
    location
  • ElastiCache should have read access to the RDB file
49
Q

Online migration to ElastiCache for Redis

A
  • Real-time data migration
  • Migrate self-hosted Redis on EC2 to ElastiCache
  • Create the target Redis cluster (or choose an existing one)
  • Under Actions, choose Migrate Data from Endpoint
  • Monitor using the Events section on the ElastiCache console
  • Failover to the new DB
  • You can decide when to failover
50
Q

Good to know – Online Redis migration * Source cluster

A
  • must have Redis AUTH disabled
  • must have “protected-mode” disabled
  • “bind” config if present should allow requests from ElastiCache
51
Q

Good to know – Online Redis migration * Target cluster

A
  • must have Cluster-mode disabled
  • must have Multi-AZ enabled
  • must have Encryption disabled
  • must have sufficient memor
52
Q

Migrating to DocumentDB

A
  • Four approaches
  • Offline migration * Online migration * Hybrid Approach * Dual write approach
53
Q

Migrating to DocumentDB

A
  • Four approaches
  • Offline migration
  • Online migration
  • Hybrid Approach
  • Dual write approach
54
Q

Offline migration to DocumentDB

A
  • Simplest and fastest, but with the longest downtime
  • Non-critical / non-production workloads
  • Homogeneous – mongodump / mongorestore (BSON)
  • Homo / Heterogeneous – mongoexport / mongoimport (JSON / CSV)
  • Use DocumentDB index tool to export/import indexes
55
Q

Online migration to DocumentDB

A
  • Medium complexity, slowest, and with minimal downtime
  • For production workloads
  • Uses DMS
  • Migrate indexes using DocumentDB index tool
  • DMS does not migrate indexes
56
Q

Hybrid Approach to DocumentDB Migration

A
  • Mix of offline and online migration
  • Most complex, faster than online, and with minimal downtime
  • For production workloads
  • when DB size is large
  • or if you don’t have enough network bandwidth
  • Phase 1
  • Export with mongodump
  • Transfer it to AWS if on-premise (Direct Connect / Snowball)
  • Migrate indexes using DocumentDB index tool
  • Restore to DocumentDB
  • Phase 2
  • Use DMS in CDC mode
57
Q

Dual write approach for DocumentDB Migration

A
  • Typically used for heterogeneous migrations
  • Example – RDBMS to DocumentDB
  • Create indexes manually (in case of heterogeneous migration)
58
Q

Streaming use cases for DMS

A
  • Can stream data from source to
    target (=ongoing replication)
    • Redshift - Stream from OLTP to redshift for analytics
    • S3 data lakes – hydrate data lakes
    • Stream to Kinesis data streams
    • Stream to ElasticSearch Service
  • Can use fan-out architecture
  • Can also use fan-in architecture