Data Transport / Replication Flashcards

1
Q

A company uses a legacy on-premises reporting application that operates on gigabytes of .json files and represents years of data. The legacy application cannot handle the growing size of .json files. New .json files are added daily from various data sources to a central on-premises storage location. The company wants to continue to support the legacy application. The company has hired you as a solutions architect to build a solution that can manage ongoing data updates from your on-premises application to Amazon S3.

A

Set up an on-premises file gateway. Configure data sources to write the .json files to the file gateway. Point the legacy analytics application to the file gateway. The file gateway should replicate the .json files to Amazon S3

A file gateway provides a simple solution for presenting one or more Amazon S3 buckets and their objects as a mountable NFS or SMB file share to one or more clients on-premises.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A Hollywood production studio is looking at transferring their existing digital media assets of around 20PB to AWS Cloud in the shortest possible timeframe.

Why Snowmobile?
Why not Snowball?
Why not Direct Connect?

A

You can transfer up to 100PB per Snowmobile

AWS recommends snowball only if you want to transfer greater than 10 TB of data between your on-premises data centers and Amazon S3.

Storage Gateway will not work since the studio’s data centers are in remote locations where internet speed may not optimal, thereby increasing both cost and time for migrating 20TB of data.

Direct Connect connection takes significant cost as well as time to provision. This is not the correct solution since the studio wants the data transfer to be done in the shortest possible time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A Big Data company wants to optimize its daily Extract-Transform-Load (ETL) process that migrates and transforms data from its S3 based data lake to a Redshift cluster. The team wants to manage this daily job in a serverless environment.

Why are Data Pipeline and EMR not good options in the case?

A

AWS Data Pipeline launches compute resources in your account allowing you direct access to the Amazon EC2 instances or Amazon EMR clusters. As this option provides access to the underlying EC2 instances so it’s not a serverless solution.

EMR uses EC2 instances as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A retail company needs a secure connection between its on-premises data center and AWS Cloud. This connection does not need high bandwidth and will handle a small amount of traffic. The company wants a quick turnaround time to set up the connection.

Why not direct connect?
Why not internet gateway?
Why not bastion host?

A

direct connect is expensive and takes time to setup

internet gateway connects you to the internet, not to on-prem network

bastion host connect public ec2 instances to other ec2 instance, not to on-prem network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly