Big Data BDS-C00 Flashcards
(263 cards)
An organization is developing a mobile social application and needs to collect logs from all devices on which it is installed. The organization is evaluating the Amazon Kinesis Data Streams to push logs and Amazon EMR to process data. They want to store data on HDFS using the default replication factor to replicate data among the cluster, but they are concerned about the durability of the data. Currently, they are producing 300 GB of raw data daily, with additional spikes during special events. They will need to scale out the Amazon EMR cluster to match the increase in streamed data.
Which solution prevents data loss and matches compute demand?
A. Use multiple Amazon EBS volumes on Amazon EMR to store processed data and scale out the Amazon EMR cluster as needed.
B. Use the EMR File System and Amazon S3 to store processed data and scale out the Amazon EMR cluster as needed.
C. Use Amazon DynamoDB to store processed data and scale out the Amazon EMR cluster as needed.
D. Use Amazon Kinesis Data Firehose and, instead of using Amazon EMR, stream logs directly into Amazon Elasticsearch Service.
D. Use Amazon Kinesis Data Firehose and, instead of using Amazon EMR, stream logs directly into Amazon Elasticsearch Service.
A user is running a webserver on EC2. The user wants to receive the SMS when the EC2 instance utilization is above the threshold limit.
Which AWS services should the user configure in this case?
A. AWS CloudWatch + AWS SES
B. AWS CloudWatch + AWS SNS
C. AWS CloudWatch + AWS SQS
D. AWS EC2 + AWS CloudWatch
B. AWS CloudWatch + AWS SNS
It is advised that you watch the Amazon CloudWatch “_____” metric (available via the AWS Management Console or Amazon Cloud Watch APIs) carefully and recreate the Read Replica should it fall behind due to replication errors.
A. Write Lag
B. Read Replica
C. Replica Lag
D. Single Replica
C. Replica Lag
You have been asked to use your department’s existing continuous integration (CI) tool to test a three- tier web architecture defined in an AWS CloudFormation template. The tool already supports AWS APIs and can launch new AWS CloudFormation stacks after polling version control. The CI tool reports on the success of the AWS CloudFormation stack creation by using the DescribeStacks API to look for the CREATE_COMPLETE status.
The architecture tiers defined in the template consist of:
. One load balancer
. Five Amazon EC2 instances running the web application
. One multi-AZ Amazon RDS instance How would you implement this?
Choose 2 answers
A. Define a WaitCondition and a WaitConditionhandle for the output of an output of a UserData command that does sanity checking of the application’s post-install state
B. Define a CustomResource and write a script that runs architecture-level integration tests through the load balancer to the application and database for the state of multiple tiers
C. Define a WaitCondition and use a WaitConditionHandle that leverages the AWS SDK to run the DescribeStacks API call until the CREATE_COMPLETE status is returned
D. Define a CustomResource that leverages the AWS SDK to run the DescribeStacks API call until the CREATE_COMPLETE status is returned
E. Define a UserDataHandle for the output of a UserData command that does sanity checking of the application’s post-install state and runs integration tests on the state of multiple tiers through load balancer to the application
F. Define a UserDataHandle for the output of a CustomResource that does sanity checking of the application’s post-install state
A. Define a WaitCondition and a WaitConditionhandle for the output of an output of a UserData command that does sanity checking of the application’s post-install state
F. Define a UserDataHandle for the output of a CustomResource that does sanity checking of the application’s post-install state
By default what are ENIs that are automatically created and attached to instances using the EC2 console set to do when the attached instance terminates?
A. Remain as is
B. Terminate
C. Hibernate
D. Pause
B. Terminate
Without _____, you must either create multiple AWS accounts-each with its own billing and subscriptions to AWS products-or your employees must share the security credentials of a single AWS account.
A. Amazon RDS
B. Amazon Glacier
C. Amazon EMR
D. Amazon IAM
D. Amazon IAM
The project you are working on currently uses a single AWS CloudFormation template to deploy its AWS infrastructure, which supports a multi-tier web application. You have been tasked with organizing the AWS CloudFormation resources so that they can be maintained in the future, and so that different departments such as Networking and Security can review the architecture before it goes to Production.
How should you do this in a way that accommodates each department, using their existing workflows?
A. Organize the AWS CloudFormation template so that related resources are next to each other in the template, such as VPC subnets and routing rules for Networking and Security groups and IAM information for Security
B. Separate the AWS CloudFormation template into a nested structure that has individual templates for the resources that are to be governed by different departments, and use the outputs from the networking and security stacks for the application template that you control
C. Organize the AWS CloudFormation template so that related resources are next to each other in the template for each department’s use, leverage your existing continuous integration tool to constantly deploy changes from all parties to the Production environment, and then run tests for validation
D. Use a custom application and the AWS SDK to replicate the resources defined in the current AWS CloudFormation template, and use the existing code review system to allow other departments to approve changes before altering the application for future deployments
B. Separate the AWS CloudFormation template into a nested structure that has individual templates for the resources that are to be governed by different departments, and use the outputs from the networking and security stacks for the application template that you control
An administrator is processing events in near real-time using Kinesis streams and Lambda. Lambda intermittently fails to process batches from one of the shards due to a 5 –minute time limit.
What is a possible solution for this problem?
A. Add more Lambda functions to improve concurrent batch processing
B. Reduce the batch size that lambda is reading from the stream
C. Ignore and skip events that are older than 5 minutes and put them to Dead Letter Queue (DLQ)
D. Configure Lambda to read from fewer shards in parallel
D. Configure Lambda to read from fewer shards in parallel
Fill in the blanks: A\_\_\_\_\_ is a storage device that moves data in sequences of bytes or bits (blocks). Hint: These devices support random access and generally use buffered I/O. A. block map B. storage block C. mapping device D. block device
D. block device
What does Amazon EBS stand for? A. Elastic Block Storage B. Elastic Business Server C. Elastic Blade Server D. Elastic Block Store
D. Elastic Block Store
You have an ASP.NET web application running in Amazon Elastic BeanStalk. Your next version of the application requires a third-party Windows installer package to be installed on the instance on first boot and before the application launches.
Which options are possible? Choose 2 answers
A. In the application’s Global.asax file, run msiexec.exe to install the package using Process.Start() in the Application_Start event handler
B. In the source bundle’s .ebextensions folder, create a file with a .config extension. In the file, under the “packages” section and “msi” package manager, include the package’s URL
C. Launch a new Amazon EC2 instance from the AMI used by the environment. Log into the instance, install the package and run sysprep. Create a new AMI. Configure the environment to use the new AMI
D. In the environment’s configuration, edit the instances configuration and add the package’s URL to the “Packages” section
E. In the source bundle’s .ebextensions folder, create a “Packages” folder. Place the package in the folder
B. In the source bundle’s .ebextensions folder, create a file with a .config extension. In the file, under the “packages” section and “msi” package manager, include the package’s URL
C. Launch a new Amazon EC2 instance from the AMI used by the environment. Log into the instance, install the package and run sysprep. Create a new AMI.
A gas company needs to monitor gas pressure in their pipelines. Pressure data is streamed from sensors placed throughout the pipelines to monitor the data in real time. When an anomaly is detected, the system must send a notification to open valve. An Amazon Kinesis stream collects the data from the sensors and an anomaly Kinesis stream triggers an AWS Lambda function to open the appropriate valve.
Which solution is the MOST cost-effective for responding to anomalies in real time?
A. Attach a Kinesis Firehose to the stream and persist the sensor data in an Amazon S3 bucket. Schedule an AWS Lambda function to run a query in Amazon Athena against the data in Amazon S3 to identify anomalies. When a change is detected, the Lambda function sends a message to the anomaly stream to open the valve.
B. Launch an Amazon EMR cluster that uses Spark Streaming to connect to the Kinesis stream and Spark machine learning to detect anomalies. When a change is detected, the Spark application sends a message to the anomaly stream to open the valve.
C. Launch a fleet of Amazon EC2 instances with a Kinesis Client Library application that consumes the stream and aggregates sensor data over time to identify anomalies. When an anomaly is detected, the application sends a message to the anomaly stream to open the valve.
D. Create a Kinesis Analytics application by using the RANDOM_CUT_FOREST function to detect an anomaly. When the anomaly score that is returned from the function is outside of an acceptable range, a message is sent to the anomaly stream to open the valve.
A. Attach a Kinesis Firehose to the stream and persist the sensor data in an Amazon S3 bucket. Schedule an AWS Lambda function to run a query in Amazon Athena against the data in Amazon S3 to identify anomalies. When a change is detected, the Lambda function sends a message to the anomaly stream to open the valve.
An Amazon Redshift Database is encrypted using KMS. A data engineer needs to use the AWS CLI to create a KMS encrypted snapshot of the database in another AWS region.
Which three steps should the data engineer take to accomplish this task? (Select Three.)
A. Create a new KMS key in the destination region
B. Copy the existing KMS key to the destination region
C. Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key created in the destination region
D. Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key from the source region
E. In the source, enable cross-region replication and specify the name of the copy grant created
F. In the destination region, enable cross-region replication and specify the name of the copy grant created
A. Create a new KMS key in the destination region
D. Use CreateSnapshotCopyGrant to allow Amazon Redshift to use the KMS key from the source region
F. In the destination region, enable cross-region replication and specify the name of the copy grant created
You have been tasked with implementing an automated data backup solution for your application servers that run on Amazon EC2 with Amazon EBS volumes. You want to use a distributed data store for your backups to avoid single points of failure and to increase the durability of the data. Daily backups should be retained for 30 days so that you can restore data within an hour.
How can you implement this through a script that a scheduling deamon runs daily on the application servers?
A. Write the script to call the ec2-create-volume API, tag the Amazon EBS volume with the current data time group, and copy backup data to a second Amazon EBS volume. Use the ec2-describe- volumes API to enumerate existing backup volumes. Call the ec2-delete-volume API to prune backup volumes that are tagged with a date-time group older than 30 days
B. Write the script to call the Amazon Glacier upload archive API, and tag the backup archive with the current date-time group. Use the list vaults API to enumerate existing backup archives. Call the delete vault API to prune backup archives that are tagged with a date-time group older than
30 days
C. Write the script to call the ec2-create-snapshot API, and tag the Amazon EBS snapshot with the current date-time group. Use the ec2-describe-snapshot API to enumerate existing Amazon EBS snapshots. Call the ec2-delete-snapshot API to prune Amazon EBs snapshots that are tagged with a date-time group older than 30 days
D. Write the script to call the ec2-create-volume API, tag the Amazon EBS volume with the current date-time group, and use the ec2-copy-snapshot API to backup data to the new Amazon EBS volume. Use the ec2-describe-snapshot API to enumerate existing backup volumes. Call the ec2- delete-snapshot API to prune backup Amazon EBS volumes that are tagged with a date-time group older than 30 days
C. Write the script to call the ec2-create-snapshot API, and tag the Amazon EBS snapshot with the current date-time group. Use the ec2-describe-snapshot API to enumerate existing Amazon EBS snapshots. Call the ec2-delete-snapshot API to prune Amazon EBs snapshots that are tagged with a date-time group older than 30 days
An enterprise customer is migrating to Redshift and is considering using dense storage nodes in its Redshift cluster. The customer wants to migrate 50 TB of data. The customer’s query patterns involve performing many joins with thousands of rows. The customer needs to know how many nodes are needed in its target Redshift cluster. The customer has a limited budget and needs to avoid performing tests unless absolutely needed.
Which approach should this customer use?
A. Start with many small nodes
B. Start with fewer large nodes
C. Have two separate clusters with a mix of small and large nodes
D. Insist on performing multiple tests to determine the optimal configuration
A. Start with many small nodes
An organization is using Amazon Kinesis Data Streams to collect data generated from thousands of temperature devices and is using AWS Lambda to process the data. Devices generate 10 to 12 million records every day, but Lambda is processing only around 450 thousand records. Amazon CloudWatch indicates that throttling on Lambda is not occurring.
What should be done to ensure that all data is processed? (Choose two.)
A. Increase the BatchSize value on the EventSource, and increase the memory allocated to the Lambda function.
B. Decrease the BatchSize value on the EventSource, and increase the memory allocated to the Lambda function.
C. Create multiple Lambda functions that will consume the same Amazon Kinesis stream.
D. Increase the number of vCores allocated for the Lambda function.
E. Increase the number of shards on the Amazon Kinesis stream.
A. Increase the BatchSize value on the EventSource, and increase the memory allocated to the Lambda function.
E. Increase the number of shards on the Amazon Kinesis stream.
A company needs to monitor the read and write IOPs metrics for their AWS MySQL RDS instances and send real-time alerts to their operations team. Which AWS services can accomplish this?
Choose 2 answers
A. Amazon Simple Email Service B. Amazon CloudWatch C. Amazon Simple Queue Service D. Amazon Route 53 E. Amazon Simple Notification Service
B. Amazon CloudWatch
E. Amazon Simple Notification Service
When should I choose Provisioned IOPS over Standard RDS storage?
A. If you use production online transaction processing (OLTP) workloads.
B. If you have batch-oriented workloads
C. If you have workloads that are not sensitive to consistent performance
A. If you use production online transaction processing (OLTP) workloads.
A customer needs to determine the optimal distribution strategy for the ORDERS fact table in its Redshift schema. The ORDERS table has foreign key relationships with multiple dimension tables in this schema.
How should the company determine the most appropriate distribution key for the ORDRES table?
A. Identity the largest and most frequently joined dimension table and ensure that it and the ORDERS table both have EVEN distribution
B. Identify the target dimension table and designate the key of this dimension table as the distribution key of the ORDERS table
C. Identity the smallest dimension table and designate the key of this dimension table as the distribution key of ORDERS table
D. Identify the largest and most frequently joined dimension table and designate the key of this dimension table as the distribution key for the orders table
D. Identify the largest and most frequently joined dimension table and designate the key of this dimension table as the distribution key for the orders table
In the 'Detailed' monitoring data available for your Amazon EBS volumes, Provisioned IOPS volumes automatically send \_\_\_\_\_ minute metrics to Amazon CloudWatch. A. 5 B. 2 C. 1 D. 3
C. 1
A medical record filing system for a government medical fund is using an Amazon S3 bucket to archive documents related to patients. Every patient visit to a physician creates a new file, which can add up to millions of files each month. Collection of these files from each physician is handled via a batch process that runs every night using AWS Data Pipeline. This is sensitive data, so the data and any associated metadata must be encrypted at rest.
Auditors review some files on a quarterly basis to see whether the records are maintained according to regulations. Auditors must be able to locate any physical file in the S3 bucket or a given data, patient, or physician. Auditors spend a signification amount of time locating such files.
What is the most cost-and time-efficient collection methodology in this situation?
A. Use Amazon kinesis to get the data feeds directly from physician, batch them using a Spark application on Amazon Elastic MapReduce (EMR) and then store them in Amazon S3 with folders separated per physician.
B. Use Amazon API Gateway to get the data feeds directly from physicians, batch them using a Spark application on Amazon Elastic MapReduce (EMR), and then store them in Amazon S3 with folders separated per physician.
C. Use Amazon S3 event notifications to populate an Amazon DynamoDB table with metadata about every file loaded to Amazon S3, and partition them based on the month and year of the file.
D. Use Amazon S3 event notifications to populate and Amazon Redshift table with metadata about every file loaded to Amazon S3, and partition them based on the month and year of the file
.
A. Use Amazon kinesis to get the data feeds directly from physician, batch them using a Spark application on Amazon Elastic MapReduce (EMR) and then store them in Amazon S3 with folders separated per physician.
You have written a server-side Node.js application and a web application with an HTML/JavaScript front end that uses the Angular.js Framework. The server-side application connects to an Amazon Redshift cluster, issue queries, and then returns the results to the front end for display. Your user base is very large and distributed, but it is important to keep the cost of running this application low.
Which deployment strategy is both technically valid and the most cost-effective?
A. Deploy an AWS Elastic Beanstalk application with two environments: one for the Node.js application and another for the web front end. Launch an Amazon Redshift cluster, and point your application to its Java Database connectivity (JDBC) endpoint
B. Deploy an AWS OpsWorks stack with three layers: a static web server layer for your front end, a Node.js app server layer for your server-side application, and a Redshift DB layer Amazon Redshift cluster
C. Upload the HTML, CSS, images, and JavaScript for the front end to an Amazon Simple Storage Service
(S3) bucket. Create an Amazon CloudFront distribution with this bucket as its origin. Use AWS Elastic Beanstalk to deploy the Node.js application. Launch an Amazon Redshift cluster, and point your application to its JDBC endpoint
D. Upload the HTML, CSS, images, and JavaScript for the front end, plus the Node.js code for the server-side application, to an Amazon S3 bucket. Create a CloudFront distribution with this bucket as its origin. Launch an Amazon Redshift cluster, and point your application to its JDBC endpoint
E. Upload the HTML, CSS, images, and JavaScript for the front end to an Amazon S3 bucket. Use AWS Elastic Beanstalk to deploy the Node.js application. Launch an Amazon Redshift cluster, and point your application to its JDBC endpoint
C. Upload the HTML, CSS, images, and JavaScript for the front end to an Amazon Simple Storage Service
(S3) bucket. Create an Amazon CloudFront distribution with this bucket as its origin. Use AWS Elastic Beanstalk to deploy the Node.js application. Launch an Amazon Redshift cluster, and point your application to its JDBC endpoint
What is one key difference between an Amazon EBS-backed and an instance-store backed instance?
A. Amazon EBS-backed instances can be stopped and restarted
B. Instance-store backed instances can be stopped and restarted
C. Auto scaling requires using Amazon EBS-backed instances
D. Virtual Private Cloud requires EBS backed instances
A. Amazon EBS-backed instances can be stopped and restarted
You are configuring your company’s application to use Auto Scaling and need to move user state information.
Which of the following AWS services provides a shared data store with durability and low latency?
A. Amazon Simple Storage Service
B. Amazon DynamoDB
C. Amazon EC2 instance storage
D. AWS ElasticCache Memcached
A. Amazon Simple Storage Service