Week 5 - practice quiz Flashcards
Which company has created the MapReduce framework as a concept?
1) Amazon
2) Oracle
3) Microsoft
4) Google
4) Google
Which company has implemented Hadoop an an open-source version of MapReduce?
1) Google
2) Amazon
3) Microsoft
4) Yahoo
4) Yahoo
Which of the following is true about the Hadoop file system?
1) Files are append-only
2) Files split in to 1 GB blocks
3) Meta node stores metadata
4) Each node stores distinct data blocks
1) Files are append-only
What does HDFS stand for?
1) Highly Distributed File System
2) Highly Disturbed File System
3) High Definition File System
4) Hadoop File System
4) Hadoop File System
Hadoop Disturbed File System
What is the data type used by Hadoop for a MapReduce process?
1) Column-based
2) Document-based
3) Graph-based
4) Key-value
4) Key-value
What is the output of the Map function in a MapReduce process?
1) List of graph nodes
2) List of key-value pairs.
3) List of table columns
4) List of network nodes
2) List of key-value pairs.
Where do mapper nodes save their outputs before serving to reducer nodes?
1) Local disk
2) Another node
3) Central node
4) Master node
1) Local disk
What does Hadoop do with a task that crashes in a node?
1) The task is retried on another node.
2) The node is rebooted.
3) The task is failed.
4) The node is shut down.
1) The task is retried on another node.
Apache Spark sorts its data processing operations, such as collect, filter, and sort, by building a graph called DAG. What does DAG stand for?
1) Derived Apache Graph
2) Distributed Apache Graph
3) Directed Acyclic Graph
4) Distributed Asymmetric Graph
3) Directed Acyclic Graph
Which of the following statements about the difference between Hadoop and Spark is true?
1) Hadoop supports in-memory cluster computing.
2) Hadoop is faster than Spark.
3) Both Hadoop and Spark can load data from Hadoop File System (HDFS)
4) Hadoop provides multiple built-in data processing operations such as filter and join.
3) Both Hadoop and Spark can load data from Hadoop File System (HDFS)
What is the input for the Reduce function in a MapReduce process?
1) Keys and their corresponding list of values.
2) Keys and their corresponding maps.
3) Keys and their corresponding nodes.
4) Maps and their corresponding values.
1) Keys and their corresponding list of values.
What is the output of the Reduce function in a MapReduce process?
1) List of key-value pairs
2) List of key-node pairs.
3) List of key-reducer pairs.
4) List of key-mapper pairs.
1) List of key-value pairs
Which of the following is the correct sequence of phases in a MapReduce process?
1) Input, Splitting, Shuffling, Mapping, Reducing, Output
2) Input, Splitting, Mapping, Reducing, Shuffling, Output
3) Input, Splitting, Mapping, Shuffling, Reducing, Output
4) Input, Mapping, Splitting, Shuffling, Reducing,
3) Input, Splitting, Mapping, Shuffling, Reducing, Output
What does Hadoop do with a task that repeatedly crashes in a MapReduce system?
1) The task is failed.
2) The task is retried on another system.
3) The system is rebooted.
4) The system is shut down.
1) The task is failed.
What does Hadoop do when a node crashes during a MapReduce process?
1) Ignores all of the maps created on all of the nodes.
2) Ignores all of the maps created on the node crashed.
3) Re-launches any maps the node previously ran.
4) Re-launches any maps all of the nodes previously ran.
3) Re-launches any maps the node previously ran.
Which of the following data operators requires implementation of a reduce function in a MapReduce
1) GROUP BY
2) SELECT
3) PROJECT
4) SORT
1) GROUP BY
What is the output of a JOIN operation in a MapReduce process?
1) Key-column pairs
2) Key-node pairs
3) Key-map pairs
4) Key-value pairs
4) Key-value pairs
What is Apache Spark?
1) A cloud-based spreadsheet software.
2) Interconnected computing nodes.
3) A cluster of server computers.
4) A distributed data-processing software.
4) A distributed data-processing software.
Apache Spark relies on a database concept called RDD. What does RDD stand for?
1) Relational Dynamic Database
2) Recoverable Distributed Database
3) Resilient Distributed Dataset
4) Rigorous Distributed Database
3) Resilient Distributed Dataset
There are two types of RDD operations in Apache Spark: transformation and action. Which of the following is an action operation?
1) Count
2) Map
3) Filter
4) Join
1) Count
Which of the following was written on top of the Apache Spark software?
1) Python
2) GraphX
3) Java
4) Scala
2) GraphX
Which of the following big data software is implemented by Google to rank websites using their popular PageRank algorithm?
1) Oracle
2) MySQL
3) Spark SQL
4) GraphX
4) GraphX
What is the method implemented by Apache Spark to process live streaming data?
1) Real time processing
2) Batch processing
3) Binary processing
4) On-demand processing
2) Batch processing
Which of the following is an example of live streaming data?
1) Student grades submitted by an instructor.
2) An online banking statement for an individual.
3) A Wikipedia article about a historical figure.
4) A Twitter hashtag containing a company name.
4) A Twitter hashtag containing a company name.