4. Hadoop Related Projects Flashcards

1
Q

Hive was originally developed by:
A) Google
B) Facebook
C) Apache Software Foundation
D) IBM

A

B) Facebook

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of the following is NOT a characteristic of Hive?
A) It supports real-time data processing
B) It uses a SQL-like language called HiveQL
C) It is built on top of Hadoop
D) It is used for data warehousing

A

A) It supports real-time data processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the main purpose of Spark?
A) To provide a more efficient alternative to MapReduce
B) To support online transaction processing
C) To manage Hadoop clusters
D) To store large datasets

A

A) To provide a more efficient alternative to MapReduce

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Resilient Distributed Datasets (RDDs) in Spark are:
A) Mutable collections of data items
B) Fault-tolerant and can be operated on in parallel
C) Stored on disk by default
D) Only accessible in Scala

A

B) Fault-tolerant and can be operated on in parallel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which of the following is an advantage of Spark over MapReduce?
A) Spark cannot handle large datasets
B) Spark writes intermediate results to disk
C) Spark can cache intermediate results in memory
D) Spark supports only batch processing

A

C) Spark can cache intermediate results in memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In which language was Spark originally developed?
A) Java
B) Python
C) R
D) Scala

A

D) Scala

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which of the following is a limitation of HiveQL compared to ANSI SQL?
A) It supports “insert into” for existing tables
B) It does not support the equality operator in join predicates
C) It does not support “update” or “delete” operations
D) It is fully ANSI-compliant

A

C) It does not support “update” or “delete” operations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Spark’s ability to cache intermediate results in memory is particularly useful for:
A) Online transaction processing
B) Iterative algorithms
C) Long-term data storage
D) Reducing network traffic

A

B) Iterative algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In Hive, which command is used to load data into a table?
A) INSERT INTO
B) LOAD DATA INPATH
C) UPDATE TABLE
D) SET DATA

A

B) LOAD DATA INPATH

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which of the following is NOT a feature of Spark’s RDDs?
A) They are mutable
B) They are distributed across the cluster
C) They are resilient
D) They can be cached in memory

A

A) They are mutable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which of the following operations is an action in Spark?
A) map()
B) filter()
C) reduce()
D) flatMap()

A

C) reduce()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

HiveQL supports which of the following operations?
A) Real-time processing
B) Transactional updates
C) Ad-hoc querying
D) In-memory computations

A

C) Ad-hoc querying

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In Spark, an RDD can be created from:
A) Only HDFS files
B) Only local files
C) Both HDFS files and local files
D) Neither HDFS files nor local files

A

C) Both HDFS files and local files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which of the following is a limitation of HiveQL?
A) It does not support JOIN operations
B) It cannot handle large datasets
C) It does not support “insert into” for existing tables
D) It requires data to be structured

A

C) It does not support “insert into” for existing tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Spark’s ability to cache data in memory is beneficial for:
A) Long-term data storage
B) Real-time transaction processing
C) Iterative algorithms
D) Disk-based data processing

A

C) Iterative algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which of the following is a transformation in Spark?
A) count()
B) saveAsTextFile()
C) groupByKey()
D) take()

A

C) groupByKey()

17
Q

HiveQL’s “insert overwrite” command:
A) Appends data to an existing table
B) Deletes the existing data before inserting new data
C) Updates existing data with new data
D) Inserts data without affecting existing data

A

B) Deletes the existing data before inserting new data

18
Q

Which of the following accurately describes Spark’s RDD lineage?
A) A record of all actions performed on an RDD
B) A history of transformations applied to an RDD
C) The distribution of an RDD across the cluster
D) The sequence of RDDs created during a Spark job

A

B) A history of transformations applied to an RDD

19
Q

Hive is primarily used for:
A) Online transaction processing
B) Real-time data analysis
C) Data warehousing and batch processing
D) In-memory data processing

A

C) Data warehousing and batch processing

20
Q

The main advantage of using Spark over MapReduce is:
A) Spark’s support for SQL queries
B) Spark’s ability to process real-time data
C) Spark’s faster data processing due to in-memory computation
D) Spark’s compatibility with Hadoop’s HDFS

A

C) Spark’s faster data processing due to in-memory computation

21
Q

Which of the following is NOT a way to create an RDD in Spark?
A) From an existing RDD
B) From a local file system
C) From a remote database
D) From an HDFS file

A

C) From a remote database

22
Q

HiveQL’s support for “join” operations:
A) Is limited to equality joins
B) Includes support for full outer joins
C) Allows for non-equi joins
D) Is not available in Hive

A

A) Is limited to equality joins