Review Quiz - SAS and Hadoop Flashcards
What role does the NameNode play in the Hadoop cluster?
Select one:
a. provides naming conventions used to store data
b. provides developer access to the Hadoop cluster
c. stores metadata about where the data is stored
d. provides table names to sql queries
c. stores metadata about where the data is stored
What is the purpose of the HDFS file system?
Select one:
a. be the access point for developers to use the Hadoop cluster
b. provide scalable and reliable data storage across the DataNodes
c. maintain metadata about the Hadoop ecosystem components
b. provide scalable and reliable data storage across the DataNodes
What is the purpose of MapReduce in the Hadoop ecosystem?
Select one:
a. perform distributed data processing across the DataNodes
b. manage cluster utilization for jobs and applications
c. provide browser access to the Hadoop jobs
a. perform distributed data processing across the DataNodes
Which Hive component contains table definitions that point to HDFS files? Select all that apply.
Select one:
a. Hive Client
b. Hive Server
c. Hive Metastore
d. Hive Driver
c. Hive Metastore
Examine the following Hive database structures.
Which of the following database operations will execute successfully in this Hive structure? Select all that apply.
Select one:
a. create database student;
b. drop database student;
c. create table default.TableB (col1 int);
d. use student; create table dihdm.TableB (col1 int);
d. use student; create table dihdm.TableB (col1 int);
When implementing a data governance solution that leverages Hive tables, which table type would you consider the best?
Select one:
a. managed
b. external
c. temporary
d. hidden
b. external
Which interface would you use to submit a Pig script from the Client Node command prompt?
Select one:
a. Beeline
b. Grunt
c. Beeswax
d. hdfs dfs
b. Grunt
The Grunt shell is the command-line interface for submitting Pig scripts.
Is this a valid multi-line comment?
Select one:
a. Yes
b. No
a. Yes
The – at the beginning of each comment line is valid for multi-line comments.
Will this Pig script execute?
Select one:
a. Yes
b. No
b. No
PigStorage is a case-sensitive keyword.
Will this Pig script execute?
Select one:
a. Yes
b. No
a. Yes
Positional notation, starting at $0, is valid for reading an entire row into a single field.
What is the result of dumping the B relation after you execute the following Pig script?
Select one:
a. (x,y),(x,z)
b. (x,y,x,z)
c. (x,y,z)
c. (x,y,z)
The FLATTEN operator removes the inside parentheses from the row.
The FLATTEN operator enables you to “un-nest” or “transpose” tuples or bags. For tuples, the FLATTEN operator substitutes the fields of a tuple in place of the tuple. For bags, FLATTEN creates new tuples.
When using the SAMPLE keyword in a Pig program, is the data sample ever guaranteed to be the same data?
Select one:
a. Yes
b. No
b. No
The SAMPLE keyword returns a random sample of data.
Which of the following statements are true? Select all that apply.
Select one or more:
a. The results of the SPLIT operator guarantee that every row is assigned a relation.
b. The results of the SPLIT operator guarantee that every row will reside in only one relation.
c. The results of the SPLIT operator do not have to be in two equal relations.
d. The IF keyword is required using the SPLIT operator.
c. The results of the SPLIT operator do not have to be in two equal relations.
d. The IF keyword is required using the SPLIT operator.
The expression used with the required IF keyword determines how relations are split, which might not contain equal rows.
T = LOAD … ;
SPLIT T INTO T1 IF population > 50000,
T2 IF population <= 50000;
STORE T1 INTO ‘output/split1’;
STORE T2 INTO ‘output/split2’;
Which specialized join works well if the entire relation can fit into memory?
Select one:
a. merge
b. bloom
c. skewed
d. replicated
d. Replicated
Replicated joins require that the entire relation fit into memory. If not, the join fails.
In Pig Latin, replicated joins are a special type of join that works well if one or more relations can fit into memory. If the smaller relation cannot fit into memory, the join fails and generates an error. This join supports the inner and left outer join.
Which programming language is most extensively supported when writing Pig user-defined functions?
Select one:
a. Python
b. Ruby
c. Java
d. Groovy
c. Java
Java functions have the most extensive support and are very efficient because they are written in the same language as Pig.
What option do you need in your FILENAME statement to indicate that the fileref is reading from a directory in HDFS?
Select one:
a. dir
b. concat
c. all
d. folder
b. concat
The concat option is used for reading files in an HDFS directory. The dir option is used for writing to an HDFS directory.
filename in hadoop
‘/user/student/test_table’ concat user=’student’;
Does this DATA step read a single HDFS file or a concatenated directory?
Select one:
a. single HDFS file
b. concatenated directory
a. single HDFS file
This is a single HDFS file. The concat option was not used in the Hadoop FILENAME statement.
Which HDFS command is not a command that you can submit using PROC HADOOP?
Select one:
a. MKDIR
b. CHMOD
c. DELETE
d. RMDIR
d. RMDIR
RMDIR is not a valid PROC HADOOP HDFS command. To remove a directory using PROC HADOOP, use the HDFS DELETE command.
Which Hive schema is SAS connected to in this code?
Select one:
a. DIHDM
b. DIAHD
c. DEFAULT
d. none
c. DEFAULT
When a schema is not included in the connection parameters, the schema called DEFAULT is used.
When the CREATE TABLE statement below is executed, which of the following is true?
Select one:
a. A table definition is stored in the Hive metadata.
b. A table definition is stored in the Hive metadata and a data file is created in HDFS.
c. A data file is stored in HDFS in /user/student/test.
d. Data is transferred from /user/student/test into the Hive database.
a. A table definition is stored in the Hive metadata.
When the CREATE TABLE statement below is executed, which of the following is true?
Select one:
a. The data must already exist in the HDFS directory /user/student/test.
b. The data must not yet exist in the HDFS directory /user/student/test.
c. The data can either exist already or be placed there later.
c. The data can either exist already or be placed there later.
When using the SAS/ACCESS LIBNAME to Hadoop method to work with Hive tables, the DATA step and SAS procedures can be used.
Select one:
True
False
True
Although any DATA step or PROC can be used, it will be important to explore where the processing is happening because of how we’ve written our code.
This DS2 DATA program INIT method contains a SET statement that reads orion.banks. If the data set contains three observations, how many times is the SET statement executed?
Select one:
a. 0
b. 1
c. 3
d. cannot be determined
b. 1
The INIT system method automatically executes only once, when the DS2 DATA program first begins execution.
Will this program execute without producing an error?
Select one:
a. Yes
b. No
a. Yes
The PROC DS2 SCOND=NONE option overrides the system DS2SCOND=ERROR.