technology and tools Flashcards

(50 cards)

1
Q

what is Hadoop?

A

open source distributed computing framework

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is Hadoop written in?

A

java

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the 4 main compensates of Hadoop?

A

map reduce
YARN
HDFS
Hadoop common

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what size blocks does a Hadoop distributed file system (hdfs) use?

A

128 mb blocks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

in the HDFS of Hadoop is failure normal?

A

yes as its highly fault tolerable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is the name node?

A

master server

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what does the name node do?

A

holds file system
undertakes file and directory operations
maps blocks to datanodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the data node?

A

a file split into more than one block

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what do data nodes do?

A

read and write requests

reports back to namenode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is bad about HDFS?

A

not good for small reads
not good for many small files
append not amend

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is map reduce?

A

java based programming paradigm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

when to use map reduce?

A

problems that are embarrassingly parallel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what does the Map from map reduce do?

A

Performs a map function on input key-value pairs to generate intermediate key-value pairs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what does the reduce from map reduce do?

A

Performs a reduce function on intermediate key-value groups to generate output key-value pairs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

name a case where you would use map reduce?

A
data mining
spam detection
ad optimisation
index building in search engines 
article clustering for news
statistical machine translation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does YARN stand for?

A

Yet another resource negotiator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does yarn do?

A

Manages and monitors workloads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
What are the main features of yarn?
A shared
B fast
C scalability
D flexibility
E efficiency
A

A
C
D
E

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is pig

A

Data flow language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is hive/hiveQL

A

SQL style query language

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is hbase

A

Column-orientated database

22
Q

What is mahout

A

Machine learning library

23
Q

What is spark

A

In memory processing

24
Q
In Hadoop what are the data ingestion programs 
Flume
Hbase
Sqoop 
Storm
A

Flume
Sqoop
Storm

25
In Hadoop what are the analytic and machine learning programs Spark Giraph Mahout
Giraph | Mahout
26
``` What are the no sql programs on Hadoop Tez Hbase Cassandra Spark ```
Hbase | Cassandra
27
In Hadoop what programs are the engines Spark Storm Tex
Spark | Tez
28
What is zookeeper in hadoop
Cluster and workflow management
29
What does hive do?
Coverts sql queries into java jobs
30
What does hbase allow you to do?
Read/write operations on large datasets and works in real time
31
What does spark do?
Analytic engine for large scale data processing
32
What is different with sparks data sharing?
It’s in memory and not disk
33
What is greenplum
Open source data platform
34
What is postgresql
Rdbms with object oriented features
35
What is MADlib
Open source library for in database analytics
36
In greenplum what is the intersect operation
Rows from all answer sets
37
In greenplum what is the except operation
Rows from first answer set minus rows from second
38
In greenplum what is the union all operation
Rows from all answer sets with repeating rows
39
In greenplum what is the union operation
Rows from all answer sets minus repeating rows
40
In greenplum what is the group by operation
Group results based on one or more specified columns
41
In greenplum what is the group by with union all operation
Add sub totals and grand totals
42
In greenplum what is the roll up operation
Replaces union all
43
In greenplum what is the cube operation
Creates sub totals of all possible combinations
44
In greenplum what is the grouping function
Distinguishes NULL from summary markers
45
In greenplum what is a window function.
Performs a calculation across a set of rows that are related to the current roe
46
In greenplum and window functions what clause should you apply to specify which data window
OVER
47
In greenplum window functions how would you define window partitions
PARTITION BY
48
what does MAD stand for in MADlib?
magnetic agile deep
49
what are the MADlib in-database analytical functions a) regression b) classification c) validation d) text analysis e) descriptive analytics f) clustering and top modelling g) association rule mining
a) regression b) classification c) validation e) descriptive analytics f) clustering and top modelling g) association rule mining
50
what does MADlib do?
creates models without moving data out of DBMS