Ch 3 - class 3 Hadoop FIle System Flashcards

(35 cards)

1
Q

what are two componants of hadooop

A

mapreduce and hdfs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

hdfs

A

file system to manage hard drive. on top of file system on hard drive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

command interface

A

use to communicate hdfs and hdd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

communicate to server from hdrive

A

winscp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

file system deal with

A

large files. write once, read many times, high throughput

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

data size?

A

block size. hdfs divided into blocks. 64mb by default, 128mb in practice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

can many files be on same block?

A

YES!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

check status of file system block

A

% hadoop fsck –files -blocks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Namenode

A

Manage filesystem namespace, keep track of blocks, block locations, namespace image

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

cluster

A

name node, datanode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

single point of failure

A

persistand metadata files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

system has 2 namenodes

A

active and standby

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

datanode known as

A

workhorse of the file sytem. store and retreive blocks, report to namenode.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

HDFS high avaiability

A

use pair of namenodes in active-standby configuration.

standby has latest log entreis and up to date block mapping in memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how do you set replication for data node

A

set dfs.replication=3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Psudo-distribted configuration

A

fs. dautlname=hdfs://localhost/

dfs. replication=1

17
Q

where is default filesystem

A

on master computer, namenode

18
Q

where is local filesystem?

A

on the server

19
Q

command to copy from hard drive to hdfs

A

hadoop fs -copyFromLocal input/docs/quangle.txt

hdfs://localhost/user/tom/quangle.txt

20
Q

for checksum

A

use md5 to check file integrity to compare
md5sum used hash function producing a 256 bit hash value, gives a checksum to verify data integrity.
256bit =32bytes=32characters.

21
Q

split of data

A

you want to split the data so it fits on 1 block..

so use block size for split size

22
Q

hdfs

A

just one implemenation of hadoop filesystem, s3 another.

23
Q

2 waits to catch exception

A

try or finally , 2 ways to catch the exception

24
Q

finally

A

regardless of exception or not. thats how diff from catch

it’s a strong method.

25
how to tell if hdfs command or not
you will see hadoop fs not hadoop URLCAT .etc. hadoop URLCAT is java program
26
FileSystemCat
public java hadoop program to handle file stream | bbbbbbbbbbbbbbb
27
complete java program
public class and main method
28
glob characters
regular expressions
29
datanode error - client
ads and packets in ack queue to data queue removeds the failed datanode from the pileline. namenode - arranges under replicated blocks for further replicas. failed datanode- deleted the parital block when the node recovers later on.
30
ack
sent when data is received by datanode, not when written.
31
way to write in paraallel to speed up process of copying data.
distcp is used for copying large amounts of data to and from hadoop filesystems in parallel.
32
RPC
remote procedure call. communication with the nodes.
33
HAR Files
are file archiving facility that packes files into HDFS blocks more efficiently.
34
L= | r=
long, -r Recursive means show all the entries in the subtree
35
-p
P option preserves file attributes(timestamp, ownership, permission, etc)