Ch 3 - class 3 Hadoop FIle System Flashcards
(35 cards)
what are two componants of hadooop
mapreduce and hdfs
hdfs
file system to manage hard drive. on top of file system on hard drive
command interface
use to communicate hdfs and hdd
communicate to server from hdrive
winscp
file system deal with
large files. write once, read many times, high throughput
data size?
block size. hdfs divided into blocks. 64mb by default, 128mb in practice.
can many files be on same block?
YES!
check status of file system block
% hadoop fsck –files -blocks
Namenode
Manage filesystem namespace, keep track of blocks, block locations, namespace image
cluster
name node, datanode
single point of failure
persistand metadata files
system has 2 namenodes
active and standby
datanode known as
workhorse of the file sytem. store and retreive blocks, report to namenode.
HDFS high avaiability
use pair of namenodes in active-standby configuration.
standby has latest log entreis and up to date block mapping in memory
how do you set replication for data node
set dfs.replication=3
Psudo-distribted configuration
fs. dautlname=hdfs://localhost/
dfs. replication=1
where is default filesystem
on master computer, namenode
where is local filesystem?
on the server
command to copy from hard drive to hdfs
hadoop fs -copyFromLocal input/docs/quangle.txt
hdfs://localhost/user/tom/quangle.txt
for checksum
use md5 to check file integrity to compare
md5sum used hash function producing a 256 bit hash value, gives a checksum to verify data integrity.
256bit =32bytes=32characters.
split of data
you want to split the data so it fits on 1 block..
so use block size for split size
hdfs
just one implemenation of hadoop filesystem, s3 another.
2 waits to catch exception
try or finally , 2 ways to catch the exception
finally
regardless of exception or not. thats how diff from catch
it’s a strong method.