File System Flashcards

1
Q

What is the File System for?

A

File System provides:

  • Large and cheap storage space
  • Non-volatility
  • Sharing information between processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a File System?

A

It is a collection of files + directory structures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are files?

A

A container for storing information.

File abstracts away the different kinds of information into a single struct - devices, pipes, text files, data files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Where are files stored?

A

Generally, they are stored on disk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are the different kinds of representations a file has?

A

Three kinds of files:

  • byte sequence
  • record sequence
  • tree of records
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What kind of types a file has?

A

User files:

  • ASCII
  • Binary

System files

  • Directories
  • Special files
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How many kinds of access-to-file we have?

A

Sequential access:

  • Read all bytes/records from the beginning
  • cannot jump around

Random access

  • bytes/records read in any order
  • All files of modren OSs are random access
  • read/write functions can
    • recieve an offset parameter
    • seek(offset, start_from)
      • start_from = {0,1,2}
        • 0 - beginning
        • 1 - current
        • 2 - end
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the file attributes?

A

General info - user ID, Group ID, dates, times

Location & size - pointer to a device and location on it

Flags that store information for the system

Another special flags - Protection, password..

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the file’s operations?

A

Create ; Delete

Open ; Close

Write ; Read ; Seek

Get attributes

Set attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the directory’s operations?

A

Create entry ; Delete entry (?)

Search for a file

Create/Delete a directory file

List a directory

Rename a file

Link a file to a directory

Traverse a file system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the dot and dotdot directory entries?

What’s the relation between a process and a path?

A

. - the current directory.

.. - the parent directory

Each process has its own working directory, which is shared by its threads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Using which tool files(inodes) are shared?

A

Links

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are soft links and hard links?

A

Soft

  • Containing a path name. access is slower
  • If source is removed the link becomes broken

Hard

  • information about a shared file is duplicated in sharing directories.
  • fast, points to file
  • Link count must be maintained.
  • if source is removed link still accessible
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the difference between shared and exclusive locks?

A

Exclusive - protects updates to file resources and can be owned by only one transaction at a time.

Acquiring an exclusive lock demands waiting if another process is currently holding an exclusive lock or a shared lock of the same resource.

shared - can be owned by several processes at a time.

Acquiring a shared lock demands waiting if another process is currently holding an exclusive lock.

A new request for a shared lock must wait if there’s a request for an exclusive lock on a resource that already has a shared lock.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can i block or unblock a file?

A

Using the command flock(file descriptor, operation)

  • operation:
    • LOCK_SH - place a shared lock.
    • LOCK_EX - place an exclusive lock.
    • LOCK_UN - remove an existing lock held by this process.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What happens to the lock when a file is closed or a process terminates?

A

File lock is removed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Can we lock parts of the file?

A

Yes.

Any part of the file may be blocked.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the concerns of the user/kernel regarding files?

A

User:

  • File names
  • operations allowed
  • location(directory structures)

System:

  • Storage of files and directories
  • Disk space management
  • implementation efficiency and reliability.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Disk allocation:

Advantages/Disadvantages of:

  • Contiguous allocation?
  • Linked list of disk blocks
  • Linked list using in-memory Files Allocation Table
A

Contiguous:

  • Simple; access is fast
  • external fragmentation
    • How much size to allocate at creation time?

Linked list of disk blocks:

  • No fragmentation and easy allocation
  • slow random access
    • n disk accesses to get to the n’th block
      • because the pointer to the next block is held within the block.
  • weird block size

Linked list using in-memory Files Allocation Table

  • none of the above disadvantages.
  • Uses a table to store the pointers of all blocks in the linked list that represent files - last file block has a special EOF symbol.
    • we don’t need allocations to be continguous - segmentation free
    • Can access the nth block of some file using the table and not using the table blocks.
  • A very large table in memory
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is an inode and who holds it?

A

It is a place, on disk, where data about the file is stored.

Each file struct holds inode. (might be two files pointing to the same inode..)

There is a struct called dinode which holds the:

  • File Attributes
    • Time last accessed
    • Time last modified
    • Size
    • File type, protection bits
  • Nlinks
    • Number of directory entries pointing to this i-node.
  • Address of disk block 0
  • Address of disk block 1
  • ..
  • Address of block of pointer
    • Containing additional disk addresses
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the classic Unix disk structure?

A
  • Boot Sector
  • Super Block
    • Number of inodes
    • Number of Blocks
    • Number of free blocks
    • Pointer to free blocks list
    • Pointer to free i-node list
  • i-nodes
  • Data blocks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Unix inodes - how bytes counting work?

A

I didn’t get it. You try.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Tell me in which block byte 355,000 can be found?

A
  • First 10 blocks - 10K:*
  • 0 - 10240
  • Indircet block - 10K + 256K = 266K =*
  • 10240 - 272384
  • double indirect block -*
  • refers to a list of 256K indirect blocks - each of size 256KB.
  • 355,000 = 272384 + X*
  • X = 82616. That is, it is placed within the 0th indirect block, in the 80th block. (starting at byte 81920), at byte number 696*
24
Q

What is the file descriptor table for? who holds it?

A
  • It is for quick access of a proccess to its files.
  • Each process has a file descriptors table
  • One entry per each open file.
25
* Where should we keep the file offset?* * what if two processes want to read from the same file but starting from a different offset?* * What if i want a couple of threads to write to the same file one after the other?*
* we keep the offset within the file struct .* * Two files might refer to the same inode but have different offset, thus no problem for them to read different parts of the data.* * For the mission of writing/reading together **Unix** provided the following solution:* * Parent's file descriptors-table-entries and child's file descriptors-table-entries point to the **same** file struct, thus holding the same offset and flags.* * Unrelated processes point to **different** files thus have different offsets.*
26
Can a user access the i-node of a file?
**No.** Each process has its own *file descriptors table* that points to the entries in the **kernel's** *open files description table.* *The kernel's open files description table entries point to the i-node of the file.*
27
What does the *open* call do?
It adds an entry to both the *kernel's open file descriptions* and the *process'* file description table.
28
*How is a directory implemented?*
*A simple directory(like in MS-DOS):* * *fixed size entries* * *disk addresses & attributes in diretory entry* *Directory entries simply point to i-nodes.*
29
There are two ways of handling long file names: * *In-line* * *In a heap* what does *in-line* means?
*In-line* means it is included in the file. That is, there is no pointer referring to some outsource which holds the content, it is "physically" here.
30
How **Unix** *directories* are implemented?
Implemented using struct *dirent* which holds : * *name* * *inode array*
31
*What are the Pros/Cons with having large blocks?*
*_Pros:_* * *Typically a multiple number of sectors* * *Because its size requires division* * *In sequential access, less blocks to read/write - less seek/search.* *_Cons:_* * *High internal fragmentatios* * *In Random access larger transfer time, larger memory buffers.*
32
What are the Pros/Cons with having small blocks?
*_Pros:_* * Smaller internal fragmentatios * Faster random access *_Cons:_* * Slower sequential access(*more seeks*)
33
What are the disk access time parameters?
* A - Average seek-time * Avg time for head to get above a cylinder. * B - Average time to get to track block. * C - Rotation time * Time for disk to complete full rotation * D - Transfer time * (block size/track size)\*rotation time ## Footnote *Average time to access block is given by:* **A + B + D.**
34
*How can we keep track of free blocks?*
***As a linked list of blocks*** * *Adresses of the first n free blocks are stored at the _super block._* * *First n-1 blocks are free to be assigned.* * *Last block contains address of a block which contains n more addresses to free blocks..* * Addresses of many free blocks are retrieved with one disk access.* * Unix maintains a single block of free-block addresses in memory.* * Whenever the last free block is reached, the next block of free-blocks is read and used.* ***As a bit map***
35
Didn't get it.. you try
36
***File system consistency*** * *we count the number of references in free lists* * *we count the number of references in inodes* *_what results are troubling?_* *_How is consistency mainted?_*
* *Both counts 0 - block is missing* * *Add to free list* * *More than once in free list* * *delete all references but one* * *More than once in files - **Trouble*** * *two or more different files write/read from the same place. which to delete? problem.* * *In both file and free list* * *Delete from free list.* ***_Hard links may solve the problem_*** * *count - number of references to each i-node which is found by _descending_ down the file system tree.* * *Compare count with the link-count field in the i-node struct.* * *If different, correct link-count field.*
37
What are the hazards if: * *Link-count \>* actual links number * *Link-count \<* actual links number
_*Link-count \>* actual links number_ * This file will not be removed. * Even after removing all actual links - there's a fake one which tell the kernel to avoid removal. * A waste of memory. _*Link-count \<* actual links number_ * The file will be removed even though there's a file pointing to it. It is a much greater problem
38
How does Unix enhance performance?
**Using *filename* caching**
39
What is the **Buffer Cache?** Who maintains that?
It is a pool of internal **data buffers** - the buffer cache. The *kernel* maintains that. _Important notes_ * Data *written to disk* is cached, for later use. * Algorithms instruct the buffer to _delay-write._ * If data is not found in cache, it is read from disk to cache. * Algoritms instruct the buffer to _pre-cache._
40
* What are the buffer's states?* * How buffers are implemented?*
*Buffers are categorized into:* * *busy- currently being used.* * *Clean - available for use, sync with block content on disk.* * *Free - empty and haven't been used yet.* * *Dirty - needs to be moved to write list.* * Each has a header, that includes the pair .* * The Buffer Cache is implemented using **LRU list:*** * *Buffers are on a doubly-linked list in LRU order.* * *Each hash-queue entry points to a linked list of buffers that have the same hash value.* * *A block may be in only **one** hash-queue.* * *A **free block** is on the free-list(maintained by the kernel, remember?) _in addition_ to being on a hash-queue.* * *When looking for a particular block, the hash-queue for it is searched. When in need of a new block , it is removed from the free list.*
41
What happens if the block is: 1. Found in its *hash queue* and is *free* 2. Found in its *hash queue* and is *busy* 3. Not found in the *hash queue* and there are free buffers 4. Not found in the hash queue and in searching the free list for a free buffer, one or more "*delayed-write"* buffers are found. 5. Not found in the *hash queue* and free list is empty.
1. *Found in its hash queue and is free​* * *_​buffer is marked busy_* * *_buffer is removed from free list_* 2. *Found in its has queue and is busy* * *_​process sleeps until buffer is freed, then **recheak**._* 3. *Not found in the hash queue and there are free buffers* * *_A free buffer is allocated from the pool_* 4. *Not found in the hash queue and in searching the free list for a free buffer, one or more "delayed-write" buffers are found.* * *_write delayed-write buffer(s) to disk_* * *_move them to the most recently used side of the list and find a free buffer_* 5. *Not found in the hash queue and free list is empty.* * *_Block requesting process, when scheduled, go through hash-queue again._*
42
Process B looks up for block **b** to find that its buffer is locked. He waits until the buffer of block B is available. When is returned, it needs to search again in the hash table for block *b.* *_Why?_*
Because while waiting some other process **C** might have gotten the buffer, might have loaded it with another block **c.**
43
Why not pure LRU?
* Some blocks should be written as quickly as possible. * Insert critical blocks at the head of the queue ,to be replaced soon and written to disk. * Have a system daemon that calls *sync* every 30 seconds, to help in updating blocks. * Some blocks are likely to be used again. (partly filled blocks being written) * Insert to the end to stay longer in the cache.
44
What is special about the ***NTFS***?
***NTFS:*** * A journaling file system - a file system that keeps track of changed not yet commited to the file system's main part by *_recording_* the intentions of such changes. * Each file is represented by a record in a special file called the *master file table(MFT)*
45
*MFT - Master File Table* * *what does it hold?* * *does its records are of the same size for process?* * *Can it save file's data?* * *Is there any limit of file size?* * *Where does the MFT address lie?*
MFT - *Master File Table* * Each file/directory has one or more **1K records** in the *MFT.* * A record contains file attributes and a list of block numbers. * Larger files need more then one MFT record for the list of blocks - records are **extended** by pointing to other records. * Disk blocks are described by sequence of records, each of which is a series of *runs.* * *​A run* is a continguous sequence of blocks and represented by a pair: (offset, length) * Case the file is very small - data can be kept *directly.* * The boot sector contains the MFT address * NTFS tries to allocate blocks continguosuly * That is why *run is* a contiguous sequence of blocks. * No upper bound on file size
46
Is it possible that a short file uses more MFT records than a longer file?
**Yes.** **Assume files A,B where B is a longer.** A - most of its blocks are not sequential. * A few *runs* structs. * Might have a few MFT records. B - all of his blocks are sequential. * Only one *run* struct is needed. * Only one MFT record is needed.
47
*How is **NTFS**(new technology file system) implemented with directories?*
*With small directories:* * *Standard info* * *A directory entry contains the MFT index for the file, length of the file's name, file's name, and various fields and flags.* *With large directories:* * *Organized as B+ trees* * *Supports transparent file compression* * *Compresses in groups of 16 blocks.* * *Can select to compress whole volume, specific directories or files.*
48
Is compression good or bad for performance? Disk throughput - increased/decreased? * CPU works harder/lighter? * As function of compression unit size - RAM access slowed down or increased?
**Disk throughput is increased** * *because size is smaller.* **CPU works harder** * Because it needs to compress/decompress **Access to RAM slowed down**
49
*_DFS - Distributed File Systems_* * What is it? * what is *mount?* * remote file access can be "*stateful"* or *"stateless".* explain. * What is it used for?
*_DFS - Distributed File Systems_* * A collection of interconnected machines that **do not share memory or a clock.** * We can use *file naming scheme.* (**{hostname:path}**) * This method is not **transparent**. * we want the kernel to hide details * This can be solved using _mount_ * _​_*mounting* is a process by which the OS makes files and directories on a storage device available for users to access via the computer's file system. * A server exports part of its file system * We don't want clients to be exposed to the server's data.. * *stateless protocol -* server remembers nothing about previous requests from a client * *stateful protocol -* more efficient(information kept in server's kernel) * _can be different machines and different OSs_ * _Any machine can be both client and server_ * _Clients access directories by *mounting*_ * *_remote mounts are invisible_* * *_​guess it means A request B that request C.._* * *_​Servers specify their exported directories upon boot, listed in /etc/exports_* * *_File sharing: accessing a file in a directory mounted by other (sharing) clients._*
50
NFS(v3) protocols * How mounting requests work? * What *file handles* contain? * What is the file operations protocol?
NFS(v3) protocols * *send*: {host-name;target-directory *path}, get*: *file handle* * *File handle contains:* * *​File system type* * *Disk ID* * *i-node number* * *Protection information* * *​File operations - directory and file access:* * *​lookup* provides a *file handle* * *reads* and *writes* have all the needed information by using file handles - offsets are absolute. * No *open* or *close* calls. * Server **does not** keep open files tables. * Files cannot be locked
51
*Remote Procedure Calls (RPC)* * How does it activate code on a remote machine? * using send/receive protocol * Proccess on one machine calls a procedure on another machine. What must be maintained? * - synchronous(!) - *blocking* send * Problems? * What is the general scheme?
*Remote Procedure Calls (RPC)* * How does it activate code on a remote machine? * using send/receive protocol * Proccess on one machine calls a procedure on another machine. What must be maintained? * synchronous(!) - *blocking* send * Problems? * Different address spaces * Parameters and results have to be passed - by they refer to addresses the client/server don't have * Machines can crash... * general scheme * A client *stub* and a server *stub* * A stub is a piece of code that converts parameters passed between *client* and *sever* * Server *stub* blocked on recieve. * Client *stub* blocks for returning message
52
What is ***NFS*?**
*NFS* is a distributed file system protocol allowing a user on a client computer to access files over a _computer network_ much like local storage is accessed.
53
How ***NFS*** is implemented?
***_NFS implemenation:_*** * Client & Server have: * Virtual File System layer. * Keeps *v-nodes* for *open-files.* * *v-node* = {*i-node* | *r-node*} * *NFS* module * Used for remote requests * Client creates *r-node* in its internal tables and stores the *file handle.*
54
_What are the steps in NFS mounting?_ _What are the steps in opening a remote file?_
*_NFS mounting_* * mount program invoked manually by admin * input: {remote target | local dir pathname} * remote target = {remote server name | dir path name} * *_local dir is where the file will stay in the **client's** VFS_* * server sends *file handle* representing directory * kernel construct a *v-node* for remote directory * Client(NFS) creates r-node for remote directory _Opening a remote file_ * Client kernel parses a path name to reach desired directory * Find a pointer to *r-node* in *v-node* * Asks *NFS* client to open the file * *NFS* clients look at the last name of the path name and gets back a *file-handle.* * *NFS* client creates an *r-node* entry for the open file, stores the *file-handle* in it and the VFS creates a *v-node*, pointing to the *r-node* * Calling process is given a file descriptor in return, pointing to the *v-node.* * Any subsequent calls this file descriptor will be traced by the VFS to the *r-node* and the suitable *read/write* operations will be performed.
55
_What is an obsolete value?_ _What is read-ahead?_
_obsolete value:_ *A value which is not in use anymore.* _read-ahead_ *Reading data in advance so that it is immedietly available when requested.*
56
*_NFS: what are the performance issues?_*
*_NFS: performance issues_* * Data sent in 8KB chunks * not necesarilly efficient * *Read-ahead* might not be coherent. * Client caching is important for efficiecy * If the policy is **not** *write-through NFS* exposed to problems with coherency and data loss * Cached block discarded * Data block after 3 seconds * Directory block after 30 seconds * Every 30 seconds all dirty cached blocks are written. * Check with the server whenever a cached file is opened * if not in used discard from cache.