File Systems Flashcards by Lev Whatever

Give five different path names for the file /etc/passwd. (Hint: Think about the directory
entries “.” and “..”.)

You can go up and down the tree as often as you want using “..”. Some of the many paths are:
/etc/passwd
/./etc/passwd
/././etc/passwd
/./././etc/passwd
/etc/../etc/passwd
/etc/../etc/../etc/passwd
/etc/../etc/../etc/../etc/passwd
/etc/../etc/../etc/../etc/../etc/passwd

How well did you know this?

Not at all

Perfectly

In Windows, when a user double clicks on a file listed by Windows Explorer, a program is
run and given that file as a parameter. List two different ways the operating system could
know which program to run.

The Windows way is to use the file extension. Each extension corresponds to a file type and to
some program that handles that type. Another way is to remember which program created the
file and run that program. The Macintosh works this way.

How well did you know this?

Not at all

Perfectly

In early UNIX systems, executable files (a.out files) began with a very specific magic
number, not one chosen at random. These files began with a header, followed by the text and
data segments. Why do you think a very specific number was chosen for executable files,
whereas other file types had a more-or-less random magic number as the first word?

(TUT10)
These systems loaded the program directly in memory and began executing at word 0, which
was the magic number. To avoid trying to execute the header as code, the magic number was a
BRANCH instruction with a target address just above the header. In this way it was possible to
read the binary file directly into the new process’ address space and run it at 0, without even
knowing how big the header was.

How well did you know this?

Not at all

Perfectly

Is the open() system call in UNIX absolutely essential? What would the consequences be of
not having it?

(TUT10)
To start with, if there were no open, on every read it would be necessary to specify the name of
the file to be opened. The system would then have to fetch the i-node for it, although that could
be cached. One issue that quickly arises is when to flush the i-node back to disk. It could time
out, however. It would be a bit clumsy, but it might work.

How well did you know this?

Not at all

Perfectly

Systems that support sequential files always have an operation to rewind files. Do systems
that support random-access files need this, too?

No. If you want to read the file again, just randomly access byte 0.

How well did you know this?

Not at all

Perfectly

Some operating systems provide a system call rename to give a file a new name. Is there
any difference at all between using this call to rename a file and just copying the file to a new
file with the new name, followed by deleting the old one?

(TUT10)
Yes. The rename call does not change the creation time or the time of last modification, but
creating a new file causes it to get the current time as both the creation time and the time of
last modification. Also, if the disk is nearly full, the copy might fail.

How well did you know this?

Not at all

Perfectly

In some systems it is possible to map part of a file into memory. What restrictions must
such systems impose? How is this partial mapping implemented?

The mapped portion of the file must start at a page boundary and be an integral number of
pages in length. Each mapped page uses the file itself as backing store. Unmapped memory uses
a scratch file or partition as backing store.

How well did you know this?

Not at all

Perfectly

A simple operating system supports only a single directory but allows it to have arbitrarily
many files with arbitrarily long file names. Can something approximating a hierarchical file
system be simulated? How?

(TUT10)
Use file names such as /usr/ast/file. While it looks like a hierarchical path name, it is really just a
single name containing embedded slashes.

How well did you know this?

Not at all

Perfectly

In UNIX and Windows, random access is done by having a special system call that moves
the ‘‘current position’’ pointer associated with a file to a given byte in the file. Propose an
alternative way to do random access without having this system call.

One way is to add an extra parameter to the read system call that tells what address to read
from. In effect, every read then has a potential for doing a seek within the file. The
disadvantages of this scheme are (1) an extra parameter in every read call, and (2) requiring the
user to keep track of where the file pointer is.

How well did you know this?

Not at all

Perfectly

Consider the directory tree of Fig. 4-8 (https://imgur.com/a/U8wMPfK). If /usr/jim is the working directory, what is the
absolute path name for the file whose relative path name is ../ast/x?

The dotdot component moves the search to /usr, so ../ast puts it in /usr/ast. Thus ../ast/x is the
same as /usr/ast/x.

How well did you know this?

Not at all

Perfectly

Contiguous allocation of files leads to disk fragmentation, as mentioned in the text,
because some space in the last disk block will be wasted in files whose length is not an
integral number of blocks. Is this internal fragmentation or external fragmentation? Make an
analogy with something discussed in the previous chapter.

Since the wasted storage is between the allocation units (files), not inside them, this is external
fragmentation. It is precisely analogous to the external fragmentation of main memory that
occurs with a swapping system or a system using pure segmentation.

How well did you know this?

Not at all

Perfectly

Describe the effects of a corrupted data block for a given file for: (a) contiguous, (b) linked,
and (c) indexed (or table based).

(TUT10)
If a data block gets corrupted in a contiguous allocation system, then only this block is affected;
the remainder of the file’s blocks can be read. In the case of linked allocation, the corrupted
block cannot be read; also, location data about all blocks starting from this corrupted block is
lost. In case of indexed allocation, only the corrupted data block is affected.

How well did you know this?

Not at all

Perfectly

One way to use contiguous allocation of the disk and not suffer from holes is to compact
the disk every time a file is removed. Since all files are contiguous, copying a file requires a seek and rotational delay to read the file, followed by the transfer at full speed. Writing the
file back requires the same work. Assuming a seek time of 5 msec, a rotational delay of 4
msec, a transfer rate of 80 MB/sec, and an average file size of 8 KB, how long does it take to
read a file into main memory and then write it back to the disk at a new location? Using these
numbers, how long would it take to compact half of a 16-GB disk?

(FFE18)
It takes 9 msec (5 msec + 4 msec) to start the transfer.
The given transfer rate is 80 MB/sec = 10 × 2²³ bytes per second.
The size of file is 8 KB = 2¹³ bytes.
The time taken to read a file = Size of file / Transfer rate = 2¹³ / 2²³ = 0.0977 msec.
Total time needed copying one file = 2 × 9.0977 = 18.1954 msec (to read and write back file)
To compact 8 GB of storage (half of a 16-GB disk), which is 8 GB / 8 KB = 2²⁰ files, it will take 2²⁰
× 18.1954 msec = 19,079.26 sec = 5.3 hours. Clearly, compacting the disk after every file
removal is not a great idea.

How well did you know this?

Not at all

Perfectly

Does compacting the disk ever make any

sense?

If done right, yes. While compacting, each file should be organized so that all of its blocks are
consecutive, for fast access. Windows has a program that defragments and reorganizes the disk.
Users are encouraged to run it periodically to improve system performance. But given how long
it takes, running once a month might be a good frequency.

How well did you know this?

Not at all

Perfectly

Some digital consumer devices need to store data, for example as files. Name a modern
device that requires file storage and for which contiguous allocation would be a fine idea.

A digital still camera records some number of photographs in sequence on a nonvolatile storage
medium (e.g., flash memory). When the camera is reset, the medium is emptied. Thereafter,
pictures are recorded one at a time in sequence until the medium is full, at which time they are
uploaded to a hard disk. For this application, a contiguous file system inside the camera (e.g., on
the picture storage medium) is ideal.

How well did you know this?

Not at all

Perfectly

Consider the i-node shown in Fig. 4-13 (https://imgur.com/a/gxnY2FF). If it contains 10 direct addresses and these were 8
bytes each and all disk blocks were 1024 KB, what would the largest possible file be?

The indirect block can hold 128 disk addresses. Together with the 10 direct disk addresses, the
maximum file has 138 blocks. Since each block is 1 KB, the largest file is 138 KB.

How well did you know this?

Not at all

Perfectly

For a given class, the student records are stored in a file. The records are randomly
accessed and updated. Assume that each student’s record is of fixed size. Which of the three
allocation schemes (contiguous, linked and table/indexed) will be most appropriate?

Study These Flashcards

(TUT10)
For random access, table/indexed and contiguous will be both appropriate, while linked
allocation is not as it typically requires multiple disk reads for a given record.

Consider a file whose size varies between 4 KB and 4 MB during its lifetime. Which of the
three allocation schemes (contiguous, linked and table/indexed) will be most appropriate?

Study These Flashcards

Since the file size changes a lot, contiguous allocation will be inefficient requiring reallocation of
disk space as the file grows in size and compaction of free blocks as the file shrinks in size. Both
linked and table/indexed allocation will be efficient; between the two, table/indexed allocation
will be more efficient for random-access scenarios.

It has been suggested that efficiency could be improved and disk space saved by storing
the data of a short file within the i-node. For the i-node of Fig. 4-13 (https://imgur.com/a/gxnY2FF), how many bytes of data
could be stored inside the i-node?

Study These Flashcards

(TUT10)
There must be a way to signal that the address-block pointers hold data, rather than pointers. If
there is a bit left over somewhere among the attributes, it can be used. This leaves all nine
pointers for data. If the pointers are k bytes each, the stored file could be up to 9k bytes long. If
no bit is left over among the attributes, the first disk address can hold an invalid address to mark
the following bytes as data rather than pointers. In that case, the maximum file is 8k bytes.

Two computer science students, Carolyn and Elinor, are having a discussion about inodes.
Carolyn maintains that memories have gotten so large and so cheap that when a file is opened, it is simpler and faster just to fetch a new copy of the i-node into the i-node table,
rather than search the entire table to see if it is already there. Elinor disagrees. Who is right?

Study These Flashcards

Elinor is right. Having two copies of the i-node in the table at the same time is a disaster, unless
both are read only. The worst case is when both are being updated simultaneously. When the i-
nodes are written back to the disk, whichever one gets written last will erase the changes made
by the other one, and disk blocks will be lost.

Name one advantage of hard links over symbolic links and one advantage of symbolic links
over hard links.

Study These Flashcards

Hard links do not require any extra disk space, just a counter in the i-node to keep track of how
many there are. Symbolic links need space to store the name of the file pointed to. Symbolic
links can point to files on other machines, even over the Internet. Hard links are restricted to
pointing to files within their own partition.

Explain how hard links and soft links differ with respective to i-node allocations.

Study These Flashcards

A single i-node is pointed to by all directory entries of hard links for a given file. In the case of
soft-links, a new i-node is created for the soft link and this i-node essentially points to the
original file being linked.

Consider a 4-TB disk that uses 4-KB blocks and the free-list method. How many block
addresses can be stored in one block?

Study These Flashcards

Block size = 4 KB = 2¹² bytes
Disk size = 4 TB = 4⁴² bytes
The number of blocks on the disk = 4 TB / 4 KB = 2³⁰.
Thus, each block address can be 32 bits = 4 bytes (the nearest multiple of 8).
Thus, each block can store 2¹² / 4 = 1024 addresses.

Free disk space can be kept track of using a free list or a bitmap. Disk addresses require D
bits. For a disk with B blocks, F of which are free, state the condition under which the free list uses less space than the bitmap. For D having the value 16 bits, express your answer as a
percentage of the disk space that must be free.

Study These Flashcards

The bitmap requires B bits. The free list requires D × F bits. The free list requires fewer bits if
D × F < B. Alternatively, the free list is shorter if F / B < 1 / D, where F / B is the fraction of blocks
free. For 16-bit disk addresses, the free list is shorter if 6.25% or less of the disk is free.

The beginning of a free-space bitmap looks like this after the disk partition is first formatted: 1000 0000 0000 0000 (the first block is used by the root directory). The system always searches for free blocks starting at the lowest-numbered block, so after writing file A, which uses six blocks, the bitmap looks like this: 1111 1110 0000 0000. Show the bitmap after each of the following additional actions: a) File B is written, using five blocks. b) File A is deleted. c) File C is written, using eight blocks. d) File B is deleted.

(TUT10) The beginning of the bitmap looks like: a) After writing file B: 1111 1111 1111 0000 b) After deleting file A: 1000 0001 1111 0000 c) After writing file C: 1111 1111 1111 1100 d) After deleting file B: 1111 1110 0000 1100

What would happen if the bitmap or free list containing the information about free disk blocks was completely lost due to a crash? Is there any way to recover from this disaster, or is it bye-bye disk? Discuss your answers for UNIX and the FAT -16 file system separately.

It is not a serious problem at all. Repair is straightforward; it just takes time. The recovery algorithm is to make a list of all the blocks in all the files and take the complement as the new free list. In UNIX this can be done by scanning all the i-nodes. In the FAT file system, the problem cannot occur because there is no free list. But even if there were, all that would have to be done to recover it is to scan the FAT looking for free entries.

Oliver Owl’s night job at the university computing center is to change the tapes used for overnight data backups. While waiting for each tape to complete, he works on writing his thesis that proves Shakespeare’s plays were written by extraterrestrial visitors. His text processor runs on the system being backed up since that is the only one they have. Is there a problem with this arrangement?

(TUT10) Ollie’s thesis may not be backed up as reliably as he might wish. A backup program may pass over a file that is currently open for writing, as the state of the data in such a file may be indeterminate.

We discussed making incremental dumps in some detail in the text. In Windows it is easy to tell when to dump a file because every file has an archive bit. This bit is missing in UNIX. How do UNIX backup programs know which files to dump?

They must keep track of the time of the last dump in a file on disk. At every dump, an entry is appended to this file. At dump time, the file is read and the time of the last entry noted. Any file changed since that time is dumped.

Suppose that file 21 in Fig. 4-25 (https://imgur.com/a/XfpMVtt) was not modified since the last dump. In what way would the four bitmaps of Fig. 4-26 be different?

In (a) and (b), 21 would not be marked. In (c), there would be no change. In (d), 21 would not be marked.

It has been suggested that the first part of each UNIX file be kept in the same disk block as its i-node. What good would this do?

Many UNIX files are short. If the entire file fits in the same block as the i-node, only one disk access would be needed to read the file, instead of two, as is presently the case. Even for longer files there would be a gain, since one fewer disk accesses would be needed.

Consider Fig. 4-27 (https://imgur.com/a/D8GMbIg). Is it possible that for some particular block number the counters in both lists have the value 2? How should this problem be corrected?

It should not happen, but due to a bug somewhere it could happen. It means that some block occurs in two files and also twice in the free list. The first step in repairing the error is to remove both copies from the free list. Next a free block has to be acquired and the contents of the sick block copied there. Finally, the occurrence of the block in one of the files should be changed to refer to the newly acquired copy of the block. At this point the system is once again consistent.

The performance of a file system depends upon the cache hit rate (fraction of blocks found in the cache). If it takes 1 msec to satisfy a request from the cache, but 40 msec to satisfy a request if a disk read is needed, give a formula for the mean time required to satisfy a request if the hit rate is h. Plot this function for values of h varying from 0 to 1.0.

The time needed is h + 40 × (1 − h). The plot is just a straight line.

For an external USB hard drive attached to a computer, which is more suitable: a write- through cache or a block cache?

In this case, it is better to use a write-through cache since it writes data to the hard drive while also updating the cache. This will ensure that the updated file is always on the external hard drive even if the user accidentally removes the hard drive before disk sync is completed.

Consider an application where students’ records are stored in a file. The application takes a student ID as input and subsequently reads, updates, and writes the corresponding student record; this is repeated till the application quits. Would the "block read-ahead" technique be useful here?

The block read-ahead technique reads blocks sequentially, ahead of their use, in order to improve performance. In this application, the records will likely not be accessed sequentially since the user can input any student ID at a given instant. Thus, the read-ahead technique will not be very useful in this scenario.

Consider a disk that has 10 data blocks starting from block 14 through 23. Let there be 2 files on the disk: f1 and f2. The directory structure lists that the first data blocks of f1 and f2 are respectively 22 and 16. Given the FAT table entries as below, what are the data blocks allotted to f1 and f2? (14, 18); (15, 17); (16, 23); (17, 21); (18, 20); (19, 15); (20, −1); (21, −1); (22, 19); (23, 14). In the above notation, (x, y) indicates that the value stored in table entry x points to data block y.

The blocks allotted to f1 are: 22, 19, 15, 17, 21. | The blocks allotted to f2 are: 16, 23, 14, 18, 20.

Consider the idea behind Fig. 4-21 (https://imgur.com/a/hMnQYjG), but now for a disk with a mean seek time of 6 msec, a rotational rate of 15,000 rpm, and 1,048,576 bytes per track. What are the data rates for block sizes of 1 KB, 2 KB, and 4 KB, respectively?

At 15,000 rpm, the disk takes 4 msec to go around once. The average access time (in msec) to read k bytes is then 6 + 2 + (k / 1,048,576) × 4. For blocks of 1 KB, 2 KB, and 4 KB, the access times are about 6.0039 msec, 6.0078 msec, and 6.0156 msec, respectively (hardly any different). These give rates of about 170.556 KB/sec, 340.890 KB/sec, and 680.896 KB/sec, respectively.

A certain file system uses 4-KB disk blocks. The median file size is 1 KB. If all files were exactly 1 KB, what fraction of the disk space would be wasted? Do you think the wastage for a real file system will be higher than this number or lower than it? Explain your answer.

If all files were 1 KB, then each 4-KB block would contain one file and 3 KB of wasted space. Trying to put two files in a block is not allowed because the unit used to keep track of data is the block, not the semi-block. This leads to 75% wasted space. In practice, every file system has large files as well as many small ones, and these files use the disk much more efficiently. For example, a 32,769-byte file would use 9 disk blocks for storage, given a space efficiency of 32,769/36,864, which is about 89%.

Given a disk-block size of 4 KB and block-pointer address value of 4 bytes, what is the largest file size (in bytes) that can be accessed using 10 direct addresses and one indirect block?

The indirect block can hold 1024 addresses. Added to the 10 direct addresses, there are 1034 addresses in all. Since each one points to a 4-KB disk block, the largest file is 4,235,264 bytes.

Files in MS-DOS have to compete for space in the FAT -16 table in memory. If one file uses k entries, that is k entries that are not available to any other file, what constrain does this place on the total length of all files combined?

It constrains the sum of all the file lengths to being no larger than the disk. This is not a very serious constraint. If the files were collectively larger than the disk, there would be no place to store all of them on the disk.

A UNIX file system has 4-KB blocks and 4-byte disk addresses. What is the maximum file size if i-nodes contain 10 direct entries, and one single, double, and triple indirect entry each?

Each block is 4 KB = 2¹² bytes One disk address is 4 bytes = 2² bytes In one block there are 2¹² / 2² = 2¹⁰ pointers Maximum size of the file = 4 KB × (10 + 2¹⁰ + 2²⁰ + 2³⁰) = 4 KB × 1,074,791,434 ≈ 4 TB

How many disk operations are needed to fetch the i-node for a file with the path name /usr/ast/courses/os/handout.t? Assume that the i-node for the root directory is in memory, but nothing else along the path is in memory. Also assume that all directories fit in one disk block.

``` (TUT10) The following disk reads are needed: directory for / i-node for /usr directory for /usr i-node for /usr/ast directory for /usr/ast i-node for /usr/ast/courses directory for /usr/ast/courses i-node for /usr/ast/courses/os directory for /usr/ast/courses/os i-node for /usr/ast/courses/os/handout.t In total, 10 disk reads are required. ```

In many UNIX systems, the i-nodes are kept at the start of the disk. An alternative design is to allocate an i-node when a file is created and put the i-node at the start of the first block of the file. Discuss the pros and cons of this alternative.

Some pros are as follows. First, no disk space is wasted on unused i-nodes. Second, it is not possible to run out of i-nodes. Third, less disk movement is needed since the i-node and the initial data can be read in one operation. Some cons are as follows. First, directory entries will now need a 32-bit disk address instead of a 16-bit i-node number. Second, an entire disk will be used even for files which contain no data (empty files, device files). Third, file-system integrity checks will be slower because of the need to read an entire block for each i-node and because i- nodes will be scattered all over the disk. Fourth, files whose size has been carefully designed to fit the block size will no longer fit the block size due to the i-node, messing up performance.

File Systems Flashcards

(42 cards)