Organisation and Structure of Data Flashcards

1
Q

What does the MOD function do?

A

the MOD function returns the remainder

e.g. 10 MOD 3 = 1 (3+3=9, remainder 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the DIV function do?

A

the DIV function returns the amount of times one number goes into another

e.g. 10 DIV 3 = 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a transaction file and what does it do?

A

A transaction file is a temporary file.

It collects data over a short period of time (e.g. a month). At the end of the timescale, the data from the transaction file is copied to the master file, often using batch processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a master file?

A

at the end of a transaction files timescale, the data from it is copied over to the master file (batch processing)

Master files will store all of the data required to perform batch processing operations such as generating bills etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What happens to both a master and transaction file before performing the update?

A

The transaction file will be sorted in the same primary key order as the master file

This is to speed up the update as both files only have to be read through once to perform the update

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are fixed length records?

A

Where all records are exactly the same length

When using fixed-length records, you need to make the record length equal to the length of the longest possible record.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is an advantage of a fixed-length record?

A

They are faster to process since the exact position of the record is known.

This is because all records are the same length, therefore you are able to jump to the desired record.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are Variable-Length records?

A

Where the length of each record varies according to the data it holds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an advantage of variable-length records?

A

Using variable-length records usually enables you to save disk space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a serial file?

A

A serial file is where records are stored one after the other, in no particular order (records are added to the end of a file)

Can be used with either tape or disk storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the main use of a serial file?

A

A transaction file

e.g. recording sales in a shop “clocking in”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an advantage of a serial file?

A

Adding records is very straightforward and fast

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the disadvantages of a serial file?

A

Very slow to search

This is because as the records are stored one after the other in no order there is no choice there is no other option other than to search one-by-one to find an item. This is a serial search.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a sequential file?

A

A sequential file had records stored one after the other in order of a key field (e.g. employee ID)

can be used with either tape of disk storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a sequential file often used as?

A

A sequential file is often used as a master file to allow records to be searched via key field.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the advantages of a sequential file?

A

If performing a batch update, it can be quicker to perform the update if the master file and also the file being updated from are in sequential order.

When searching for a record that doesn’t exist, it will be quicker to spot that it isn’t there.

Easier to program than methods such as indexed sequential.

17
Q

What is the disadvantage of a sequential file?

A

Can be slow to add records as they need to be added in the correct place

18
Q

What is the process of adding a record to a serial file?

A

You open the file

Record is added to the end of the file

You close the file

19
Q

What is the process of adding records to a sequential file?

A
  1. First, a new file is created
  2. Each record is copied from the old file to the new file until a record with primary key greater than the new record being added is found;
  3. The new record is being added to the file;
  4. All the remaining records are copied over to the new file.

(lesson 2 slide 8 for diagram)

20
Q

What is the process of deleting a record from a serial or sequential file?

A
  1. A new file is created
  2. Every record is copied from the old file to the new file apart from the record(s) to be deleted

(lesson 2 slide 9 for diagram)

21
Q

What is a block?

A

A block is where a track and a sector intersect.

A computer reads and writes blocks of data at a time, it is the smallest unit of transfer.

22
Q

What does a disk contain?

A

thousands of concentric circles called tracks and a number of sectors.

23
Q

What does a block contain?

A

each disk block could hold several records

24
Q

What is an indexed sequential file?

A

An indexed sequential file is a sequential file so therefore has all advantages associated:

-able to update quickly when modifying large numbers of records in key field order.

However it also contains an index file:

-allows records to be directly searched for very quickly without having to read through all other records in the list.
(similar to a book which can be read through but has an index to skip to the desired page)

considered the ‘jack of all trades’, having the properties of a sequential file but also allowing very quick direct access via the index to go straight to the record. However it is not as fast as Random (Direct) Access files.

Indexed sequential files will not work on tape storage as there is no way to directly go to a record without reading through all the previous records.

The indexes are set up so that each block will only be partially filled with records so that there is room for new records to be added in future.

25
Q

What is overflow?

A

If there is not enough space in the block where the record should be written, then this is known as an ‘overflow’.

26
Q

How do we deal with overflow?

A

There are a number of blocks set aside in an indexed sequential file known as the “overflow area”.

When an overflow occurs:

  • the record is written into the overflow area
  • a pointer is left in the block where it was supposed to be stored
  • to indicate in which block in the overflow area it can be found
27
Q

What are the problems associated with overflow?

A

The more records added to the overflow, the time taken to search for records increases

This is because:

  1. The index is searched which indicates the block where the record should be
  2. The block is located (requiring disk read/write heads to move)
  3. If there is a tag/pointer, it now needs to find the lock in the overflow area. Adding an extra step.
28
Q

How do we reorganise Indexed sequential files?

A

As records are also deleted from files, there may now be room for some of the overflowed records in their home blocks. Therefore we can:

periodically (nightly, weekly etc.) a file reorganisation program is run. This reads each record sequentially through the file, writing it back into its correct home location.

29
Q

How do you delete records in an indexed sequential file?

A
  1. A record to be deleted is searched for using the index and then marked as deleted (to speed up the process)
  2. New records added will be able to overwrite the deleted records

A special character (“e.g. $”) denotes that the record can be overwritten.

30
Q

What is a multi-level index?

A

In a multi-level index there is one main index.

The entries in this index refer to more indexes. These in turn refer to more indexes and so on.

(example of 3 level index on slide 13 and 14 lesson 3)

31
Q

What do Random (Direct) Access files do?

A

When a file is searched using a key field, using a random access file provides the fastest method.

It uses a hashing algorithm to generate a disk address from the key field of the record being searched for.

e.g. John Smith hashed generates disk address 873

32
Q

What is the most commonly used hashing algorithm?

A

The most commonly used hashing algorithm is known as the division/remainder method.

Disk Address = (Key_Field MOD Maximum_No_Records) + 1

33
Q

What are the signs of a good hashing algorithm?

A
  • Executes quickly

- Distributes keys equitably

34
Q

What is a collision?

A

A collision is when two keys have the same hash value. So when the element to be inserted hashes out to be stored in an array position, it is already occupied.

35
Q

Name the two ways you can handle collisions?

A

Linear Probing

Overflow (Flagging)

36
Q

What do you do in Linear Probing to handle a collision?

A

You search sequentially for an unoccupied position
-uses a wraparound (circular) array

  1. Apply Hashing Algorithm
  2. Examine the result
  3. Check to see if the index is free
  4. If free, store the value
  5. If not free, move sequentially to the next free slot
37
Q

What do you do in Overflow to handle a collision?

A
  • a tag/pointer should be left in the block to indicate an overflow has occurred
  • the record is added to the overflow area
  1. Apply Hashing Algorithm
  2. Examine the Result
  3. Check to see if the index is free
  4. If free, store the value
  5. If not free, add flag pointing to the overflow
  6. Add the record to the next free slot in the overflow