Lossless compression (1.3.1 b) Flashcards
(7 cards)
What is dictionary encoding?
replaces common phrases with a reference to the original data
e.g Romeo_Romeo_where_art_tho_Romeo = Romeo_ _where_art_tho
What is run length encoding?
turns runs of identical data into a single value and the number of repetitions
e.g AAABBBBCCCCDDBBBBAAA =3A4B4C2D4B3A
Run length encoding
simplest compression method
also works with images
Run length encoding limits
does not work with files of natural origin
englishrarely features words with lots of repeated character
images often don’t have regions of identical colour (especially photographs)
Dictionary encoding
replace repeated values with pointers to the original data
I am Sam Sam I am
token | data
———|———- 231123
1 | Sam
2 | I
3 | am
aren’t limited to individual words or characters
Dictionary limitations
dictionary and compressed file can have a larger file size than the uncompressed file
need for a dictionary and pointers is a major overhead
if the file doesn’t have sections that repeat the efficiency reduces
Real life lossless
often combine multiple compression approaches
e.g ZIP files use both dictionary encoding and variable length encoding (Huffman encoding)