Compression, encryption and hashing Flashcards
(33 cards)
why is data compressed?
- to reduce storage space of files on disk
- to send data faster
- use less bandwidth - eg. if the ISP sets bandwidth limits/charges
- less buffering on audio/video streams
- webpages load quicker with compressed images
what is lossy compression?
non-essential information is removed permanently from the data
eg. different shades of the same colour are removed from an image, sound frequencies outside human hearing range are removed
what is lossless compression?
patterns in data are spotted and summarised in a shorter format without permanently removing any information
how is an image compressed using bitmapping?
- the image is made up of pixel of different colours arranged in rows and columns
- file size is reduced by reducing the variety of colours shown
- the image is reconstructed without the missing data
- leads to much smaller file size but also lower quality
how does lossy audio/video compression work in an MP3 file?
- removes sounds of frequencies too high for most ppl to hear
- removes quieter sounds played at the same time as loud sounds
- resulting file is about 10% of original size - 1 min = 1MB
where is lossy compression used vs lossless?
- lossy: in images, audio and video where one slightly different/missing pixel/note wouldnt make that much difference
- lossless: in text files/program files where one missing letter could cause an error/loss of critical information
how does compression compare between lossy and lossless compression?
- lossy: very significant file size reduction
- lossless: not as much reduction as lossy, but still a pretty significant reduction
what is run length encoding and how does it work?
- a form of lossless compression
- works by summarising consecutive patterns of data
- good for where there is lots of repeated data eg. pixels in an image
- eg. in an image: records the number of times a pixel colour is repeated consecutively and the colour repeated, all in the correct order of it appearing
- eg. in sound: 1000s of indentical samples are taken of the same note played for a longish time - one sample of a note is taken and applied everywhere
what is dictionary compression and how does it work?
- a form of lossless compression
- best used for long passages of text
- repeated patterns/words are identified
- a dictionary is created consisting of all the words in the passage
- the passage is now stored as numbers that correspond to the correct word in the dictionary - uses up significantly less space, even when including the size of the dictionary
how could dictionary compression of a passage of text become even more efficient?
instead of storing words in the dictionary, store repeated phrases
what is encryption?
the transformation of data from one form to another to prevent an unauthorised third party from being able to understand it
what is plaintext?
the original unencrypted text/data
what is ciphertext?
the encrypted text/data
what is the cipher?
the encryption method or algorithm
what is the key?
the secret information to lock/unlock the message
how does the Caesar cipher and the Vernam cipher compare in terms of level of security?
- Caesar: very weak security - easily broken with little to no computational power (eg. by brute force attack)
- Vernam: perfect security
- all other ciphers are in between
what is the Caesar cipher and how does it work?
- a substitution cipher
- all the letters of the alphabet are shifted along by a constant amount - indicated in the key
- very basic encryption + the most insecure
- made a teeny bit more secure by removing spaces to mask word length
what is a brute force attack?
attempts to apply every possible key to decrypt the ciphertext until one works
what is the Vernam cipher and how does it work?
the only cipher that is still proven unbreakable
1) one time pad
2) bitwise exclusive - xor operation
how does the one time pad work in the Vernam cipher?
- its the encryption key
- must be at least as long in characters as the plaintext
- can only be used once
- sender + recipient are both party to the key - they meet in person to securely share the key + destroy it immediately after decryption
- it is random - character distribution is random = no cryptoanalysis will give any meaningful results
what is the xor operation carried out in the Vernam cipher?
- the second step
- xor operation done between binary character value of first character of plaintext + first character of the one-time pad
- often produces strange/unprintable symbols as ASCII ciphertext but this isnt normally an issue since the message is transmitted in binary
- to decrypt: XOR operation carried out on ciphertext using the one-time pad, restoring to plaintext
how is perfect security guaranteed with the Vernam cipher?
- the key is generated using a truly random source - eg. white noise, the timing of a hard disk read/write head, radioactive decay
- this is mathematically impossible to break
- computer generated random keys arent actually random - they are mathematically generated so just have the illusion of being random
what is frequency analysis?
- you use facts about the english language to try and work out the key and decrypt ciphertext
- not all letters in the english language are used equally often - most common: E,T,A,O,I,N,S,R,H
- least common: Z,J,K,Q,X
- most commonly used words depend on the language
what is symmetric encryption?
- aka private key encryption
- same key is used to encrypt + decrypt data
- key must also be exchanged to the same destination as ciphertext (key exchange)
- this is a security problem - the key can be intercepted as easily as the ciphertext to decrypt data