1.3.1 Flashcards
Compression, encryption and hashing (10 cards)
Run length encoding
- A lossless compression method.
- Repeated values are replaced with a single instance of the value and the number of times the value occurs.
- It relies on all consecutive pieces of data being the same.
- It offers poor reduction in file size if there is little repetition.
- Used with images particularly PNG and TIFF
Lossy compression
Some of the data is permanently removed during compression and cannot be recovered
Provide more compression than lossless techniques
Popular in situations where the removal of data is not noticeable as in the case of images and audio
Example file formats .jpg, .mp3
Lossless compression
All the original data can be recovered when the file is uncompressed
Used with text as all data must be recoverable
Example techniques run-length encoding, dictionary encoding
Example file formats .zip, .gif, .png, .pdf
Not normally applied to video as higher compression rates are needed to make files smaller
Dictionary encoding
Lossless method used commonly with text.
Any data matching a value in the ‘dictionary’ (a lookup table) is substituted with a value that uses fewer bits.
Benefits of compression
Storage space. This is a ‘no-brainer’, if you can compress your photos, music and videos, you can fit more of them on a device reducing cost.
Bandwidth. Compression reduces the wait times to download files. As well as saving time, it can reduce data usage on limited (or expensive) networks.
Symmetric encryption
The same key is used to encrypt and decrypt the message.
Both the sender and the recipient must know the key.
A simple example of a symmetric algorithm is the Caesar cipher.
In this substitution cipher, each character in the plaintext is shifted left or right in the alphabet; the number of places the character is shifted corresponds to the key.
To decrypt the message, you would reverse the process by shifting the other way using the same key.
Asymmetric encryption
Asymmetric encryption uses two separate but mathematically connected keys (see next slide for some maths) - private and public keys - to encrypt and decrypt the data.
The public key, can be widely shared. The private key, is known only to its owner and must be kept secure.
Why do we need encryption?
To secure electronic communication over a network such as the internet. Encryption provides confidentiality and privacy. Today, more than 80% of all websites are HTTPS (ISPs, governments and big data collection firms love our browsing data). This is why many websites choose to encrypt your traffic even though you’re not sending sensitive information.
To keep data secure for example to comply with legislation e.g. data protection legislation. For example customer records and their accounting and financial data must be protected by law.
Authenticity (verification of the person’s or device’s identity) - e.g. use your private key to create digital signature or encrypt with your private key and reader decrypts with your public key, hence knowing that you sent the message.
Hashing
An algorithm that is applied to data. The output is called a digest (or hash). The digest can not be reverted to the original value so is said to be one way. You can not revert from the hash back to the original data.
A collision is a situation that occurs when two distinct pieces of data have the same hash value.
Uses of hashing
Hashing can be used to check that two files are the same, or that a message has not been altered for example.
Hash tables with an array can speed things up.
Hashing in Databases In a database, the primary key (or another value) can be hashed to determine the location on disk to store the record or the location from which to retrieve/delete the record.
Storing confidential data