Suppose we’re inspecting a text file (ilovepessoa.txt)
using different editors
They may show different things! Why?
Data is encoded using a character encoding scheme
that editor application is having trouble to identify
What is the common code text representation?
ASCII
(7 bits)
How many bits the ASCII extension uses?
8 bits
What is the most common unicode?
UTF-8
Why do we in forensics usually need to inspect the raw
bytes, for which we use a hexadecimal editor (hex editor)?
Because the raw bytes contain the ground truth.
Most chars use 1 byte, accented chars use … bytes.
Accented chars use 2 bytes.
How an editor determines the encoding of text file?
Uses a default encoding scheme
Uses heuristics
Uses embedded information: BOM
What is BOM?
The Byte Order Mark (BOM) is a Unicode char that appears as a magic number at the start of a text stream telling its encoding.
How does an editor see if it’s ASCII?
Test if the bit 7 of all bytes is unset.
Integers are stored as a … of one or more bytes.
sequence
Endianness tells the … in which the sequence of bytes is stored
order
Is it possible to encode 2/4-byte characters in little endian or big endian?
Yes.
Does the BOM marker includes information about endianness?
Yes.
UTF-16/UTF-32 without BOM is big endian or small endian by default?
Big endian.
Base64 is a popular encoding scheme, for what?
Used to encode email attachments and certificates.
Base64 encoding mechanism?
[Dados binários] -> 3 bytes (24 bits)
↓ divide
[4 grupos de 6 bits] -> 4 números (0-63)
↓ tabela Base64
[4 caracteres ASCII seguros]
What if we need to forensically analyze structured data types?
They are everywhere, e.g., file formats, disk layouts, network packets, etc.
Strategy: decode raw bytes using knowledge of the data type layout (tools can help us).
Beyond payload data, can file formats include useful meta-data?
Yes.
E.g., timestamps, camera model, GPS coordinates, user name, software version that generated the file, etc.
Raster image?
array of pixels, each encodes a specific color.
How many bits per pixel in a raster image?
24-bit RGB images encode each pixel with 3 bytes, one per channel: red, green, and blue
Each value tells the intensity of the color varying from 0 (min) to 255 (max) of each channel.
What is BMP?
Bitmap Image File (ou simplesmente “bitmap”).
File Header - 14 bytes
DIB Header (Bitmap Info Header) - 50-byte
Pixel Data (Bitmap Data) - variável
ZIP files: archives that store multiple files. Start with letters…
PK.
How to decode a file not knowing its format?
What does the string command do?
Strings command: prints out the readable characters from a file.
Useful for looking at data fiels without the originating program, searching executables for useful strings, etc.