How are files physically stored on storage media?
Strings of 1s and 0s (binary data) in data blocks.
Interpretation depends on standards, manufacturer specifications, and hardware structure
What command shows a file in binary (bits) and hexadecimal?
xxd -b myfile → binary (bits)
xxd myfile → hexadecimal dump
What is Encoding?
Rules for how data is stored and retrieved.
Binary encoding → for machines
Character encoding → for human understanding
What is ASCII?
American Standard Code for Information Interchange.
1 byte = 1 character (0-127 standard, 128-255 extended).
Compare UTF-8, UTF-16, and UTF-32
UTF-8: Variable bytes, backward compatible with ASCII
UTF-16: 16 bits per character (Little/Big Endian)
UTF-32: 32 bits per character, supports all languages
What is Base64 encoding used for?
Converts binary data into a sequence of 64 printable ASCII characters. Commonly used in URLs, email attachments, and payloads
Main categories of files in forensics?
Data Files – Binary or Character encoded
Program Files – Executables and libraries
Difference between Character-encoded and Binary-encoded files?
Character-encoded: Human-readable in text editor (XML, JSON, CSV, YAML, etc.)
Binary-encoded: Machine-readable, illegible as text, often show “magic numbers”
Describe the JSON format structure
Uses { } for objects, : to separate key:value, , to separate pairs, [ ] for lists. Supports nesting
Describe the CSV format
Values separated by commas or colon. One record per line. Optional header row with field names
Describe the XML format
Data enclosed in tags <tag>content</tag>. Supports nesting. Processing instructions start with <?
Describe YAML and INI formats
YAML: Key: Value pairs, uses indentation for nesting, - for lists
INI: Key=Value pairs, sections in [Section], no standard list support
How can you identify a file type without relying on the extension?
Using the Magic Number (file signature) – specific bytes at the start of the file
Give examples of common magic numbers.
GIF → GIF89a
WAV → RIFF
ZIP (including .docx) → PK
Why can .docx files be opened with a ZIP tool?
Because .docx is a ZIP archive containing XML files (word/document.xml, etc.)
Signed vs Unsigned Integers – explain the difference
Unsigned: Positive numbers only (0 to max)
Signed: Uses Two’s Complement to represent negative numbers
What is Little Endian vs Big Endian?
Byte order for multi-byte data types:
Little Endian: Least significant byte first
Big Endian: Most significant byte first
What standard is used for floating-point numbers?
IEEE 754
List the 4 main methods to identify a file type (in order of forensic preference)
Try to open it (on isolated/offline machine – risky)
Check file extension (easily changed/hidden)
Check Magic Number / File Signature (file command)
Examine contents (text editor, hexdump, strings, etc.)
What are the main problems forensic analysts face when examining files?
Older or private formats
Overwritten/changed signatures
Fragmented or partially overwritten files
Encrypted files
Steganography (hidden data)
Context-dependent interpretation (endianness, encoding, structures)
Proving the interpretation matches the creator’s original intent