2 Data Structures, Types, and Formats Flashcards
What is the main focus of this chapter?
Data storage and the various formats of data.
What are the two main categories of databases?
- Structured
- Unstructured
What defines a structured database?
It follows a standardized format with a clear and logical structure.
What are the two main archetypes of structured databases?
- Defined rows/columns
- Key-value pairs
How are defined rows and columns organized?
In tables or spreadsheets where columns represent variables and rows represent data points.
What do key-value pairs represent in a structured database?
Data objects where each object has the same set of keys with different values.
What characterizes unstructured data?
It has no attempt at organization and is often stored as individual files.
What are the two groups of unstructured data?
- Undefined fields
- Machine data
What types of file formats are included in undefined fields?
- Text files
- Audio files
- Video files
- Images
- Social media data
- Emails
What is machine data?
Data automatically generated by software without human intervention.
What is the difference between relational and non-relational databases?
Relational databases store information and relationships, while non-relational databases store information only.
What language is primarily used for querying relational databases?
Structured Query Language (SQL)
True or False: All SQL databases are structured and relational.
True
True or False: All non-relational databases are unstructured.
True
What are the two most basic types of data schemas covered in this chapter?
- Star schema
- Snowflake schema
What is the structure of a star schema?
A central key table with dimension tables connected directly to it.
What are the pros of a star schema?
- Simple
- Fewer joins required
- Easier to understand
What are the cons of a star schema?
- High redundancy
- Denormalized
What distinguishes a snowflake schema from a star schema?
It has two levels of dimension tables instead of one.
What are the pros of a snowflake schema?
- Low redundancy
- Normalized
What are the cons of a snowflake schema?
- More complicated
- More joins required
What is a data warehouse?
A database used for structured relational tables, holding large amounts of processed transactional data.
What is a data mart?
A specialized subset of a data warehouse holding processed information on a specific topic.
What is a data lake?
A storage system for large amounts of raw, unprocessed data, which can be structured, unstructured, or a combination.