Chapter 3 - Databases and Data Acquisition Flashcards
(80 cards)
What does an entity contain? What kind of database does it relate to?
An entity contains data about a single subject. Think of it as a noun that describes a person, place or thing. Used in relational databases.
What’s an entity instance?
It’s like a single record or instances of an entity subject. Like picking out one record in a table about pets.
What is the name given to describe the relationship between two entities?
Cardinality
Which types of entity relationships are comparatively complex and rare?
Unary and Ternary.
What is the difference between an entity instance and an entity attribute?
An entity attribute is akin to the column heading in a table. An entity instance is akin to a single record in a table, for example, the record about one particular customer. The record contains multiple entity attributes.
Entities become a separate _____ in a database
table
What two things together link data between two tables?
An associative table and a foreign keys
the F-keys link to the primary keys in the two related tables
A foreign key helps to enforce what?
foreign key enforces referential integriy, or how consistent the data is in related tables
What describes the modifications necessary to to create tables for storing data?
A schema
A table that contains a primary key in more than one column has a primary key called what?
A composite primary key
You need to store and retrieve simple data quickly that doesn’t have a lot of structure, which of the two non-relational databases would you choose?
Key-value or Column-family
Key-value
Name 2 high level key benefits of a column-family database?
1) DYNAMIC/FLEXIBLE - Column data can vary dynamically between key values
2) ABILITY TO SCALE - can process vast amounts of data
What kind of database is best at exploring relationships between pieces of data?
Graph databases
What database type do the below relate to and what are they?
1) NODES -
2) PROPERTIES -
3) RELATIONSHIPS -
GRAPH DATABASES
2) 1) NODES - these are the equivalent of an entity instance; a person or a thing.
2) PROPERTIES - these store attributes about a node
3) RELATIONSHIPS - these are arrows connecting nodes denoting their relationship to each other
What kind of database is often behind website features like related or RECOMMENDED items?
Graph database.
What database process is employed to reduce data redundancy?
Normalization
List the 5 rules of 1NF
1) KEY - a table without a primary key is not permitted
2) GROUPS - Repeating groups are not permitted
3) ORDER - Row order is not permitted
4) DATATYPES - Mixing of datatypes in a column is not permitted
5) VALUES - non-atomic values are not permitted
What level of normalization does this statement describe? “every attribute in a table should depend on the key, the whole key. and nothing but the key”?
3NF - third normal form
it doesn’t use a schema
it stores data in raw format
it requires specialist knowledge (data scientists) about the data to operate
a datalake
a subset of a datawarehouse that is data that serves a particular part or subject within the organization as opposed to data that serves the entire organziation
A Datamart
List 2 key features of the Star Schema design pattern
1) Denormalized to improve read performance over large datasets
2) Uses a central fact table with dimension tables as spokes.
Regarding OLAP databases, which schema design pattern would you pick if storage was a concern?
Snowflake schema
Regarding design patterns, which design pattern does a datawarehouse usually use and why and which design pattern does a datamart usually use and why?
1) Datawarehouses usually use snowflake schema design patterns because they’re supplied from many differences sources.
2) Datamarts usually use the Star schema because they represent a single data subject area.
When designing a dimension table, it is crucial to understand what to help you build it?
The questions that analysts will be asking about the the business or subject?