Big Data Lecture 02 Lessons Learnt Flashcards
Explain data independence
Logical model (interface) of the data (queries, and displaying) is independent from the physical storage (can be swapped).
What 4 pieces constitute the architecture of data storage?
<ul><li>Language (how you query),</li><li>model (representation, driver of independence),</li><li>compute (execution of computation),</li><li>storage (physical hardware).</li></ul>
What does the data model describe? (2)
<ul><li>What the data looks like,</li><li>what you can do with it (manipulation primitives).</li></ul>
What is a table?
Collection of rows with different attributes.
What is a row?
One record in the table.
What is an attribute?
One column in a table.
What is a primary key?
Unique key that identifies the record in a table.
What is a value?
One input in a row and a column of a table.
What is relational algebra?
Algebra to express operations on a table.
Relation table expressed formally in relational algebra?<br></br><br></br>What are its two components?
Each attribute has its domain, the relation is a subset of cross product of these domains, tuples of which we now put into a table.<br></br><br></br>Components:<br></br>1. set of attributes (schema),<br></br>2. set/bag/list of tuples.
Explain: set, list, and bag.
<ul><li>Set: unordered collection without duplicates,</li><li>list: ordered collection, can have duplicates,</li><li>bag: unordered collection, with duplicates.</li></ul>
How can tuple be seen as a function?
It assigns to each attribute of a table a value.
What is relational integrity?
All the attributes must have a correct reference, meaning that the keys point to valid records in other tables.
Edit: I dont think this is true. The real answer should be that:
All records must have identical support. Eg there cannot be missing values.
What is atomic integrity?
There are no tables in a table, every value is atomic.
When is table 1st normal form?
Table must follow atomic integrity.
What is domain integrity?
All the values must come from the same type, i.e. all are bools, or strings.
What is NoSQL?
When we break all the given constraints, we get outside, that is what we study in Big Data!
What is selection?
Selecting rows of a table.
What is projection?
Selecting columns of a table.
What is grouping?
Merging values of one table on the same attribute or condition.
What is sorting?
Sorting a table based on some order.
What is Cartesian product?
Taking product (each with each) of two tables.
What is join?
Merging two tables on a common attribute.
What are anomalies?
If some data is duplicated, but not properly linked, it might happen that on update/delete/insert there is anomaly.
Functional, we can nest it like math.
FROM table_name
WHERE condition
GROUP BY attribute
HAVING condition
ORDER BY attribute and direction
LIMIT number_to_display
OFFSET number_to_skip
- Theta join only matches on a selected attribute or condition,
- right and left joins, join one table onto the other on matching records and fill in the rest using NULL,
- full outer join does both right and left join,
- natural join joins on matching attribute names.
- atomicity: either everything or nothing is executed,
- consistency: everytime you update, all the data will be consistent,
- isolation: more people are using the database, but if feels like you are the only one,
- durability: updates that are carried out are persistent.
- rows,
- columns,
- nesting.