Chapters 1&2 Knowledge Testers Flashcards

(36 cards)

1
Q

Edgar Codd?

A

Introduced Data Independence -> revolutionized data storage
did work on relational algebra

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Independence

A

Seperation of physical and logical representation of data<br></br>Make physical simple and clear for human understanding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Shapes

A

trees, cubes, tables, vectors (text), graphs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data Model

A

What data looks like and what you can do with it
How much data? What shape? How data is organized?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Table Synonyms

A

Collection, Relation, Relational Table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Row Synonyms

A

Business Object, Item, Entity, Document, Record, Tuple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Attribute Synonyms

A

Column, Field, Property, Key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Primary Key Synonyms

A

Row ID, Name, Key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Value Synonyms

A

Scalar, Cell, Characteristic, Fact

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Relational Tables have

A

set of attributes (schema) and set/bag/list of tuples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Atomic Integrity

A

All values are atomic (string, number), NOT ARRAY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Relational Integrity

A

all its records have identical
support. All elements have all attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sketch the history of databases (ancient and
modern) to a colleague in a few minutes?

A

DNA - first data storage
Brain - First human controlled data storage
Humans told stories->ISSUE: not reliable, story changes over time
Writing - clay tablets - tables -> ISSUE: how to make copies??
Printing Press-> easily make copies and mass produce/distribute
Computers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Difference between data, information and knowledge

A

Data -> numbers<br></br>Information -> Meaning from data, processed data<br></br>Knowledge -> meaning from information, interpreting information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can structured data can be characterized?

A

Order and organization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Do you know the standard prefixes of the International System of
Units (when the exponent in base 10 is a positive multiple of 3)?

A

Karl Marx gave the proletariat eleven zeppelins, yo
Kilo, Mega, Giga, Tera, Peta, Exa, Zeta, Yotta, Ronna, Quetta

17
Q

4 technologies commonly referred to as
NoSQL

A

key-value, document, column family, graph

18
Q

3 Vs

A

Volume - Amount of data
Velocity - Capacity, latency, throughput
Variety - Shapes

19
Q

Define capacity, throughput and latency with units

A

Capacity: how much data per volume (bytes)
Latency: Wait time to read data (miliseconds)
Throughput: Data read per time (byte/sec)(Not sure what is standard units)

20
Q

Can you explain why and how the evolution of capacity, throughput
and latency over the last few decades has influenced the design of
modern database systems?

A

Capacity expanded a lot more than the other 2. Need to use parallelization and batch processing to improve latency and throughput (scale out)

21
Q

Scale out vs Scale up

A

scale out - more machines
scale up - more powerful machines

22
Q

Name a few big players in the industry that accumulate and
analyze massive amounts of data?

23
Q

bit vs byte

A

a bit is 0 or 1, a byte is a collection of 8 bits

24
Q

Name a few concrete examples that illustrate the various
orders of magnitude of amounts of data?

A

Files Kb, Movies Gb

25
Why is it important to consider whether a use case is read-intensive, or write-intensive, or in-between?
Guess:Read intensive -> benefit from denormalized data to reduce query complexity and latency Write intensive -> benefit from normalized data to avoid redundancy (redundant writes) and maintain data integrity
26
Why normal forms are important?
Can prevent deletion, insertion and update errors
27
first normal form in simple terms?
Atomic integrity -> all values are atomic (simple) -> no nesting
28
Describe in simple terms how higher normal forms (like Boyce-Codd) are related to joins?
NF are like opposite of joins - seperate big table into small tables
29
Why is it common, for large amounts of data, to drop several levels of normal form, and denormalize data instead?
GUESSING: preventing expensive joins, simpler queries
30
Declarative language
User specifies what they want - not how to compute it - up to system to figure out how to execute the query
31
Functional language
Nesting - expressions can nest in each other Queries are like lego - building blocks - you can change order and such But changing order can change outcome
32
Why design query languages that are declarative and functional?
GUESS: declarative: focus on what rather than how (easy for users, leave how to machine) functional: modularity, can move around pieces
33
Describe the major relational algebra operators: select, project, aggregate, sort, Cartesian product, join?
Select -> Choose rows Project -> Choose columns Aggregate -> Combine - group by cols -> aggregate other columns Sort -> order by a specified column value Cartesian Product -> multiply tables (all rows) Join -> multiply tables based on certain matching values eg A=B
34
The names of the basic components of the tabular shape at an abstract level (table, row, column, primary key) as well as the names of the most common corresponding counterparts in the NoSQL world?
Relation/Collection, Record, Attribute/Field, Id
35
ACID
Atomicity, Consistency, Isolated, Dependable A - either an update (called a transaction if it consists of several updates) is applied to the database completely, or not at all; C - before and after the transactions, the data is in a consistent state (e.g., some values sum to another value, another value is positive, etc); I - the system “feels like” the user is the only one using the system, where in fact maybe thousands of people are using it as well concurrently; D - any data written to the database is durably stored and will not be lost (e.g., if there is an electricity shortage or a disk crash).
36
Describe the following SQL terms: SELECT, FROM, WHERE, GROUP BY, HAVING, JOIN, ORDER BY, LIMIT, OFFSET
Select - projection - selecting columns from - table where - selection of certain rows group by - aggregate information having - selection join - combine 2 tables order by - sorting limit - number of rows to display eg. 10 rows offset - start from nth row