Chapters 1&2 Knowledge Testers Flashcards
(36 cards)
Edgar Codd?
Introduced Data Independence -> revolutionized data storage
did work on relational algebra
Data Independence
Seperation of physical and logical representation of data<br></br>Make physical simple and clear for human understanding
Data Shapes
trees, cubes, tables, vectors (text), graphs
Data Model
What data looks like and what you can do with it
How much data? What shape? How data is organized?
Table Synonyms
Collection, Relation, Relational Table
Row Synonyms
Business Object, Item, Entity, Document, Record, Tuple
Attribute Synonyms
Column, Field, Property, Key
Primary Key Synonyms
Row ID, Name, Key
Value Synonyms
Scalar, Cell, Characteristic, Fact
Relational Tables have
set of attributes (schema) and set/bag/list of tuples
Atomic Integrity
All values are atomic (string, number), NOT ARRAY
Relational Integrity
all its records have identical
support. All elements have all attributes
Sketch the history of databases (ancient and
modern) to a colleague in a few minutes?
DNA - first data storage
Brain - First human controlled data storage
Humans told stories->ISSUE: not reliable, story changes over time
Writing - clay tablets - tables -> ISSUE: how to make copies??
Printing Press-> easily make copies and mass produce/distribute
Computers
Difference between data, information and knowledge
Data -> numbers<br></br>Information -> Meaning from data, processed data<br></br>Knowledge -> meaning from information, interpreting information
How can structured data can be characterized?
Order and organization
Do you know the standard prefixes of the International System of
Units (when the exponent in base 10 is a positive multiple of 3)?
Karl Marx gave the proletariat eleven zeppelins, yo
Kilo, Mega, Giga, Tera, Peta, Exa, Zeta, Yotta, Ronna, Quetta
4 technologies commonly referred to as
NoSQL
key-value, document, column family, graph
3 Vs
Volume - Amount of data
Velocity - Capacity, latency, throughput
Variety - Shapes
Define capacity, throughput and latency with units
Capacity: how much data per volume (bytes)
Latency: Wait time to read data (miliseconds)
Throughput: Data read per time (byte/sec)(Not sure what is standard units)
Can you explain why and how the evolution of capacity, throughput
and latency over the last few decades has influenced the design of
modern database systems?
Capacity expanded a lot more than the other 2. Need to use parallelization and batch processing to improve latency and throughput (scale out)
Scale out vs Scale up
scale out - more machines
scale up - more powerful machines
Name a few big players in the industry that accumulate and
analyze massive amounts of data?
S3, Azure
bit vs byte
a bit is 0 or 1, a byte is a collection of 8 bits
Name a few concrete examples that illustrate the various
orders of magnitude of amounts of data?
Files Kb, Movies Gb