Introduction - Mabel Flashcards
What is data independence?
The separation of logical schema (how data is structured) from physical schema (how data is stored).
Why is data independence important?
It ensures physical changes (e.g., hardware upgrades) don’t affect logical data interaction.
Who introduced the concept of data independence?
Edgar Codd in 1970.
What is normalization?
The process of organizing data to reduce redundancy and improve integrity.
What is denormalization?
Combining data into fewer tables to reduce the need for complex joins, enhancing performance and scalability.
What are the three Vs of Big Data?
Volume, Variety, and Velocity.
What are examples of NoSQL technologies?
Key Value Stores, Triple Stores, Column Stores, Document Stores.
What is the purpose of normalization in traditional databases?
To prioritize data integrity and minimize redundancy.
Why is normalization often relaxed in Big Data systems?
To prioritize performance and scalability over strict data integrity.
How is velocity defined in Big Data?
By capacity (how much data can be stored), throughput (speed of data transfer), and latency (time delay in data availability).
How has velocity changed from 1956 to 2024?
Capacity increased by 23 billion times, throughput by 20,800 times, and latency improved by 400 times.
What is the timeline of storage systems?
1960s: File systems, 1970s: Relational Databases, 1980s: Object Era, 2000s: NoSQL era.
What is a data model?
A framework defining how data is structured, organized, and stored.
What are the fundamental shapes of data?
Tables, Trees, Graphs, Cubes, Text.
What are the units for capacity in data velocity?
Megabytes per cubic centimeter (MB/cm³).
What are the units for throughput in data velocity?
Megabytes per second (MB/s).
What are the units for latency in data velocity?
Seconds.
What is the difference between data, information, and knowledge?
Data: Raw facts; Information: Processed data with context (e.g., averages); Knowledge: Interpreted information combined with experience and insights.
What are the 10 principles of Big Data?
Learn from the past, Keep design simple, Modularize architecture, Homogeneity in large, Heterogeneity in small, Separate metadata, Shard data, Replicate data.
What are some examples of Big Data technologies?
S3, HDFS, XML, HBase, OLAP, Neo4j, Hadoop MapReduce, Spark, MongoDB.
What are some key relational algebra operations?
Select, Project, Union, Difference, Cartesian Product, Rename.
What is the purpose of the rename (ρ) operator in relational algebra?
To rename columns in a relation, similar to the AS clause in SQL.
How much data was stored digitally worldwide as of 2021?
Close to 100 zettabytes (ZB).
What prefixes are used for measuring data sizes?
Pico (10⁻¹²), Nano (10⁻⁹), Micro (10⁻⁶), Milli (10⁻³), Kilo (10³), Mega (10⁶), Giga (10⁹), Tera (10¹²), Peta (10¹⁵), Exa (10¹⁸), Zetta (10²¹), Yotta (10²⁴).