DBMS endsem Flashcards
(27 cards)
Relational database
method of organization and management of data through relations, attributes and tuples
Ex.
Components (8) :
Relation, Attribute, Tuple, Constraint, Domain (allowable values in att), PK, FK, Normalization
Advantages:
Scalability
SQL support
User friendly
Data integrity
Relation | Table (6)
1 theoretical concept with mathematical rules | organization of data
2 No null | Allowed
3 Unique entries | Allowed
4 Storage abstract | Physical
5 Unordered | Ordered
6 Cannot have duplicate | Can
Candidate key and superkey
Super key may contain unecessary attribute with no restriction for uniqueness
Candidate key is minimal super key
Every candidate key is super key but not every..
Codd’s rules
0: Foundation rule
1: Information rule
2: Guaranteed access
3: Systematic treatment of null values
4: Active online catalog
5: Comprehensive data sub-language rule
6: View updating rule
7: High-level insert, update, delete
8: Physical data independence
9: Logical data independence
10: Integrity independence
11: Distribution independence
12: Non subversion rule
Normalization
1NF:
Atomicity
Uniqueness
No groups
2NF:
No partial dependence
Completely dependent on primary key
3NF:
No transitive dependence
BCNF:
Stronger form of 3NF
Ensures in every functional dependence, dependent variable is super key
Characteristics of a good relational database
1 Normalization
2 Data integrity and consistency
3 Scalability
4 Referential integrity
5 Proper data types
6 Minimal data redundancy
7 Security and ascces control
Why is normalization needed?
1 Data redundancy
2 Scalability support
3 Improve query performance
4 Improves data integrity
Explain Query optimization wrt SQL database
-selecting efficient query plan
-goals: minimize execution time, reduce resource usage, overall performance improvement
Steps:
1 Parsing the query (syntax, identifier, keywords)
2 Semantic analysis
3 Query rewrite
4 Plan generation
5 Plan selection (estimated cost)
6 Plan execution
Distributed database
Database spread across multiple physical locations connected using conection links
Components:
1 Data nodes (physical locations)
2 Connection network (connects data nodes)
3 Distrubuted DBM (software that manages dist database)
Types:
Homogenous (nodes run the same DBMS software)
Heterogenous (may run diff)
Hybrid (combination)
Explain architecture of parallel databases
designed to execute simultaneous execution of tasks
1 Shared mem arch
2 Share disk arch
3 Distributed arch
4 Hybrid (Mem+Disk+Distributed)
Key elements of parallel database processing
1 Parallelism
2 Multiprocessing
3 Data partitioning
4 Task parallelilsm
5 Data parallelism
6 Inter query
7 Intra query
8 Synchronization
9 Communication
10 Load balancing
2 tier | 3 tier
layers
names
complexity
scalability
security
cost
flexibility
performance
example (simple web app, mobile apps | enterprise software systems)
OLAP
Online analytical processing
technology that allows to analyze large amounts of data quickly from multiple percpectives
Slice and dice through data for reporting and decision making
key feature:
Multidimensional view (organized in cubes)
Advanced analysis (comparisons, forecasting)
Interactive (users can slice, dice, roll-up)
Roll up (aggregates to summary)
Drill down (to more detailed data)
Slice (selects a single value for 1 dimension)
Dice(multiple values across multiple dim)
Pivot(reorrients multidimensional view)
Data warehouse architecture and its components
Architecture:
Bottom tier(data source flat files and operational databases)
Middle tier (staging and storage-olap cubes)
Top tier (presentation)
Components:
Data source (logs,spreadsheets, API, cloud storage)
ELT tools (Extract, Load, transform)
Stage area
Storage area (organized by subject)
Metadata repository (usage, source, structure)
OLAP engine
Query tools (run query, reports, analyze trends)
What is KDD
overall process of discovering useful, valid, and understandable patterns or knowledge from large volumes of data.
Data selection (sources)
Data preprocessing (Clean data)
Data transformation (Convert to suitable form)
Data mining (Apply algorithms to extract patterns)
Pattern Evaluation (valid, novel, useful, and understandable)
Knowledge presentation
Goal of data mining
main goal - discover hidden patterns and analyzing data to gain insights
1 Forecasting and prediction
2 Classification
3 Clustering
4 Anomaly detection
What is big data?
Datasets so large that traditional data processing cannot handle
Volume
Velocity (real time processing and streaming)
Value (inghts that bring business value)
Variety
Explain data mining task
Descriptive tasks
Purpose (describe general properties or patterns in the data)
Focus(Understand what’s happening)
Ex - Clustering, Summarization
Predictive tasks
Purpose (predict future values based on current)
Focus(Make predictions or classification)
Ex - Classification, regression
NoSQL database
For handling unstructured or semi-structured data in flexible schema
Compensate for limitations of relational databases
Characteristics:
Schema less (flexible data modelling)
Non relational (key-value, column family, document)
High performance (optimized for fast data retrieval)
Distributed (scale horizontally across multiple servers)
Types:
Key value (Riak or Redis)
Document oriented (self describing docs like JSON or XML - MongoDB)
Column family (Cassandra)
Graph based (complex relations - Amazon neptune)
Multimodel (ArangoDB)
Adv: Flexible, High scalability
Disadv: Lack of standardization (different query language and models), Limited transaction support (no roll back)
Internet databases
Web based
Provide remote access
Application like e-commerce, social media
Traits:
Remote access
Scalability
Security
Web-based architecture
Cloud databases
Hosted and managed by cloud services like Amazon web services, Google cloud platform
Wide range of application from small scale to large scale ernterprise
Traits:
Cloud based architecture
Managed by cloud services
Scalability
Security
Adv: Cost effective, Scalability
Disadv: Dep on service provider, security risks
SQLite
self contained, File based RDBMS
web, mobile and desktop applications
Traits:
Relational
Self contained (no server)
File-based (easy sharing)
Zero configuration
SQL support
Adv: Easy to use, Flexible
Disadv: Limited scalability, limited concurrency
XML database
Store manage and query data in XML format
For large unstructured, semi-structured datasets
Traits:
1 XML model (hierarchical elements and att)
2 Schema less
3 Indexing (high query performance)
4 Query support (Xpath and SQL)
adv: flexibility, scalability
disadv: complexity, performance
MongoDB
provides flexible and scalable way of organizing large amounts of data
Released in 2009
Traits:
Document oriented
Schema less
Scalable
High performance
adv:
disadv: less support for transactions, no complex queries, data redundancy