DBMS endsem Flashcards

(27 cards)

1
Q

Relational database

A

method of organization and management of data through relations, attributes and tuples
Ex.
Components (8) :
Relation, Attribute, Tuple, Constraint, Domain (allowable values in att), PK, FK, Normalization

Advantages:
Scalability
SQL support
User friendly
Data integrity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Relation | Table (6)

A

1 theoretical concept with mathematical rules | organization of data
2 No null | Allowed
3 Unique entries | Allowed
4 Storage abstract | Physical
5 Unordered | Ordered
6 Cannot have duplicate | Can

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Candidate key and superkey

A

Super key may contain unecessary attribute with no restriction for uniqueness

Candidate key is minimal super key

Every candidate key is super key but not every..

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Codd’s rules

A

0: Foundation rule
1: Information rule
2: Guaranteed access
3: Systematic treatment of null values
4: Active online catalog
5: Comprehensive data sub-language rule
6: View updating rule
7: High-level insert, update, delete
8: Physical data independence
9: Logical data independence
10: Integrity independence
11: Distribution independence
12: Non subversion rule

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Normalization

A

1NF:
Atomicity
Uniqueness
No groups

2NF:
No partial dependence
Completely dependent on primary key

3NF:
No transitive dependence

BCNF:
Stronger form of 3NF
Ensures in every functional dependence, dependent variable is super key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Characteristics of a good relational database

A

1 Normalization
2 Data integrity and consistency
3 Scalability
4 Referential integrity
5 Proper data types
6 Minimal data redundancy
7 Security and ascces control

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is normalization needed?

A

1 Data redundancy
2 Scalability support
3 Improve query performance
4 Improves data integrity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain Query optimization wrt SQL database

A

-selecting efficient query plan
-goals: minimize execution time, reduce resource usage, overall performance improvement

Steps:
1 Parsing the query (syntax, identifier, keywords)
2 Semantic analysis
3 Query rewrite
4 Plan generation
5 Plan selection (estimated cost)
6 Plan execution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Distributed database

A

Database spread across multiple physical locations connected using conection links

Components:
1 Data nodes (physical locations)
2 Connection network (connects data nodes)
3 Distrubuted DBM (software that manages dist database)

Types:
Homogenous (nodes run the same DBMS software)
Heterogenous (may run diff)
Hybrid (combination)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain architecture of parallel databases

A

designed to execute simultaneous execution of tasks
1 Shared mem arch
2 Share disk arch
3 Distributed arch
4 Hybrid (Mem+Disk+Distributed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Key elements of parallel database processing

A

1 Parallelism
2 Multiprocessing
3 Data partitioning
4 Task parallelilsm
5 Data parallelism
6 Inter query
7 Intra query
8 Synchronization
9 Communication
10 Load balancing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

2 tier | 3 tier

A

layers
names
complexity
scalability
security
cost
flexibility
performance
example (simple web app, mobile apps | enterprise software systems)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

OLAP

A

Online analytical processing
technology that allows to analyze large amounts of data quickly from multiple percpectives
Slice and dice through data for reporting and decision making

key feature:
Multidimensional view (organized in cubes)
Advanced analysis (comparisons, forecasting)
Interactive (users can slice, dice, roll-up)
Roll up (aggregates to summary)
Drill down (to more detailed data)
Slice (selects a single value for 1 dimension)
Dice(multiple values across multiple dim)
Pivot(reorrients multidimensional view)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data warehouse architecture and its components

A

Architecture:
Bottom tier(data source flat files and operational databases)
Middle tier (staging and storage-olap cubes)
Top tier (presentation)

Components:
Data source (logs,spreadsheets, API, cloud storage)
ELT tools (Extract, Load, transform)
Stage area
Storage area (organized by subject)
Metadata repository (usage, source, structure)
OLAP engine
Query tools (run query, reports, analyze trends)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is KDD

A

overall process of discovering useful, valid, and understandable patterns or knowledge from large volumes of data.
Data selection (sources)
Data preprocessing (Clean data)
Data transformation (Convert to suitable form)
Data mining (Apply algorithms to extract patterns)
Pattern Evaluation (valid, novel, useful, and understandable)
Knowledge presentation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Goal of data mining

A

main goal - discover hidden patterns and analyzing data to gain insights
1 Forecasting and prediction
2 Classification
3 Clustering
4 Anomaly detection

17
Q

What is big data?

A

Datasets so large that traditional data processing cannot handle
Volume
Velocity (real time processing and streaming)
Value (inghts that bring business value)
Variety

18
Q

Explain data mining task

A

Descriptive tasks
Purpose (describe general properties or patterns in the data)
Focus(Understand what’s happening)
Ex - Clustering, Summarization

Predictive tasks
Purpose (predict future values based on current)
Focus(Make predictions or classification)
Ex - Classification, regression

19
Q

NoSQL database

A

For handling unstructured or semi-structured data in flexible schema
Compensate for limitations of relational databases

Characteristics:
Schema less (flexible data modelling)
Non relational (key-value, column family, document)
High performance (optimized for fast data retrieval)
Distributed (scale horizontally across multiple servers)

Types:
Key value (Riak or Redis)
Document oriented (self describing docs like JSON or XML - MongoDB)
Column family (Cassandra)
Graph based (complex relations - Amazon neptune)
Multimodel (ArangoDB)
Adv: Flexible, High scalability
Disadv: Lack of standardization (different query language and models), Limited transaction support (no roll back)

20
Q

Internet databases

A

Web based
Provide remote access
Application like e-commerce, social media

Traits:
Remote access
Scalability
Security
Web-based architecture

21
Q

Cloud databases

A

Hosted and managed by cloud services like Amazon web services, Google cloud platform
Wide range of application from small scale to large scale ernterprise

Traits:
Cloud based architecture
Managed by cloud services
Scalability
Security

Adv: Cost effective, Scalability
Disadv: Dep on service provider, security risks

22
Q

SQLite

A

self contained, File based RDBMS
web, mobile and desktop applications

Traits:
Relational
Self contained (no server)
File-based (easy sharing)
Zero configuration
SQL support

Adv: Easy to use, Flexible
Disadv: Limited scalability, limited concurrency

23
Q

XML database

A

Store manage and query data in XML format
For large unstructured, semi-structured datasets
Traits:
1 XML model (hierarchical elements and att)
2 Schema less
3 Indexing (high query performance)
4 Query support (Xpath and SQL)
adv: flexibility, scalability
disadv: complexity, performance

24
Q

MongoDB

A

provides flexible and scalable way of organizing large amounts of data
Released in 2009

Traits:
Document oriented
Schema less
Scalable
High performance
adv:
disadv: less support for transactions, no complex queries, data redundancy

25
JSON
lightweight, text based data interchange format data exchange between web-servers and web-applications Data types: Number, string, boolean, array, object (unordered collection of key value pairs), null Syntax: Values: values can be number, string, boolean, arrays, objects or null adv: text based, lightweight disadv: schema less, data type less
26
HDFS
Hadoop - distributed file system used to store and manage large amounts of data across a cluster of machines Integral component of Apache hadoop ecosystem used for big data analytics 1 Distributed arch 2 Block based storage (64, 128 mb) 3 Replication (across multiple systems in case of h/w failure) 4 High throughput access (fast data processing) adv: scalability, fault tolerance disadv: complexity, less support for small files
27
MapReduce and Hadoop
MapReduce : programming model used to manage large datasets across Hadoop : open source software framwork used to store and process Hadoop: 1 HDFS 2 MapReduce 3 YARN (resource management layer)