de Flashcards
(57 cards)
What does ACID stand for
ATOMICITY
CONSISTENCY
ISOLATION
DURABILITY
A in ACID
ATOMICITY
All or nothing rule
Entire transaction takes place at once or doesn’t happen at all
Abort & commit
C in ACID
CONSISTENCY
Database must be consistent before and after the transaction
I in ACID
ISOLATION
Multiple transactions occur independently without interference
Changes only visible after they have been made to main memory
Isolation prevents dirty reads, non-repeatable reads, phantom reads
D in ACID
DURABILITY
Changes of a successful transaction occurs even if the system failure occurs
Once transaction is committed, changes are permanent
Advantages of ACID in DBMS
Data consistency - data remains consistent and accurate after any transaction execution
Data integrity - maintain the integrity of the data by ensuring that any changes to the database are permanent and cannot be lost
Concurrency control - help to manage multiple transactions occurring concurrently by preventing interference between them.
Recovery - in case of any failure or crash, the system can recover the data up to the point of failure or crash.
Disadvantages of ACID in DBMS
Performance - can cause a performance overhead in the system, as they require additional processing to ensure data consistency and integrity.
Scalability - may cause scalability issues in large distributed systems where multiple transactions occur concurrently.
Complexity - can increase the complexity of the system and require significant expertise and resources.
What is the importance of ACID properties?
ACID properties ensure data consistency, integrity and reliability in DBMS. They manage multiple transactions that are occurring concurrently and help to recover data if any system failures occur. Thus, they play an important role.
How can we achieve Atomicity in transactions?
To achieve atomicity, a transaction should be treated as a single unit. If at any point of transaction, execution fails then the whole transaction should be rolled back by undoing the changes made.
What is a data warehouse
Centralized system designed to store process and analyze large volumes of data from multiple sources
Optimized for queries and reporting rather than transactions.
What are some drawbacks of ACID Properties in DBMS?
ACID properties cause performance overhead because of additional processing required for maintaining data integrity and consistency. This also increases the complexity of the system, so we require more significant expertise and resources.
OLTP
Fast reads / writes
Day to day transactions
Real-time, current data
OLAP
Complex queries
Analytical queries and reporting
Historical, aggregated data
Read optimized for querying
Data warehousing architectures
Trad data warehouse
* Centralized storage using ETL before loading
Cloud data warehouse
* Scalable, pay as you go
Data lakehouse
* Combines DWH + data lake (lakehouse on databricks)
Database
OLTP online transaction processing
-structured system for storing and managing real time transactions handling read/writes for day to day
-sql server
-maintaining inventory records in a store
Data warehouse
OLAP online analytical processing
-central repo optimized for storing historical data and performing complex analytical queries. Holds structured data
-red shift, snowflake
-trends on last 5 years, hr report across departments
Data lake
-massive storage holding raw unstructed semi struct and structured data.
-used for data processing, ml
-s3, azure data lake, Hadoop
-storing raw sensor data from IoT devices
-unfiltered ocean of information
Data mart
Subset of warehouse focused on specific department or business unit (sales, hr, marketing)
-provides faster access to relevant insights
-structured data
-hr analyzing employee retention
- Star schema
o One fact connected to multiple dimension tables
o Optimized for fast querying
o Sales fact table
- Snowflake schema
o Normalizes dimensions into sub-dimensions
o Reduces data redundancy but increases query complexity
- Galaxy schema – fact constellation
o Multiple fact tables share common dimension tables
o Complex business scenarios (sales, returns data)
Slowly changing dimensions (SCD) type 1
o Overwrites old data
Slowly changing dimensions (SCD) type 2
o New row is added for every address change with start and end date
Slowly changing dimensions (SCD) type 3
o Limited history (prev_addr is added to track only the last change)