dbs Flashcards

Question

4V characteristics of Big Data

Answer 1

Volume: Refers to the large scale of data, typically measured in terabytes, petabytes, or even larger, that exceeds the capacity of traditional data processing systems. Velocity: Represents the high speed at which data is generated, collected, and processed, requiring real-time or near real-time analytics to extract timely insights. Variety: Encompasses the diverse types and formats of data, including structured, semi-structured, and unstructured data from various sources, posing challenges in data integration and analysis. Veracity: Focuses on the quality and reliability of data, ensuring data accuracy, consistency, and trustworthiness in the context of Big Data analytics.

Answer 2

Read and Divide: The input dataset is divided into smaller runs that fit into the available system memory. Sort Each Run: Each run is individually sorted. Merge Runs with Priority Queue: During the merge phase, a priority queue is used to select the smallest element from the current front of each run. The selected element is written to the output file, and the corresponding run advances to the next element. Memory Management: System memory holds portions of the runs being merged. Swapping occurs as pages fill or empty to efficiently utilize memory. Final Merge: If needed, the merge process continues recursively until a single sorted output file is obtained.

Answer 3

Basically Available: This implies that the system does guarantee availability of the data as per the CAP theorem, but there may be occasional breakdowns; for example, data may be available but not necessarily up-to-date for all users. Soft state: This means the state of the system could change over time, even without input due to eventual consistency or other factors. Eventually consistent: This implies the system will become consistent over time, given that the system doesn't receive input during that time.

Answer 4

Atomicity: This property ensures that a transaction is treated as a single, indivisible operation, which either succeeds completely, or fails completely; for instance, in a banking system, a fund transfer should either transfer the complete amount or none at all. Consistency: This means that a transaction brings the database from one valid state to another, maintaining database invariants; for example, in a school database, the total number of students enrolled should decrease by one when a student drops out. Isolation: This property ensures that concurrent execution of transactions leaves the database in the same state as if the transactions were executed sequentially; for instance, simultaneous withdrawals from a bank account should not result in an incorrect balance. Durability: This guarantees that once a transaction has been committed, it will remain committed even in the case of a system failure; for example, once a purchase is finalized in an online shopping system, the purchase record remains persistent even if the system crashes afterwards.

Answer 5

BASE Schema-less Data: NoSQL databases do not require a predefined schema, allowing for the storage of diverse and complex data structures; for instance, MongoDB can store documents in flexible, JSON-like structures. Scalability: NoSQL databases are designed to scale out by distributing data across multiple servers; for example, Cassandra can distribute data across many nodes in a cluster to handle large amounts of data. Speed: NoSQL databases are often optimized for specific data models (key-value, document, column, graph) for quick data access and manipulation; Redis, a key-value store, provides high speed data operations due to its in-memory nature. Flexibility: NoSQL databases can easily accommodate changes in data and queries, often without downtime; CouchDB, a document store, allows changes to data fields without affecting existing data.

Answer 6

První normální forma (1NF): Tabulka je v 1NF, pokud každá buňka obsahuje jedinou a jedinečnou hodnotu a záznamy ve sloupci jsou stejného druhu. Druhá normální forma (2NF): Tabulka je v 2NF, pokud je v 1NF a všechny neklíčové atributy jsou plně závislé na primárním klíči. Třetí normální forma (3NF): Tabulka je v 3NF, pokud je v 2NF a neexistuje žádná tranzitivní závislost mezi neklíčovými atributy.

Answer 7

In order for a pair of database operations in a history to not have conflicting operations, the following conditions must be met: Both operations are performed by different transactions. Operations performed by the same transaction are not considered conflicting as the operations within a single transaction are totally ordered. At least one of the operations is a write operation. If both operations are read operations (read-read), then they do not conflict because neither operation changes the state of the database. Both operations access the same data item. Operations that are accessing different data items cannot conflict because each operation is operating independently of the other. So, for a pair of operations to not be in conflict, they would either need to be executed by the same transaction, or they would both be read operations, or they would be operating on different data items. Conflicting operations come in three variants: Write-Read (WR) Conflict: This occurs when a write operation by one transaction is followed by a read operation by another transaction on the same data item, and the read operation reads the value written by the first transaction. Read-Write (RW) Conflict: This occurs when a read operation by one transaction is followed by a write operation by another transaction on the same data item, and the write operation changes the value that was read by the first transaction. Write-Write (WW) Conflict: This occurs when a write operation by one transaction is followed by a write operation by another transaction on the same data item. The second write operation could overwrite the value written by the first transaction.

dbs Flashcards

(31 cards)