Relational Algebra, Optimization, and more Flashcards
(24 cards)
Selection condition
sigma, allows us to specify the subset of tuples we’re interested in (similar to SQL WHERE)
Project condition
pi, select a subset of attributes to be returned (similar to SQL SELECT)
Relation expressions
allow us to combine selection and projection together
Union
U, finds the set of tuples that belong to one set, the other, or both
Set difference
-, remove a set of tubles from a second set
Union-compatible relations
two relations must have the same number of attributes, same names of the attributes, and same data type for each matching attribute
Cartesian product
x, produces the set of all tuples by concatenating each tuple (same as cross product), computationally the most expensive operator
Joins
bowtie, join two tables based on some conditions
View
psuedo-table used to hold a result set
Stored Procedures
query you can store in database to be executed, similar to a function
SQL Injection
common attack by entering malicious data via user input
Query optimizer
chooses the most efficient way to execute a query
Logical optimization
rearranges the query via relational algebra (performed first), want to reduce the number of tuples as quickly as possible
Physical optimization
chooses indexes/types of joins (performed second)
Database index
data structure stored separately in memory for improving query operations
Covering index
index that contains all search terms as well as the output attributes
Non-covering index
index that points you to information, but does not store it
Distributed database
databases spread across multiple machines, in either the same or separate locations, approached via replication or duplication
Replication
looks for changes in any database and pushed changes as necessary (NoSQL approach)
Duplication
assigns one database as the master and duplicates that database on the the others any time changes are made to it (RDBMS approach)
CAP
Consistency, Availability, Partition tolerance
CAP - Consistency
all clients see the same data at the same time
CAP - Availability
any client making a request for data gets a response, even if one or more nodes are down
CAP - Partition tolerance
the cluster must continue to work despite any number of communication breakdowns (partitions) between nodes in the system