Database Concept Flashcards
(35 cards)
Design a relational database
two sub-system: storage(file system) and management software
management software:
storage manager (block read in/write)
cache
SQL parser
log system (for undo/redo)
authorization management (roles)
Disaster recover system
indexing (fast read)
lock management (for concurrence)
why use index
Indexes are created to speed up the data retrieval and the query processing operations from a database table or view, by providing swift access to the database table rows, without the need to scan all the table’s data, in order to retrieve the requested data.
what column(s) can be index
The column or columns have a high degree of uniqueness
You frequently need to look for a certain value or range of values for the column.
index’s data structure
B+ tree
other candidates: binary search tree, B tree, Hash table
dense index and sparse index
For MySQL’s two storage engines, myisam use sparse index, innodb has one and only one dense index: if primary key is defined, it is the dense index; if not, the first non-empty index will be chosen to be the dense index; if neither, innodb will generate a hidden primary key for us.
In the dense index, there is an index record for every search key value in the database. This makes searching faster but requires more space to store index records itself. Index records contain search key value and a pointer to the actual record on the disk.
In the sparse index, index records are not created for every search key. An index record here contains a search key and an actual pointer to the data on the disk. To search a record, we first proceed by index record and reach the actual location of the data. If the data we are looking for is not where we directly reach by following the index, then the system starts sequential search until the desired data is found.

Why not use a binary search tree to store index
Since each node will only contain two children, the average lookup time will be O(logN). That means O(logN) times of I/O, which is too much – even slower than a full table scan.
B+ tree as index’s data structure
- non-leaf node’s number of child = number of keywords
- non-leaf node’s child pointer P[i] -> sub-tree with keywords[K[i], K[i+1])
- the non-leaf node only store index, data records are all in leaf nodes
- all leaf nodes have a pointer to next leaf nodes

Why use B+ tree as the DS for index?
- low I/O cost, internal nodes contains only indexes, a bulk read in contains more indexes which in turn reduce the total times of read
- search time cost is stable
- better for full-table scan (scan all leafs)
Why not use hash index for database?
- time complexity is better
- However, it only supports ‘=’, not range query
- does not keep the order, thus need constant sorting operation
- query by composition key is very complex for hash-based method
- hash collision will cause unnecessary table scan
- large hash collision will reduce the time efficiency
Bitmap for index?
yes, oracle is using it.
not good for high concurrency, thus, it is an option for OLAP, but not for OLTP
How to optimize slow query
identify slow query through slow query log
use EXPLAIN ANALYZE e.g. tool to analyze the query
revise query to let it exploit more with indexes
Is using primary key always faster than using other indexes?
No.
For example, when count(id), it probably will not use id(primary key) to aggregate, it will choose a sparse index, because a sparse index store only key and value in its nodes, it could read in more records in one batch each time/ eliminate more records this way, which in turn will speed up the query.
Script command related to tuning/optimalize query
- show variables like ‘%query%’ // find slow_query related var
- show status like ‘%slow_queries%’ // show the number of slow queries executed within this session
- explain ########query content########## //explain analyze // type=all or index: need to improve | extra=Using filesort or temporary : can’t use index
If the above 3 showed slow query, change the script to use other field to query or alter the table to add the current field as index
sometimes Mysql will choose a non-primary key to query to achieve better performance
If you want to force the query to use other index:
select count(id) from table1 force index(primary)
Composite Index
Like a single index, a composite index is also a data structure of records sorted on something. But unlike a single index, that something is not a field, but a concatenation of multiple fields.
Key related SQL command
PRIMARY KEY(‘id’),
UNIQUE KEY ‘account’ (‘account’),
KEY ‘index_area_title’ (‘area’,’title’),
KEY ‘idx_name’(‘name’)
Left-prefix rule of composite index
The left-prefix rule says a query can search an index efficiently if it provides search terms (such as values in a WHERE clause) that match the leading columns of the index, in left-to-right order.
mySQL will continue its matching from left to right until range operator(>, <, between, like)
key(a,b,c,d): where a=1.b=2.c>5,d=4 will not use index
key(a,b,d,c) where a=1.b=2.c>5,d=4 will use index
d=4 will be move upper stream by MySQL to fit the index
‘in’ and ‘=’ could be not in order
example:
table: KEY ‘mycomkey’ (A,B)
query 1: select * from table1 where A=1,b=2
query 2: select * from table where A=1
query 3: select * from table where B=2
query1 and 2 will use index mycomkey, query 3 will do whole table scan. because the left prefix rule, B=2 without A will not provoke the index
A and B’s order in the WHERE clause is not a matter, MySQL will change the order for you to fit the index
is index the more the better
no.
- The small table should not use an index, because the overhead of establishing and store the index system will exceed the efficiency gain compared to a whole table scan
- Update table will update the index, more index meaning more cost in maintaining the index system
- More index means more spatial cost
What is the difference between MyISAM’s and InnoDB’s internal Locking
MyISAM’s default is table-level locking, not support row-level locking;
InnoDB’s default is row-level locking, and support table-level locking
read lock
A READ lock has the following features:
- A READ lock for a table can be acquired by multiple sessions at the same time. In addition, other sessions can read data from the table without acquiring the lock.
- The session that holds the READ lock can only read data from the table, but cannot write. And other sessions cannot write data to the table until the READ lock is released. The write operations from another session will be put into the waiting states until the READ lock is released.
- If the session is terminated, either normally or abnormally, MySQL will release all the locks implicitly. This feature is also relevant for the WRITE lock.
read lock is generally shared lock but you can force mutex lock like:
select * from table1 where id between 1 and 300 _for update_
write lock
A WRITE lock has the following features:
- The only session that holds the lock of a table can read and write data from the table.
- Other sessions cannot read data from and write data to the table until the WRITE lock is released.
script to add a shared lock
lock in shared mode
MyISAM is suitable for what case
- frequent full table count(*)
- read-intensive, write-low-intensive
- no transaction
InnoDB is suitable for what case
- read and write-intensive
- high requirement for reliability, support transaction
different type of lock
level: row-level, table-level, page-level
shared lock and mutex lock
auto-lock and explicit lock
DML lock DDL lock
Optimistic lock pessimistic lock