Beyond Relational Databases Flashcards
What is a database?
A large amount of information stored in a computer system in such a way that it can be easily looked at or changed.
What are the two types of ‘traditional’ database?
OLTP: on-line transaction processing
OLAP: on-line analytical processing
What is the purpose of a database?
To organise and store information in a structured way, making it easily accessible for retrieval and analysis.
What are OLTP databases used for?
Line of business applications
What are OLAP databases used for?
Data warehousing
Are OLTP and OLAP standard relational databases?
OLTP: yes
OLAP: sometimes
Are OLTP and OLAP databases normalised?
OLTP: normalised
OLAP: denormalised
Are OLTP and OLAP databases ACID compliant?
OLTP: yes
OLAP: not necessarily
What does ACID mean in relation to databases?
Atomicity: each transaction is treated as a single, indivisible unit.
Consistency: data must adhere to all defined constraints and rules, ensuring data accuracy and integrity.
Isolation: concurrent transactions do not interfere with each other.
Durability: once a transaction is committed, the changes are permanently stored and survive system failures or crashes.
These properties ensure the reliable processing of transactions.
How do OLTP databases operate?
They have minimised redundancy as each piece of information is only stored once (normalised), making it quick to work on and update multiple pieces of information simultaneously. This is the general structure of an operating system.
How do OLAP databases operate?
They are optimised for reporting and are oriented around FACTS of interest. These FACTS can be updated quickly, but the DIMENSIONS of the database change slowly.
What is a FACT in regards to databases?
Something we can measure (e.g. 1 cGy was delivered to a target) or something that happened (e.g. a patient died).
What is a DIMENSION in regards to databases?
Information about a FACT (e.g. patient information, planning protocol, date of treatment).
What are the FACTS in radiotherapy databases?
- CT series dose report generated
- CT slice acquired
- Plan prepared
- Treatment history recorded
- Treatment sessions delivered
- Images acquired
What are the DIMENSIONS in radiotherapy databases?
- Patient
- Date
- Time of day
- Equipment
- Operator
- Modality
- Scan protocol
- Slice details
- Site
- Planning technique
- Mould room technique
- Treatment technique
- Course
- Patient ID
Describe the general structure of a database
ETL = extract, transform, load
Decribe the strucure of an imaging dose database
What is ETL?
Extract, transform, load. It is the process used to collect, clean, and prepare data from multiple sources for analysis and storage in a data warehouse or other target system.
How often is the ETL process run?
Nightly (typically)
Describe each stage of the ETL process
Extract: data dump is obtained from the source system
Transform: value added, initial data is modelled, reconciliation, (Pseudo/An)onymisation
Load: import data into warehouse database
What are the two data types?
Structured data (anything standardised)
Unstructured data
Give 3 examples of structured data
- Numbers
- Results of calculations
- Codified text
Give 4 examples of unstructured data
- Free text
- Hand written/dictated documents
- Images
- Tumour/OAR outlines
Give 5 examples of data analysis tools
- Straight SQL queries (Excel, Access, Matlab)
- Business reporting tools
- Data exploration tools
- ‘Big Data’ tools
- Data mining tools