Beyond Relational Databases Flashcards
(22 cards)
What is an ETL process?
Extract -> Transform -> Load
Obtain data dump from the source system
Add value, initial data modelling, reconciliation, (pseud/an)onymisation
Import data into warehouse database
What is a data warehouse?
A data management system consisting of a series of hard drives that store, and enable analysis and reporting of, large amounts of data
OLAP database
What is an example of a data warehouse?
Accumulation of data from multiple departments or hospitals to be able to analyse and find trends, e.g. PACS, dose management software
What are the two types of databases?
- OLTP
- OLAP
What is OLTP?
On-Line Transaction Processing
Normalised
- Database used for ‘line of business’ applications
- Standard relational database
- Normalised
- ACID compliant
- Minimise redundancy
- Each piece of information only stored once
- Quick to work on and update multiple pieces of information simultaneoulsy
- General structure of operating systems
What is OLAP?
On-Line Analytical Processing
Denormalised
- Database used for ‘data warehousing’
- Not necessarily relational database
- De-normalised
- Not necessarily ACID compliant
- Optimised for reporting
- Orientated around FACTS of interest
- Designed so FACTS can be updated quickly
- DIMENSIONS change only slowly
What is ACID?
Atomicity
Consistency
Isolation
Durability
Updates to databases based around concept of transactions
What are FACTS?
- Something we can measure e.g. 1cGy delivered to target
- Something that happened e.g. patient died
What are DIMENSIONS?
- Information about the fact
e.g. patient information: ID, DOB, Sex
planning protocol, doctor, date of treatment
What is granularity?
When something should be recorded e.g. in radiotherapy treatment, each episode, each day, each control point, each plan, each beam
What are the data types?
Structured
Unstructured
What is a structured data type?
- Numbers, results of calculations
- Codified text
- Anything standardised
What is an unstructured data type?
- Free text
- Hand written/dictated documents
- Images
- Tumour/OAR outlines
What is a database?
A large amount of information stored in a computer system in such a way that it can be easily looked at or changed
What is a flat file database?
A straightforward list (single table) containing all the data in one place
What are the disadvantages of a flat file database?
Increased chance of duplication
Potential for non unique records
Harder to update
Inefficient use of memory
Poor for complex enquiries
Poor at limiting access
What is a relational database?
Data is organised into multiple tables with the records within the tables linked (related) to records in other tables by way of common fields
What is the advantage of a relational database?
Data only stored once
Easier modification
Efficient storage
Capable of complex enquiries
More secure - can limit access to certain parts of data
What is a database engine? Give an example of a database engine software
Underlying piece of software that performs all the reading/creating/updating/deleting of the database
Microsoft access
What is a primary key?
One field needs to be selected to identify the unique nature of each record in each table - use of an ID number for each entry
Unique identifier acting as a reference when relating any of the record’s fields to another table/query
What is a foreign key?
A primary key from one table which is also in a related table
What is a common field in a database?
A field that appears in more than one table in a database