Databases Flashcards
(66 cards)
What is Big Data?
Data sets so large or complex that traditional systems cannot process them effectively. It includes unstructured and semi-structured formats like images, videos, and logs.
What are the 5 V’s of Big Data?
Volume - enormous size
Velocity - speed at which data is generated and processed (e.g. stock market transfers)
Variety - different formats of data: structured semi-structured and unstructured.
Veracity - the quality and trustworthiness of the data; managed bias, noise, misinformation in data.
Value - the insight and benefits derived from the data.
What is Data Mining?
Principles and steps involved in discovering patterns from large datasets.
What is a Data Warehouse?
A centralised repository designed for structured data, optimised for analytics and reporting. It uses schema-on-write, supports historical analysis an BI tools, and is optimised for SQL-based queries.
What is a Data Lake?
Stores raw data in its native format, supporting structured, semi-structured, and unstructured data. It uses schema-on-read, is scalable and cost effective for large datasets, and supports machine learning and big data analytics.
What is a Data Lakehouse?
Combines the flexibility of a data lake with the analytical capabilities of a data warehouse. It supports both schema-on-read and schema-on-write, structured and unstructured data, and is ideal for machine learning and advanced analytics.
What are some key steps in data cleaning?
Remove duplicates, Handle missing data (fill, drop, or interpolate), Fix structural errors (e.g. inconsistent date formats), Standardise data (convert measurements to the same unit).
What is the Freedom of Information Act (2000)?
Provides any citizen the right to access information controlled by public authorities.
Public authorities must produce an information scheme approved by the information commissioner.
Public authorities must deal with requests for data which is non-personal.
Public have the right to be told when information exists, to secure access to information on request and in the manner requested.
What is the Human Rights Act (1998)?
The right to respect for private and family life, home and correspondence.
What is the Data Protection Act (1998)?
Requires anyone handling personal data to comply with eight enforceable principles of good information handling practice.
What are some of the key principles of the Data Protection Act (1998)?
Data must be fairly and lawfully processed.
Processed for limited purposes.
Adequate, relevant, and not excessive.
Accurate.
Not kept longer than necessary.
Processed in accordance with the data subject’s rights.
Secure.
Not transferred to countries without adequate protection.
What are some of the rights of individuals under the Data Protection Act?
Right to subject access.
Right to prevent processing likely to cause substantial damage or distress.
Right to prevent processing for the purposes of direct marketing.
Rights in relation to automated decision-taking.
Right to compensation.
Right to rectify, block, erase, or destroy inaccurate data.
Right to ask the Commissioner to assess whether the Act has been complied with.
What does GDPR stand for?
General Data Protection Regulation.
What is personal data under GDPR?
Any information relating to an identified or identifiable natural person, including:
HR records
CCTV images
Emails
Confidential opinions
Automated and manual filing data
Even “anonymised” data can often be identifiable.
What is sensitive/special category personal data under GDPR?
Racial/ethnic origin
Political opinions
Religious/philosophical beliefs
Trade union membership
Genetic or biometric data
Health
Sex life/sexual orientation.
What are some key requirements for companies under GDPR?
Implement appropriate technical & organisational measures to ensure and demonstrate compliance.
Maintain relevant documentation.
Implement data protection by design.
Use Data Protection Impact Assessments/Risk Assessments.
Appoint a Data Protection Officer (for big companies).
What are the basic operators in Relational Algebra?
Projection (π)
Selection (σ)
Cross product (x)
Union (υ)
Rename (ρ)
Set difference (-)
What are the derived operators in Relational Algebra?
Join (⋈)
Intersect (∩)
Division (/,÷)
What does the Projection operator (π) do?
Selects specific columns from a relation.
What does the Selection operator (σ) do?
Selects rows that satisfy a given condition.
What does the Cross product operator (x) do?
Combines tuples from two relations in every possible way.
What does the Union operator (υ) do?
Combines tuples from two relations, removing duplicates.
What does the Intersection operator (∩) do?
Returns tuples that are present in both relations.
What does the Set difference operator (-) do?
Returns tuples from the first relation that are not in the second relation.