All Flashcards

1
Q

Define Statistics

A

The art, language and science of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is synonymous with Domain Knowledge

A

Business/context understanding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define Data

A

The raw, unorganised facts used in analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define Information

A

Data which has been processed to make it useful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define Knowledge

A

Understanding of the information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

List three common data formats

A

CSV
XML
RTF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define Open Data

A

Data which may have no copyright or referencing requirement. E.g open-source software like R.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define Public Data

A

Data within the public domain. Free to use, but still has ownership and restrictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define Proprietary Data

A

Opposite of public data. Private IP of a company.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define Operational Data

A

Used in the day-to-day activities of a business, e.g. customer records.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define Administrative Data

A

Data used to make informed decisions, often the subject of analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define Structured and Unstructured Data

A

Structured data has a well defined model. It’s easy to tabularise.

Unstructured data has no defined model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Types of Quantitative Data

A

Discrete/categorical are numeric variables which can only take specific values, which can be counted between.

Continuous is data which can take any value within the interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Types of Qualitative Data

A

Nominal is label data with no order.

Ordinal is label data which can be ordered.

Binomial is a binary data label, e.g. TRUE/FALSE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the stages of the Data Lifecycle?

A
Created
Initial storage
Archived 
Obsolete
Deleted
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do Databases and Structured Data relate?

A

A database is a repository of structured data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a Relational Database?

A

A large grouping of schemes, tables, queries, reports, views and other elements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Explain Tables in the relational model

A

In the relational mode, every relation must have a header (columns) and body (rows).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define Keys

A

Designated columns within a table with which the data can be ordered and linked.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are some examples of Semi-Structured data?

A

XML and csv are technically semi-structured, as some processing is required to get them into table form.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Define Big Data

A

Sets of data which are beyond the capabilities of traditional data processing software. They must be analysed computationally.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the four Vs of Big Data?

A

Volume
Variety
Velocity
Veracity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are Requirements?

A

The constraints placed on an analysis project, usually determining the data to analyse. Aims to establish the purpose of the project.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is Explicit Knowledge?

A

Knowledge that can easily and swiftly be articulated to other people and is usually stored somewhere.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is Tacit Knowledge?
Knowledge that cannot be readily articulated to other people, may be assumed and may not be stored.
26
What is Elicitation?
A proactive activity, where the analyst initiates conversations with stakeholders to gain an understanding of the problem.
27
What are some techniques of Requirement Elicitation?
Interviewing Observing Recounting Apprenticing
28
What is Recounting
The method of having multiple stakeholders articulate their requirements. Aims to identify misunderstandings, assumptions and reach consensus.
29
What is the difference between Requirements Elicitation and Gathering?
Requirements gathering is a reactive activity - data exists and must be collected and analysed. Elicitation is a proactive activity. The analyst initiates conversations with stakeholders to gain an understanding of their problem.
30
What are some Elicitation challenges?
Problems of scope - customers give ill-defined or unnecessary requirements. Problems of volatility - requirements change over time. Problems of understanding - customers unsure of what is needed and the capabilities in their computing environment.
31
What are some Elicitation solutions
``` Visualisation Consistent language Guidelines Consistent use of templates Documenting dependencies ```
32
What are the Elicitation guidelines?
Assess business + technical feasibility. Identify requirement specifiers and their bias. Define technical environment. Identify domain constraints Select 1+ Elicitation techniques. Encourage participation from many stakeholders. Identify ambiguous requirements for prototyping. Use usage scenarios to help customers better identify their key requirements.
33
What is the difference between Validation and Verification?
Validation judges the accuracy of something, eg 50% of company records are compliant. Verification is concerned with meeting standards in absolute terms, eg the company records are not compliant.
34
Define the types of Data Models
Conceptual - high-level mappings of database elements and the relationships between them. Identifies info to collect, attributes and class relationships. Logical - converts business requirements into a model. Revolves around customer need, rather than technical needs. eg a flow diagram. Physical - a full server model diagram, showing the detail of the database. Shows constraints, eg keys and check constraints.
35
Define Check Constraints
Check whether an attribute meets a certain requirement.
36
Define Quality
The standard of something when compared to other things of a similar kind. For data, quality doesn't need to be perfect - just high enough for the specific analysis.
37
What are the 8 principles of the Data Protection Act?
Used fairly and lawfully. Used for limited, specifically stated purposes. Used in a way that's adequate, relevant and not excessive. Accurate. Kept for no longer than absolutely necessary. Handled according to people's data protection rights. Kept safe and secure. Not to be transferred outside the EEA.
38
Under the Data Protection Act, for what do stronger legal protections exist?
Race, ethnic background, political opinions, religious beliefs, TU membership, genetics, biometrics, health, sexual orientation.
39
What are the 8 rights under GDPR?
``` Right to be informed. Right of access. Right of rectification. Right of erasure. Right to restrict processing. Right to data portability. Right to object. Right in relation to automated decision making and profiling. ```
40
Which acronym gives the fundamentals of Data Security?
CIA Confidentiality Integrity Availability
41
What are the reasons for Dirty Data?
``` Data is missing. Data is incorrect. Incorrectly formatted. Entered into wrong fields. Stale (out of data). Missing links, eg relationship. Duplicated. ```
42
What are the sources of Data Error?
Completeness - does not capture the entire problem. Uniqueness - no duplicates. Timeliness - data is available when expected and needed. Accuracy - data reflects reality. Consistency - providing the same data for the same data object. Conformity - the data follows the required format.
43
How can Data Error be avoided?
Process - Put greater controls around data creation. Entry - have independent checking of drop-down lists to ensure correct data entry. Identification - searching for errors in data. Validate - automatically or manually check accuracy in data.
44
What are the steps in the Data Analysis Process?
``` Problem hypothesis Identify what to measure Collect data Cleanse data Model data Visualise data Analyse data Interpret results Document/communicate results ```
45
Define a Hypothesis
A possible explanation for something, which serves as a starting point for further investigation.
46
What's the difference between H0 and H1?
Null hypothesis is the default assumption, that nothing has changed. Alternative hypothesis is the prediction you make, can be considered the case if H0 is disproven.
47
Define Data Accessibility
Data in a format that is easy to handle/manage. Similar to data quality.
48
Define Data Extraction
Adding further structure to data. Yields usable data from unstructured data.
49
What are the types of Data Cleansing?
Filtering - data is included based on a Boolean condition. Interpolation - using other data points to fill in the gaps. Masking - hides certain data from view by unauthorised people, but still allows analysis to occur. Blending - Combining data from different sources into a single dataset. May be warehoused. Transformation - changing data from one format/structure to another.
50
What is an ETL process?
Extract Transform Load ``` Define the source. Define the target. Define the mapping. Create the session. Create the workflow. ```
51
Define Data Models
Mathematical abstractions of reality. Seek to capture relationships between variables. Date = Model + Error
52
Explain Inferential Statistics
A branch of statistics which quantifies relationships (rather than descriptive statistics). Correlation quantifies strength of linear trend. Hypothesis testing asses the significance of patterns in data. Regression analysis models trends.
53
What are some types of Data Visualisation?
Infographics Time series Part to whole Geospatial
54
Define Data Analysis
Deriving insight and meaning from data. Includes assessing trends and correlations.
55
Define a Variable (data structure)
A reference to a particular location in a computer's memory (an address).
56
Define an Array (data structure)
A sequence of slots of memory, where each slot contains an element (value or object). Deleting and inserting can be slow - it will change the address of all elements in the array.
57
Define a List (data structure)
Similar to arrays, but permit elements of more than one data type. Values can be inserted/deleted without changing the address of other elements.
58
Define a Class (data structure)
A data structure containing data fields. It offers a blueprint defining the variables common to an object.
59
Define a Tree (data structure)
Shows a hierarchical data structure. The top node is called the root. Faster than arrays when inserting and deleting, but slower the linked lists.
60
Define a Record (data structure)
A value that contains other values. Aka a tuple of struct. (A row in a table).
61
Define a Schema
A database design including conceptual, logical and physical considerations.
62
What are the types of Schema?
Conceptual schema - a representation of an organisation, showing the entities, attributes and relationships. Logical schema - the natural successor, articulates data structures, eg tables, objects and shows relationships. Physical schema - successor to the logical schema, includes precise detail on the database structure.
63
What is a Relational Database?
It breaks data into multiple tables. Tables linked through primary and foreign keys.
64
What is a Relational Database?
It breaks data into multiple tables. Tables linked through primary and foreign keys.
65
What is a Flat File Database?
Before relational, all data was stored in a single table (eg a spreadsheet).
66
What is a Hierarchical Database?
Organised into a tree structure. Parent records can have many child records. Each child record can have one parent record. Still widely used for certain functions.
67
What is a Network Database?
Aims to boost the flexibility of hierarchical databases by allowing many-many relationships between records. Still less flexible than relational.
68
What is an Object Orientated Database?
Info on each entity stored within a single object. Eg each customer had an object to store their own file info.
69
What is a Multi-dimensional Database?
Data visualised as a collection of cubes. Includes data cubes and hyper cubes (more than three dimensions).
70
What is a NoSQL Database?
Not only SQL database. Came from the need to have large scale, clustered databases. Useful for unstructured data.
71
What are the types of NOSQL Database?
Document store - stores semi-structured data by allowing Devs to update code without refering to a central schema. eg JSON and XML. Wide-column store - organised data into columns rather than rows. Each column has lots of info on the same entity. Can be faster to query large volumes. Graph store - data stored in nodes, rather than traditional records. Node connections known as edges.
72
Define Normalisation
The process of organising tables (and their columns) in order to improve data integrity.
73
What are the two types of Anomalies?
Insertion anomalies - describes when data cannot be added into the table. Deletion anomalies - describes attributes being lost when other attributes are deleted.
74
What are the three Normal Forms?
First normal form - stored in a relational table with no multi-valued columns. Second normal form - all columns depend on the tables primary key. Third normal form - no column has transitive dependency on primary keys.
75
Define Data Warehousing
Data stored ready to be dispatched/used.
76
Explain four Database Maintenance techniques
Log file maintenance - log files contain a history of every transaction against the database. Log files are a form of redundancy (they're data additional to the actual data). Data compaction - frees up unused space for new data, but doesn't necessarily reduce the size of the database file. May require downtime. Defragmentation - identifies data that is related, and relocated it to the same physical location to improve performance. Integrity checks - looks for problems with data that may cause corruption or other problems. Eg a virus scan.
77
What is a Canonical data model?
Provides a high-level view of entities and their relationships across an organisation.
78
Define Data Architecture
The set of rules, policies, standards or models set by the organisation that govern the use of its data. It's a business process, rather than technical primarily.
79
Define Data Policies, Standards and Rules
Data policies - a broad framework for how decisions should be made regarding data. Data standards - provide detailed rules on how to implement data policies. Data rules - provide specific instructions on how to implement data standards.
80
Define Data Migration
The transfer of data from storage/computing environment to another.
81
Define Data Integration
Combining data from different sources to provide a unified view.
82
What are the four features of Database Architecture?
Database design, data warehousing, migration and integration.
83
Define Domain Context
Understanding of the business environment the data is in.
84
Define Decision Analytics?
Using visual data techniques to support choices or decisions made by people.
85
Define Descriptive Analytics
Focusses entirely on the understanding of historical data. Can inform decision-making.
86
Define Predictive Analytics
Using historical data to understand or predict the future and inform decisions.
87
Define Prescriptive Analytics
The integration of predictive analytics into business systems. Seeks to identify what will happen, when and why.
88
What is a Functional Requirement?
It describes a feature which the solution should have.
89
What are the steps of the ETL process?
``` Define the source Define the target Create the mapping Create the session Create the work flow ```
90
What is data validation?
The process of ensuring a program operates on clean, correct and useful data.