All Flashcards
Define Statistics
The art, language and science of data.
What is synonymous with Domain Knowledge
Business/context understanding.
Define Data
The raw, unorganised facts used in analysis.
Define Information
Data which has been processed to make it useful.
Define Knowledge
Understanding of the information.
List three common data formats
CSV
XML
RTF
Define Open Data
Data which may have no copyright or referencing requirement. E.g open-source software like R.
Define Public Data
Data within the public domain. Free to use, but still has ownership and restrictions.
Define Proprietary Data
Opposite of public data. Private IP of a company.
Define Operational Data
Used in the day-to-day activities of a business, e.g. customer records.
Define Administrative Data
Data used to make informed decisions, often the subject of analysis.
Define Structured and Unstructured Data
Structured data has a well defined model. It’s easy to tabularise.
Unstructured data has no defined model.
Types of Quantitative Data
Discrete/categorical are numeric variables which can only take specific values, which can be counted between.
Continuous is data which can take any value within the interval.
Types of Qualitative Data
Nominal is label data with no order.
Ordinal is label data which can be ordered.
Binomial is a binary data label, e.g. TRUE/FALSE.
What are the stages of the Data Lifecycle?
Created Initial storage Archived Obsolete Deleted
How do Databases and Structured Data relate?
A database is a repository of structured data.
What is a Relational Database?
A large grouping of schemes, tables, queries, reports, views and other elements.
Explain Tables in the relational model
In the relational mode, every relation must have a header (columns) and body (rows).
Define Keys
Designated columns within a table with which the data can be ordered and linked.
What are some examples of Semi-Structured data?
XML and csv are technically semi-structured, as some processing is required to get them into table form.
Define Big Data
Sets of data which are beyond the capabilities of traditional data processing software. They must be analysed computationally.
What are the four Vs of Big Data?
Volume
Variety
Velocity
Veracity
What are Requirements?
The constraints placed on an analysis project, usually determining the data to analyse. Aims to establish the purpose of the project.
What is Explicit Knowledge?
Knowledge that can easily and swiftly be articulated to other people and is usually stored somewhere.