data exploration Flashcards

(34 cards)

1
Q

data does not

A
  • speak for itself
  • it can be biased and is not objective (based on how selected)
  • the people behind it interprets the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

answers/results depend on..

A

question to solve and perspective

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

types of data sets

A
  • cross-sectional
  • time-series
  • panel
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

cross-sectional

A
  • many subjects/variables, one point in time
  • eg sales, expenses, profit
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

time-series

A
  • one subject/variable, many points in time
  • eg sales over time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

panel

A
  • many subjects/variables, many points in time
  • eg sales, expenses, profit over time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

dimensions of data quality

A
  • completeness
  • consistency
  • conformity
  • accuracy
  • integrity
  • timeliness
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

completeness

A

comprehensive and meets expectations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

consistency

A

across all systems/sourced from different places reflects the same information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

conformity

A

follows set of standard data definitions like data type, size and format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

accuracy

A

correctly reflects the real world object OR an event being described

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

integrity

A

all in a database can be traced and connected to other data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

timeliness

A

information is available when it is expected and needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

first two steps of data cleansing/processing

A
  • sourcing raw data
  • technically correct data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

sourcing raw data

A

What do we want and need to achieve?
What data will support this outcome?
How can we source it and ensure it is of a high quality?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

technically correct data

A
  • when can be directly recognised as belonging to a certain variable
  • is stored in a data type that represents the value domain of the real-world variable
17
Q

data issues

A
  • formatting/data type
  • missing values
  • outliers
18
Q

formatting/data type

A
  • sex; Male, M, Boy
  • month; January, 1-Jan, 1
19
Q

missing values - listwise deletion

A

remove records with missing values in any variable

20
Q

missing values - mode/median/mean imputation

A
  • mean for continuous variables
  • median for skewed continuous variables
  • mode for categorical variables
21
Q

missing values - model imputation

A
  • interpolate/extrapolate
  • use regression model to predict missing value
22
Q

outliers - drop outlier record

A

completely remove record to avoid severe skewness

23
Q

outliers - winsorisation

A
  • cap your outliers data
  • limit extreme values in statistical data to reduce effect of possibly spurious (false) outliers
24
Q

outliers - imputation

A
  • assign a new value
  • mean or regression
25
data privacy
claim of individuals, groups, and institutions to determine for themselves, when, how, and to what extent information about them is communicated to others
26
data privacy principles
- notice - choice and consent - use and retention - access - protection - enforcement and redress
27
notice
inform users about privacy policy/protection procedures
28
choice and consent
consent from individuals about collection, use, disclosure, and retention of information
29
use and retention
data is retained/protected according to law or business practices required
30
access
provide access to individuals to review, update, and modify data about personal information
31
protection
data is used only for purpose stated
32
enforcement and redress
provide channels for individuals to report, provide feedback, or complain
33
ethics of data security
- managing quality personnel to address ethical issues - perceived potential conflict of interest also exists relative to ethical behaviours and technical knowledge
34
Australian Security Principles protects against
- misuse - interference - loss - unauthorised access, modification, disclosure