Summarising & Analysing Data Flashcards

(23 cards)

1
Q

Big Data

A
  • mass of data that society creates every year
  • extends beyond traditional data created by companies
  • social networking sites, internet search engines, mobile devices
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the main characteristics of big data?

A

Volume
- created and stored due to advances
Velocity
- real time data, timeliness is key
Variety
- structured or unstructured
Value
- insights gained add value
Veracity
- truthfulness, careful of hidden biases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Structured Data

A
  • contained within a field or data record
  • easy to analyse, store, search
  • in standard format or in specific location within data
  • rows/columns
  • expiry date on card
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Semi- Structured Data

A
  • doesn’t reside in fixed field but contains some properties that can be organised/analysed
  • email- content is unstructured but info stamps are structured
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Unstructured Data

A
  • not easily contained within data fields
  • video, audio, images
  • difficult to analyse, manage, search
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data Analytics

A
  • process of collecting/examining data
  • to extract meaningful business insights
  • used to inform decision making
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Descriptive analysis/analytics of data

A
  • summarises or describes what the data shows
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Inferential Analysis of data

A
  • makes predictions about a population based on sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the key effects of big data on decisions for businesses?

A
  • can be made quickly
  • respond earlier to environmental changes/ be more flexible
  • decisions based on current situations but still have element of future situations
  • based on hard evidence
  • outside the box decisions as using all factors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Frequencies of data

A
  • how often data occurs
  • can be grouped together into bands/classes if in large set
  • then shown in a frequency distribution or table but this means individual values are lost
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Grouped Data

A
  • frequency is shown in terms of range
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ungrouped data

A
  • frequency shown in terms of specific measure/value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Arithmetic Mean

A

adding all observations and dividing by number of observations.
x bar

Advs
- most frequent used/understood
- uses all data

Disadvs
- value may not be in distribution
- can be distorted
- ignores dispersion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Mode

A
  • modal value
  • most frequently occurring value

advs
- not distorted by high/low
- actual value in distribution

disadvs
- ignores dispersion
- not use all data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Median

A
  • value of middle member of array
  • use n+1/2 to find middle item when data arranged in order
  • if even amount will have to find mean of two middle numbers

advs
- not distorted by low/high
- corresponds to actual value in distribution

disadvs
- ignores dispersion
- limited use

17
Q

Standard Deviation

A
  • measure of dispersion/ spread of data
  • measures spread of data around the mean

= v (sum of values x)^2/sum of frequency - mean^2

= square root of variance
advs
- uses all data
- gives weight to values far away from mean

18
Q

Variance

A

variance is square of standard deviation

19
Q

Coefficient of Variance =

A

= standard deviation/ mean

the bigger = the wider the spread

20
Q

The Normal Distribution Properties

A
  • probability distribution
  • arises frequently in real life
  • majority of items lie near to average
  • bell-shaped curve on graph
  • the mean is mew and each side represents 50% so symmetrical
  • at certain points of standard deviation from the mean the area under the curve represents same % of population
21
Q

z score

A
  • distance from mean in normal distribution measured by number of standard deviations

= value of variable - mean / standard deviation

  • can then be looked up in tables to find proportion
22
Q

Expected Value

A
  • weighted average value of different possible outcomes from decision
  • weightings are based on probability of each possible outcome

= sum of probability x outcome/results

23
Q

What are the limitations of expected value?

A

limitations
- long run average result and so not appropriate for one off decisions
- heavily dependent on probability distribution
- ignores risk