Summarising & Analysing Data Flashcards
(23 cards)
Big Data
- mass of data that society creates every year
- extends beyond traditional data created by companies
- social networking sites, internet search engines, mobile devices
What are the main characteristics of big data?
Volume
- created and stored due to advances
Velocity
- real time data, timeliness is key
Variety
- structured or unstructured
Value
- insights gained add value
Veracity
- truthfulness, careful of hidden biases
Structured Data
- contained within a field or data record
- easy to analyse, store, search
- in standard format or in specific location within data
- rows/columns
- expiry date on card
Semi- Structured Data
- doesn’t reside in fixed field but contains some properties that can be organised/analysed
- email- content is unstructured but info stamps are structured
Unstructured Data
- not easily contained within data fields
- video, audio, images
- difficult to analyse, manage, search
Data Analytics
- process of collecting/examining data
- to extract meaningful business insights
- used to inform decision making
Descriptive analysis/analytics of data
- summarises or describes what the data shows
Inferential Analysis of data
- makes predictions about a population based on sample
What are the key effects of big data on decisions for businesses?
- can be made quickly
- respond earlier to environmental changes/ be more flexible
- decisions based on current situations but still have element of future situations
- based on hard evidence
- outside the box decisions as using all factors
Frequencies of data
- how often data occurs
- can be grouped together into bands/classes if in large set
- then shown in a frequency distribution or table but this means individual values are lost
Grouped Data
- frequency is shown in terms of range
Ungrouped data
- frequency shown in terms of specific measure/value
Arithmetic Mean
adding all observations and dividing by number of observations.
x bar
Advs
- most frequent used/understood
- uses all data
Disadvs
- value may not be in distribution
- can be distorted
- ignores dispersion
Mode
- modal value
- most frequently occurring value
advs
- not distorted by high/low
- actual value in distribution
disadvs
- ignores dispersion
- not use all data
Median
- value of middle member of array
- use n+1/2 to find middle item when data arranged in order
- if even amount will have to find mean of two middle numbers
advs
- not distorted by low/high
- corresponds to actual value in distribution
disadvs
- ignores dispersion
- limited use
Standard Deviation
- measure of dispersion/ spread of data
- measures spread of data around the mean
= v (sum of values x)^2/sum of frequency - mean^2
= square root of variance
advs
- uses all data
- gives weight to values far away from mean
Variance
variance is square of standard deviation
Coefficient of Variance =
= standard deviation/ mean
the bigger = the wider the spread
The Normal Distribution Properties
- probability distribution
- arises frequently in real life
- majority of items lie near to average
- bell-shaped curve on graph
- the mean is mew and each side represents 50% so symmetrical
- at certain points of standard deviation from the mean the area under the curve represents same % of population
z score
- distance from mean in normal distribution measured by number of standard deviations
= value of variable - mean / standard deviation
- can then be looked up in tables to find proportion
Expected Value
- weighted average value of different possible outcomes from decision
- weightings are based on probability of each possible outcome
= sum of probability x outcome/results
What are the limitations of expected value?
limitations
- long run average result and so not appropriate for one off decisions
- heavily dependent on probability distribution
- ignores risk