Organising, Visualising and Describing Data Flashcards

1
Q

3 Classes of Data Types

A
  1. Numerical (quantative) vs Categorical (qualitative)
  2. Time series vs X-Sectional
  3. Structured vs Unstructured
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 2 types of Categorical Data

A

Nominal - no logical order
Ordinal - logical order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Can you perform mathematical operations on categorical data?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between Time Series and x-sectional data?

A

Time series is a set of many observations.
X-Sectional- one specific point in time, a set of comparable observations are made.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Panel Data?

A

Combine x-sectional and time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Unstructured data can be classified according to how the data is generated. Give an example

A

Individuals (social media post)
Processes (withdrawal)
Sensors (Camera)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define the following:
1.Absolute, Relative and Cumulative Frequency

A

nominal, % and adds up to 100
Absolute- histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a joint frequency and what is a marginal frequency ?

A

Joint - data cell of a contingency table (two-dimensional array = a normal table with columns and rows). Basically when the 2 variables (row and column label) occur simultaneously.
Marginal - Total frequency for a row or column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a contingency table and what is a confusion matrix?

A

Table to analyse 2 variables
A confusion matrix is an example of a contingency table. One variable is predicted…. and the other variable is actual… . so shows actual vs predicted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Benefits of a : Histogram, Frequency Polygon, Cumulative frequency distro chart, Bar Chart, grouped bar chart or clustered bar chart, a stacked bar chart

A
  1. Quickly see where the concentration lies
    2.Joins the midpoints of the histogram intervals
  2. can be either relative or absolute
    4.Illustrate RELATIVE sizes/degrees/magnitudes.
  3. can illustrate 2 categories at once (adds another variable)
  4. shows both the cumulative and joint frequency in the same bar
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Benefits and features of: A Tree Map; word cloud; line charts; bubble line chart; scatter plot + scatter plot matrix (3 variables); Heat Map

A
  1. Visualise relative size of categories
  2. Visualise text - categorical data
  3. illustrate time series data. Can plot multiple lines if scale is comparable
  4. adds another dimension to a line chart, each point has a bubble that is in proportion to its variable
  5. shows the relation between 2 variables and the strength of it.
  6. Is drawn off of a contingency table and uses colour to visualise the concentration of data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Place all charts you can think of into the following 3 categories: Relationships, Comparisons and Distributions. Can be more than one

A

Relationships: Scatter/scatter plot matrix, heat maps
Comparisons: Bar chart, tree maps, heat maps, dual line charts, bubble line charts
Distributions: Histogram, frequency polygon, cumu distro charts, bar charts, tree maps, heat maps for categorical data, word clouds for text data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Formula for u(population mean) and X(sample mean) and What is an Arithmetic Mean and its 4 properties?

A

u=sum of all observations/no. of obs
X= same but for sample
An arithmetic mean = sum of observations/no. of obs
1.All interval and ratio data sets have an arithmetic mean
2.all data values are considered and included in the arithmetic mean
3. a data set has only one arithmetic mean
4. the sum of the deviations of each data point in the set will sum to 0. so: sum of data points (Xi-X)=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

2 techniques to deal with the pitfalls of the arithmetic mean

A

Trimmed Mean- a 1% trimmed mean would exclude the top and bottom 1/2%
Windsorized Mean - substitue out data rather than exclude it (doesn’t sound very good)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you calculate Weighted Mean

A

Xw=(w1X1+w2X2+….+wnXn)

e.g. A portfolio consists of 50% common stocks, 40% bonds, and 10% cash. If the return on
common stocks is 12%, the return on bonds is 7%, and the return on cash is 3%, what is the
portfolio return?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When is the geometric mean used? give a general formula for the geometric mean for calculating the returns of a data set

A

To calc investment returns over multiple periods or when mearing compounding growth rates.
1+G=^n√(x1+1).(x2+1) …. (xn+1)

17
Q

What is a harmonic mean and when is it used

A

N/ (sumof) 1/Xi, where there are n values of Xi.
Good for calculating average cost of shares
eg An investor purchases $1,000 of mutual fund shares each month, and over the last three
months, the prices paid per share were $8, $9, and $10. What is the average cost per share?

18
Q

Harmonic<Geometric<Arithmetic (in terms of the result). This fact has resulted in investors claiming benefit of … ?

A

Cost Averaging: buying shares on a regular basis

19
Q
A