Data Flashcards

Question

Properties of high quality data

Answer 1

1. Relevance 2. Accuracy 3. Timeliness 4. Accessibility and clarity of results 5. Comparability 6. Coherence 7. Completeness

Answer 2

1. Prevention: keep bad data out (most) 2. Detection: look for bad data already entered 3. Repair: let the bad data find you and fix things (least)

Answer 3

- Edit data before it enters the database to prevent issues instead of fix them - Encourage staff to improve the data management process - Improve the data collection instrument - Improve the data collection method - Build in Redundancy

Answer 4

- Deterministic tests - Probabilistic tests to detect outliers - Exploratory data analysis - Frequency counts - Two-way tabulations - Record Linkage

Answer 5

- Range Test - If-Then Test - Ratio Control Test - Zero Control Test - Internal Consistency Test

Answer 6

Data quality should: 1) be a regular item on the Management Board’s agenda 2) receive ongoing priority attention within the organization 3) be a structural component of operational management 4) be applied to the reporting and operational processes

Answer 7

Pro: easy to understand Con: not a coherent risk measure Con: doesn't describe the tail of the distribution well Con: considers upside and downside risk equal Con: underestimates risk if the underlying distribution is leptokurtic (thicker tails than normal dist)

Answer 8

The relationship between variables (like risks)

Answer 9

1) Pearson's Rho 2) Spearman's Rho 3) Kendall's Tau 4) Tail Correlation

Answer 10

The linear correlation coefficient. A value of 0 may not mean 0 correlation, only 0 linear correlation.

Answer 11

- Equal to Pearson's Rho in the uniform distribution. - Only the order of observations matters. Spearman's Rho is independent of the statistical distribution.

Answer 12

Calculated by comparing pairs of data points. If X and Y both in/decrease from one data point to another, the observations are concordant. If not, they are discordant.

Answer 13

Correlation between 2 variables may not be constant for all values. We can measure the correlation of the tails.

Answer 14

- If the variables are numeric and you care about the linear relationship specifically, choose Pearson's. - If any variables are ordinal, choose Spearman's or Kendall's. - If you care about the degree of deviation between data points, choose Spearman's. If not, Kendall's.

Answer 15

- Describes the degree of flatness of a distribution - An indication of the likelihood of extreme observations relative to those that would be expected with the normal distribution

Answer 16

- Mesokurtic distribution: normal distribution with kurtosis of 3 - Platykurtic distribution: thinner tails than the normal distribution, so kurtosis less than 3 - Leptokurtic distribution: thicker tails than the normal distribution, so kurtosis greater than 3

Answer 17

It means the distribution has thicker tails than the normal distribution. If it is present and not properly accounted for, then the probability of extreme events will be underestimated.

Answer 18

Often referred to as the difference between the returns on 2 assets

Answer 19

- 2 risks are comonotonic if one can be expressed as an increasing deterministic function of the other - If X lies at its q-quantile, Y will also be at its q-quantile - More than 2 risks can be comontonic

Answer 20

- 2 risks are countermonotonic if one can be expressed as a strictly decreasing function of the other - If X lies at its q-quantile, Y will be at its (1-q)-quantile - More than 2 risks cannot be countermonotonic

Answer 21

For full credibility, there needs to be a sufficient number of observations. Technically, true full credibility can never be achieved. But we say full credibility exists at a given confidence level for a given distance from the expected value. There are 3 types of credibility: classic, Buhlmann, and Bayesian.

Answer 22

Sometimes considered the same as TVaR. While TVaR is the average value in the tail, Expected shortfall is the probability of loss multiplied by the expected loss given that a loss has occurred.

Answer 23

It has many of the same benefits as TVaR, but it has little intuitive meaning.

Answer 24

1. Empirical: sum the losses in the tail and divide by the total number of observations 2. Parametric: (1-alpha)xTVaR 3. Stochastic: apply the empirical approach to the output from the stochastic model

Answer 25

It is the probability that a given extreme loss will occur. The reciprocal of VaR because VaR is the max loss for a given level of confidence.

Answer 26

It generally has the same limitations as VaR, and the assessment of loss if it occurs is not usually a priority. We use VaR and TVaR to determine the loss for which capital is required.

Answer 27

Values are generally higher than an underlying trend at some points in the period and below it in others. To handle this, we can use an ARIMA model because seasonality is essentially an autoregressive process.

Answer 28

1) Immediate: direct immediate causal relationship 2) Time-lagged: delayed causal relationship 3) Feedback: variables interact with each other over time 4) Phase-shift: one variable affects another only after a change has reached a threshold

Answer 29

Experience has shown that in many situations, dependencies in stressed situations are different than they are under normal situations. Ex: Low interest rates, high credit risk, and widespread panic among investors (a financial crisis) can happen in many countries at the same time and quickly deplete a firm's capital

Answer 30

1) Correlation matrix approach 2) Copula approach 3) Structured scenario approach 4) Multivariate distribution approach

Answer 31

Assume linear correlation. Use a correlation matrix or a collection of matrices. Different matrices can be used for different percentiles to reflect higher correlation in tail events.

Data Flashcards

Terms, statistics, properties, management (55 cards)