Extra questions i mess up 2 weeks before exam Flashcards
(287 cards)
t the beginning of the year, Investor A purchased one share of a stock for $84. Investor B made the same investment, but borrowed half the necessary funds at an annual rate of 2.0%. At the end of the year, during which the stock paid no dividends, both investors sold their shares for $82. If inflation was -1.25% over the holding period, which of the following return measures for this investment was most likely highest?
A
Investor A’s real return
B
Investor A’s nominal return
C
Investor B’s nominal leveraged return
A
Investor A’s real return
Note that the nominal return is approximately equal to the sum of the real return and inflation. If inflation is negative, the real return is greater than the nominal return.
in a test concerning the value of a population variance, what is the appropriate test statistic to use?
what is the formula?
Chi square test
chi square = ((n - 1)*population variance)/(sample variance)
Data curation is most accurately described as the process of:
A
identifying and correcting for data errors.
B
formatting and summarizing data in graphical form.
C
transforming data into a format that can be used for analysis.
A
identifying and correcting for data errors.
The objective of the data curation process is to ensure high quality, accurate data.
Any errors in a dataset are identified and appropriate action is taken. For example, it may be necessary to make adjustments for missing data that was unavailable or had to be removed.
An analyst collects a sample of 12 monthly return datapoints that have been drawn from a larger population. Wanting to reduce the bias of the expected value based on this small sample size, the analyst decides to resample the data using the jackknife method. Which of the following statements regarding this resampling process is most accurate?
A
Each repetition will include 11 observations
B
The process will be completed in 11 repetitions
C
The sample for each of the 12 repetitions will be drawn with replacement
A
Each repetition will include 11 observations
The Jackknife resampling method repeatedly samples, leaving out one observation at a time. For the 12 monthly observations in the original sample from this example, the jackknife method will require 12 repetitions, each with 11 observations. The process is done without replacement, meaning that, after an observation has been excluded from one repetition, it is no longer eligible to be excluded from any subsequent repetitions.
In analyzing the volatility of mutual fund returns, your null hypothesis is that the standard deviation of monthly returns for Fund A is greater than 2.0%. Assuming that returns are normally distributed, this hypothesis would most likely be tested using a:
A
a t-test.
B
an F-test.
C
a chi-square test.
C
a chi-square test.
For hypotheses regarding the variance of a single normally distributed population, a chi-square test is used.
how to find the money-weighted return
example:
Initial cash outflow + (total cash outflow or inflow)/(1 + r) + (final cash outflow or inflow)/(1 + r)^2
In a linear regression with one independent variable, the correlation (r) is calculated how?
is calculated as the square root of the coefficient of determination (R^2)
If a distribution is positively skewed, which of the following measures of central tendency most likely has the lowest value?
A
Mean
B
Mode
C
Median
B
Mode
For a positively skewed distribution such as the one shown below, the mode (high point of the curve) is smaller than the median (50th percentile), which is smaller than the mean (average value).
The mode is small because of the high concentration of small data points. The mean is high because of the fat right tail.
Data generated by individuals are typically:
A
structured.
B
unstructured.
C
semi-structured.
B
unstructured.
Data generated by individuals are typically produced in the form of video, photos, text files, or audio files, which tend to be unstructured as they cannot be easily organized into tables.
By contrast, business process data tend to be structured.
Semi-structured data have both structured and unstructured qualities.
Text analytics is appropriate for application to:
A
economic trend analysis.
B
large, structured datasets.
C
public but not private information.
A
economic trend analysis.
A is correct. Through the text analytics application of NLP, models using NLP analysis might incorporate non-traditional information to evaluate what people are saying—via their preferences, opinions, likes, or dislikes—in the attempt to identify trends and short-term indicators about a company, a stock, or an economic event that might have a bearing on future performance.
Which of the following is most likely the “fourth V” that has become an increasingly important characteristic of Big Data as large datasets have been increasingly relied upon for predictive purposes?
A
Value
B
Veracity
C
Volatility
B
Veracity
Traditionally, Big Data datasets have been defined by three key characteristics: volume, velocity, and variety.
A fourth defining characteristic — veracity — has become increasing importance more recently as the reliability and credibility of data sources is essential when using Big Data as the basis for predictions and inferences.
All else equal, is specifying a smaller significance level in a hypothesis test likely to increase the probability of a Type I error and a Type II error?
A
Type I error: No ; Type II error: No
B
Type I error: No ; Type II error: Yes
C
Type I error: Yes ; Type II error: No
B
Type I error: No ; Type II error: Yes
Specifying a smaller significance level decreases the probability of a Type I error (rejecting a true null hypothesis) but increases the probability of a Type II error (not rejecting a false null hypothesis).
As the level of significance decreases, the null hypothesis is less frequently rejected.
Covariance formulas
Covariance = p * s1 * s2
Covariance (A, B) = E [[A - E(A)] * [B - E(B)]
Which of the following statements is most likely correct regarding the chi-square test of independence?
A
The test has a one-sided rejection region
B
The null hypothesis is that the two groups are dependent
C
If there are two categories, each with three levels or groups, there are six degrees of freedom
A
The test has a one-sided rejection region
A is correct. The test statistic comprises squared differences between the observed and expected values, so the test involves only one side, the right side.
The best approach for creating a stratified random sample of a population involves:
A
drawing an equal number of simple random samples from each subpopulation.
B
selecting every kth member of the population until the desired sample size is reached.
C
drawing simple random samples from each subpopulation in sizes proportional to the relative size of each subpopulation.
C
drawing simple random samples from each subpopulation in sizes proportional to the relative size of each subpopulation.
C is correct. Stratified random sampling involves dividing a population into subpopulations based on one or more classification criteria.
Then, simple random samples are drawn from each subpopulation in sizes proportional to the relative size of each subpopulation.
These samples are then pooled to form a stratified random sample.
Which of the following tests of a hypothesis concerning the population mean is most appropriate?
A
A z-test if the population variance is unknown and the sample is small
B
A z-test if the population is normally distributed with a known variance
C
A t-test if the population is non-normally distributed with unknown variance and a small sample
B
A z-test if the population is normally distributed with a known variance
B is correct. The z-test is theoretically the correct test to use in those limited cases when testing the population mean of a normally distributed population with known variance.
A distribution with fat tails is most accurately described as:
A
leptokurtic.
B
platykurtic.
C
mesokurtic.
A
leptokurtic.
If you flip a coin twice and count the number of heads, the outcome is a random variable. How many possible outcomes can that random variable have?
A
2
B
3
C
4
B
3
If you flip the coin twice, there are only four possible scenarios:
Given these possible scenarios, there are three possible outcomes for the random variable, which is the total number of heads:
Heads Tails Outcome 1 0 2 Outcome 2 1 1 Outcome 3 2 0
n analyst observes the benchmark Indian NIFTY 50 stock index trading at a forward price-to-earnings ratio of 15. The index’s expected dividend payout ratio in the next year is 50 percent, and the index’s required return is 7.50 percent. If the analyst believes that the NIFTY 50 index dividends will grow at a constant rate of 4.50 percent in the future, which of the following statements is correct?
A
The analyst should view the NIFTY 50 as overpriced.
B
The analyst should view the NIFTY 50 as underpriced.
C
The analyst should view the NIFTY 50 as fairly priced.
B
The analyst should view the NIFTY 50 as underpriced.
15 < 50% / (7.5% - 4.5%) = 16.67
The above inequality implies that the analyst should view the NIFTY 50 as priced too low. The fundamental inputs into the equation imply a forward price to earnings ratio of 16.67 rather than 15.
When working with sample data taken from a population that is believed to be non-normally distributed, the most appropriate statistic to measure the strength of a linear relationship between two variables is the:
A
bivariate corelation coefficient.
B
Pearson correlation coefficient.
C
Spearman rank correlation coefficient.
C
Spearman rank correlation coefficient.
The Spearman rank correlation coefficient is used for non-parametric tests of correlation when working with data from a population that is believed to deviate significantly from an assumed normal distribution.
The Pearson (or bivariate) correlation coefficient is used to perform parametric tests if the population data is assumed to be normally distributed.
The kurtosis of the normal distribution is closest to:
A
0
B
1
C
3
C
3
The kurtosis of the normal distribution is 3.
Its excess kurtosis is zero.
he collection and transformation of data into a format that can be used by the analytical process is most accurately described as:
A
data capture.
B
data storage.
C
data curation.
A
data capture.
Data capture is the method of collecting and transforming data to be analyzed by low-latency systems.
Data curation is the process of cleaning data to ensure its quality and accuracy.
The data storage process involves designing databases so that data can be recorded, archived, and accessed.
Compared with bootstrap resampling, jackknife resampling:
A
is done with replacement.
B
usually requires that the number of repetitions is equal to the sample size.
C
produces dissimilar results for every run because resamples are randomly drawn.
B
usually requires that the number of repetitions is equal to the sample size.
B is correct. For a sample of size n, jackknife resampling usually requires n repetitions. In contrast, with bootstrap resampling, we are left to determine how many repetitions are appropriate.
A machine learning model that has been underfit will most likely:
A
treat noise in a training dataset as true parameters.
B
fail to recognize true relationships in a training dataset.
C
identify relationships in a training dataset that are not found in the validation dataset.
B
fail to recognize true relationships in a training dataset.
Underfit models are overly simplistic and can fail to identify true relationships that are present in a training dataset.
By contrast, a model is described as overfit when it treats noise in the training dataset as true parameters, but it is unable to find the same relationships in the validation dataset.