Week 1 Flashcards
(48 cards)
a)
A parameter describes a particular characteristic of the entire population of UK
adults. A statistic, on the other hand, describes a particular characteristic of a sample
from the population. Because 54% is based only on a sample from the UK population, it
i s a statistic.
b)It is very unlikely that the new sample will contain exactly the same number of Remainers as the old sample. However, the sample size is reasonably large and we should not expect the difference between the two sample proportions to be large.
a) Categorical, nominal. The responses can only be grouped into categories. The
responses are nominal because it is not possible to rank the responses.
b) Categorical, ordinal. The responses can be grouped into categories. The
measurements are ordinal because the ratings of customer service can be ranked.
c)Numerical, continuous. The outcomes correspond to actual numbers where
differences between any two values are quantitatively meaningful. The measurement is continuous because time can take any value within a given interval.
c) Indicate whether each variable in the study is numerical or categorical. If numerical,
identify it as continuous or discrete. If it is categorical, give the level of measurement.
a Each row of the data matrix represents a participant in the survey.
b) There were 1,691 participants in the survey.
c) ex: Categorical, nominal. The responses can only be grouped into categories. The responses
are nominal because it is not possible to rank the responses.
Age: Numerical, continuous. The outcomes correspond to actual numbers and the difference
between any two ages is quantitatively meaningful. Age is continuous because it can take
any value within a given range of possible ages. In this survey, however, ages are recorded
as whole numbers and are reported as discrete variables. Even though age is reported as a
discrete variable, the units are small enough that we would treat this as a continuous
variable.
grossIncome: Categorical, ordinal. The concept of income is continuous, but in this survey it
is reported as a categorical variable. It is ordinal because the different income categories
can be ranked.
Smoke: Categorical, nominal. The responses can only be grouped into categories. The
responses are nominal because it is not possible to rank the responses.
amtWeekends: Numerical, discrete. The outcomes correspond to actual numbers. The
responses are discrete because the number of cigarettes smoked can only be a whole number.
amtWeekdays: Numerical, discrete. The outcomes correspond to actual numbers. The
responses are discrete because the number of cigarettes smoked can only be a whole number.
Similar to Age, we might treat the last two variables as continuous in practice.
The two variables are positively associated. Countries in which a higher
percentage of the pop ulation have access to the i nternet also tend to have higher life
expectancies.
No, that is not a reasonable conclusion. Omitted third variables, such as level
of economic development, likely drive both internet use and life expectancy .
a) the distribution is right skewed with potential outliers on the positive end. We
should expect that these positive outliers will pull the mean above the median.
b) The distribution is somewhat symmetric and has few, if any, extreme
observations, therefore we should expect that the mean and median will be similar.
c) Most Edinburgh undergraduate students are around 20 years old, and the
majority are European and will have spent almost all of their 240 months in Europe. Very
few students will have spent more than 300 or so months in Europe, but some of the
overseas students (partic ularly freshers) will have lived 0 months in Europe (at least
initially). The minority of observations clustered at the extreme lower end of the
distribution should pull the mean down below the median.
d) The distribution would be right skewed. Most employees would make
something on the order of the median salary, but we would anticipate that upper
management makes much more. The distribution would have a long right tail, and we
should expect the mean to be greater than the median.
Since about 35% of the data is in the first bar this is where Q1 is, between 0 and 10.
The median is in the second bar, between 10 and 20. Q3 appears to be the fifth bar, between
40 and 50.
Note that when est imating the population
variance and standard deviation from a sample , we use π β 1 rather than π in the
denominator.
mean = 2%
standard dev = 0.79
a) Distribution (2) has a higher mean since 20 > 13 , and a higher standard
deviation since 20 is much further from the rest of the data than 13.
b) Distribution (1) has a higher mean since -20 > -40, and distribution (2) has a
higher standard deviation since -40 is farther away from the rest of the data than -20.
c)istribution (2) has a higher mean since all values in this distribution are
higher than those in distribution (1), but both distributions have the same standard
deviation since they are equally variable around their respective means.
d) Both distributions have the same mean since they are both centred at 300 , but
distribution (2) has a higher standard deviation since the observations are farther from
the mean than in distribution (1).
Because each stock price has a different mean, it is misleading to simply compare
standard deviations. In order to meaningfully compare the relative volatility of stock prices,
we should calculate the coefficient of variation, which is calculated as the ratio of the standard
deviation to the mean multiplied by 100. For Ford, Honda, and Toyota, the coeffi cients of
variation are 49.05, 26.80, and 29.68, respectively . Therefore, Ford stock prices are the most
volatile.
The mean height is 61.52 inches with a standrd deviation of 4.58 inches. Use this
information to determine if the heights approximately follow the β68-95-almost allβ empirical
rule.
what is systemic sampling?
assumes the population list has no connection to the subject being studied. Every jth item in a population is selected
therefore there is no risk of bias if there is a hidden pattern or link between the population and study topic