Lesson 1 Flashcards

1
Q

State three different widespread problems with data. For each problem, give a 1-sentence definition

A

faulty: when a data is elaborated or researched by humans, inevitable errors could be committed

incomplete: data could be incomplete (maybe lack of answers or participants)

censored: data could be censored and so impossible to reach

survey-based: based on survey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Briefly (≤ 3 sentences) explain the term replication crisis.

A

Replication crisis: many scientific results cannot be reproduced and this fact carry dubts about the affidability of the data used in those researches
This raises doubts about the validity and reliability of many published scientific studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Briefly (≤ 2 sentences) state what is regulated in the Dickey Amendment

A

The Dickey-Wicker Amendment (1996) prohibits the use of federal funds for research that involves the dead by firearm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Fill in the blank boxes and the blank arrow descriptions denoted (1), (2),
. . . in the graph for the data value chain in the table below.

A

Impara lo schema

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the term “GIGO” stand for in data science? Briefly (≤ 2 sentences) explain it

A

“GIGO” stands for “Garbage In, Garbage Out” in data science. It means that if low-quality or inaccurate data is used as input for a computer program or analysis, the output or results will also be of low quality and accuracy. In other words, the quality of the output depends on the quality of the input data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

For each of the following defintions of random variables, state whether it is a
correct definition of a random variable or not. If you answer \not”, briefly (≤ 2 sentences)
state which rule for random variables has been violated.

a Updown. This variable is −1, if the return of a specific index is < 0, and +1, if the
return of the same index is > 0.

b Continent. This variable is 1, if a certain place is located in Europe, 2 for North
America, 3 for South America, 4 for Asia, 5 for Australia and 6 for Antarctica.

c MaleSon. This variable is 1, if the eldest child of a person is male and 0 if the
eldest child of a person is female.

d FirmSize. This variable is “S” if a company has less than 100 employees and a
turnover of less than 10 million CHF. It is “M”, if only one of the above conditions
is true. It is “L”, if the company has 100 employees or more and a turnover of ≥ 10
million CHF.

A

a. The 0 is missing. Ubi ≠ omega

b. Africa is missing

c. If the person does not have children

d.not a real number is associated to S

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why data is useful?

A

New data set → research and business opportunities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why data is problematic?

A

Recorded, processed, transferred and converted by humans
→ inevitable errors

widespread problems

replication crisis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what does it mean Data is (not) “given”?

A

consider collecting/deriving it yourself this:
Means we cannot change →

es. it means that in Y= a + bX
you can change a and b to improve data but you cannot change X and Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

explain briefly what does it mean “Responsible data use”

A

Check data: Compare to other sources, literature, check internal consistency

Reproducability: keep audit trail for any data usage

Share data (verification+insight)

Copyright and privacy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a data?

A

Data
= collection of realizations of random variables
= collection of
measurements
of a property
of an entity/individual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

For each of the following numeric random variables, specify the statistical
type of variable. If you think that a variable could be of two different types, briefly
(1sentence for each case) state under which cirsumstances it would be of one type and under
which circumstances it would be of another type

  • Left-Handedness
  • Shoe size
  • Firm size
  • iPhone model
A
  • Left-Handedness: indicator
  • Shoe size: Categorical, ordered unclear relation
  • Firm size: if you think about numbers of employees-> descrete, if you think about profit-> continuos ratio
    Categorical, ordered but unclear relation
  • iPhone model: Categorical, unordered: because if we think about different iPhone 12 it has iPhone Pro, Pro max, plus or normal. but could be categorical ordered if we think about: iPhone 10, 11, 12, 13, 14…
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The type of random variable is necessary for many tasks involving random variables.

Name three such tasks.

A

Design of data structure or database

Design optimal simulation scheme (e.g. Monte Carlo)

Data visualization

definition and interpretation of statistical model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is a Random variable?

A

Function assigning a real number to every state of nature

Sample space Ω. Set of all possible (relevant) events.

States of nature. Each possible outcome = state of nature, (ωi).
Finite (i ∈ {1, 2, . . . , N}) or infinite (i ∈ N) size.

Partitions of Ω. Collection of all subsets P = {B1, . . . Bn}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what are the 2 “Pizza slicing rules”?

A

1) Don’t forget a part (or B1 ∪ B2 ∪ · · · ∪ Bn = Ω)

2) Don’t count a part twice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly