PART II: Python For Basic Data Analysis Flashcards

1
Q

What are the functions you have to import to plot/show a graph?

A

From pylab (or matplotlib) and numpy
Plot, show, xlabel, title, legend, xlim, etc
-import data using loadtxt
-use linspace to get x values and then calc y
-assign x and y manually with lists
-can also use errorbar()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can we modify the lines and colours on a plot?

A

Colours: r,g,b,c,m,y,k,w

Line styles: o(dotted), -(solid), –(dashed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Scatter plots

A
Use scatter() from pylab
For if you don't want to connect the dots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Density plots

A
  • for chi^2 mostly
  • use imshow() from pylab
  • y axis top to bottom, x left to right
  • adjust y with origin=”lower”
  • parameter extent=[xl,xu,yl,yu] gives range of x and y values
  • aspect=# specifies aspect ratio of x and y axes
  • colorbar() shows range of colours to help read density plot
  • also different colour schemes exist! (Ex. spectral() )
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What kinds of errors or uncertainties are there?

A

Systematic: error only goes one way
Random: goes both ways at random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is discrepancy?

A

Refers to the difference between results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When is discrepancy significant?

A

If it’s larger than both error ranges combined (as in, there’s no overlap of error bars)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When are the two measurements not consistent?

A

If the discrepancy is significant

Which means if it’s larger than both error ranges combined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Error propagation: sums, differences, products, quotients, how???

A

Sums and differences add in quadrature (sum of squares)
Products and quotients add RELATIVE error in quadrature
Follow BEDMAS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Error propagation: two correlated or dependent conditions

A

Condition with lowest error is considered in error calculations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is probability?

A

The chances of getting a subset N outcomes from a total T possible outcomes:
P = N/T

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Properties of probability?

A
  • 0<p></p>
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What do we do if we have two independent conditions? (Probability)

A

Neither can affect the probability of the other and so probabilities are multiplied

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do we represent statistical significance?

A
  • we use n*sig (corresponds to probability of having a result n standard deviations away from mean in Gaussian dist)
  • 1sig = 0.31
  • 2sig = 0.0456
  • outcomes are significant if p-value is equal to sig level
  • assume two sides probabilities
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Define p-value

A

Probability that our data matches the null hypothesis
Higher = greater match to “nothing happening”
Lower = more different and significant results compared to control/null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do we calculate mean/average?

A

List: sum/len
Array: sum/shape (from numpy)

17
Q

What is the median?

A

For sorted set of values: numpy.median()

18
Q

Mode

A

Most common value of x

Have to count occurance of each quantity

19
Q

Formula for variance

A

Sig^2 = 1/N * sum(N,i)(xi-xavg)^2

20
Q

What is variance?

A

If we’re looking at the range/spread of values, it means how far from the average we usually are

21
Q

Error in the mean: what is it, what’s the formula?

A

Measure of how reliable your mean value is
Delta = sig/sqrt(N)
You want more N to reduce it
You’d have to increase N a lot though since its sqrt

22
Q

Formula for normal/Gaussian distribution

A

p(x) = 1/sig*sqrt(2pi) * exp(-(x-xavg)^2/2sig^2)

23
Q

Central limit theorem

A

Distribution of sample means approaches a normal distribution as sample size gets larger

24
Q

How do we calculate the probability of a given result for a normal distribution?

A

Integrate it
But you can’t integrate it so
We have erfc and erf
P(|x|>nsig) = erfc(n/sqrt(2))

25
Q

When do you use binomial distribution?

A

When you want to determine the probability of discrete events occurring, with no range given

26
Q

Binomial distribution: what do the variables mean?

A

K is number of discrete events you’re looking for
N is total number of events
P is probability of discrete event occurring

27
Q

When do you use poisson?

A

Probability of discrete events occurring

Rate is given (as lambda)

28
Q

When you’re picking out errors in code, what should you look for?

A

Float values

Syntax errors

29
Q

What are the largest/smallest numbers in Python for floats and what happens if you exceed them?

A
\+/- 10^308
Exceeds max: inf (overflow)
Lower min: 0 (underflow)
Operations with inf give nan
Limited precision due to rounding error
30
Q

Precision for int values in Python?

A

Arbitrary precision (as long as needed)

31
Q

What is maximum likelihood?

A

A way of finding the most probably model used to reproduce the data you obtained
Method of fitting a curve

32
Q

How good is a model?

A

Log values of probability to force it to be monotonic

To max it you use chi^2

33
Q

How do we test our models?

A

Guess better parameter values

Use least-squares test to find lowest values of chi^2 (mathematically or density plot)

34
Q

How do we run the chi^2 process?

A

Import functions and data
Set different ranges of a,b using set intervals to test
Create array to store all chi^2 values for a,b
Create for loop to calc/store all chi^2
Use density plot or solve mathematically to find lowest chi^2

35
Q

Solving chi^2 with math

A

Use chi^2 equation & plug in model for m
Differentiate chi^2 with respect to a, set =0
Solve for a
That looks simple but it’s ugly

36
Q

What do you do if it isn’t a linear model?

A

Use a density plot