test 1 Flashcards

(27 cards)

1
Q

== means

A

“are the 2 things equal to each other?” or “is equal to”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

!= means

A

is not equal to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

> = means

A

is greater than or equal to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

<= means

A

is less than or equal to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

> means

A

is greater than

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

< means

A

is less than

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

logical operators in R- what means true and what means false?

A

true = 1
false = 0

(what the computer tells you when you ask it)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the ways I can ask this question and what type of responses would I get?

are any of the x greater than or equal to 40 AND less than 60?

A

x >= 40 & x <60 - you would get a string of statements that say “true” or “false” for each element in the vector

sum ( x >= 40 & x < 60) - adding the sum command would count how many of them are true

which ( x >= 40 & x < 60) - tells you which element is true to the statement don’t use much but still useful

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what does $ do?

A
  • grabs out a column from what you defined
  • can use it to add a new column too
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how to add another column to a data frame?

A

df.name$column <- c(…)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

if you have already defined “seedlings,” how can you extract the number of times the count “0” was observed?

A

sum(seedlings == 0)

sum gives the count, otherwise would just give the elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

if you defined seedlings, and now you want to see how many times each count occurred, what would you do?

A

df.seedlings <- data.frame (seedlings = c(0,1,2,3,4,5), freq = c(sum(seedlings==0), sum(seedlings ==1), sum(seedlings==2), sum(seedlings==3), sum(seedlings==4), sum(seedlings==5)))

first create data frame and then create 2 columns. one for the seedlings and one for the frequency. and then use the sum command to count how many seedlings were equal to 1, 2, 3, 4, and 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

you’ve defined x, now you want to get the 3rd and 4th elements of x, how?

all but the 3rd and 4th elements?

A

x[c(3,4)]

have to use both the square brackets and the vector ones

x[-c(3,4)]

put the negative outside of the c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

defined x, how to get only even numbers out?

to get the elements in reverse?

A

x[seq(2, length(x), by=2)]

x[10:1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

if x is defined, how to get the sum of the first and last element if there are 10 elements? the product of second and ninth?

A

x[1] + x[10]

x[2] * x[9]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how to get square of each element in x?

17
Q

what does the sample command do and how to use it?

A

generates a random sample of numbers, used like this:

sample(1:100, 25, replace=FALSE)

1:100 is the range of which you want the numbers to be from

25 is the amount of numbers you want

and replace is always FALSE

18
Q

what does set.seed() do?

A

makes the sample the same for sample1 and sample2

19
Q

with the cdc file we used, how would you find how many of the subjects are male?

A

sum(gender==”m”)

20
Q

list the 2 ways to find the median of the ages of nonsmokers vs smokers

A

many ways to find the median using the median command, here are 2:

  • median(cdc[smoke100==1,]$age)
  • smokers <- subset(cdc, smoke100=1)
    & smokers_median <- median (smokers$age)
21
Q

what does the ~ line mean in box plots?

A

used to specify relationships between variables in various functions

like:
boxplot(age ~ smoke100, data = cdc)

tells R to show the distribution of ages for smokers and nonsmokers separately, allowing you to compare the age distributions between the two groups visually

22
Q

what is the comma used for in [smoke100==1,]?

A

used to separate row indices from column indices. In this case, we’re leaving the column part empty, meaning we’re selecting all columns

23
Q

sapply()

A

simplify apply, applies the same function to each vector

24
Q

how to solve this using sapply function:

Make a cumulative distribution of the ages of the males or females in the dataset by finding the number of subjects not older than each age in the sequence seq(from = 0,to = 100,by = 5). Use barplot() to make and attach a plot of your results.

A

df.males <- data.frame(Age = seq(0,100,by = 5), Cumulative_Frequency = sapply(seq(0,100,by = 5), function(X) sum(age[gender == “m”] <= X))) # males age cumulative distribution

25
what does adding a type =1 inside of quantile function do?
changes to empirical instead (kinda like rounding to the closest number)
26
how to solve this problem : What fraction of men have heights greater than the 90th percentile of height among women? What fraction of women have heights less than the 10th percentile of height among men?
quantile(height[gender == "f"], probs = 0.9, type = 1) # 90th percentile of height among females sum(height[gender == "m"] > quantile(height[gender=="f"], probs = 0.9, type = 1)) / sum(gender == "m") # fraction of males taller than the tallest females and then you can go from there for the other one
27