WEEK 3 Flashcards

(35 cards)

1
Q

INDEXING

A

With R we can relate one group of vector with another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

INDEXING EXAMPLE PROGRAM

A

MURDER$RATE <- #MURDER$TOTAL/MURDERS$POPULATION * 100000

#MURDERS$RATE<=0.71
#MURDERS$STATE[MURDERS$RATE]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

THE SUM FUNCTION

A

The function sum returns the sum of the entries oF a vector and logical vectors get coerced to numeric with TRUE coded as 1 and FALSE as 0.
Thus we can count the states using:
SUM[MURDERS$RATE]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

LOGICAL OPERATOR PROGRAMMING EXAMPLE

A

WEST <- MURDER$REGION == “WEST”
SAFE <- MURDERS$RATE < 1
INDEX <- WEST & SAFE
MURDERS$STATE [INDEX]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

WHICH FUNCTION

A

This helps us to find the specific entry by converting vectors of logical into indexes

example
index <- murder$state == “California”
murder$rate[index]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

MATCH

A

This function tells us which
indexes of a second vector match each of the entries of a first vector
example
index<- match(c(“California”,”New York”, “Florida”), murder$state)
ind

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

%in%

A

If rather than an index we want a logical that tells us whether or not each element of a
first vector is in a second, we can use the function %in%.
c(“Boston”, “Dakota”, “Washington”) %in% murders$state
#> [1] FALSE FALSE TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

PLOT

A

PLOT FUNCTION CAN BE USED TO MAKE SCATTERPLOTS

EXAMPLE
X<- MURDERS$POPULATION / 10^6
Y<- MURDERS$TOTAL
PLOT(X,Y)

ALSO

X <-WITH(MURDERS(POPULATION/10^6,TOTAL)
PLOT(X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

HISTOGRAM

A

HISTOGRAMS ARE A POWERFUL GRAPHICAL SUMMARY OF A LIST OF NUMBERS THAT GIVES YOU A GENERAL OVERVIEW OF NUMBERS YOU HAVE.

HIST()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

BOXPLOT

A

They provide a
more terse summary than histograms, but they are easier to stack with other boxplots.

murders$rate <- with(murders, total / population * 100000)
boxplot(rate~region, data = murders)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DPLYR

A

Library(dplyr)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

MUTATE FUNCTION

A

This function is used to change the date table by adding more columns, or rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

FILTER FUNCTION

A

This is used to filter the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to select a specific column in a data table?

A

By using select function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

EXAMPLE FOR MUTATE - ADD A NEW COLUMN CALLED RATE IN MURDERS DATA TABLE

A

murders <- mutate(murders, rate = total / population * 100000)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Filter the states with murder rate less than 0.71

A

filter(murders, rate <= 0.71)

17
Q

Select only state, region and rate assign it to an object called new_table and show the states less 0.71 murder rate?

A

new_table <- select(murders, state, region, rate)
filter(new_table, rate<= 0.71)

18
Q

Select only state, region and rate assign it to an object called new_table and show the states less 0.71 murder rate in a single line of code?

A

murders %>% select(murders,state,region,rate) %>% filter(rate<=0.71)

19
Q

MUTATE FUNCTION

A

The mutate function is used to add a column to a dataset. A mutate takes the dataframe as first argument, and names and value as the second argument.

20
Q

ADD MURDER RATE USING MUTATE FUNCTION

A

library(dslabs)
data(“murders”)
murders <- mutate(murders, rate = total / population * 100000)

21
Q

Filter function to filter data

A

The filter function, which takes the data
table as the first argument and then the conditional statement as the second.

22
Q

Selecting columns with select

A

new_table <- select(murders, state, region, rate)
filter(new_table, rate <= 0.71)

This selects only rate, state, region column of murders dataset

23
Q

The pipe function

A

murders %>% select(state, region, rate) %>% filter(rate <= 0.71)
In general, the pipe sends the result of the left side of the pipe to be the first argument of
the function on the right side of the pipe

24
Q

summarize() function

A

1, The main purpose is to create new summary table.
example:
s <- heights %>%
filter(sex == “Female”) %>%
summarize(average = mean(height), standard_deviation = sd(height))
This takes our original data table as input, filters it to keep only females, and then produces
a new summarized table with just the average and the standard deviation of heights

25
Pull() function
us_murder_rate <- murders %>% summarize(rate = sum(total) / sum(population) * 100000) %>% pull(rate) The resulting value is numeric not a data frame.
26
groupby()
heights %>% group_by(sex) %>% summarize(average = mean(height), standard_deviation = sd(height)) The summarize function applies the summarization to each group separately.
27
Arrange()
murders %>% arrange(rate) %>% Note that the default behavior is to order in ascending order. In dplyr, the function desc transforms a vector so that it is in descending order. example: murders %>% arrange(desc(rate))
28
Nested Sorting/ Arrange
murders %>% arrange(region, rate) %>% Here we order by region, then within region we order by murder rate:
29
What is tibbles
The functions group_by and summarize always return this type of data frame. The group_by function returns a special kind of tbl, the grouped_df.
30
Tibbles display it better?
The print method for tibbles is more readable than that of a data frame. We can do this using as_tibble(murders).
31
Subset of tibbles are tibbles?
If you subset the columns of a data frame, you may get back an object that is not a data frame, such as a vector or scalar. With tibbles this does not happen. class(as_tibble(murders)[,4]) if you want to access the vector that defines a column, and not get back a data frame, you need to use the accessor $: class(as_tibble(murders)$population)
32
Create a tibble using tibble?
To create a data frame in the tibble format, you can do this by using the tibble function. grades <- tibble(names = c("John", "Juan", "Jean", "Yao"), exam_1 = c(95, 80, 90, 85), exam_2 = c(90, 85, 85, 90))
33
How to convert rectangular dataframe into a tibble?
To convert a regular data frame to a tibble, you can use the as_tibble function. ex: as_tibble(grades) %>% class()
34
The Dot Operator?
rates <-filter(murders, region == "South") %>% mutate(rate = total / population * 10^5) %>% .$rate median(rates)
35
the do operator?
heights %>% group_by(sex) %>% do(my_summary(.))