WEEK 3 Flashcards
(35 cards)
INDEXING
With R we can relate one group of vector with another.
INDEXING EXAMPLE PROGRAM
MURDER$RATE <- #MURDER$TOTAL/MURDERS$POPULATION * 100000
#MURDERS$RATE<=0.71
#MURDERS$STATE[MURDERS$RATE]
THE SUM FUNCTION
The function sum returns the sum of the entries oF a vector and logical vectors get coerced to numeric with TRUE coded as 1 and FALSE as 0.
Thus we can count the states using:
SUM[MURDERS$RATE]
LOGICAL OPERATOR PROGRAMMING EXAMPLE
WEST <- MURDER$REGION == “WEST”
SAFE <- MURDERS$RATE < 1
INDEX <- WEST & SAFE
MURDERS$STATE [INDEX]
WHICH FUNCTION
This helps us to find the specific entry by converting vectors of logical into indexes
example
index <- murder$state == “California”
murder$rate[index]
MATCH
This function tells us which
indexes of a second vector match each of the entries of a first vector
example
index<- match(c(“California”,”New York”, “Florida”), murder$state)
ind
%in%
If rather than an index we want a logical that tells us whether or not each element of a
first vector is in a second, we can use the function %in%.
c(“Boston”, “Dakota”, “Washington”) %in% murders$state
#> [1] FALSE FALSE TRUE
PLOT
PLOT FUNCTION CAN BE USED TO MAKE SCATTERPLOTS
EXAMPLE
X<- MURDERS$POPULATION / 10^6
Y<- MURDERS$TOTAL
PLOT(X,Y)
ALSO
X <-WITH(MURDERS(POPULATION/10^6,TOTAL)
PLOT(X)
HISTOGRAM
HISTOGRAMS ARE A POWERFUL GRAPHICAL SUMMARY OF A LIST OF NUMBERS THAT GIVES YOU A GENERAL OVERVIEW OF NUMBERS YOU HAVE.
HIST()
BOXPLOT
They provide a
more terse summary than histograms, but they are easier to stack with other boxplots.
murders$rate <- with(murders, total / population * 100000)
boxplot(rate~region, data = murders)
DPLYR
Library(dplyr)
MUTATE FUNCTION
This function is used to change the date table by adding more columns, or rows.
FILTER FUNCTION
This is used to filter the data.
How to select a specific column in a data table?
By using select function.
EXAMPLE FOR MUTATE - ADD A NEW COLUMN CALLED RATE IN MURDERS DATA TABLE
murders <- mutate(murders, rate = total / population * 100000)
Filter the states with murder rate less than 0.71
filter(murders, rate <= 0.71)
Select only state, region and rate assign it to an object called new_table and show the states less 0.71 murder rate?
new_table <- select(murders, state, region, rate)
filter(new_table, rate<= 0.71)
Select only state, region and rate assign it to an object called new_table and show the states less 0.71 murder rate in a single line of code?
murders %>% select(murders,state,region,rate) %>% filter(rate<=0.71)
MUTATE FUNCTION
The mutate function is used to add a column to a dataset. A mutate takes the dataframe as first argument, and names and value as the second argument.
ADD MURDER RATE USING MUTATE FUNCTION
library(dslabs)
data(“murders”)
murders <- mutate(murders, rate = total / population * 100000)
Filter function to filter data
The filter function, which takes the data
table as the first argument and then the conditional statement as the second.
Selecting columns with select
new_table <- select(murders, state, region, rate)
filter(new_table, rate <= 0.71)
This selects only rate, state, region column of murders dataset
The pipe function
murders %>% select(state, region, rate) %>% filter(rate <= 0.71)
In general, the pipe sends the result of the left side of the pipe to be the first argument of
the function on the right side of the pipe
summarize() function
1, The main purpose is to create new summary table.
example:
s <- heights %>%
filter(sex == “Female”) %>%
summarize(average = mean(height), standard_deviation = sd(height))
This takes our original data table as input, filters it to keep only females, and then produces
a new summarized table with just the average and the standard deviation of heights