R Flashcards

1
Q

What is the difference between a categorical variable and a continuous variable?

A

A categorical variable can belong to a limited number of categories and a continuous variable can correspond to an infinite number of values.

For example: sex is a categorical variable because it is limited to ‘Male’ or ‘Female’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is R?

A

R is an open-source language and environment for statistical computing and analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Can you write and explain some of the most common syntax in R?

A

— as in many other languages, # can be used to introduce a line of comments. This tells the compiler not to process the line, so it can be used to make code more readable by reminding future inspectors what blocks of code are intended to do.

”” — quotes operate as one might expect; they denote a string data type in R.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you list the preloaded datasets in R?

A

To view a list of preloaded datasets in R, simply type data() into the console and hit enter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some advantages of R?

A

Its open-source nature. This qualifies as both an advantage and disadvantage for various reasons, but being open source means it’s widely accessible, free to use, and extensible.

Its package ecosystem. The built-in functionality available via R packages means you don’t have to spend a ton of time reinventing the wheel as a data scientist.

Its graphical and statistical aptitude. By many people’s accounts, R’s graphing capabilities are unmatched.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the disadvantages of R?

A

Memory and performance. In comparison to Python, R is often said to be the lesser language in terms of memory and performance. This is disputable, and many think it’s no longer relevant as 64-bit systems dominate the marketplace.

Open source. Being open source has its disadvantages as well as its advantages. For one, there’s no governing body managing R, so there’s no single source for support or quality control. This also means that sometimes the packages developed for R are not the highest quality.

Security. R was not built with security in mind, so it must rely on external resources to mind these gaps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the similarities and differences between R and Python?

A

There are many comparisons to draw between Python and R. They are both free. They both have strong modeling capabilities. Python is generally considered more secure and easier to learn, but R is typically thought to have better visualization tools and libraries. In many jobs, you’ll be expected to use both R and Python, so it’s good to know about both, even if you aren’t fluent in both languages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When is it appropriate to use the “next” statement in R?

A

A data scientist will use next to skip an iteration in a loop. As an example:

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you assign a variable in R?

A

Variable assignment in R is a bit different from other languages. Rather than using an = sign, we typically use a less-than sign, < ,followed by a minus, –. An equals sign, =, still works, but there are arguments about its readability in addition to instances where it can actually muck up your code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the different data types/objects in R?

A

Unlike other object-oriented languages such as C, R doesn’t ask users to declare a data type when assigning a variable. Instead, everything in R correlates to an R data object. When you assign a variable in R, you assign it a data object and that object’s data type determines the data type of the variable. The most commonly used data objects include:

Vectors
Matrices
Lists
Arrays
Factors
Data frames
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you import data in R?

A

Let’s use CSV as an example, as it’s a very common data format. Simply make sure the file is saved in a CSV format, then use the read function to import the data.

yourRDateHere

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you install a package in R?

A

There are many ways to install a package in R. Some even include using the GUI. We’re coders, so we’re not going to give those attention.

Type the following into your console and hit enter:

install.packages(“package_name”)

Followed by:

library(package_name)

It’s that simple. The first command installs the package and the second loads the package into the session.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the use of with() in R?

A

We use the with() function to write simpler code by applying an expression to a data set. Its syntax looks like this:

with(randomDataSet, expression.test(sample))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the use of by() in R?

A

Like with(), by() can help write DRY (don’t repeat yourself) code.

You can use by() to apply a function to a data frame split by factors. Its usage is something like this:

by(data, factor, function, …)

The data frame plugged into this function is split into data frames (by row) subsetted by the values of factor(s), and a function is then applied to each subset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When is it appropriate to use mode()?

A

By default, mode() gets or sets the storage mode of an object. It’s default usage is equivalent to storage.mode(). A sample usage:

x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a factor variable, and why would you use one?

A

A factor variable is a form of categorical variable that accepts either numeric or character string values. The most salient reason to use a factor variable is that it can be used in statistical modeling with great accuracy. Another reason is that they are more memory efficient.

17
Q

When is it appropriate to use the which() function?

A

The which() function loops through a logical object until the condition returns TRUE and returns the index (position) of the element.

To get a sense of how this works, plug in the letters array and search for the index of a specific letter using which().

18
Q

How do you concatenate strings in R?

A

Concatenating strings in R is less than intuitive. You don’t use a . operator, nor a + operator, and forget about the & operator. In fact, you don’t use an operator at all. Concatenating strings in R requires the use of the paste() function. Here’s an example:

hello

19
Q

How do you read a CSV file in R?

A

Simply use the read.csv() function.

yourRDateHere

20
Q

Can you create an R decision tree?

A

A decision tree is a familiar graph for data scientists. It represents choices and results through the graphical form of a tree. To keep things simple, let’s just go over the basics.

Install the party package to get started with making the tree.

install.packages(“party”)

This gives you access to a fancy new function: ctree(), and, at its most basic, this is all we need to create a tree. First, let’s grab some data from our package; make sure the package is loaded.

library(party)

Now we have access to some new data sets. Part of the strucchange package that bundles with party includes data on youth homicides in Boston called BostonHomicide. Let’s use that one. You can print the data to the screen if you like.

print(BostonHomicide)

Now we’ll create the tree. The usage of ctree() goes something like this:

ctree(formula,dataset)

We’ve got our data set. I’ll assign it to a variable for simplicity.

inputData

21
Q

Why is R useful for data science?

A

R turns otherwise hours of graphically intensive jobs into minutes and keystrokes. In reality, you probably wouldn’t encounter the language of R outside the realm of data science or an adjacent field. It’s great for linear modeling, nonlinear modeling, time-series analysis, plotting, clustering, and so much more.

Simply put, R is designed for data manipulation and visualization, so it’s natural that it would be used for data science.

22
Q

Describe how R can be used for predictive analysis

A

As a data manipulation and visualization tool, R can most definitely be used for predictive analytics. Using the same sort of decision tree we developed earlier, one could predict how many shootings might occur in 2019 in Boston. R as a whole provides numerous tools and packages for predictive modeling, so it’s the right tool for a data scientist.

23
Q

What are the two types of categorical variables?

A

nominal categorical variable: variable without implied order.

ordinal categorical variable: have natural ordering such as low medium and high.

24
Q

What is a data frame?

A

Data frames (two-dimensional objects): can hold numeric, character or logical values. Within a column all elements have the same data type, but different columns can be of different data type.A data frame has the variables of a data set as columns and the observations as rows.

25
Q

What are Vectors?

A

Vectors (one dimensional array): can hold numeric, character or logical values. The elements in a vector all have the same data type.

26
Q

What are Matrices?

A

Matrices (two dimensional array): can hold numeric, character or logical values. The elements in a matrix all have the same data type.

27
Q

What is a list?

A

A list in R is similar to your to-do list at work or school: the different items on that list most likely differ in length, characteristic, and type of activity that has to be done.

A list in R allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other lists, etc. It is not even required that these objects are related to each other in any way.

You could say that a list is some kind super data type: you can store practically any piece of information in it!