Data analysis with R Programming Flashcards

1
Q

What you have learnt so far?

A

-Use structured thinking to define a problem and ask the right questions.

  • Work with spreadsheets, databases, and tools like SQL to organize and transform data.

-Clean your data to make sure it has integrity before you analyze it.

  • Create impactful data visualizations to illustrate key points.
  • Craft a compelling story to communicate insights to stakeholders.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Computer programming

A

Giving instructions to a computer to perform an action or set of instructions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What you will learn?

A
  • Introduction to programming languages.
  • Explore main features and functions.
  • Basic programming concepts in R.
  • How to work with data in R.
  • Clean, transform, visualize, report data in R.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

R Programing language

A

Used for statistical analysis, visualization, and other data analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Programming Languages

A
  • The words and symbols we use to write instructions for computers to follow.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Coding

A
  • is writing instructions to the computer in the syntax of a specific programming language.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Programming languages

A

-R
- Python
- JavaScript
- SAS
-Scala
-Julia

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Benefits of using programming languages

A
  • Clarify the steps of your analysis.
  • Saves time.
  • Reproduce and share your work.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

R

A

A programming language frequently used for statistical analysis, visualization, and other data analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Open Source

A

Code that is freely available and may be modified and shared by the people who use it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

R Benefits

A
  • Accessible
  • Data-centric
  • Open source
  • Community
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Uses of R

A
  • Reproducing your analysis
  • Processing lots of data
  • Creating data visualizations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Integrated Development Environment (IDE)

A

A software application that brings together all the tools you may want to use in a single place.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

R code known as pipe

A

Helps make a sequence of code easier to work with and read.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The Basic concepts of R

A
  • Functions
    -Comments
  • Variables
  • Data types
  • Vectors
    -Pipes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Functions (R)

A

A body of reusable code to perform specific tasks in R.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Argument (R)

A

Information that a function in R needs in order to run.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Variable (R)

A

A representation of a value in R that can be stored for use later during programming.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Vector (R)

A

A group of data elements of the same type stored in a sequence in R.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Pipe(R)

A

A tool in R for expressing a sequence of multiple operations, represented with “%>%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Pipe (R) example

A

Tooth Growth %>%
filter(dose==0.5)%>%
arrange(Len)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Data Structure

A

Data structure is a format for organizing and storing data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Types of atomic vectors

A

-Logical
-Double
-integer
-Character

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Logical Vector

A

True/False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Logical vector example
TRUE
26
Integer vector
Positive and negative whole values
27
Integer vector example
3
28
Double vector
Decimal values
29
Double vector example
101.175
30
Character vector
String/ character values
31
Character vector example
“Coding”
32
Data Frames
are the most common way of storing and analyzing data in R.
33
Matrix
is a two-dimensional collection of data elements. This means it has both rows and columns.
34
Operator
A symbol that names the type of operation or calculation to be performed in a formula.
35
Assignment operators
Used to assign values to variables and vectors.
36
Assignment operator Example
sales _1 <-1 c(67.00,75.50,90.00,54.75)
37
Arithmetic Operators
Used to complete math calculations.
38
Athematic Operators
+ (addition) -(subtraction) *(multiplication) /(division)
39
Function
A body of reusable code for performing specific tasks in R.
40
Argument
Information needed by function in R in order to run.
41
Comment
Helpful text that describes or explains R code, preceded by#.
42
Variable
A representation of a value in R that can be stored for later use.
43
Data Types
An attribute that describes a piece of data based on its values, its programming language, or the operations it can perform.
44
Vector
A group of data elements of the same type stored in a one-dimensional sequence in R.
45
Pipe
A tool in R for expressing a sequence of multiple operations, represented with %>%.
46
Packages (R)
Units of reproducible R code
47
Packages include:
- Reusable R functions - Documentation about the functions - Sample datasets - Tests for checking your code.
48
CRAN(Comprehensive R Archive Network)
An online archive with R packages, source code, manuals, and documentation.
49
R Packages
Packages offer a helpful combination of code, reusable R functions, descriptive documentation, tests for checking operability, and sample data sets.
50
Tidyverse (R)
A system of packages in R with a common design philosophy for data manipulation, exploration, and visualization.
51
How do Conflicts in R studio happen?
Conflicts happen when packages have functions with the same names as other functions.
52
8 Core tidy verse Packages
-ggplot2 -Tibble -tidyr -readr -purrr -dplyr -stringr -forcats
53
Conflict notifications
are just one type of message that can show up in the console.
54
Vignette
is documentation that acts as a guide to an R package.
55
Four Packages that are an essential part of the workflow for data analysts:
- ggplot2 -dplyr -tidyr -readr
56
ggplot2 (R)
Create a variety of data viz by applying different visual properties to the data variables in R.
57
tidyr(R)
A package used for data cleaning to make tidy data.
58
readr(R)
Used for importing data
59
dplyr(R)
Offers a consistent set of functions that help you complete some common data manipulation tasks.
60
Factors (R)
Store categorical data in R where the data values are limited and usually based on a finite group like country or year.
61
What you have Learnt so far.
- Fundamentals of R from variables to vectors and more. -Explored the different operations in R and saw how they can help you complete calculations. - Check out pipes and how they can make your programming more efficient. -Unpacked packages to find out how they are a big part of what you can do in R.
62
Nested
In Programming, describes code that performs a particular function and is contained within code that performs a broader function.
63
Nested function
A function that is completely contained within another function.
64
Keyboard shortcuts for inserting pipe operators
- PC/ Chromebook: ctrl+shift+m -Mac: cmd+shift+m
65
Things to consider when using pipes:
-Add the pipe operator at the end of each line of the piped operation except the last one. -Check your code after you have programmed your pipe. - Revisit piped operations to check for parts of your code to fix.
66
Data Frame
A collection of columns
67
Data Frames rules
- Columns should be named - Data stored can be many different types, like numeric, factor, or character. - Each column should contain the same number of data items.
68
In Tidy verse
- Tibbles are like streamlined data frames
69
Tibbles
-Never change the data types of the inputs. - Never change the names of your variables. - Never create row names - Make printing easier
70
Tidy data (R)
A way of standardizing the organization of data within R.
71
Tidy data standards
- Variables are organized into columns. - Observations are organized into rows. - Each value must have its own cell.
72
.CVS (comma-separated values )
a .csv file is a plain text file that contains a list of data. They mostly use commas to separate (or delimit) data, but sometimes they use other characters, like semicolons.
73
.TSV(tab-separated values)
a tsv file stores a data table in which the columns of data are separated by tabs. For example, a database table or spreadsheet data.
74
.FWF (Fixed width files)
a. fwf file has a specific format that allows for the savings for textual data in an organised fashion
75
.LOG
a log file is a computer-generated file that records events from operating systems and other software programs.
76
Arithmetic Operators
let you perform both math operations like addition, subtraction, multiplication, and division.
77
Relational Operators
Relational operators, also known as comparators, allow you to compare values. Relational operators identify how one R object relates to another ex <,>, <=.
78
Logical operators
allow you to combine logical values. Logical operators return a logical data type or Boolean (TRUE or FALSE).
79
Assignment Operators
let you assign values to variables. ex <-
80
Organizational functions
Help you sort, filter, and summarize your data.
81
Cleaning functions
help you preview and rename data so its easier to work with.
82
Transformational functions
help you separate and combine data, as well as create new variables.
83
Anscombe's quartet
Four datasets that have nearly identical summary statistics.
84
Popular Visualizations packages in R .
-ggplot2 -Plotly -Lattice -RGL -Dygraphs -Leaflet -Highcharter -Patchwork -gganimate -ggridges
85
The basics of ggplot2
The ggplot2 package lets you make high-quality, customizable plots of your data. ggplot-2 is based on the grammar of graphics, which is a system for describing and building visualizations.
86
Benefits of ggplot-2
-Create different types of plots -Customize the look and feel of plots -Create high quality visuals - Combine data manipulation and visualization.
87
Our focus on core concepts in ggplot-2
-Aesthetics -Geoms -Facets - Labels and annotations
88
Aesthetic (R)
A visual property of an object in your plot.
89
Geom (R)
The geometric object used to represent your data.
90
Facets (R)
Let you display smaller groups, or subsets, of your data.
91
Labels and annotations (R)
Let you customize your plot
92
Mapping (R)
Matching up a specific variable in your dataset with a specific aesthetic.
93
Steps to Create your plot in R programming
1) Start with the ggplot function and choose a dataset to work with. 2) Add a geom_funtion to display your data. 3) Map the variables you want to plot in the arguments of the aes() function.
94
Aesthetics for points
-X -Y -Color -Shape -Size -Alpha
95
Geom functions
-geom_point -geom_bar -geom_line
96
Smoothing
enables that detection of a data trend when you can't easily notice a trend from a plotted data points.
97
Loess smoothing
The loess smoothing process is the best for smoothing plots with less than 1000 points.
98
Gam smoothing
Gam smoothing or generalized additive model smoothing is useful for something plots with a large number of points. i.e. more than 1000 points.
99
Facet functions
-Facet_wrap() -Facet_grid()
100
To add a title to a chart
label function= title= Average product rating.
101
Blue and yellow bars
To highlight underperforming products, use an aesthetics function: col = ifelse (x<2, 'blue', 'yellow').
102
Bar chart
To create the bars on the chart, use a geom function: geom_bar ().
103
Trend line
To create a trend line, use a geom function: geom_smooth ().
104
Scatter plot chart
To create the scatter plot, use a geom function: geom_point ().
105
Compare data
To compare data trends across average ratings, use a facets function: facet_wrap (~Average Rating)
106
Axis labels
To label the axes, use an aesthetics function: aes (x = Average price (USD), y = Product)
107
Annotate
To add notes to a document or diagram to explain or comment upon it.
108
R Markdown
A file format for making dynamic documents with R.
109
Course Overview for R markdown
- An Overview for R Markdown -How to install R Markdown in RStudio - How to Create an R Markdown document - The Structure and components of the document - How to insert and edit pieces of code called chunks in your document. - The Process of exporting your documentation.
110
Markdown
A syntax for formatting plain text files.
111
Markdown formatting
-Add a_single_underscore - or *asterisk*
112
Markdown report output
Add a single underscore or asterisk.
113
R Notebook
Lets users run your code and show the graphs and charts that visualize the code.
114
R Markdown file formats
- HTML, PDF and Word documents. -Slide presentation -Dashboard
115
HTML
The set of markup symbols or codes used to create a webpage.
116
Other notebook options
-Jupyter -Kaggle - Google Colab
117
Jupyter notebooks
are documents that contain computer code and rich text elements – such as comments, links, or descriptions of your analysis and results
118
YAML
A Language for data that translates it so it's readable.
119
Code Chunk
Code added in an.Rmd file
120
Delimiter
A character that indicates the beginning or end of a data item.
121
Code chunk delimiters
```{r } and ```
122
Code chunk keyboard shortcuts
PC/Chromebook: ctrl+alt+I
123
What we have explored so far?
- What R Markdown is - How to use R Markdown in Rstudio to create.Rmd files - Structure of these files and how to format them to make reports. - What code chunks are and how to include them in your documentation. - How to take all of your analyses and transform it from an .Rmd file into a report.
124
Case study
A common way for employers to assess job skills and gain insight into how you approach common data related challenges.
125
Portfolio
Collection of case studies that can be shared with potential employers.
126
Best Practices for Case studies and Portfolios
1) Make sure your case study answers the questions being asked. 2) Make sure that you are communicating the steps you have taken and the assumptions you have made. 3) The best portfolios are personal, unique and simple. 4) Make sure your portfolio is relevant and presentable.