Bioinformatics Exam 3 Review Flashcards
Programming
helps in collecting and manipulating data, automating analysis workflows (to show people what you did), minimizing human error and generating reproducible reports, quick processing of large datasets and repetitive tasks, visualizing and making sense of the data
Programming language
letters/symbols create words according to rules, language for humans to formulate instructions for computers to generate some desired output, compiler and interpreter software allow an instruction formulated in a programming language to be translated into executable machine level operations.
Pathway: Instruction (in your mind) → instruction in programming language → instruction in machine level language → execution of instruction/computation → generated output
Source Code
a set of instructions formulated in a programming language that is readable by humans
Program
a set of instructions stored in a form that can be executed by a computer
Compiler
a software that translates source code into a machine level program that is (usually) efficiently optimized for the machine it is compiled for
Time to translate → Slow
Time to execute → Fast
Interpreter
translates source code scripts into machine level operations “on the fly” and executes them line by line
Time to translate → Fast
Time to execute → Slow
1976
Chambers, Becker and Wilks develop the S statistical programming language at Bell laboratories
Aim: facilitate quick transitions from idea to software
This Interpreter based language allowed modifications, testing and trouble shooting of programs quick and convenient.
1993
Ihaka and Gentleman re-implement S and Name it the “R programming language”
1995
R is decided to be made freely available under the GNU General Public license (But not officially released)
1997
R Core Group is founded and starts taking control of R’s further development, the Comprehensive R Archive Network (CRAN) is launched, enabling sharing and curation of user developed components that extends R’s capabilities
2000
R version 1.0.0 is released to the general public
2009
New york Times article: “Data Analysts Captivated by R’s Power”, Ashlee Vance
Good description of how R makes a difference → Daryl Pregibon (Google): “it allows statisticians to do very intricate and complicated analyses without knowing the blood and guts of computing systems”
2017
a study found that R has shown extreme growth
2019
Another study found that R is the most requested programming language
Comprehensive R Archive Network (CRAN)
a network of ftp and web servers storing versions of code and documentation for R. This serves as the main general purpose repository for R packages and if there is something common that is a common problem you can use a pre-made package to solve the answer to your problem.
R
language and environment for statistical computing and graphics, open source language that is free, provides tools for statisticians, data miners, data analysts, data scientists and academic researchers
Bioconductor
Another R package repository, free, dedicated to the analysis of genomic data and biological high-throughput assays, primary focus on an R package repository serving the needs of bioinformaticians and biomedical researchers
Packages available: >1800
Mission: accessibility of powerful analysis and visualization tools, reproducible research, rapid development of software components that are both scalable and compatible with each other
Commands in R
R’s interpreter can process 2 forms of these → expressions and assignments, these can be separated by line-breaks or the “;” character, individual components within commands can be arbitrary separated by spaces and tabs
Expressions
commands that are evaluated, printed (optional) and their output is lost, these take some input arguments or values and return some output values
Operators
are generally expressed via 1 to 3 consecutive special characters and often handle fundamental, essential programming tasks, there are several other operators that handle tasks such as logic or comparison
Examples: ? opens a webpage with helpful documentation and explanations of a function
Objects
individual pieces of data that have two major attributes,:
Data type:what type of information it contains
Value: the actual information that it contains
NOTE: internally the value of an object is just a bunch of zeros and ones in the memory of the computer the data type is what tells R how to interpret and display the value of the object.
Scalar and multidimensional data types
the two fundamental classes of data types
Character Objects
display letters, words and text, wrapped in quotation marks
Logical Objects
only two possible values (yes and no/ true and false {abbreviated T and F}), used when you want to check or remember whether or not something is true or has happened when you run a program.