# INTRO+DATASETS Flashcards

What is Data Science

process of building, cleaning, structuring datasets to analyse and extract meaning

Process of Data Science

- Ask interesting qn
- Get data
- Explore data
- model data
- visualize and communicate results

key principles in DS

- get many data sources
- understand how data collected
- use statistical models
- understand correlations
- good comm skills

What does the discussion of probability include

-random experiments that produce a series of possible outcomes (can be infinity outcomes)

elements of probability model (uncertainty of experiment)

- sample space(ohm symbol)(set that contains all possible outcomes. outcomes are mutually exclusive and collective exhaustive)(an event is a collection of one or more outcomes–subset of sample space)
- probability fraction p(A) assigns event A a no. between 0 and 1. Complement of event A= A^c– p(A^c)=1-p(A)

conditional probability

probability of outcome A given that event B (DENOMINATOR)has occurred.

independent

A and B are independent if the occurrence of B provides no information about A. intersect of events A and B =P(A)*P(B)

Variable?

variable is any characteristic observed in a study. summary of ALL outcomes in a random process

quantitative variable

there is meaningful distance between any 2 points of data

types of categorical variable

- ordinal

- nominal

types of quantitative variable

- discrete (separate numbers)

- continuous (possible values form an interval)

distribution of a variable (probability distribution)

list of possible outcomes+associated probability

Cumulative probability distribution

probability that the discrete variable is less than or equal to a particular value.

probability density function (used for continuous variable as impossible to list down all values and prob for each value

Probability density function (PDF) is the probability that the value of a continuous variable falls within an interval.

cumulative density function

Cumulative distribution function (CDF) is the probability that the variable is less than or equal to a particular value.

modal category?

category with the highest frequency