Math Flashcards
(185 cards)
Alternative hypothesis
In statistical hypothesis testing, the null hypothesis and alternative hypothesis are two mutually exclusive statements.
The alternative hypothesis (often denoted as Ha or H1) is a statement that contradicts the null hypothesis and usually assumes that hypothesised effect exists. It represents the researcher’s hypothesis or the claim to be tested. The alternative hypothesis suggests that there is a significant effect, relationship, or difference between variables in the population, while null hypothesis usually states that there is no effect.
Arg Max function
Arg Max (arg max): A mathematical function that returns the input value where a given function achieves its maximum value. In other words, it finds the input that makes the function’s output the highest.
arg maxₓ f(x) = {x | f(x) = max(f(x’))}
(where x’ represents all possible inputs)
Common Uses:
* Optimization problems
* Machine learning algorithms
* Decision-making (finding the best solution)
Average (mean)
Average (Mean): A measure of central tendency representing the typical or central value in a dataset.
Calculation: Sum of all values divided by the total number of values.
Mathematical Formula:
x̄ = (1/n) Σ_{i=1}^n x_i
(Where x̄ is the average, n is the number of data points, and x_i represents each value)
Uses:
* Descriptive statistics
* Data analysis
* Comparing datasets or groups
Base rate
Refers to the underlying probability of an event occurring in a population, regardless of other factors. It serves as a benchmark for assessing the likelihood of an event. Understanding the base rate is crucial for making accurate predictions and evaluating the performance of predictive models. For example, in medical diagnosis, the base rate might represent the prevalence of a disease within a certain population, providing valuable context for interpreting diagnostic test results.
Basis
In linear algebra, a basis is a set of linearly independent vectors that span a vector space, meaning any vector in the space can be expressed as a unique linear combination of basis vectors. Basis vectors form the building blocks for representing and understanding vector spaces, facilitating operations such as vector addition, scalar multiplication, and linear transformations. For example, in Euclidean space, the standard basis consists of orthogonal unit vectors aligned with the coordinate axes (e.g., {(1, 0, 0), (0, 1, 0), (0, 0, 1)} for 3-dimensional space), enabling the representation of any point in the space using coordinates along these axes.
Bellman Equations
Bellman Equations are a set of recursive equations used in dynamic programming and reinforcement learning to express the value of a decision problem in terms of the values of its subproblems. They provide a way to decompose a complex decision-making process into smaller, more manageable steps.
Application: Bellman Equations are fundamental in reinforcement learning algorithms such as value iteration and Q-learning, where they are used to compute the optimal value function or policy for a given environment.
Example: In a grid world environment where an agent must navigate to a goal while avoiding obstacles, Bellman Equations express the value of each state as the immediate reward plus the discounted value of the subsequent state reached by taking an optimal action.
Bernoulli Distribution
A discrete probability distribution that models the outcomes of a binary random experiment. It is characterized by a single parameter, p, representing the probability of success (usually denoted by 1) in a single trial and the probability of failure (denoted by 0) as 1 - p. The distribution is commonly used to model simple events with two possible outcomes, such as success or failure, heads or tails, and yes or no.
Binomial coeficient formula
The formula calculates the number of ways you can choose a smaller group (k) out of a larger group (n) when the order you pick them in doesn’t matter.
Formula: (n k) = n! / (k! (n-k)!)
Binomial distribution
Binomial Distribution: A discrete probability distribution describing the number of successes in a fixed number of independent trials, each with the same success probability (p). Modeling binary outcomes (success/failure) in various fields.
Common Notation: B(n, p)
* n: Number of trials
* p: Probability of success on each trial
The probability mass function (PMF) of the binomial distribution gives the probability of observing exactly k successes in n trials:
P(X = k) = (n k) pᵏ (1 - p)ⁿ⁻ᵏ
(where ‘k’ is the number of successes and (n k) is the binomial coefficient)
Block Matrices
Matrices composed of smaller submatrices arranged in a rectangular array.
Capital Sigma Notation
The summation over a collection X = {x1, x2, . . . , xn≠1, xn} or over the attributes of a vector x = [x(1), x(2), . . . , x(m≠1), x(m)]
Cartesian coordinate system
The Cartesian coordinate system, named after the French mathematician René Descartes, provides a geometric framework for specifying the positions of points in a plane or space using ordered pairs or triplets of numbers, respectively. In a two-dimensional Cartesian coordinate system, points are located with reference to two perpendicular axes, usually labeled x and y, intersecting at a point called the origin. The coordinates of a point represent its distances from the axes along the respective directions. The Cartesian coordinate system serves as the foundation for analytic geometry, facilitating the study of geometric shapes, equations, and transformations in mathematical analysis and physics.
Cauchy Distribution
A probability distribution that arises frequently in various areas of mathematics and physics. It is characterized by its symmetric bell-shaped curve and heavy tails, which indicate that extreme values are more likely compared to other symmetric distributions like the normal distribution. The Cauchy distribution has no defined mean or variance due to its heavy tails, making it challenging to work with in statistical analysis. However, it has applications in fields such as physics, finance, and signal processing.
Centra Limit Theorem (CLT)
A key concept in statistics that states that the distribution of sample means from any population approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution. This theorem is crucial in inferential statistics as it allows for the estimation of population parameters and the construction of confidence intervals and hypothesis tests, even when the population distribution is unknown or non-normal. The Central Limit Theorem is widely applied in various fields, including finance, biology, and engineering, where statistical inference is essential.
Central Tendencies
Central tendencies, also known as measures of central tendency, are summary statistics that describe the central location or typical value of a dataset. They provide insights into the distribution of data and help summarize the main features of the dataset. The three main measures of central tendency are the:
- mean (the arithmetic average of all values in the dataset and is sensitive to outliers)
- median (the middle value of the dataset when the values are arranged in ascending or descending order and is robust to outliers.)
- mode. (the most frequently occurring value(s) in the dataset and is applicable to both numerical and categorical data.)
Chain rule
Chain Rule (Calculus): A fundamental rule for finding the derivative of composite functions (functions made up of other functions).
States: The derivative of the composite function f(g(x)) is equal to the derivative of the outer function f evaluated at the inner function g(x), multiplied by the derivative of the inner function g. In mathematical notation, this can be expressed as:
d/dx [f(g(x))] = f’(g(x)) * g’(x)
where f’(g(x)) represents the derivative of the outer function f evaluated at g(x), and g’(x) represents the derivative of the inner function g.
the derivative of a sum is the sum of derivatives:
u = w1x1 + w2x2 + b
y = f(u) //(f being the activation function)
dy/dx1 = (dy/du) * (du/dx1) = f’(u) * w1
dy/dw1 = (dy/du) * (du/dw1) = f’(u) * x1
dy/db = (dy/du) * (du/db) = f’(u) * 1
Each coefficient (weight) or bias has its own “chain” within the overall calculation. The derivative of the activation function (f’(u)) is a common factor dictating how much a change anywhere in the input (u) affects the output. This is the core of why we can calculate the contribution of individual weights and biases to the error during backpropagation!
Codomain
In mathematics, the codomain of a function is the set of all possible output values or elements that the function can produce. It is the set of values to which the function maps its domain elements. The codomain represents the entire range of possible outputs of the function, regardless of whether all elements in the codomain are actually attained by the function.
The codomain is distinct from the range, which refers to the set of actual output values produced by the function when evaluated on its domain. In function notation, the codomain is typically denoted as the set Y in the function f: X → Y, where X is the domain of the function and Y is the codomain.
The codomain provides information about the possible outputs of a function and helps define the scope and range of the function’s behavior.
Combinatorics
A branch of mathematics concerned with counting, arranging, and analyzing the combinations and permutations of finite sets of objects. In machine learning and artificial intelligence, combinatorics plays a crucial role in feature engineering, model parameterization, and optimization algorithms.
Concave function
The opposite of Convex Function. Function in which whenever you connect to points of the function, then a line is always above function graph. Concave functions are essential in convex optimization, where they serve as objective functions or constraints in optimization problems. In machine learning and artificial intelligence, concave functions find applications in convex optimization algorithms, such as gradient descent, for training models, minimizing loss functions, and solving constrained optimization problems. Understanding concave functions is crucial for designing efficient optimization algorithms and analyzing the convergence properties of machine learning models.
Conditional distribution
The probability distribution of a random variable given the value or values of another variable. It describes the likelihood of observing certain outcomes of one variable given specific conditions on another variable. Conditional distributions are fundamental for modeling dependencies and relationships between variables in probabilistic models, Bayesian inference, and predictive modeling tasks.
Confidence Intervals
Statistical intervals used to estimate the range of plausible values for a population parameter, such as the mean or proportion, based on sample data. They provide a measure of uncertainty around the point estimate and quantify the precision of estimation. Confidence intervals are essential for hypothesis testing, parameter estimation, and assessing the reliability of statistical inference in machine learning and data analysis.
Continous
Continuous variables are those that can take any real value within a certain range or interval. They are characterized by an infinite number of possible values and are typically represented by real numbers. Continuous variables are prevalent in data analysis, modeling, and predictive tasks, such as regression analysis, time series forecasting, and density estimation.
Continous random variable
In contrast to discrete random variables, continuous random variables can take on an infinite number of possible values within a specified range. These values are typically associated with measurements or quantities that can take any value within a certain interval. Continuous random variables are described by probability density functions (PDFs), which indicate the likelihood of observing a value within a given range. Examples of continuous random variables include height, weight, temperature, and time.
Continous variable
A type of quantitative variable that can take on an infinite number of values within a specified range or interval. Continuous variables are characterized by having an uncountable and infinite number of possible values, including both whole numbers and fractional values. They can take on any value within their range, and the concept of “gaps” between values is not meaningful. Continuous variables are typically represented by real numbers and are subject to arithmetic operations such as addition, subtraction, multiplication, and division.
Examples of continuous variables include measurements such as height, weight, temperature, time, and distance.