Concepts and stuff (M1) Flashcards
Data science
Data science is a multidisciplinary field that uses scientific methods, processes, algorithms, and systems to extract insights and knowledge from structured and unstructured data. It combines techniques from statistics, mathematics, computer science, and domain-specific knowledge to analyze and interpret complex data sets. The primary goal of data science is to uncover hidden patterns, make predictions, and generate actionable insights to support decision-making.
Numeric data
Discret Data, Continous Data
SQL
SQL, or Structured Query Language, is a domain-specific programming language used for managing and manipulating relational databases. It provides a standardized way to interact with relational database management systems (RDBMS), allowing users to define, query, update, and manage data.
The 5 v’s of Big data
Volume, Velocity, Variety, Value,Veracity
Statistics
It provides methods for making inferences about the characteristics and behavior of populations based on samples taken from them.
Central tendency measures
Mean, Median, Mode
Measures of dispersion
Range, Variance, Standard Deviation
Variance
It provides a measure of how much individual data points in a dataset differ from the mean (average) of the dataset. A higher variance indicates greater variability, while a lower variance suggests that the data points are closer to the mean.
Standard deviation
It provides a more interpretable measure of spread in the original units of the data.
Skewness
It indicates whether the data is skewed to the left or right relative to the normal distribution.
Machine learning
The primary goal of machine learning is to create systems that can automatically learn and improve from experience without being explicitly programmed for a specific task.
Predictive Models
Neuronal Network, Support Vector Machine, Ramdom Forest, Bayesian Models, K Nearest Neighbors
Data Warehouse
Centralized repository for storing and managing large volumes of data from various sources within an organization. It is designed to support business intelligence (BI) and analytical reporting activities.
Data Lake
Unlike traditional data warehouses, which follow a structured approach, data lakes can store both structured and unstructured data. The concept of a data lake is often associated with big data and the need to handle large-scale, diverse datasets.