Data Science Terms and Techniques Flashcards by Maika Henry Northrop

Hypothesis

an assumption made about the world that can be tested using data; an educated guess that needs to be validated or disproved by experiment and data

How well did you know this?

Not at all

Perfectly

Statistical Inference

a branch of statistics dedicated to drawing conclusion about the world using smaller data samples

How well did you know this?

Not at all

Perfectly

Confidence intervals

an interval estimate used to express the degree of uncertainty associated with a sample statistic

How well did you know this?

Not at all

Perfectly

Statistical Significance

an estimate of how likely that the observed event has some kind of real world importance; an estimate of how likely an event might occur randomly - the smaller the number, the more likely that the observed event has some kind of real-world importance.

How well did you know this?

Not at all

Perfectly

Big Data

a collective term used for technology to analyze large amounts of data to unearth insights, typically into human behavior and patterns

How well did you know this?

Not at all

Perfectly

Data Set

a collection of data to be analyzed

How well did you know this?

Not at all

Perfectly

Analytics

a collective term for techniques used to analyze data, mostly to draw business insights

How well did you know this?

Not at all

Perfectly

Algorithm

a well defined set of steps to solve a specific problem

How well did you know this?

Not at all

Perfectly

Technology Stack

the collective set of tools and programs used in an organization or team

How well did you know this?

Not at all

Perfectly

Pre-packaged distribution

a package that bundles all of the required python tools and libraries e.g. numpy, scipy, pandas, scikit-learn, jupyter, matplotlib, seaborn and statsmodels. In the python world, Anaconda and Canopy are popular distributions for scientific computing and data science.

How well did you know this?

Not at all

Perfectly

Regular Expressions

a technique to quickly search for or substitute complex patterns in strings

How well did you know this?

Not at all

Perfectly

Jupyter

formerly known as IPython, this tool enables data scientists to prototype code rapidly and combine it with useful documentation

How well did you know this?

Not at all

Perfectly

Raw data

data from original or secondary sources that may be unstructured or corrupted and needs more work performed on it before it can be analyzed

How well did you know this?

Not at all

Perfectly

Data Wrangling

process of taking data in its raw form and manipulating it in various ways into a useful form

How well did you know this?

Not at all

Perfectly

Messy or Dirty data

data can be messy or dirty in the sense that it might contain values that are invalid, missing, corrupted, inconsistent or non-uniform

How well did you know this?

Not at all

Perfectly

Storytelling

Study These Flashcards

this highly effective art of communication is not limited to entertainment; it is a crucial skill needed to communicate answers to questions that data scientists ask

Visualization

Study These Flashcards

a picture is worth a thousand words, this rings true when you are trying to present data in an easy to understand manner. Data visualization is increasingly used to depict data analysis. Think about how data is visualized during election times.

Supervised Learning

Study These Flashcards

algorithms that create a model of the world by looking at labeled examples

Unsupervised Learning

Study These Flashcards

algorithms that create a model of the world using examples without labels

Bayesian Analysis

Study These Flashcards

algorithms based on Bayes Theorem, which makes inferences about the world by combining domain knowledge or assumptions and observed evidence

Clustering

Study These Flashcards

a family unsupervised learning algorithms used to automatically find groups in data sets

What does data science entail?

Study These Flashcards

processing, analyzing and visualizing data.

Why is python a popular language for data science?

Study These Flashcards

Because it can:
handle large datasets
works with common mathematical functions
creates powerful data visualizations

What is Python Jupyter Notebook?

Study These Flashcards

It’s built around a typical data analysis workflow and very different from an integrated development environment such as Pychar which focuses more on just working with code. In Jupyter, you work with notebooks which mix plain text, code and code outputs in one view. You can interleave code with markdown text explanations which enables you to easily explore data, create visualizations and share your results.

What is a kernel in Jupyter Notebook?

The kernel defines the programming language that the code in the notebook will be written in. This is displayed in the top right corner of the notebook. When you run code, it's executed inside the kernel session.

What strategies should one use to address missing data?

1. Remove any rows that contain missing data 2. Populate the empty fields with a specified value 3. Populate the empty fields with a calculated value. 4. Use analysis techniques that work with missing data.

Data Science Terms and Techniques Flashcards

(26 cards)