Module 1 Flashcards

(92 cards)

1
Q

_______ is about extracting knowledge from data.

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

It is a research field at the intersection of statistics, artificial intelligence, and computer science.

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

It is also known as predictive analytics or statistical learning.

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A field of study concerned with giving computers the ability to learn without being explicitly programmed.

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

It is a discipline of Artificial Intelligence (AI) that provides machines with the ability to automatically learn from data and past experiences while identifying patterns to make predictions with minimal human intervention.

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

True or False.
Machine Learning (excluding deep learning) is a study of learning algorithms.

A

False.
(INCLUDING deep learning)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Data Science concerned with?

A
  • Collection, preparation, and analysis of data.
  • Leverages AI/ML, research, industry expertise, and statistics to make business decisions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Artificial Intelligence concerned with?

A
  • Technology for machines to understand/interpret, learn, and make “intelligent” decisions.
  • Includes Machine Learning among many fields.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Machine Learning concerned with?

A
  • Algorithms that helps machines improve through supervised, unsupervised, and reinforcement learning.
  • Subset of Al and Data Science
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Beyond that, automation by machine learning can _________ risks caused by fatigue or inattention.

A

Mitigate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

__________ are better suited than humans for tasks that are routine, repetitive, or tedious.

A

Learning Machines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Machines driven by algorithms designed by humans are able to learn __________ and _________ and to fulfill tasks desired by humans.

A

Latent Rules and Inherent Patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

This is a collection of data used in machine learning tasks.

A

Dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Machine learning methods enable computers to operate ___________ without explicit programming.

A

Autonomously

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Each data record is called a “_____”.

A

Sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

This is the process of creating a model from data.

A

Learning(Training)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the different types of machine learning?

A
  • Supervised Machine Learning
  • Unsupervised Machine Learning
  • Semi-supervised Machine Learning
  • Reinforcement Learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

It is a dataset used in the training process.

A

Training Set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The dataset used in the testing process is called “________”, and each sample is called a/an “_______”.

A

Test Set; Test Sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Events or attributes that reflect the performance or nature of a sample in particular aspects are called “_______”.

A

Features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In training sets, each sample is referred to as a “_________”.

A

Training Sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

This refers to the process of using the model obtained after learning for prediction.

A

Testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the Machine Learning Workflow?

A

(PayDay MayDay)
- Project Setup
- Data Preparation
- Modeling
- Deployment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What steps are under Project Setup in the Machine Learning Workflow?

A
  • Understand the business goals.
  • Choose the solution to your problem.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
What steps are under Data Preparation in the Machine Learning Workflow?
(C CED) - Data Collection - Data Cleaning - Feature Engineering - Split the Data
19
What steps are under Modeling in the Machine Learning Workflow?
(Hi There, My Ass) - Hyperparameter tuning - Train your Models - Make predictions - Assess Model Performance
19
What occurs in Phase 1 of Machine Learning?
(PLT) - Processing - Learning - Testing
20
What steps are under Deployment in the Machine Learning Workflow?
(DMI) - Deploy the model - Monitor model performance - Improve your model
21
What occurs in Phase 2 of Machine Learning?
New Data + Trained Model -> Prediction -> Predicted Data
21
How does Machine Learning work? (What are the two phases in Machine Learning?)
Phase 1: Learning Phase 2: Prediction
22
What are some examples of Machine Learning Languages?
PRC - Python - R - C++
23
What are some examples of Big Data Tools?
- MemSQL - Apache Spark
24
What are some examples of General Machine Learning Frameworks?
NSN - Numpy - Scikit-learn - NLTK
25
What are some examples of Data Analysis and Visualization Tools?
PM JWT - Pandas - Matplotlib - Jupyter Notebook - Weka - Tableau
26
What are some examples of ML Frameworks for Natural Network Modelling?
PK CTT - Pytorch - Keras - Caffe 2 - Tensorflow & Tensorboard
27
What are the Top Programming languages for Machine Learning?
- Python - R - Java - Julia - Scala - C++ - JavaScript - Lisp - Haskell - Go
28
Why use Python for Data Science?
EEF CSS - Easy-to-read Syntax - Extensive Libraries and Frameworks - Flexibility - Compatibility with other Languages - Strong Community Support - Scalability and Performance
29
This Python library is a collection of functions for scientific computing.
Scipy
29
This Python library is one of the fundamental packages for scientific computing
Numpy
29
This Python library is a very popular tool and the most prominent Python library for Machine Learning.
Scikit-learn
29
This is a interactive environment for running code in the browser.
Jupyter Notebook
29
This is a Python distribution made for large-scale data processing, predictive analysis, and scientific computing.
Anaconda
30
This Python library is a library for data wrangling and analysis.
Pandas
30
This Python library is used for Machine Learning.
Pytorch/Tensorflow
30
This Python library is the prime scientific plotting library.
Matplotlib
31
This Python library is used in interacting with SQL databases.
SQLModel
32
This Python library is used for Web Crawling
Scrapy
33
This Python library is used for Deep Learning
Keras
34
What are the applications for Machine Learning?
AI METH - Automobile - Insurance - Manufacturing - E-commerce - Transportation - Healthcare
35
What are some quality problems that real data posses?
- Incompleteness - Noise - Inconsistency
36
It is an observation that seems to be distant from other observations.
Outlier
37
It is one observation that follows a different logic or generative process than the other observations.
Outlier
38
_________ is the practice of cleaning, altering, and reorganizing raw data prior to processing and analysis,
Preprocessing
39
Data Preprocessing is also known as "__________"
Data Preparation
40
It is an important step before processing to prepare the data for analysis or modeling by cleaning and transforming it.
Data Preprocessing
41
What are the key steps in Data Preprocessing?
PCR TEV - Data Profiling - Data Cleansing - Data Reduction - Data Transformation - Data Evaluation - Data Validation
42
What are the two main categories of Preprocessing?
Data Cleansing and Feature Engineering
43
This is composed of techniques for cleaning messy data.
Data Cleansing
44
This features techniques used by data scientists to organize the data in ways that make it more efficient to train data models and run inferences against them.
Feature Engineering
45
What is/are done during Data Cleansing?
- Identify and sort out missing data - Reduce noisy data - Identify and remove duplicates
46
What is/are done during Feature Engineering?
- Feature scaling of normalization - Data Reduction - Discretization - Feature Encoding
47
This is used to understand the main characteristics of the data, identify patterns to discover patterns, spot anomalies, test a hypothesis, or check assumptions.
Exploratory Data Analysis (EDA)
48
This often employs data visualization methods.
Exploratory Data Analysis (EDA)
49
What are the different activities done during Exploratory Data Analysis?
VSAUCE (VSOCH) - Visualization - Summary Statistics - Outlier Detection - Correlation Analysis - Hypothesis Testing
50
This activity in EDA involves creating plots and charts to visualize data distributions and relationships.
Visualization
51
This activity in EDA involves calculating measures like mean, median, variance, and standard deviation.
Summary Statistics
52
This activity in EDA involves Identifying unusual data points.
Outlier Detection
53
This activity in EDA involves examining relationships between variables.
Correlation Analysis
54
This activity in EDA involves testing initial assumptions about the data.
Hypothesis Testing
55
It is the act of filling in missing values by estimating them.
Imputation
56
This refers to the process of taking a trained ML model and making it available for use in real-world applications.
Machine Learning Model Deployment
57
What are the different steps in Machine Learning Model Deployment?
- Training - Validation - Development - Monitoring
58
This involves data preprocessing, feature engineering, and rigorous testing to ensure the model is robust and ready for real-world scenarios.
Training
59
This ensures the infrastructure can handle the model's computational requirements is vital, requiring validation and effective testing for scalability before deploying models.
Validation
60
What are the different steps involved in the Deployment of a model?
DDC SCC - Defining how to extract or process data in real time. - Determining the storage required for these processes. - Collection and predictions of model and data patterns. - Setting up APIs, tools and other software environments to support and improve predictions. - Configuring the hardware (cloud or on-prem environments) to help support the ML model. - Creating a pipeline for continuous training and parameter tuning.
61
What are the best practices for successful ML Model Deployment?
- Choosing the right infrastructure - Effective Versioning and Tracking - Robust Testing and Validation - Implementing Monitoring and Alerting
62
It covers the ethical and moral obligations of sharing, collecting, and using data, focused on ensuring that data is used fairly, for good.
Data Ethics
63
This principle of Data Ethics goes as follows: The first principle of data ethics is that an individual has ownership over their personal information.
Ownership
63
This principle of Data Ethics goes as follows: Just as it’s considered stealing to take an item that doesn’t belong to you, it’s unlawful and unethical to collect someone’s personal data without their consent.
Ownership
63
What are the 5 principles of Data Ethics?
- Ownership - Transparency - Privacy - Intention - Outcomes
63
This principle of Data Ethics goes as follows: In addition to owning their personal information, data subjects have a right to know how you plan to collect, store, and use it.
Transparency
63
This principle of Data Ethics goes as follows: Before collecting data, ask yourself why you need it, what you’ll gain from it, and what changes you’ll be able to make after analysis.
Intention
63
This principle of Data Ethics goes as follows: Another ethical responsibility that comes with handling data is ensuring data subjects’ ___________. Even if a customer gives your company consent to collect, store, and analyze their personally identifiable information (PII).
Privacy
64
This principle of Data Ethics goes as follows: If your intention is to hurt others, profit from your subjects’ weaknesses, or any other malicious goal, it’s not ethical to collect their data.
Intention
64
This principle of Data Ethics goes as follows: Even when intentions are good, the outcome of data analysis can cause inadvertent harm to individuals or groups of people. This is called a disparate impact
Outcome
65
What are the Data Privacy Regulations (New Rules of Data)?
1. Trust over Transactions - this first rule is all about consent. 2. Insight over Identity - Avoid compromising both privacy and security. 3. Flows over Silos - No need to work on silos, rather CIOs and CDOs can work together to facilitate the flow of insights.
66
What are the Data Subject Rights?
The right to: - To be informed - To file a complaint - To damages - To object - To access - To rectify - To erasure or blocking - To data portability
67
It is a set of principles and processes for data collection, management, and use.
Data Governance
67
It ensures data is accurate, consistent, and available while protecting data privacy and security.
Data Governance
68
It is a set of policies, procedures, and standards that implements data governance for an organization.
Data Governance Framework
69
_____________ describes what to do, the ______________ describes how to do it.
Data Governance; Data Governance Framework
69
What are the pillars of Data Governance?
- Ownership and Accountability - Data Quality - Data Protection and Safety - Data Use and Availability - Data Management