MODULE 1 Flashcards

M1S1 - M1S2 (68 cards)

1
Q

Algorithm where samples are used for training.

A

Machine Learning Algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

It is a research field at the intersection of statistics, artificial intelligence, and computer science.

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

It is the practice of cleaning, altering, and reorganizing raw data prior to processing and analysis

A

Preprocessing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

contains inconsistent records

A

Inconsistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

contains incorrect records or exceptions

A

Noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Creating plots and charts to visualize data distributions and relationships.

A

Visualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

T/F
The performance of ML algorithms adaptively improves with an increase in the number of available samples during the ‘training’ processes.

A

FALSE: (‘learning’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

T/F
Data reduction is a data cleansing technique.

A

FALSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

T/F
Reducing noise in data is a feature engineering technique.

A

FALSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

It covers the ethical and moral obligations of collecting, sharing, and using data, focused on ensuring that data is used fairly, for good.

A

Data Ethics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Best Practices for Successful ML Model Deployment

A
  1. Choosing the Right Infrastructure.
  2. Effective Versioning and Tracking
  3. Robust Testing and Validation
  4. Implementing Monitoring and Alerting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data: _____________
Learning Algorithms: ______________
Basic Understanding: ______________

A

Experience (E)
Task (T)
Measure (P)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Even when intentions are good, the ___________ of data analysis can cause inadvertent harm to individuals or groups of people.

A

Outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Once deployed, models need to be continuously monitored.

A

Monitoring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A field of study concerned with giving computers the ability to learn without being explicitly programmed.

A

Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

It is a collection of data used in machine learning tasks.

A

Dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Feature Engineering techniques

A
  • Feature scaling or normalization
  • Data reduction
  • Discretization
  • Feature encoding
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Calculating measures like mean, median, variance, and standard deviation.

A

Summary Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Data Cleansing techniques

A
  • Identify and sort out missing data
  • Reduce noisy data
  • Identify and remove duplicates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

It is used to understand the main characteristics of the data, identify patterns to discover patterns, spot anomalies, test a hypothesis, or check assumptions.

A

Exploratory Data Analysis (EDA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

The process of creating a model from data is called ___________

A

Learning (training)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Rule-based algorithms: Condition
Machine Learning: _________.

A

Model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Algorithm where explicit programming is used.

A

Rule Based Algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

It refers to the process of using the model obtained after learning for prediction.

A

Testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
It is the most crucial process of integrating the ML model into its production environment. This process is the most challenging, involving several moving pieces, tools, data scientists, and ML engineers to collaborate and strategize.
Model Deployment
26
In addition to owning their personal information, data subjects have a right to know how you plan to collect, store, and use it.
Transparency
27
Another ethical responsibility that comes with handling data is ensuring data subjects’ ____________
Privacy
28
Machine Learning is a field of study concerned with giving computers the ability to ________ without being explicitly programmed.
Learn
29
___________ matter. Before collecting data, ask yourself why you need it, what you’ll gain from it, and what changes you’ll be able to make after analysis
Intention
30
Data Preprocessing Techniques
Data Cleansing Feature Engineering
31
Machine Learning Workflow
1. Project Setup 2. Data Preparation 3. Modelling 4. Deployment
32
Before deployment, models need to be thoroughly trained and evaluated. This involves data preprocessing, feature engineering, and rigorous testing to ensure the model is robust and ready for real-world scenarios.
Training
33
EDA Activities
Visualization Summary Statistics Outlier Detection Correlation Analysis Hypothesis Testing
34
contains missing values or the data that lacks attributes
Incompleteness
35
It is a discipline of artificial intelligence (AI) that provides machines with the ability to automatically learn from data and past experiences while identifying patterns to make predictions with minimal human intervention.
Machine Learning
36
Which data preprocessing task is the most time consuming?
Data cleaning
37
Phases of ML
1. Learning 2. Prediction
38
5 Principles of Data Ethics
Ownership Transparency Privacy Intention Outcomes
39
It is an observation that seems to be distant from other observations.
Outlier
40
Without good ________, there is no good _________.
data, model
41
Algorithm where the decision-making rules are complex and difficult to describe.
Machine Learning Algorithm
42
Algorithm where rules are automatically learned by the machines.
Machine Learning Algorithm
43
Examining relationships between variables.
Correlation Analysis
44
Important step before processing
To prepare the data for analysis or modeling by cleaning and transforming it.
45
A continuous data is:
Quantitative
46
T/F Machines driven by algorithms designed by humans are able to learn latent rules and inherent patterns and to fulfill tasks desired by humans.
TRUE
47
Steps for Data Preprocessing
1. Data Profiling 2. Data Cleansing 3. Data Reduction 4. Data Transformation 5. Data Enrichment 6. Data Validation
48
Events or attributes that reflect the performance or nature of a sample in a particular aspect are called ____________
Features
49
It is a dataset used in the training process, where each sample is referred to as a training sample.
Training set
50
Also known as predictive analytics or statistical learning.
Machine Learning
51
Rule Complexity : Scale of the Problem Simple : Small = ____________ Complex: Small = ____________ Simple : Large = _____________ Complex : Large = ____________
Simple Problems Manual Rules Rule Based Algorithm Machine Learning Algorithm
52
Algorithm where rules can be specified.
Rule Based Algorithm
53
Each data record is called a __________
Sample
54
Testing initial assumptions about the data.
Hypothesis Testing
55
It is about extracting knowledge from data.
Machine Learning
56
Data Preprocessing is also called as ___________
Data Preparation
57
T/F Machine Learning methods enable computers to operate autonomously without explicit programming.
TRUE
58
It refers to the process of taking a trained ML model and making it available for use in real-world applications
Machine Learning Model Deployment
59
Identifying unusual data points.
Outlier Detection
60
T/F Sorting out missing data is a data cleansing technique.
TRUE
61
It is a set of policies, procedures, and standards that implements data governance of an organization.
Data Governance Framework
62
Types of Machine Learning
Supervised Unsupervised Semi-Supervised Reinforcement
63
It is a study of learning algorithms.
Machine Learning
64
It is a set of principles and processes for data collection, management, and use. The goal is to ensure that data is accurate, consistent, and available for use, while protecting data privacy and security.
Data Governance
65
A nominal data is:
Qualitative
66
ML models should be able to handle increased loads and continue to deliver results efficiently. Ensuring the infrastructure can handle the model's computational requirements is vital, requiring validation and effective testing for scalability before deploying models.
Validation
67
The first principle of data ethics is that an individual has __________ over their personal information.
Ownership
68
Machine Learning Deployment Model
Training Validation Deployment Monitoring