Week 1: Data and KNN Flashcards
(38 cards)
What is machine learning approach?
Programming an algorithm to automatically learn from data, or from experience, uncover patterns in data, building autonomous agents
What should be emphasized in machine learning?
- Predictive performance
- Scalability
- Autonomy
Why might you want to use a learning algorithm?
- Hard to code solution by hand (vision, speech)
- System needs to adapt to a changing environment (spam detection)
- Want the system to perform better than human programmers
- Privacy/ fairness (ranking search results)
How does machine learning perform compared to humans?
It may perform better or worse than humans
Define artificial intelligence
- A subfield of CS that refers to computer programs that can solve problems humans are good at
- E.g vision, natural language
Define machine learning
A subfield of AI focused on learning (tuning parameters) from data
Define neural networks
Parametric model used in ML loosely based on biological neurons
What is deep learning?
Neural networks with multiple layers
What is data science?
An emerging field which applies ml techniques to domain-specific problems
What are some machine learning domains?
- Computer vision
- Speech recognition
- Natural Language Processing
- Recommender system
- Games
Types of machine learning
- Supervised learning
- Semi-supervised learning
- Reinforcement learning
- Unsupervised learning
What is supervised learning
- They have labeled examples of the correct behavior
- Predict unknown values of the data using other known data
- Classification (is this A or B?)
- Anomaly detection (is this weird?)
- Regression (how much/ how many)
What is semi-supervised learning
Utilizes both labeled and unlabeled data
What is reinforcement learning
Learning system which interacts with the world and learns to maximize a scalar reward signal
What is unsupervised learning
- No labeled examples, instead looking for interesting patterns in the data
- Find human interpretable and previously unknown patterns that describe the unlabeled data
- Clustering (how is data organized)
- Association rule mining (are these related?)
Why is machine learning so powerful nowadays?
- Abundance of data
- Computing power
What is the machine learning problem?
- Should I use ml on this problem?
- Gather and organize data (pre-processing, cleaning, visualizing)
- Establish a baseline
- Choosing a model
- Optimization
- Hyperparameter search
- Analyze performance and mistakes
-Iterate back to step 4 or 2
What is data?
Collection of objects and their attributes
What does a ml training set consist of?
- Inputs (vectors)
- Labels
Why do we use input vectors in machine learning?
- Algorithms need to handle lots of data
- A common strategy is mapping data to another space that is easy to manipulate (Representation)
- Vectors are a good representation since we can do linear algebra
What is regression and classification in a training set?
Regression- t is a real number
- Classification- t is an element of a discrete set
What are the classification metrics for evaluation?
Accuracy= # correct predictions/ # test instances
Error= 1 - accuracy= # incorrect predictions/ # test instances
What is similarity?
- The simplest method of learning we know
- Classifying according to similar objects you’ve seen
- aka manohorse
What happens when more data points come in to nearest neighbor?
More complicated boundaries are possible