Chapter1: ML with Python Flashcards

This is summary of ML with Python book chapter1

1
Q

Start Notebook from Cmd?

A

start fresh terminal ,and type …jupyter notebook

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Mutiple Assigment in Python ?

A

In an assignment statement, the right-hand side is always evaluated fully before doing the actual setting of variables. So,

x, y = y, x + y
is diff from
x = y
y = x + y

In python , while evaluating an assignment, the right-hand side is evaluated before the left-hand side.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Python comes in many flavors, and there are many ways to install it. However, we recommend to install a scientific-computing distribution, that comes readily with optimized versions of scientific modules. What are they ?

A
There are several fully-featured Scientific Python distributions:
	• Anaconda

	• EPD

WinPython
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are NumPy and NumPy arrays?

A

NumPyprovides: • extension package to Python for multi-dimensional arrays

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is numpy useful?

A

Why it is useful:Memory-efficient container that provides fast numerical operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

• How to find Interactive help in Numpy?

A

p.array?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the recommended way to inport numpy?

A

import numpy as np

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Example of creating array with numpy?

A
○ a = np.array([0, 1, 2, 3])
		○ a.ndim … gives dimension of a which is 1
		○ a.shape  gives (4,1)
		○ len(a) = 4
		○ 2-D array
			§ b = np.array([[0, 1, 2], [3, 4, 5]])
			§ b.ndim= 4
b.shape (2,3)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Functions for creating arrays?

A
a = np.arange(10)……. # 0 .. n-1  (!)
b = np.arange(1, 9, 2)    …     start, end (exclusive), step
c = np.linspace(0, 1, 6)   # start, end, num-points
d = np.linspace(0, 1, 5, endpoint=False)
Np.zeros()
Np.ones()
Np.linspace()
Np.arrange()
Np.eye()
Np.diag()
Np.random.rand(4)
Np.random.randn(4)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

if a is an object. how can we check its data type?

A

a.dtype

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Numpy auto detect data type from input, yes or NO?

A

Numpy auto detect data type from input

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how can you specify data type of numpy array

A

C= np.array([2,3,4], dtype=folat)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the default data type in numpy?

A

The default data type is float

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to start python in notebook or ipython?

A

Start by launching IPython:
$ ipython
Or the notebook:

$ ipython notebook
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to enable interactive plot in python?

A

%matpotlib on ipython or %matpotlib inline on notebook. With inline plot are display not in anothe window

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Matplotlib?

A

is a 2D plotting package

17
Q

How to import matplotlib?

A

import matplotlib.pyplot as plt….The tidy way

18
Q

What is Machine Learning about?

A

Machine learning is about extracting knowledge from data

19
Q

What is intersection of ML and its another names?

A

• It is a research field at the intersection of statistics, artificial intelligence, and computer science and is also known as predictive analytics or statistical learning

20
Q

Why Machine Learning?

A

In the early days of “intelligent” applications, many systems used handcoded rules of “if” and “else” decisions to process data or adjust to user input.

21
Q

What is two major Disadvantage of handcoded rule use before?

A

The logic required to make a decision is specific to a single domain and task. Changing the task even slightly might require a rewrite of the whole system
• Designing rules requires a deep understanding of how a decision should be made
by a human expert.

22
Q

Problems Machine Learning Can Solve

A

○ The most successful kinds of machine learning algorithms are those that automated decision-making processes by generalizing from known examples. In this setting, which is known as supervised learning, the user provides the algorithm with pairs of inputs and desired outputs, and the algorithm finds a way to produce the desired out‐ put given an input
○ Machine learning algorithms that learn from input/output pairs are called supervised learning algorithms because a “teacher” provides supervision to the algorithms in the form of the desired outputs for each example that they learn from. While creating a dataset of inputs and outputs is often a laborious manual process, supervised learning algorithms are well understood and their performance is easy to measure
○ If your application can be formulated as a supervised learning problem, and you are able to create a dataset that includes the desired outcome, machine learning will likely be able to solve your problem.

23
Q

While you are building a machine learning solution , you should answer, or at least keep in mind, the following questions:?

A

What question(s) am I trying to answer? Do I think the data collected can answer that question?
• What is the best way to phrase my question(s) as a machine learning problem?
• Have I collected enough data to represent the problem I want to solve?
• What features of the data did I extract, and will these enable the right predictions?
• How will I measure success in my application?
• How will the machine learning solution interact with other parts of my research or business product?

24
Q

Knowing Your Task and Knowing Your Data

A

It will not be effective to randomly choose an algorithm and throw your data a it. It is necessary to understand what is going on in your dataset before you begin building a model Each algorithm is different in ter terms of what kind of data and what problem setting it works best for

25
Q

What is scikit-learn?

A

○ The scikit-learn is an opensource project is constantly being developed and improved,
○ It contain a number of state-of-the art Machine Learining algorithm as well as comprehensive documentation
○ Very popular tool and most use machine learning library, widely use in industry and academia

26
Q

How to Instal scikit-learn?

A

○ scikit-learn depends on two other Python packages, NumPy and SciPy
○ For plot‐ ting and interactive development, you should also install matplotlib, IPython, and the Jupyter Notebook.

27
Q

What is Anaconda?

A

A Python distribution made for large-scale data processing, predictive analytics, and scientific computing. Anaconda comes with NumPy, SciPy, matplotlib, pandas, IPython, Jupyter Notebook, and scikit-learn. Available on Mac OS,Windows, and Linux, it is a very convenient solution and is the one we suggest for people without an existing installation of the scientific Python packages. Ana-conda now also includes the commercial Intel MKL library for free. Using MKL (which is done automatically when Anaconda is installed) can give significant speed improvements for many algorithms in scikit-learn.

28
Q

What is Enthought Canopy?

A

Another Python distribution for scientific computing. This comes with NumPy,
SciPy, matplotlib, pandas, and IPython, but the free version does not come with scikit-learn. If you are part of an academic, degree-granting institution, you can request an academic license and get free access to the paid subscription ver sion of Enthought Canopy. Enthought Canopy is available for Python 2.7.x, and works on Mac OS, Windows, and Linux

29
Q

How to use pip to install all of these packages:?

A

$ pip install numpy scipy matplotlib ipython scikit-learn pandas

30
Q

What are ssential Libraries and Tools in Machine Learning with python?

A

Understanding what scikit-learn is and how to use it is important, but there are a few other libraries that will enhance your experience. Scikit-learn is built on top of the NumPy and SciPy scientific Python libraries. In addition to NumPy and SciPy, we will be using pandas and matplotlib. We will also introduce the Jupyter Notebook, which is a browser-based interactive programming environment. Briefly, here is what you should know about these tools in order to get the most out of scikit-learn

31
Q

What is Jupyter Notebook?

A

The Jupyter Notebook is an interactive environment for running code in the browser. It is a great tool for exploratory data analysis and is widely used by data scientists

32
Q

What is NumPy?

A

NumPy is one of the fundamental packages for scientific computing in Python. It contains functionality for multidimensional arrays

§ In scikit-learn, the NumPy array is the fundamental data structure. scikit-learn takes in data in the form of NumPy arrays. Any data you’re using will have to be con verted to a NumPy array. The core functionality of NumPy is the ndarray class, a multidimensional (n-dimensional) array. All elements of the array must be of the same type. A NumPy array looks like this:

33
Q

What is SciPy?

A

SciPy is a collection of functions for scientific computing in Python. It provides, among other functionality, advanced linear algebra routines, mathematical function optimization, signal processing, special mathematical functions, and statistical distri‐
butions

34
Q

What is relationship between Sckitlearn and SCipy?

A

□ scikit-learn draws from SciPy’s collection of functions for implementing its algorithms. The most important part of SciPy for us is scipy.sparse: this provides sparse matrices, which are another representation that is used for data in scikit-learn. Sparse matrices are used whenever we want to store a 2D array that contains
mostly zeros:

35
Q

What is matplotlib?

A

matplotlib is the primary scientific plotting library in Python. It provides function for making publication-quality visualizations such as line charts, histograms, scatter plots, and so on. Visualizing your data and different aspects of your analysis can give you important insights, and we will be using matplotlib for all our visualizations. When working inside the Jupyter Notebook, you can show figures directly in the browser by using the %matplotlib notebook and %matplotlib inline commands. We recommend using %matplotlib notebook, which provides an interactive environment

36
Q

Pandas

A

pandas is a Python library for data wrangling and analysis. It is built around a data structure called the DataFrame that is modeled after the R DataFrame. Simply put, a pandas DataFrame is a table, similar to an Excel spreadsheet. pandas provides a great range of methods to modify and operate on this table; in particular, it allows SQL-like queries and joins of tables. In contrast to NumPy, which requires that all entries in an array be of the same type, pandas allows each column to have a separate type (for example, integers, dates, floating-point numbers, and strings). Another valuable tool provided by pandas is its ability to ingest from a great variety of file formats and data‐
bases, like SQL, Excel files, and comma-separated values (CSV) files. Going into detail about the functionality of pandas is out of the scope of this book. However, Python for Data Analysis by Wes McKinney (O’Reilly, 2012) provides a great guide

37
Q

What is mglearn ?

A

This book comes with accompanying code, which you can find on GitHub. The accompanying code includes not only all the examples shown in this book, but also the mglearn library. This is a library of utility functions we wrote for this book, so that we don’t clutter up our code listings with details of plotting and data loading.

38
Q

All imports assume in chapter1?

A

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import mglearn