Python for Data Science Flashcards

1
Q

True or False

The IPython Shell is typically used to work with Python interactively

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which file extension is used for Python script files?

A

.py

Python scripts have the extension .py. my_analysis.py is an example of a script name.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

You need to print the result of adding 3 and 4 inside a script. Which line of code should you write in the script?

A

print(3 + 4)

If you do a calculation in the IPython Shell, the result is immediately printed out. If you do the same thing in the script and run it, this printout will not occur.

In Python 3, you will need print(3 + 4). You need to explicitly include this print() function; otherwise the result will not be printed out when you run the script.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Python as a calculator

A

Python is perfectly suited to do basic calculations. Apart from addition, subtraction, multiplication and division, there is also support for more advanced operations such as:

Exponentiation: **. This operator raises the number to its left to the power of the number to its right: for example 4**2 will give 16.

Modulo: %. It returns the remainder of the division of the number to the left by the number on its right, for example 18 % 7 equals 4.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which line of code creates a variable x with the value 15?

A

x = 15

In Python, variables are used all the time. They make your code reproducible.

You use a single equals sign to create a variable and assign a value to it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the value of the variable z after executing these commands?

x = 5

y = 7

z = x + y + 1

A

In the command z = x + y + 1, x has the value 5 and y has the value 7. 5 + 7 + 1 equals 13.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

You execute the following two lines of Python code:

x = “test”

y = False

A

You can recognize strings from the quotes. Booleans can either be True or False.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a list?

A

A list is a way to give a single name to a collection of values. These values, or elements, can have any type; they can be floats, integer, booleans, strings, but also more advanced Python types, even lists.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which of the following is a characteristic of a Python list?

A

It is a way to name a collection of values, instead of having to create separate variables for each element.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which three Python data types does this list contain?

x = [“you”, 2, “are”, “so”, True]

A

“you”, “are” and “so” are strings. 2 is an integer and True is a boolean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which command is invalid Python syntax to create a list x?

A

x = [“this”, “is”, “a” True “list”]

Only the first command will result in a so-called SyntaxError, because there are no commas before and after True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Create list with different types

A

A list can contain any Python type.

Although it’s not really common, a list can also contain a mix of Python types including strings, floats, booleans, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does python access elements in a list?

A

By using an index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is slicing?

A

Apart from indexing, there’s also something called slicing, which allows you to select multiple elements from a list, thus creating a new list. You can do this by specifying a range, using a colon.

eg: x[2 : 8]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which pair of symbols do you need to do list subsetting in Python?

A

You use square brackets to subset lists in Python.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What Python command should you use to extract the element with index 1 from a Python list x?

A

x[1] or x[1] or x[1] or x[1]

You use square brackets for subsetting. Inside the square brackets, simply put the the index of the element you want to access.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

You want to slice a list x. The general syntax is:

x[begin:end]

A

List slicing is a very powerful technique to extract several list elements from a list at the same time.

In Python, the begin index is included in the slice, the end index is not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is manipulating lists consist of?

A
  1. Changing elements
  2. Adding elements
  3. Removing elements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

You have a list x that is defined as follows:

x = [“a”, “b”, “b”]

You need to change the second “b” (the third element) to “c”.

Which command should you use?

A

x[2] = “c”

The third element has index 2. You want to change this element with a string, so you need “c” instead of c.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

You have a list x that is defined as follows:

x = [“a”, “b”, “c”]

Which line of Python code do you need to add “d” at the end of the list x?

A

x = x + [“d”]

You basically have to create a single-element list containing “d” and add that to the list x.

Next you have to assign the result of this addition to x again to actually update x for the future.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

You have a list x that is defined as follows:

x = [“a”, “b”, “c”, “d”]

You decide to remove an element from it by using del:

del(x[3])

How does the list x look after this operation?

A

[“a”, “b”, “c”]

The operation removed the element with index 3, so the fourth element in the list x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

But what is a function?

A

Simply put, a function is a piece of reusable code, aimed at solving a particular task.

You can call functions instead of having to

write code yourself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a Python function?

A

A piece of reusable Python code that solves a particular problem.

A function are a block of code, that perform a specific, related action. Functions make your code more modular, so that you can reuse code without having to retype it over and over again.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What Python command opens up the documentation from inside the IPython Shell for the min function?

A

You can use help(min). Notice that help() is also a function!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are methods?

A

You can think of methods as _functions_ that “belong to” Python objects.

A Python object of type string has methods, such as capitalize and replace, but also objects of type float and list have specific methods depending on the

type.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

In Python, everything is an object, and each object has specific________________________

A

method associated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Different objects may have the same methods but_____________________________

A

depending on the type of the object, the methods behave differently.

eg. index() exists for both strings and lists

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Some methods can change___________

A

the objects they are called on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is append() in Python?

A

append() is a method, and therefore also a function.

In Python, practically everything is an object. Every python object can have functions associated. These functions are also called methods.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

You have a string x defined as follows:

x = “monty python says hi!”

Which Python command should you use to capitalize this string x?

A

x.capitalize()

Use the dot notation to call a method on an object, x in this case. Make sure to include the parentheses at the end, even if you don’t pass any additional arguments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

How does the list x look after you execute the following two commands?

x = [4, 9, 5, 7]

x.append(6)

A

[4, 9, 5, 7, 6]

If you call append() on a list, you’re actually adding the element to the list you called append() on; there’s no need for an explicit assignment (with the = sign) in this case.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What are Packages?

A

You can think of package as a directory of python scripts. Each such script is a so-called module. These modules specify functions, methods and new Python types aimed at solving particular problems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Are all packages available in Python by default.

Yes or No

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How to use Python packages?

A

To use Python packages, you’ll first have to install them on your system, and then put code in your script to tell Python that you want to use these packages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What are the main python packages for:

  • Data Science
  • Data Visualization
  • Machine Learning
A
  • data science: there’s numpy (toefficiently work with arrays)
  • matplotlib for data visualization,
  • scikit-learn for machine learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Which of the following is a package for installation and maintenance system for Python?

A

pip

pip is a very commonly used tool to install and maintain Python packages.

37
Q

Which statement is the most common way to invoke the import machinery?

A

The “import”

statement is arguably the easiest way to import packages and modules into Python

38
Q

You import Numpy as foo as follows:

import numpy as foo

Which Python command that used the array() function from Numpy is valid if Numpy is imported as foo?

A

foo.array([1, 2, 3])

If Numpy is imported as np, you need np.array().

39
Q

You want to use Numpy’s array() function.

You need to decide whether to import this function as follows:

from numpy import array

or by importing the entire numpy package:

import numpy

Select the two correct statements about these different import methods.

A
  • The from numpy import array version will make it less clear in the code that you’re using Numpy’sarray() function.
  • Using import numpy will require you to use numpy.array(), making it clear that you’re using a Numpy function.

Importing a particular function makes your code shorter, because you don’t need to include the numpy.prefix. However, It becomes less clear that array() is a function from the numpy package.

40
Q

What is one additional feature of a Numpy array?

A

Numpy array is pretty similar to a regular Python list, but has one additional feature:

you can perform calculations over all entire arrays. It’s really easy, and super-fast as

well.

41
Q

Can Numpy arrays contain different data types?

A

NO.

Numpy array can only contain values of a single type. It’s

either an array of floats, either an array of booleans, and so on.

42
Q

Numpy is another data type in Python.

TRUE or FALSE

A

TRUE.

43
Q

Which Numpy function do you use to create an array?

A

To create a Numpy array, you use the array( ) function.

You typically pass a regular Python list as an input.

44
Q

Which two statements describe the advantage of Numpy Package over regular Python Lists?

A
  • The Numpy Package provides the array, a data type that can be used to do element-wise calculations.
  • Because Numpy arrays can only hold element of a single type, calculations on Numpy arrays can be carried out way faster than regular Python lists.

Creating a Numpy array is not necessarily easier, but it is a great solution if you want to carry out element-wise calculations, something that regular Python lists aren’t capable of.

45
Q

What is the resulting Numpy array z after executing the following lines of code?

import numpy as np

x = np.array([1, 2, 3])

y = np.array([3, 2, 1])

z = x + y

A

array([4, 4, 4])

In Numpy, calculations are performed element-wise. The first element of x and the first element of yare added, giving 4. Similar for the second and third element of x and y.

46
Q

What happens when you put an integer, a Boolean, and a string in the same Numpy array using the array()function?

A

All array elements are converted to strings

Numpy arrays can only hold elements with the same basic type. The string is the most ‘general’ and free form to store data, so all other data types are converted to strings.

47
Q

For Numpy specifically, you can also use boolean Numpy arrays:

A

TRUE

48
Q

What does .ndarray stand for?

A

N dimensional array

49
Q

What does the method ‘np_2d.shape’ return?

A

shape is a so-called attribute of the np2d array, that can give you more information about what the data structure

looks like.

50
Q

You can think of the 2D numpy array as an _________________________________

A

improved list of lists:

51
Q

What charaterizes multi-dimensional Numpy arrays?

A

You can create a 2D Numpy array from a regular list of lists.

Multi-dimensional Numpy arrays are natural extensions of the 1D Numpy array:

They can only hold a single type and can be created from a regular Python list structure.

The number N in these N-dimensional Numpy arrays is not limited.

52
Q

You created the following 2D Numpy array, x:

import numpy as np

x = np.array([[“a”, “b”, “c”, “d”],

[“e”, “f”, “g”, “h”]])

A

x[1,2]

Apart from element-wise calculations, 2D Numpy arrays also offer more advanced ways of subsetting compared to regular Python lists of lists. To select the second row, use the index 1 before the comma. To select the third column, use the index 2 after the comma.

53
Q

What does the resulting array z contain after executing the following lines of Python code?

import numpy as np

x = np.array([[1, 2, 3], [1, 2, 3]])

y = np.array([[1, 1, 1], [1, 2, 3]])

z = x - y

A

array( [[0, 1, 2],

[0, 0, 0]])

54
Q

Good Resource for Numpy Arrays

A

http://cs231n.github.io/python-numpy-tutorial/#numpy-arrays

55
Q

What will provide you with a “sanity check” of the data?

A

summarizing statistics

56
Q

Good to Remember

A

Numpy offers many functions to calculate basic statistics, such as np.mean(), np.median() andnp.std().

Both the mean and median are interesting statistics to check out before you start your analysis. Visual inspection of your data is practically infeasible if you’re dealing with millions of data points.

57
Q

Select the three statements that hold.

A

Numpy is a great alternative to the regular Python list if you want to do Data Science in Python.

Numpy arrays can only hold elements of the same basic type.

Next to an efficient data structure, Numpy also offers tools to calculate summary statistics and to simulate statistical distributions.

No matter the dimension of the Numpy array, element-wise calculations will always be possible.

58
Q

You are writing code to measure your travel time and weather conditions to work each day.

The data is recorded in a Numpy array where each row specifies the measurements for a single day.

The first column specifies the temperature in Fahrenheit. The second column specifies the amount of travel time in minutes.

The following is a sample of the code.

import numpy as np

x = np.array([[28, 18],

[34, 14], [32, 16],

… [26, 23], [23, 17]])

Which Python command do you use to calculate the average travel time?

A

np.mean(x[:,1])

:,1 inside square brackets tells Python to get all the rows, and the second column. You can then usenp.mean() to get the average of the resulting Numpy array.

59
Q

How to get an overall hunch of your data set?

A

It’s always a good idea to check both the median and the mean, to get a first hunch for the overall distribution of the entire dataset.

60
Q

The better your data visualizations the better you will be able to _______________________

A

extract insights and share with other people

61
Q

The father of all visualization packages in python is -

A

matplotlib-.

62
Q

Inside the matplotlib

package, there’s____________ the subpackage.

A

pyplot

63
Q

What is scatter plot useful for?

A

A scatter plot is useful to see all the individual datapoints. Unlike in the line plot, these datapoints will not be connected by a line

64
Q

What is the characteristic about data visualization?

A

Visualization is a very powerful tool for exploring your data and reporting results.

Data visualization is useful in different stages of the data analysis pipeline. The type of visualization that is most appropriate depends on the problem at hand.

65
Q

What is the conventional way of importing the pyplot sub-package from the matplotlib package?

A

import matplotlib.pyplot as plt

The general syntax is import package.subpackage as local_name.

66
Q

You are creating a line plot using the following code:

a = [1, 2, 3, 4]

b = [3, 9, 2, 6]

plt. plot(a, b)
plt. show()

Which two options describe the result of your code?

A

The first argument corresponds to the horizontal, x-axis. The second argument is mapped onto the vertical, y-axis.

67
Q

You are modifying the following code that calls the plot() function to create a line plot:

a = [1, 2, 3, 4]

b = [3, 9, 2, 6]

plt. plot(a, b)
plt. show()

What should you change in the code to create a scatter plot instead of a line plot?

A

Change plot() in plt.plot() to scatter()

To create a scatter plot, you’ll need plt.scatter().

68
Q

Good to remember about matplotlib

A

When you have a time scale along the horizontal axis, the line plot is your friend. But in many other cases, when you’re trying to assess if there’s a correlation between two variables, for example, the scatter plot is the better choice.

69
Q

What are the benefits of using a histogram?

A

The histogram is a type of visualization that’s

particularly useful to explore your data set. It can help you to get an idea about the distribution

70
Q

What is a characteristic of a histogram?

A

Histogram is a great tool for getting a first impression about the distribution of your data.

Histogram is useful to display any distribution, and typically consist of non-overlapping bins. The matplotlib package contains functionality to build histograms very easily.

71
Q

You are working with a Python list with 10 different values. You divide the values into 5 equally-sized bins.

How wide will these bins be if the lowest value in your list is 0 and the highest is 20?

A

The range of your values is 20. Dividing these values into 5 equally sized bins will result in bins with width 4.

72
Q

You write the following code:

import matplotlib.pyplot as plt

x = [1, 3, 6, 3, 2, 7, 3, 9, 7, 5, 2, 4]

plt.hist(x) plt.show()

You need to extend the plt.hist() command to specifically set the number of bins to 4. What should you do?

A

Add a second argument to plt.hist():

plt.hist(x, bins = 4)

If you do not specify the number of bins the data has to be divided into, matplotlib chooses a suitable number of bins for you.

Setting the number of bins is as simple as specifying the bins argument appropriately.

73
Q

Why is choosing the right number of bins important in a histogram?

A

The number of bins is pretty important. Too little bins oversimplifies reality, which doesn’t show you the details. Too much bins overcomplicates reality and doesn’t give the bigger picture.

74
Q

You are customizing a plot by labelling its axes. You need to do this by using matplotlib.

Which code should you use?

A

xlabel(“x-axis title”) and ylabel(“y-axis title”)

To set the axis title, use the functions xlabel() and ylabel()

75
Q

Which matplotlib function do you use to build a line plot where the area under the graph is colored?

A

fill_between()

76
Q

Typically, you place all customization commands between the plot() call and the show() call, as follows:

import matplotlib.pyplot as plt

x = [1, 2, 3]

y = [4, 5, 6]

plt.plot(x, y)

customization here plt.show()

What will happen if you place the customization code after the show() function instead?

import matplotlib.pyplot as plt

x = [1, 2, 3]

y = [4, 5, 6] plt.plot(x, y)

plt.show()

customization here

A

Python doesn’t throw an error, but you won’t see your customizations. The show() function displays the plot you’ve built up until then. If the customizations come afterwards, there is no effect on the shown output.

The show() function displays the plot you’ve built up until then. If the customizations done afterwards, there is no effect on the shown output.

Therefore, you should place all customization commands between the plot() call and the show() call.

77
Q

You write the following code:

x = 7

if x > 6 :

print(“high”)

elif x > 3 :

print(“ok”)

else :

print(“low”)

What will be printed out if you execute the code?

A

high

If your control structures get more advanced, Python can take many different paths through your code.

As soon as Python encounters a condition that is True (x > 6 in this case), the corresponding code is executed and the control structure is abandoned. The elif and else parts are not considered anymore!

78
Q

To check if two Python values, or variables, are equal, you can use_____________

A

==

79
Q

To check for inequality, you need ____________________

A

!=

80
Q

In pandas where do you store data?

A

Dataframe

81
Q

Good to remember about PANDAS

A

You typically don’t build a pandas data frame manually. Instead, you import data from an

external file that contains all this data

82
Q

How do you access a column in panda?

A

To access a column, you typically use square brackets with the column label.

83
Q

How do you access a row in panda?

A

You’ll want to use loc.

eg. bric.loc[“BR’]

84
Q

How is a Pandas DataFrame different from a 2D Numpy array?

A

In Pandas, different columns can contain different types.

Both Pandas and Numpy offer many different ways of subsetting. 2D Numpy arrays can only contain values of the same basic type, a downside compared to Pandas if you’re working on typical Data Science problems.

85
Q

What are two characteristics that describe Pandas DataFrame?

A

The rows correspond to observations.

The columns correspond to variables.

86
Q

Which Pandas function do you use to import data from a comma-separated value (CSV) file into a Pandas DataFrame?

A

read_csv() is the function you need. You can specify a ton of other arguments to customize the way the data is imported.

87
Q

Which technique should you use to select an entire row by its row label when accessing data in a Pandas DataFrame?

A

loc .

Square brackets are used to get specific columns from a Pandas DataFrame. iloc is used if you want to select a row based on its position in the DataFrame, and not based on its row label.

88
Q

cars[‘cars_per_cap’]

cars[[‘cars_per_cap’]]

What is the difference between these 2 methods of accessing a column in panda?

A

The single bracket version gives a Pandas Series, the double bracket version gives a Pandas DataFrame.

89
Q
A