Lecture Notes Flashcards

(484 cards)

1
Q

Define programming.

A

Programming means giving a computer a list of tasks, which it then runs in order to solve a problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some advantages of computer programming?

A
  • Computers don’t get bored - automate repetitive tasks
  • Computers don’t get tired
  • Computers are calculators
  • Computer code is reproducible
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What can’t computers do?

A
  • Computers are not creative
  • Computers are not ethical
  • Computers only know what you tell them
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some advantages of python?

A
  • High-level language
  • Emphasises readability, making use of white space and indentation
  • Dynamically typed
  • Interpreted language
  • Assigns memory automatically
  • Supports multiple approaches to programming
  • Extensive functionality
  • Portable
  • Open source
  • Very popular
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some disadvantages of python?

A
  • Slower than compiled languages
  • Can be memory-intensive
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the different types of cells in a Jupyter notebook?

A
  • Code cells - interpreted as Python code
  • Markdown cells - for adding formatted text
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you add a comment to a Jupyter notebook?

A

#

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why are comments important?

A
  • Allow you to keep track of what your code does
  • Avoids repetition and mistakes
  • Easy for other people to follow
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What steps should you take for debugging?

A
  • Always read error messages carefully
  • Comment your code thoroughly
  • Tell your code to print outputs for intermediate steps
  • Use the internet
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you print in python?

A

print()

Prints whatever is in the brackets.
Useful for displaying results and testing purposes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does a variable have?

A

A name and a value.

The name is fixed, the value can change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the different types of variables in Python?

A
  • Numeric: integers, floats or complex numbers
  • Text: string, always marked by quotation marks
  • Boolean: True or False
  • Sequences: lists or arrays of numbers/letters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you change the string x = ‘33.3’ to a float?

A

float(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you check the type of a variable?

A

type(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you change the a float to an integer?

A

int(x) - this roads it to a whole number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you get an input from the user?

A

variable = input(“Enter your name: “)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is an expression?

A

Any group of variables or constants that together result in a value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the common symbols used in basic math expressions?

A

*
/
% (remainder)
** (raise to the power of)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you concatenate two strings together?

A

String1 + String2
= String1String2

String1 * 3
String1String1String1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How is python indexed?

A

Zero-based indexing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is string slicing?

A

Extracting certain characters from a string.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you access specific parts of a string?

A

Using the index with square bracket notation

  • string[0]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Can we change a part of string in place?

A

We can access parts of a string to see their value, but we cannot change them in place - strings are immutable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do we access a sequence (sub-string) of any length?

A

By specifying a range to slice. Ranges use a : notation eg [1:10]

The slice occurs before each index (eg between 0 and 1 and 9 and 10)- returning characters 1-9.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
How can we create a new string with slicing?
We can store our sub-string as a new variable (then this can be manipulated) string2 = string1[:8]
23
What is string splitting?
String splitting is a very useful method for manipulating strings - it involves breaking a string into multiple parts. string.split(' ')
24
What is a tuple?
A tuple is a type which holds an arbitrary sequence of items, which can be of different types. They are used to store multiple items in a single variable. Think multiple
25
How can you declare a tuple?
my_tuple = ('A', 'tuple', 'of', 5, 'entries')
26
How can you access a variable in a tuple?
Similar notation as characters in a string my_tuple[0]
27
What is the advantage of a tuple over a list?
Tuples only use a small amount of memory but once created, the items cannot be changed. Tuples are immutable, like strings A list is a similar but more flexible data type compared to a tuple. Lists are also comma-separated, but use square brackets
28
Give examples of immutable data types.
Tuples Strings
29
What is the difference in declaring a list vs a tuple?
Both are comma-separated lists. Tuples - () Lists - [] Tuples are immutable Lists support assignment - you can access an item and change its value
30
Lists support assignment - what does that mean?
You can access an item and change its value.
31
How do you access/change items in a list?
list[index] for a list of lists list[i][j]
32
How do you get the length of a list?
len(list)
33
How do you compute the sum of values in a list?
sum()
34
How do find the minimum value in a list?
min(list)
35
How do you find the maximum value in a list?
max(list)
36
How do you make a copy of a list?
Store it as another variable copied = list.copy()
37
How do you add an element to a list?
list.append(value)
38
What is the standard indent in python?
Four spaces - can usually tab in most editors
38
What is a dictionary?
A handy way to store and access data. A dictionary is a set of keyword and value pairs. You use the keyword to access the value. The value can be of any type, including another dictionary. dict = { x:y, a:b } The name of a key is always a string and needs quotation marks.
38
How do you define a dictionary?
dict = { x:y, a:b } The name of a key is always a string and needs quotation marks.
38
What is program flow?
Controlling which parts of your code get executed when, in what order, how many times, under what conditions, where to start and stop etc. It is essential to making sure your program actually does what you want it to do. Flow is controlled mainly by using conditional logic and loops.
38
What is the advantage of a dictionary?
We don't need to care about where the value we want is, we just have to remember what we called it. The name of a key is always a string and needs quotation marks.
39
What is an if statement?
A block of code which first checks if a specified condition is true, and only in that case will it carry out the task if condition : # body It will only be applied to the indented code which follows the :
40
What is an if-else statement?
If statements only execute if the condition is true. The else statement executes if the condition is false. if condition : # code else : # code
41
What is the elif statement?
If-elif-else if condition 1 : # code elif condition 2 : # code else : # code
42
What is a loop?
A block of code that will iterate (execute consecutively) multiple time.
43
What is a for loop?
A for loop requires something to iterate over, ie an "iterable" like a list (do something for every time in the list) or a string (do something for every character in the string) for var in iterable : # code for i in range(10) # code
44
Which is the simplest kind of loop?
For loop
45
How do you get a list of integers of length x, starting with 0?
range(x) list(range(x))
46
What are the key words used for control in the flow of a loop?
Pass - do nothing Continue - stop this iteration of the loop early, and go on to the next one Break - end the loop entirely
47
How do we open a file in python?
open() function r - reading only w - for writing, if the file exists it overwrites it, otherwise it creates a new file a - opens for file appending only, if it doesn't exist, it creates the file x - creates a new file, if the file exists it fails + - opens a file for updating syntax: f = open('zen_of_python.txt', 'r')
48
What does "f = open('zen_of_python.txt', 'r')" do?
'r' - opens a file for reading only.
49
What does "f = open('zen_of_python.txt', 'w')" do?
'w' - opens a file for writing. If the file exists, it overwrites it. Otherwise, it creates a new file.
50
What does "f = open('zen_of_python.txt', 'a')" do?
'a' - opens a file for appending only. If the file doesn't exist, it creates the file.
51
What does "f = open('zen_of_python.txt', '+')" do?
'+' - opens a file for updating.
51
When are changes to a file saved?
When the file is closed Use the .close() method if not using with/as
52
What does "f = open('zen_of_python.txt', 'x')" do?
'x' - creates a new file. If the file exists, it fails.
53
What do you have to do once you are finished with a file?
Close it, to release memory used in opening the file. When writing to a file, the changes are not saved until the file is closed. Use the .close() method
54
What is the basic way to read from a file?
f = open("file_name.txt.", "r") then use print(f.read()) pr print(f.readline())
55
What arguments does the open function take?
The name of the file you want to look at and the mode with which you want to interact with the file
56
What is the difference between .read(),.readline() and .readlines()?
.read() reads the entire contents of the file .readline() reads only the next line, it can be called repeatedly until the entire file has been read .readlines() is the most useful, it reads each line, one line at a time and then stores it all into a single list
57
What happens if you run print(fileread()) twice?
The first output will print the entire contents of the file. The second output will be blank. Once the file object has been read to the end, any subsequent calls return an empty string.
58
What happens if you try f.read() from a closed file?
Results in an error
59
How do you read each line of a file and store all the lines in a list?
.readlines() f = open("file_name.txt", "r") lines = f.readlines() f.close() print(lines) The file is closed but we have the contents written to a variable, we can then get the lines we want by indexing
60
What is the safe way to open files?
We can make sure that files are only open for as long as we need them by using a with statement with open("file_nmae.txt", "r") as d: # put file operations in here print(f.read())
61
What happens if you try print(f.read()) after a with/as statement?
An error will be produced - the with/as syntax closes the file automatically at the end. This is important for file writing, less important for file reading.
62
How do you write to a file?
with open("file_name.txt", "w") as f: f.write("String") Basic input and output only reads and writes strings. The code below will cause an error and result in an empty file.
63
What happens to the contents when you open a file in write mode?
It erases any previous contents
64
How do you format a string?
%s - string %d - integer %f - float %e - float, but using scientific notation eg('%f', %length) or ("This is a %d word %s" %(length, datatype)) - can include as many variables as you want by putting several % signs in the string, and providing a tuple after the string. The first % (inside the string) indicates that we are writing a variable. The letter that follows indicates what type of variable. The second % sign (after the string) tells your code which variable to write at the first % sign.
65
How can you cadd a tab into the string?
"\t"
66
How can you add a new line into the string?
"\n"
67
What is a JSON file?
A JSON file is structured like a Python dictionary. JavaScript Object Notation (JSON) is a standard text-based format for representing structured data based on JavaScript object syntax. It is commonly used for transmitting data in web applications .
68
How is it best to read CSV or JSON files?
Using specialised modules
69
How do we write JSON?
Using the JSON module Use json.dump to write to file import json define dictionary eg masses with open("planets.json", "w") as f: json.dump(masses,f) - this is the thing you want to dump and the file you want to dump it into
70
How do we read JSON?
Using the JSON module Use json.load to write to file import json with open("planets.json", "r") as f: new_dictionary = json.load(f) print(new_dictionary) -- to investigate that we have successfully read the JSON dictionary
71
What are the two calls to read and write json?
import json write - json.dump() read - json.load()
72
What quotation marks are standard used by JSON?
Double quotes You can define it with single quotes - python doesn't care but JSON does, so it will convert it eg so that all keys are ""
73
When might a dictionary be a string?
Dictionaries may be stored as a string if the dictionary is one entry within a larger database
74
How do we turn a dictionary into a string?
Simply add quotation marks Can check the type with print(type(item)) If there are "" used in the string, then we create the overall string with ' ' - if we try to use the same type of quote both around and within the string, it would end the string early
75
How can we turn a string into a dictionary?
json.loads() pronounce load-S the extra s is for string eg dict = json.loads(string) print(type(dict)) to check it was successful
76
What are the two cases we want to allow code to fail gracefully?
Errors - a fundamental issue where python cannot understand your code (syntax error) Exceptions - code is written in valid Python syntax, but an operation cannot be completed successfully
77
What is the syntax used to predict and catch exceptions under some circumstances?
The try/except code try: # code except: The except prevents the code from crashing and implementing an emergency fallback option.
78
Why do you need to be cautious about using a generic except statement?
It will catch all exceptions - even if the error is not what you think it is. You should try to catch specific errors.
79
What is a ValueError exception?
Raised when an operation or function receives an argument that has the right type but an inappropriate value, and the situation is not described by a more precise exception such as IndexError
80
How do you extend the exception-handling block with additional steps to execute after the try...except?
try: # code except: # code else: # code - do if no exception finally: # code - always do this at the end
81
What is the difference between a type error and value error?
Passing arguments of the wrong type (e.g. passing a list when an int is expected) should result in a TypeError , but passing arguments with the wrong value (e.g. a number outside expected boundaries) should result in a ValueError.
82
What is the benefit of using a function?
Functions are re-usable. We often want to do the same operation at different times or with different data,
83
What is a function?
A separate, named block of code for a specific purpose. The code inside a function is "walled off" from the main code.
84
What is required for a function?
every function has a name, a list of (required and optional) inputs in parentheses, and returns something at the end. def my_function(): return
85
What is the syntax for defining a function?
def my_function(): return You should give your function a meaningful name
86
What are inputs of a function called?
Keyword arguments
87
How do you call function?
Call the function using its name, including the brackets (and any arguments required to be passed in) eg hello_world()
88
If a function requires an argument to be provided, but we don't provide it, what happens?
We get an error message
89
When we call a function and assign it to a variable, what happens? eg sum = my_sum(5, 6)
The variable will be assigned the value returned by the function
90
What are global and local variables?
A global variable is a variable defined in the main body of the code. Any code executed after the variable has been defined is able to "see" the variable. A local variable is a variable defined inside the function or other object. Its value is only accessible within the function or object (ie cannot be accessed outside of the function)
91
If we want to make an input of a function optional, what do we need to do?
Give it a default value def my_sum(a, b =1): return a +b - If you provide a value for b it will overwrite - If you don't provide a value for b, it will use b = 1 as a default
92
How do you reverse a list?
list.revers()
93
How do you declare a function with an arbitrary number of variables?
def arb_function (*nums): # code Within the code, you then loop over nums
94
Why might you want to declare an arbitrary number of variables for a function?
You may not know in advance how much data you will need to work with
95
What do all functions in Python have in common?
All functions in Python return something. If you do not specify a value (or leave out the return statement entirely), the function will return a None value by default. Otherwise it returns the value we specify
96
How many values can you return from a function? What options do you have for these outputs?
You can return more than one value from a function, and return different types. For the output: - Provide the same number of variables as the number of values returned/ Each returned value then goes to a separate variable. - Provide a single variable, this will then contain a list of the values returned by the function
97
What does the return statement do?
Returns variables, ends the function call and returns to the main code. Therefore any code in the function after the return will not be executed. This can be convenient if you want to put conditions for what to return.
98
What is a lambda function?
A quick way to make short functions that can be defined in one line. They can take any number of arguments, but can only have one expression. name = lambda vars : code eg doubler = lambda x: x*2
99
How do you define a lambda function?
name = lambda vars : code eg doubler = lambda x: x*2
100
When would it be most appropriate to use a lambda function?
If we need to create a function for temporary use eg within another function.
101
How do you add an element to a list?
list.append(i)
102
How do you sort a list?
sorted(list)
103
What is a programming paradigm?
A paradigm is like a philosophy informing how we write code. Usually there are many different ways to solve a problem with code. Different paradigms help to shape which approach we choose to use. Procedural programming. Object-oriented programming.
104
What are the most common paradigms in python?
Procedural programming - the code is organised as a sequence of instructions (do this, then this). Each block performs a prescribed task to solve the problem. OOP - data are stored as "objects" belonging to pre-defined "classes". These objects have a set of "attributes" stored internally, which can be updated using built in "methods.
105
What is a class?
A class is like a template designed in advance to handle a particular data structure, with a set of properties called attributes. It also provides implementations of behaviour (member functions or methods). The syntax looks like class_name.function_name()
106
How do you reverse a list?
list.reverse()
107
How do you investigate all of the attributes and functions of a class or object?
dir(x) or print(dir(x))
108
What are alternative names for attributes and methods?
Attributes - properties Methods - functions
109
What do attributes with a double underscore represent?
Attributes internal to python that cannot be updated.
110
How do you create a new list?
my_list = [x,y,z]
111
What is the relationship of an object and a class?
Any object is an instance of a class, created to represent a particular item of data. An instance ie one specific example
112
What do methods of an object do?
Update the internal state of the object eg reversing the list
113
How can you check the class of an object?
object.__class__
114
How do you create a class?
eg class Animal(): # Can list attributes # Can define functions using def function():
115
How do you create an object? (ie particular instance of the class)
object = Class() Passing in attributes as appropriate. In this case, the attributes would be set to their defaults.
116
What is creating an object (ie. a particular instance of the class called?
Instantiation
117
How do you check the value of an attribute for an object?
object.attribute
118
How do you update the attribute of an object?
object.attribute = value
119
How do you create a new attribute of an object?
object.attribute = value We can add attributes to class instances, we can't edit the parent class
120
Why use classes?
Objects store data in a way where it is easy to update and display the internal state of that data, using built-in methods. OOP allows you to put your methods next to the data. Once we have defined useful classes and instantiated objects, an OO code will mainly interact with the data object through its built-in methods.
121
What function do we use when we create a class and we know that we will create many objects from that same class, with shared attributes and want to assign values when creating each object?
The __init__ function
122
What does the __init__ function do?
The python __init__ method is declared within a class and is used to initialise the attributes of an object as soon as the object is formed.
123
How do you use the __init__ function?
class Animal(): def __init__(self, attribute1, attribute2): self.att1 = val self.att2 = val Give the __init__ function a list of arguments, the first argument is always self. This is a special variable which represents the object itself once we have created it ( a self-referential thing) __init__ initialises the attributes of the class
124
What is the self parameter?
The self parameter is a reference to the current instance of the class, and is used to access variables that belongs to the class.
125
What does self.x mean?
The attribute "x" belonging to the object "self"
126
When using the __init__ function, what attribute can the functions defined take?
Taking the "self" function as an input, you can then access any attribute with self.x
127
What is a benefit of using __init__ when we create objects?
The object can be created and attributes defined in one line.
128
What is hierarchical inheritance?
Hierarchical inheritance is a type of inheritance in which multiple classes inherit from a single superclass. "Parent" class - Animal "Child" class - Cat, Dog etc. Object - your pet A child class inherits the attributes of its parents class, but we can also add new method and attributes.
129
How do you create a child class of a parent class?
Put the name of the parent class in the brackets when creating the child. eg class Cat(Animal): # put attributes from the parent class that should be fixed for the child class first # then use super().__init__(attribute1, attribute2) within __init__ for the attributes we want to specify new values for
130
How do we define the __init__ function for a child class?
def __init__(self, attribute1, attribute2): super().__init__(attribute1, attribute2)
131
If you define a useful function or class and want to use it in many different codes, instead of copying and pasting the code what can you do?
Make the code into a Python module. This is a python file (extension .py) containing one or more classes or functions. You can then import the class or function from the module easily.
132
What is a python module?
A python file (extension .py) containing one or more classes or functions. You can then import the class or function from the module easily.
133
How do you import a module?
Ensure the .py file is in the same directory as the current notebook. eg for the class Dog from the animals.py module from animals import Dog From this import, you can create Dog objects (can also define and import functions, they don't need to be part of a class, making it very easy to reuse them)
134
What is a python package?
A collection of modules. You can install (download) these packages and then have access to incredibly useful functions.
135
What is NumPy?
Numerical python - a built-in module A module is a pre-defined collection of functions we can load into our programs. NumPy arrays are multidimensional array objects. We can use NumPy arrays for efficient computations on large data sets.
136
How do we import NumPy?
import numpy as np Then call functions as eg np.sin(0) Alternatives (don't use): import numpy - importing the entire numpy module from numpy import sin - importing only a specific function
137
When might you only import a specific function required, rather than an entire module?
If we don't want to use up memory on the whole library. We would need to know the name of the function/class hat we want to import in advance. The specific function is now a global name, we don't need to specify the module. This could cause issues when there are functions with the same name.
138
How do you call trigonometry functions?
sin(), cos() and tan() The value passed in this function should be in radians.
139
How do we investigate the contents of a module?
import module first dir() eg dir(np)
140
What is an array?
A "grid" of values, all of the same dtype ie all floats, or all strings etc.
141
What is the difference between a 1D array and a list?
They look similar at a first. - Array use less memory. Therefore array is much more efficient than a list, particularly for a large collection of elements. - Lists are more flexible (can have mixed types)
142
How do you create a list of length x?
my_list = [] for i in range(x): my_list.append(i)
143
How do you print the first x items of a list?
print(my_list[:x])
144
How do your create a Numpy array from a list?
my_array = np.array(my_list)
145
How do you print the first x items of an array?
print(my_array[:10])
146
How do you print the first array item?
print(my_array[0])
147
How do you print the last item of an array?
print(my_array[-1])
148
How do you print the time taken to execute a cell?
%%time
149
Why is there a difference in the time taken for an operation on a list vs an operation on an array?
Lists - operations can only be performed on items, so calculates have to be one at a time. Arrays - operation is performed on all elements with single function call - this is much quicker for large data sets.
150
Why are arrays more convenient for many mathematical operations?
You can write one line of code rather than a loop. Eg 3D array requires 2 nested for loops. With NumPy,, we don't have to worry about the array shape (it is automatically preserved)
151
What are different ways to create an array?
From a list: my_array = np.array(my_list) Create an array of zeros (n = how many elements you want) np.zeros(n) Create an array of ones np.ones(n) Create an array of numbers from a to b, with spacing c np.arange(1, 10, 1) Create an evenly spaced array from a to b, with c points np.linspace(1, 10, 19) Create an array of random numbers from 0 to 1 of length n np.random.random(n)
152
What does np.arange() do?
Creates an array of numbers from a to b with spacing c. Pass in where to start, where to stop and the spacing that you want Stops before the stop number ie if you want 1 - 10 np.arange(1, 11, 1)
153
What does np.linspace() do?
Creates an evenly spaced array from a to be with C points. State how many points you want. This enables better precision and can be used to control the number of samples that you want.
154
What function creates an array of numbers from a to b with spacing c?
np.arange(a, b, c)
155
What does np.random.random(n) do?
Create an array of random numbers from 0 to 1 of length n
156
How do you create an array of random numbers of length n?
np.random.random(n) This creates an array of random numbers ranging from 0 to 1. Can apply transformations to get it into a range you want.
157
What is each dimension of an array called in NumPy?
An axis
158
How do you initialise a higher-dimension NumPy array?
You need to specify the data along each axis. eg my_2d_array = np.array( [ [1, 2, 3], [4, 5, 6] ] ) This is a nested list. The items are the rows. Items at the same position (the same index) within each sub-list form the columns eg my_3d_array = np.array( [ [ [1,2], [3,4], [5,6]], [ [7,8], [9,10], [11,12] ] ] )
159
How do you select an element from a 2D array?
We need to supply N indexes, equal to the number of axes (dimensions) print(my_2d_array[0,0]) print(my_2d_array[0][0]) The first index is the row, the second is the column.
160
How do you select a whole row or column from a 2D array?
Get the first row, all columns my_2d_array[0,:] Get all rows, the first column my_2d_array[:,0] Get all rows, last two columns my_2d_array[:,1:3] The slice of the array is itself returned as an array.
161
How do you determine how many dimensions an array has?
Determine the dimensions an array has AND the size of each dimension my_array.shape
162
What happens if you apply a simple expression to an array? eg array * 2
We can do this, the operation is applied to every element in the array. We are adding a scalar (constant value) to the array. Can do add, sub, mult, div
163
What happens if we multiply two arrays together?
We can add, sub, mult, div one array with another - but the behaviour is different to scalar mathematic expressions. Each element of the array will operate on the element in the same position in the other array. eg my_array * my_array - the array is squared If arrays are different shapes/sizes you can get errors or unexpected behaviours.
164
How can you combine arrays?
Combine array with shape (m,n) with: - Array with shape (1, n) - Array with shape (m, 1) ie a 1D array with the same number of rows or columns as the data. When you add/multiply, it will repeat the 1D array as many times as needed, in order to match the rows.columns in your data. The new array is "broadcast" to the shape of your data.
165
What is masking?
Masking is the term used for selecting entries in arrays, e.g. depending on its content. We can apply that mask to our data to retrieve a new array that only contains masked values. We can specify conditions and return sub-sets of the array.
166
What is a mask for getting even numbers?
even_numbers = (my_array %2 == 0) my_array[even_numbers] Testing each element for the condition
167
What does (my_array %2 == 0) return?
The conditional statement returns an array of Boolean True/False, with the same shape as the array. This can be used as a mask to pick out only the array elements where the condition is true. We mask arrays using square bracket notation, similar to slicing.
168
How do you apply multiple masks at the same time?
Using &
169
Why do we often work with 2D arrays in data science?
They are good for holding tabular data.
170
How do you get pi in the Jupyter notebook?
Import numpy as np np.pi
171
How do you generate data to plot for a sin curve?
x = np.linspace(0, 2*np.pi, 100) y = np.sin(x) Combine the two arrays into a new array data = np.column_stack([x,y])
172
How do you combine two arrays into a new array?
Using column_stack() or row_stack() functions Takes one argument - a list of arrays to stack The arrays to stack must be the same length as each other
173
How can we change the shape of an array?
We transpose the array using .T eg transposed_data = data.T For more complicated manipulation we can use the shape and reshape methods - data.shape to check the current shape - rd = data.reshape(2,100) - instead of 100 rows and 2 columns, reshape to 2 rows and 100 columns
174
How do you transpose data?
data.T The rows are now the columns
175
How do you reshape data?
reshaped_data = data.reshape(2,100) The size of the new array (n rows * n columns) must match the original ie the product of the axis lengths is constant. eg instead of 2 columns and 100 rows we can have 2 rows and 100 columns OR three_d_data = data.reshape(2,50,2) rows, columns, number of elements in each
176
How do you calculate the sum of elements of an array?
- OOP data.sum() - Procedural np.sum(data)
177
How do you find the minimum and maximum of an array?
Method approach data.min() data.max()
178
How do you compute statistics on the slices of an array?
Possible because the slice is just another array. eg mean of the first column print(np.mean(data[:,1]))
179
How do you write an array to a file to save for later use?
Using the savetxt() function Required arguments are the name of the file to save to (created if it does not exist, otherwise it will be overwritten by default) and the array to save. We can also specify the format of the data and the character to separate the data. np.savetxt('name.csv', data, fmt='%.4f', delimiter=',')
180
How do you load data from a file?
loadtxt() or genfromtxt() functions Required argument - file name. You can also specify the delimiter and dtype to ensure desired behaviour eg arr = np.genfromtxt("file.csv", delimiter=',', dtype= 'float')
181
What is the standard plotting library in python?
Matplotlib
182
What is Matplotlib?
A comprehensive library for creating static, animated and interactive visualisations in Python. It makes easy things easy and hard things possible.
183
How do you import the matplotlib module?
import matplotlib.pyplot as plt
184
What is pyplot?
A set of functions that can be used to create a figure with procedural programming. For better control over plotting, it is recommended to use an OO approach with Matlib objects.
185
What are the fundamental objects used in Matplotlib?
Figures - the entire area of the figure Axes - the area for plotting data
186
How do you create the axis and figure for a plot? When should you do this?
fig, ax = plt.subplots() Do this at the start of the plot
187
How do you obtain the size of the figure in pixels?
print(fig)
188
What is the default resolution of the figure in pixels?
The default resolution is 100 pixels per inch.
189
How can you specify the size of the figure?
Using the figsize argument fig, ax = plt.subplots(figsize=(7,5)) Size in inches
190
What condition needs to be met to plot some simple lines?
The points along the lines can be given as a list of x and y coordinates which must be the same length.
191
What is the minimal code for plotting a line graph?
fig, ax = plt.subplots() x = [...] y = [...] ax.plot(x,y) ax.set_xlabel("X") ax.set_ylabel("Y") plt.show() This is the OOP approach
192
How do you set labels on your plot?
ax.set_xlabel("X") ax.set_ylabel("Y") Procedural: plt.xlabel("X") plt.ylabel("Y")
193
How do you display the plot in Jupyter?
plt.show() is used in Python to display the plot, not always needed in Jupyter.
194
What differences are there between the OOP and procedural approaches in plotting?
Procedural approach - we call functions from pyplot Using methods (OOP) often start with set_, functions often do not eg plt.xlabel. In procedural, we don't tell Pyplot which axis to plot the data on, it infers which axis to use (the most recent one).
195
What kind of objects is matplotlib built to handle?
Numpy arrays
196
How do you plot two columns?
ax.plot(data[:,0],data[:,1])
197
What ways can you customise a plot?
- Changing the units - Changing the upper and lower limits on the axes - Changing the axes tick marks - Adding another curve to a figure using the legends - Changing line styles and colours - Add arbitrary text labels - Add a title
198
How can you customise the units of a plot?
Apply conversion in ax.plot. Can do numpy operations directly in the .plot as long as it produces another array. eg ax.plot(data[:,0]/np.pi*180, data[:,1])
199
What is the conversion between degrees and radians?
degrees = radians * pi/180
200
How can we change the upper and lower limits on the axis?
ax.set_xlim() eg ax.set_xlim(0,360)
201
How can we change the axes tick marks?
ax.set_xticks([...]) ax.set_yticks([...]) Pass in a list of the tick marks you want.
202
How do you add a second curve to a plot?
Use two ax.plot functions in the one plot.
203
How do you distinguish between two curves on the same plot?
Add a legend ax.legend()
204
How do you change the location of a legend?
Using the loc keyword - 'lower', 'center', 'upper' for vertical placement - 'left', 'center', 'right' for horizontal placement eg ax.legend(loc = 'upper center')
205
How do you add a box to the legend?
frameon=True eg ax.legend(loc = 'upper center', frameon=True)
206
How do you alter the thickness and style of. line?
Add lifestyle='-' and line width = 2 to ax.plot Available line styles include '-' (solid), '--' (dashed), ':' (dotted), '-.' (dash-dot), '--.' eg ax.plot(data[:,0]/np.pi*180, cosine, label='cos(x)', color='deeppink', linestyle='--', linewidth=2)
207
How do you change the colour of a plotted line?
Add color="" to ax.plot
208
How do you add an arbitrary text label to a plot?
ax.text(120, 1, "Maximum", fontsize=20) Providing the coordinates where you want to write the text and the string you want to put in.
209
How can you customise font size?
fontsize =
210
How do you add a title to a plot?
ax.set_title("Title")
211
How do we display multiple axes on the same figure?
This means showing different information on different panels of a single figure. Using the plt.subplots() function we can specify how many axes in the vertical direction with nrows and the horizontal direction with ncols. fig, axes = plt.subplots(figsize=(8,8), nrows=2, ncols=1) ax1 = axes[0] ax2 = axes[1] now we access using ax1 and ax2 etc.
212
What is the keyword to generate a line graph?
ax.plot()
213
How do you create a scatter plot?
ax.scatter(x, y, marker="o")
214
How do you plot a scatter plot with error bars?
plt.errorbar(x, y, xerr, yerr, fmt="o", color="r")
215
In what ways can we customise a scatter plot?
Shape and colour of the plots for errorbar Outline: - '.' : point - '+', 'x' : crosses Filled: - 'o' : circle - 's' : square - '^', '<', '>', 'v' : triangles in different directions - 'd', 'D'; 'p', 'P'; 'h', 'H' : different types of diamond, pentagon or hexagon - '*' : star Line plots: - '-', '--', ':' etc fmt='s' color='gold' markersize=6 markeredgewidth=2 markeredgecolor='k' ecolor='k
216
How do you control the shape of the errorbar plot?
fmt (format)
217
How do you create a histogram?
ax.hist(x) Specifying the bins: ax.hist(x, bins=20)
218
What is the default number of bins if not specified?
10 bins (of equal width)
219
What should you consider when choosing the size of your bin?
With finer bins, we can see more detail in the distribution. But if we use too many bins we can overdo it and end up with lots of misleading gaps.
220
How can you further customise a histogram?
- Changing colour - Changing from a filled histogram to an outline - Normalise the histogram to plot the probability density rather than total frequency ax.hist(heights, bins=20, color='teal', histtype='step', density='True')
221
How do we get the values of the bin edges and the numbers in each bin?
The hist function returns these already. counts = ax.hist(x) numbers in each bin - counts[0] boundary edges - counts[1]
222
What kind of plots are useful for categorical data?
Bar charts and pie charts
223
How do you create a bar chart?
ax.bar(categories, counts, color=bar_colors) These are all lists to be passed in
224
How do you create a pie chart?
ax.pie(counts, labels=categories, colors=bar_colors, autopct='%d') Include optional argument, auto percent to print the percentages - d means it prints as an integer
225
How do you display image data in Matplotlib?
Matplot has an easy way to make plots using images (eg a picture or photograph) Data must be provided as a 2D NumPy array. Matplotlib will display the array as a grid of pixels, with the intensity of each pixel determined by the value of the array at that position. image = np.gemfromtxt("pixels.txt") fig, ax = plt.subplots(figsize=(8,8)) ax.imshow(image, origin='lower', cmap='Greys_r', vmin=0, vmax=300) - Origin determines which way up it will be printed - CMAP - what colour do you want it print - Vmin and max are saturation points (less than 0 = fully black, above 300 = fully white, important for contrast)
226
What do you do if you don't want to show any tick marks on a figure, eg for an image?
ax.set_xticks([]) ax.set_yticks([])
227
How do you "zoom in" on an important part of an image? How do you add a circle to highlight this?
Using array slicing to zoom in ax.imshow(image[80:220,80:220], origin='lower', cmap='Greys_r', vmin=0, vmax=1000) Highlighting key features ax.scatter(70,70,marker='o',s=10000,c='None',edgecolors='r',label='Supernova')
228
How do you save a plot?
Reduce whitespace around your figure: plt.tight_layout(pad=0.5) Save your plot: plt.savefig('image.png')
229
What is Pandas?
Pandas builds on NumPy and introduces a new object called a data frame (or a series if one-dimensional)
230
What is the difference between a dataframe and pandas series?
Pandas series is one-dimensional (more similar to a list rather than a tabular structure)
231
What advantages do data frames provide for data science?
- A data frame looks like a table or spreadsheet, with convenient column and row labels - A data frame includes methods for sorting, filtering and performing complex operations on data - Columns can be of different data types (unlike an array) - Provides some of the functionality of an array
232
How do you load a dataframe from a file?
import pandas as pd df = pd.read_csv("data.csv") When we load the data into Pandas, the first row is assumed to be the column headings. If we wanted to we could override this behaviour by providing a list of column names to an optional keyword, names=.
233
How do you import pandas?
import pandas as pd
234
How do you determine the number of rows in a data frame?
Length len(df)
235
How do you examine the first few rows of a dataframe?
df.head(x) - where x is the number of rows to display
236
If you want to display the dataframe, what could you do?
print(df) but this isn't very nice can call df directly but this must be the last command in the cell
237
How are rows indexed?
The rows are given numerical indices by default. Sometimes one of the columns in the data is already a convenient index. We can assign this as the index df = pd.read_csv('titanic.csv', index_col='PassengerId')
238
How do you assign an index within the read_csv() function?
index_col="Column Name"
239
What should the first step of any data analysis be?
Clean up the data set - removing unwanted data, missing values or duplicates
240
How do you drop a column?
df = df/drop(columns=["X"]) you can drop multiple columns at once by providing a list of column labels
241
How might missing values be represented in the dataset?
NaN Not a number value - usually this represents a missing value
242
Check for missing values
df.isna() eg df.isna().head(6)
243
How do we remove rows with NaN values?
df.dropna() eg df = df.dropna(subset=['Age']) If we don't specify a subset of columns to use, it will remove all rows that have a NaN in any column.
244
How can we remove any rows that appear more than once in the data set?
df.drop_duplicates() df.drop_duplicates(subset='Ticket')
245
How do we slice data from a pandas data frame?
Using loc() and iloc() NB: they use square brackets like in array indexing loc gets rows (and/or columns) with particular labels. iloc gets rows (and/or columns) at integer locations.
246
How do we get a single column from a data frame?
df['Age'] Slicing syntax
247
How do we get the first row of a data frame?
df.iloc[0] As this is only 1D it displays a series.
248
How do we get rows 100-110 of a dataframe?
df.iloc[100:110]
249
How do we get rows 100-110 and the first four columns of a dataframe?
df.iloc[100:110, :4] This will return 5 columns in total - the index column and then the first 4 columns
250
Why might loc() be more useful than iloc()?
We may not know the specific index to search for but we do know the column title.
251
How do we return only the "Name" column?
df.loc[:,'Name']
252
How do we retrieve the name, sex and age columns of the first 10 passengers?
df.loc[:10, 'Name':'Age'] df.loc[:10, ['Name','Age', 'Fare']] NB: you can retrieve non-consecutive rows/columns by providing a list. Therefore df.loc is very flexible.
253
How do you use loc or iloc to return only the rows or columns where a certain condition is met?
Masking - provide an array of T/F to loc. eg df.loc[(df['Pclass']==1)]
254
How do you check whether values in a data frame column are in a list of possible values?
Using the .isin() method eg df.loc[df['Pclass'].isin([1,2])]
255
How do you compute summary statistics for numerical columns in a data frame?
df['Age'].mean() mean / min / max - these calculations automatically ignore NaN values
256
How do you sort values for a column in a data frame?
df['Age'].sort_values()
257
How do you sort entries in a data frame by a particular column?
df.sort_values(by='Age') df.sort_values(by='Age', ascending=False)
258
How can we get the length of a dataframe / column?
len() python function - number of rows df.size pandas property - number of cells ie rows by columns
259
What is the result of adding two columns together?
A 1D data frame ( a series) - the original indices are still present. To get only the values, we can access df.values - this is a property not a method, so no()
260
How do we get the values of a 1D dataframe / pandas series?
df.values No brackets, it is a property not a method.
261
How do we get the values of a column?
df["Name"].values
262
How do you add columns together?
df.add() method eg relatives = df['SibSp'].add(df['Parch'], fill_value=0) store it in a new column: df['Relatives'] = df['SibSp'].add(df['Parch'], fill_value=0)
263
What is the advantage of using the df.add() method to add columns, rather than using the + operator?
It will not try to add a NaN if the column has missing values, you can specify what value to use in place of NaNs by including a fill_value This is safer for handling NaN values rather than simple addition
264
What operations can you use on columns so that NaN values can be handled appropriately?
df.add() df.subtract() df.multiply() df.divide()
265
How can you plot data from a data frame?
Matplotlib can naturally understand data frames just like Numpy arrays. You can pass the columns directly to plotting commands. eg ax.hist(df['Age'], bins=30)
266
How do we apply functions to an entire column?
The df.apply() method Define the function needed if required. df["Name"].apply(function_name) Returns a series of Quicker = lambda functions, define as a temporary function inside apply() df['Name'].apply(lambda x: x.split(',')[0])
267
How do you split a string?
my_string.split(',')
268
How do you get the first/last part of a split string?
my_string.split(',')[0] my_string.split(',')[-1]
269
What is a benefit of using a lambda function in apply()
It is defined as a temporary function and avoids using memory for a function that is used only once
270
How do you group data in a data frame?
df.groupby() classes = df.groupby('Pclass') This provides a dictionary, where the keys are the groups and each contains a list of row indexes. The object produced has the function/attribute .groups print(classes.groups) to see the index of the rows belonging to each key.
271
How do you see the keys (ie the groups) from grouped data?
new_groups = df.groupby('Embarked') new_groups.groups.keys()
272
Why is grouping useful?
We can quickly calculate statistics separately on each of the different groups. Allows us to investigate aggregated data rather than on the whole dictionary directly. We can do this with any column of our grouped data using square bracket notation.
273
How do you calculate summary statistics for a grouped data frame?
classes = df.groupby('Pclass') classes['Fare'].mean() This returns a value for each of the group keys
274
How do you determine how many entries fall into each group?
Look at the size of each group - classes.size() NB: for a dataframe object, size is a property (no parentheses) but for a grouped object is is a method, requires parentheses
275
How do you determine and rank by how many entries fall into each group?
classes.size().sort_values(ascending=False)
276
How do you create a dataframe for a specific group?
Use get_group() first = classes.get_group("Group Name") The argument "Group Name" should match one of the keys in classes.groups.keys().
277
How do you make an array of 12 random integers from 40 to 100?
data = np.random.randint(low=40, high=100, size=12) This will make a 12 x 1 array
278
How do you convert a Numpy array into a dataframe?
Use the pd.DataFrame function df = pd.DataFrame(data) where data is a NumPy array The shape and content is preserved, but the rows and columns now have explicit names. By default the NumPy row and column indices.
279
How do you retrieve a column from a data frame?
Using familiar square bracket notation with the name of the column df["Name"] or df[1] or more explicitly (better for more complex selections) df.loc[:, "Name"]
280
When creating a data frame from an array, rather than using default indices, how can we create memorable column headings or row indices?
use index= and columns= attributes eg df = pd.DataFrame(data, index=['Matt','Jonathan','Fiona','Deepak'], columns=['DSA8001','DSA8002','DSA8003'])
281
How do we create a dataframe directly from a dictionary?
data_dict = { "Module 1": {"Matt":80, "John":60}, "Module 2": {"Matt":70, "John":63}, "Module 3": {"Matt":82, "John":76}, } df = pd.DataFrame(data_dict) Outer keys define the column headings ie modules will be the column Each nested dictionary defines one row
282
Why do we not need to specify labels when creating a data frame directly from a dictionary?
Pandas will use the dictionary keys
283
How can you add a column to an existing dataframe?
Insert method df.insert(loc=1, column="Name", value=data) The length of the array data needs to match the number of rows in the df.
284
How do you add a new row to a data frame?
Concat function Concatenates a new data frame to the end of the current one df = pd.DataFrame(new_student, index=['New Student'], columns=['DSA8001','DSA8002','DSA8003','DSA8021']) May need to reshape the data to be added after creation, using reshape. Concatenation only works well if the column labels match. It will fill in things with NaN an may convert existing data (NaN is not an integer, things may be converted to float).
285
How do you save a data frame to a file?
to_csv() - typically we save as a CSV file using this built-in dataframe method df.to_csv("file_name.csv")
286
What might happen if we repeatedly read, edit and save CSV files with Pandas?
When you open a data frame with read_csv(), it adds a numerical index column by default. We may end up doing this repeatedly, adding another index column each time. Best to specify a particular column to use for the row indices when reading in the CSV df.read_csv("file.csv", index_col=0)
287
How should you read in a data frame from a file?
df.read_csv("file.csv", index_col=0)
288
Some columns contain JSON data, how is this formatted?
JSON is a string, formatted like a dictionary. It is very flexible for dataframe columns that need to contain complex information.
289
How is complex information stored in a data frame column?
JSON data - can be stored as a dictionary
290
Before working with complex data stored in a column in JSON format, what do we need to do?
import json
291
How do you create a data frame with JSON in a column?
eg df = pd.DataFrame(index=['Matt','Jonathan','Fiona','Deepak'], columns=['module_scores'])
292
What function is used to retrieve information from a data frame with JSON data?
json.loads(x) eg getting the "DSA8002" column info. df['module_scores'].apply(lambda x: json.loads(x)['DSA8002'])
293
What does json.loads do?
The json.loads() method can be used to parse a valid JSON string and convert it into a Python Dictionary
294
How do you get the value of a specific row and column from JSON data in a data frame?
Index the series returned from json.loads(x) like any other data frame eg df['module_scores'].apply(lambda x: json.loads(x)['DSA8002']).loc['Matt']
295
How do we convert a column to a date time dtype?
pd.to_datetime df['datetime'] = pd.to_datetime(df['datetime'])
296
How do we check the data type of a column?
df['datetime'].dtypes
297
How do we extract hours/years etc. from a date time object? (from the timestamp)
df['hour'] = df['datetime'].dt.hour or dt.year etc.
298
How do we calculate eg the total sales for each category in a data frame?
Group by and then sum spend_by_hour = df.groupby('hour') spend_by_hour['total'].sum()
299
When would data frame merging be more useful?
If two data frames contain only some columns in common, it is often more useful to merge rather than concatenate.
300
What is the theory of merging two databases?
We find the columns in common between the two databases and return a set of rows with those columns.
301
What is merging a pandas data frame equivalent to?
JOIN statements in SQL.
302
What are JOIN statements in SQL equivalent to in pandas data frame?
Merging
303
What function is used to merge data frames?
pd.merge(df1, df2, on="Column", how="left")
304
What are the different ways data frames can be merged?
- Left join - keep everything in the left table and what's in the right table if available - Right join - keep everything in the right table and what's in the left table if available - Inner join - return only entries that are present in both tables - Outer merge - returns all rows across both tables
305
What is the opposite of the inner join?
The outer join - returns everything across both tables
306
Which type of merge is most likely to have lots of NaNs?
The outer merge / full join in SQL
307
Are data frames static?
No, new data can be inserted by adding rows or columns
308
What is SQL?
Structured Query Language Used as a tool to search relational databases. Can search, filter, group or combine databases to return entries matching certain criteria. Most popular language to manage relational databases
309
What is a relational database?
- Data are stored as a table or tables with rows (records or tuples) and columns (attributes) - Each record has a unique key - Each table represents a particular type of data eg one table to store information on customers, another to store products
310
What is the advantage of SQL?
It is written closer to natural language, so queries can be constructed more intuitievely..
311
What are the different data types in SQL?
Numeric - eg INTEGER, FLOAT(p) with p digits of precision String types - CHARACTER (L) with fixed length L, or VARCHAR(L) with a maximum length L DATE, TIME BOOLEAN (True, False)
312
Why are we able to use SQL to perform queries on pandas data frames?
Pandas data frames are relational databases.
313
Before performing SQL queries directly in Python/Pandas, what do we need to do?
import pandas as pd import pandasql as ps
314
What is pandasql?
A handy python module to query pandas data frames
315
If you don't have pandasql, what should you do?
!pip install pandasql
316
What is the general syntax for writing and executing an SQL command in the Jupyter notebook?
query = ''' ''' ps.sqldf(query)
317
What is a simple query to fetch all data?
query = ''' SELECT * FROM dataframe ''' ps.sqldf(query)
318
What are SQL queries composed of?
Combination of "clauses" with the names of tables and/or columns
319
What clause returns entries of interest?
SELECT
320
What does the SELECT clause do?
Returns entries
321
What does the FROM clause do?
Tells SQL which database to select the columns from
322
Why are SQL clauses written in capital letters?
They are not case sensitive. Writing in capitals helps differentiate them from the names of tables etc.
323
What is returned when we use PandaSQL?
A Pandas DataFrame - which is very convenient for further database operations.
324
How do you select a specific column from a database?
query = ''' SELECT "Column Name1", "Column Name2" FROM dataframe ''' ps.sqldf(query)
325
How do we return entries that have a specific attribute?
Use conditional searches using the WHERE clause. query = ''' SELECT * FROM dataframe WHERE city = "Belfast" ''' ps.sqldf(query) NB: single = sign, not == NB: Need to put string of interest in different quotation marks to overall string query
326
What conditions can we apply with WHERE?
- =, <, <=, >, >= - BETWEEN X AND Y - number in a specified range - IN ('X', 'Y') - values in a given list - LIKE '%YZ%' - value matches a given pattern YZ where % is used to represent free text before and/or after the pattern
327
How can we apply multiple condition at the same time in SQL?
Use the AND clause
328
How can we sort data by column values in SQL?
query = ''' SELECT * FROM dataframe ORDER BY column DESC ''' ps.sqldf(query) Can specify ASC or DSC
329
How can we sort data by multiple column values?
query = ''' SELECT * FROM dataframe ORDER BY column DESC, column2 ASC ''' ps.sqldf(query) The ordering is applied one after the other
330
How do we modify the SQL query so that we only return a small number of rows?
The LIMIT clause. This is similar to the head() function in Pandas. query = ''' SELECT * FROM dataframe LIMIT 5 ''' ps.sqldf(query)
331
What kind of data aggregation computing statistics are usually performed in SQL?
- COUNT - returns the number of records - MIN, MAX - returns the smallest/largest entries in a column - SUM - sum of entries in a column - AVG - average of entries in a. column
332
How do you find out eg how many women are in a database using SQL?
query = ''' SELECT COUNT(*) FROM dataframe WHERE Gender = "Female" ''' ps.sqldf(query) NB: you would get the same answer whether you count the whole database or a single column - the number of rows will be the same either way.
333
How do we use simple expressions in SQL to return a calculation?
query = ''' SELECT Number, Income/1000 AS [Income ($k)] FROM dataframe ''' ps.sqldf(query) NB: be aware of automatic rounding here, because the column was an integer, it automatically converts a float to an integer.
334
How do we name a created column in SQL?
Using the AS clause It should come right after the column or expression of interest. Need to include square brackets if you want the column name to have a space.
335
What clause allows us to return different values depending on the content of a column?
The CASE clause query = ''' SELECT Number, Gender, Age, City, CASE WHEN City='New York City' THEN 'North' WHEN City='Dallas' THEN 'South' END AS Region FROM citizens ''' ps.sqldf(query)
336
What does the CASE clause do?
Allows us to return different values depending on the content of a column. The CASE and END start and end the logical criteria AS specifies to column name to store the result
337
How do we calculate summary statistics based on a categorical column?
Using the GROUP BY clause query = ''' SELECT Age, COUNT(*) FROM citizens GROUP BY Age ''' ps.sqldf(query)
338
When do we use GROUP BY in SQL queries?
To calculate aggregated statistics (counts, averages etc.). You can'y display columns from grouped tables, you just see the first record in each group. Only the column used for grouping and aggregated statistics should be included with the SELECT clause.
339
How do you group by multiple columns?
Select the two columns you want to group by, and select the aggregate statistic for the third column. The order typed is the order they are grouped by query = ''' SELECT Gender, Age, AVG(Income) FROM citizens GROUP BY Age, Gender ''' ps.sqldf(query)
340
When combining WHERE and GROUP BY clauses, what order should they be stated in?
Order matters Need to apply the WHERE clause before grouping the data, so that undesired rows are not included in the grouping stage.
341
What clause do you need to apply conditions to grouped data?
HAVING clause
342
How do you use the HAVING clause?
query = ''' SELECT Gender, Age, AVG(Income) FROM citizens GROUP BY Gender, Age HAVING Age > 30 ''' ps.sqldf(query) "Group by gender and age, but show me only the ones having ages over 30"
343
Why is the HAVING clause necessary?
It allows us to perform more complex conditions, by applying criteria to the statistics of each group. Eg you only want groups with an average income over X. We need to do the grouping first before we can apply the condition.
344
What is the operation to join two tables in SQL?
JOIN The first table is always the left table The second table is always the right table Joined based on a common column
345
If you do not specify the JOIN type in an SQL query, which type of JOIN is automatically performed?
Inner join ie only returns records present in both tables.
346
What information do you need to provide in the SQL query when performing a JOIN?
You must provide the column to use for the join, otherwise the entire right table will be repeated for every record in the left table. Specify columns to join on using ON ON citizens.Number = welfare.IDNum It is best to specify which table the column is coming from, to avoid any ambiguity if there is a column with the same name in each table.
347
How do you write a JOIN query in SQL?
query = ''' SELECT * FROM citizens JOIN welfare ON citizens.Number = welfare.IDNum ''' ps.sqldf(query)
348
What can be handy to do in complex queries?
Give each table a shorthand name, query = ''' SELECT * FROM citizens c JOIN welfare w ON c.Number = w.IDNum ''' ps.sqldf(query)
349
When is an SQL sub-query used?
When it takes more than one query to get what we want. Avoids hardcoding, which is not very efficient.
350
How do you specify a sub-query?
Specified using round brackets. The table or value the sub-query returns directly feed into the overall query. query = ''' SELECT * FROM citizens WHERE Income > (SELECT MAX(Income)/2 FROM citizens) ''' ps.sqldf(query).
351
When might you need to use a sub query?
When you need to compare a column against a list of values. Eg finding all citizens belonging to age groups with average incomes over 55000
352
What clause is used to convert a character string to JSON?
JSON_EXTRACT JSON_EXTRACT(table.column, "$[x].key") AS new_colimn Specifying which element of the JSON list we want to extract (0 is the first, 1 is the second etc.) The key we want is optional
353
What is the syntax for an SQL query using JSON_EXTRACTS?
query = ''' SELECT title, JSON_EXTRACT(credits.cast, "$[0].name") AS starring FROM credits ''' ps.sqldf(query)
354
If data has a very large dynamic range, what is it good to do?
Look at the logarithm of the data (convert to powers of 10)
355
How do you convert an axes to the logarithmic scale?
using np.log10() function create new columns with the logarithmic conversions and then plot these. OR plot directly with plt.xscale('log')
356
How can you identify trends?
Looking for the slope of the relationship. Using np.polyfit() to fit a polynomial to data.
357
How do you use the polyfit function?
It takes an x array, y array and a degree (first = linear, second = quadratic etc.) f = np.polyfit(x=df['log_galaxy_mass'], y=df['log_bh_mass'], deg=1) f[0] is the slope f[1] is the intercept We can then plot this using dummy data x_arr = np.arange(x1,x2,0.1) y_mod = f[0] * x_arr + f[1] plt.plot(x_arr, y_mod, color='r')
358
How do you quantify the goodness of fit of the polynomial fitted?
Calculating the Mean Squared Error - the average difference between model and data MSE = mean((data - model)**2) Calculate the predicted y values linear_prediction = f[0] * df['log_galaxy_mass'].values + f[1] Calculate MSE mse = np.mean( (linear_prediction - df['log_bh_mass'])**2 )
359
When is the MSE good for a data set?
When the range of the y axis is several times bigger than 1 - so the fit is doing better than random
360
What is SciPy?
A Python scientific module which provides algorithms for many mathematical problems. We use it for correlation in this module (does bigger x really mean bigger y)
361
How is correlation determined?
Using a statistic called Spearman's Rank Correlation coefficient from scipy.stats import spearmanr print(spearmanr(df['log_galaxy_mass'], df['log_bh_mass']))
362
How do you convert a list of strings to a single string?
" ".join(x)
363
How do you remove an item from a list?
list.remove(item)
364
How do you insert an item into a list?
list.insert(position, item)
365
How do you add something to the end of a list?
list.append(item)
366
How do you investigate the names of the keys in a dictionary?
dict.keys()
367
What is XeY shorthand for?
XeY is short-hand in Python for “X times 10 to the power of Y”
368
What does 1e6 represent?
1 x 1 000 000
369
How do you write a dictionary to a file?
Within the with open as - json.dumps(data, f)
370
How do you read in a dictionary within a file?
Within the with open as - json.load(f)
371
How do you check in a condition that a variable is of a certain type?
isinstance(string1, str)
372
What syntax for raising an exception can be used in a function?
def paint(self, colour): try: if isinstance(colour, str): self.colour = colour else: raise TypeError("Paint should be provided as a string") except TypeError: print(TypeError, "- the colour remains", self.colour)
373
How can you remove an item from a list?
del list[0]
374
How does python store a list?
In a simplified sense, you are storing a list in your computer memory, and store the address of that list, so where the list is in your computer memory in x. This means that x does not actually contain all the list elements, rather it contains a reference to the list.
375
How can you create a new list from an original list, so that it is passed by value rather than reference?
y = list(x) - this is a more explicit copy of the list rather than y = x
376
How do you find the maximum value of a list?
max(list)
377
How can you round a value?
Round function round(value, precision)
378
How can you look at python documentation?
help(function_name)
379
How do you find the length of a list?
len(list)
380
How do you sort a list?
sort(list, reverse=False)
381
How do you get the index of a specific item in a list?
list.index(item)
382
How do you count the number of time an element appears in a list/string?
list.count(element)
383
How do you capitalise the first letter of a string?
string.captialise()
384
How do you replace part of a string with a different part?
string.replace("x","y")
385
How do you convert an entire string to all caps?
string.upper()
386
How do you reverse the order of a list?
list.reverse() - this changes the list it is called on
387
What is the NumPy array an alternative to?
The NumPy list
388
How do you create a NumPy array from a list?
np_array = np.array(list) Assumes the list contains elements of the same type
389
In a NumPy array, how are True and False treated?
As 1 and 0
390
How do you investigate the size of a numpy array?
array.shape
391
How can you subset. single element from a 2D NumPy array
array[0][2] or array[0,2]
392
How can you get the mean of a column of a 2D NumPy array?
np.mean(dataset[:,0])
393
How can you check if two columns of a 2D NumPy array are correlated?
np.corrcoef(dataset[:,0], dataset[:,1]) correlation coefficient
394
How can you calculate the standard deviation of a NumPy array column?
np.std(dataset[:,0])
395
How do you generate random data points from a normal distribution?
data = np.round(np.random.normal(1.75, 0.2, 5000), 2) mean = 1.75, std = 0.2, 5000 samples
396
What package do we use for data visualisation?
Matplotlib import matplotlib.pyplot as plt
397
When is it appropriate to plot a line graph?
When time is on the x-axis
398
In a scatter plot, how do you set the size of plots?
s=numpy array
399
How can you add grid lines to your plot?
plt.grid(True)
400
How do you look at the keys of a dictionary?
dict.keys()
401
What type of values can dictionary keys be?
immutable objects
402
What are examples of immutable object types that can be used as dictionary keys?
Strings, Booleans, integers and floatsH
403
How can you check if a key is already in a dictionary?
"key" in dictionary - see if it returns True or False
404
How can you delete a value from a dictionary?
del(dictionary["key"])
405
How can you manually check if two arrays are compatible for broadcasting?
np.broadcast_to()
406
How do you find the maximum value of a numpy array?
np.max(array)
407
How do you find the index of the maximum value of a numpy array?
np.argmax(array)
408
How you transform all values in a numpy array to positive?
np.absolute(array)
409
How do find the find the base 10 logarithm of 1000?
np.log10(1000)
410
How do you find the exponential of 1?
np.exp(1)
411
What kinds of mathematical functions can you access with numpy?
np.sin(x) np.cos(x) np.pi
412
How do you count the number of occurrences of eg a City in a database?
Group by city then find the size eg home_team = matches.groupby("Home Team Name").size()
413
How do you make a dataframe from a dictionary and change the names of the indexes?
pd.DataFrame(dictionary) Indexes automatically given df.index = [list_of_strings]
414
How do you select a column from a data frame and keep it in a data frame (rather than a pandas series)?
Use double square brackets df[["column"]]
415
How do you select multiple columns from a data frame by name?
df[["column1", "column2"]] OR df.loc[:, ["column1", "column2"]]
416
To carry out the slicing function my_array[rows columns] on pandas data frames what do we need?
loc and iloc
417
How can you only select certain columns and certain rows of a data frame?
df[ ["row1","row2"], ["col1","col2"]]
418
How do you apply multiple logical operators to a NumPy array / pandas series?
np.logical_and(array > 1, array < 5) array[np.logical_and(array > 1, array < 5)] np.logical_and() np.logical_or() np.logical_not()
419
How do you write a for loop to include access to the index?
for index, var in enumerate(seq): expression
420
How do you loop over a dictionary to access both key and value?
for key, value in dictionary.items(): expression
421
How do you loop over an array to get each element?
To get every element of an array, you can use a NumPy function called nditer (ND iter) for val in np.nditer(array): print(val)
422
When looping over a dataframe - what does the following print out? for val in dataframe: print(val)
Prints out the column names
423
How do you iterate over the rows of a data frame?
In pandas, you need to explicitly say that you want to iterate over the rows. Generates label on row and actual information. for label, row in np.iterrows(dataframe)|: print(label) print(row) dataframe.loc[label, "country_name_length"] = len(row["country"]) Can also select a specific column eg print(row["column_name"] or (as shown, can create new column) But this is inefficient - use .apply eg dataframe["country_name_length"] = dataframe["country"}.apply(len)
424
How can you create a column that contains a calculation based on another column?
Use .apply(function) eg dataframe["country_name_length"] = dataframe["country"}.apply(len)
425
What does .apply() do?
Allows you to apply a function on a particular column in an element-wise fashion.
426
How do you generate random numbers, ensuring reproducibility?
Using a seed - generate pseudo random numbers np.random.seed(123) np.random.rand()
427
How do you randomly generate a 0 or 1?
np.random.randint(0,2) This simulates a coin toss
428
How can you simulate a dice throw?
np.random.randint(1,7)
429
In functions with eg subtracting, how can you account for the fact you can't have a negative number?
x = max(0, calculated_value) this ensures x never goes below zero
430
How do you transpose a 2D NumPy array?
np.transpose(array)
431
How do you add a description of a defined function?
Use of docstrings - placed inside triple double quotation marks def function(paramters): """ """
432
How do you change the value of a global parameter inside a function?
use keyword global global name
433
In a nested function, how can you change the value in an enclosing scope?
nonlocal keyword
434
How do you allow for passing multiple arguments into a function?
*args
435
How do you allow for passing multiple keyword arguments into a function?
**kwargs This turns the identifier keyword-pairs into a dictionary within the function body Then, within the function body, we print all the key value pairs stored in the dictionary kwargs for key, value in kwargs.items():
436
How do we apply a lambda function to all elements of a list? How do we print results of this lambda function?
We need to use map() to apply the lambda function to all elements of the sequence result = map(lambda x,y: x+y) It returns a map object, convert to list using list(result)
437
How can you filter out elements of a list which don't meet certain criteria?
result = filter(lambda x: len(x) > 6, list)
438
What kind of error is thrown when an operation or function is applied to an object of an inappropriate type?
TypeError
439
When should we raise an error (instead of catching in an except)?
eg if we don't want our function to work for a particular set of values - such as don't want to square root negative numbers using an if statement, we can raise a value error for cases in which the user passes the function a negative number if x < 0: raise ValueError("X must be non-negative")
440
in an SQL query, how do you count unique values?
COUNT (DISTINCT "column_name")
441
How do you determine the number of rows in a data frame?
len(df)
442
How can you quickly inspect a data frame?
df.info()
443
What does df.describe() do?
The describe() method computes some summary statistics for numerical columns like mean and median
444
What are the components of a data frame that you can access?
df.values - a 2D NumPy array df.columns - column labels df.index - row labels
445
How can you sort a data frame by multiple column values?
df.sort_values([col1, col2], ascending=[True,False])
446
How do you select multiple columns from a data frame?
Need double square brackets df[["col1", "col2"]]
447
How do you compare dates in a logical comparison?
The dates are in quotes, written as year, month then day This is the international standard date format
448
How can you filter a dataframe on multiple options of a categorical variable?
Using .isin() dogs["colour"] .isin(["Black", "Brown"])
449
What method allows to calculate custom summary statistics?
Aggregate .agg() def function(column): return column.quantile(0.3) df["column"].agg(function) Can be used on multiple columns - pass in list ["col1","col2"] Agg itself can also take a list of functions to apply at the same time Can use .agg for the IQR
450
How can you calculate the cumulative sum of a column?
Calling .cumsum() on a column returns not just one number, but a number for each row of the data frame df["column"].cumsum() Can also have .cummax(), .cummin(), .cumprod() These all return an entire column of a dataframe, rather than a single number
451
When counting in a dataframe, how do you ensure you only count each "thing" once?
use .drop_duplicates() eg df.drop_duplicates(subset=["col1", "col2"]
452
After subsetting, how can you count the number of values in a table?
To count the dogs of each breed, we subset the breed column and use the value_counts() method Can do .value_counts(sort=True)
453
How can you turn counts into proportions of the total?
df["column"].value_counts(normalize=True)
454
How can you calculate the mean weight of each colour of dog?
dogs.groupby("colour")["weight"].mean()
455
What does the .agg method allow you to do?
Pass in multiple summary statistics at once to calculate df["column"].agg([np.min, np.max, np.sum])
456
What are pivot tables?
A way of calculating grouped summary statistics .pivot_table() df.pivot_table(values="col", index="colour") o The values argument is the column that you want to summarise o The index column is a column that you want to group by Automatically calculates the mean, if you want another statistic, use aggfunc df.pivot_table(values="col", index="colour". aggfunc=np.median) To group by more than one variable, pass in columns df.pivot_table(values="col", index="colour", columns="breed', fill_value=0, margins=True)
457
How do you set the index of a data frame?
df.set_index("column") can include multiple columns df.set_index(["col1", "col2"])
458
How do you reset the index of a dataframe?
df.reset_index() to get rid of it completely df.reset_index(drop=True)
459
How can you subset a data frame with row labels?
.loc df.loc[[item1, item2]]
460
How do you subset rows at the outer level of an index vs the inner level, when there are two indexes?
Outer -df.loc[[item1, item2]] - with a list Inner - df.loc[[(oteritem1, inneritem1), (outerritem2, inneritem2]] - with a tuple
461
How can you sort values by their index?
.sort_index() for multiple indexes - By default, it sorts all index levels from outer to inner, in ascending order, can control this;: df.sort_index(level = [inner, outer], ascending=[True, False])
462
What does slicing do?
Selects consecutive elements from objects
463
If a column contains a date type, how can you access the different elements of the date?
df["columns"].dt.year /.dt.month etc
464
What is the simple way to plot?
eg df["column"].hist() avg_weight_by_breed.plot(kind=bar)
465
How do you rotate axis labels by 45 degrees?
pass in rot=45
466
How can you investigate if there are any missing values in your dataset?
Represented by NaN df.isna().any() - tells you if there are any missing values in each column df.isna().sum() - tells you how many missing values are in each column
467
What can you do with missing values in a dataframe?
Drop - df.dropna() Fill with 0 - df.fillna(0)
468
How do you convert a data frame to a CSV file?
df.to_csv("new filename.csv")
469
How do you find the value in column 1 based on a condition in columns 2?
journalsBSA.iloc[journalsBSA["Rank"].idxmin()].loc["Rank"] correct - journalsBSA.loc[journalsBSA["Rank"].idxmin(), "Title"]
470
How do you change the range of the data shown on the axis?
Change the axis limits - ax.set_ylim()
471
What are the steps of calculating the MSE?
determine the y values based on the predicted model and compare to actual values in table MSE = np.mean( (predicted_y - df["column"])**2)
472
How do you count the number of occurrences in a grouped
phys_groups.size().sort_values(ascending=False)
473
In databases, what are rows and columns referred as?
In the world of databases, rows are referred to as records Columns are referred to as fields
474
What SQL query do you use to only return unique values?
SELECT DISTINCT column1, column2 FROM dataframe
475
What does the distinct key word do?
return the unique combinations of multiple field values
476
What is an SQL view?
A view is a virtual table that is the result of a saved SQL SELECT statement Views are considered virtual tables There is no result set when creating a view Then this table can be queried