Python Flashcards

(639 cards)

1
Q

for loop in Python

A

number of enemies is 8

for i in range(a, b, x):
print(i)

a: start of range (inclusive)
b: end of range (exclusive)
x: the step (i.e., how much to add to i in each iteration of the loop. You can even go backwards.

You can use in-place operators to increase or decrease a variable by any amount.
+=
-=

Notes:
-The body of a for-loop must be indented, otherwise you’ll get a syntax error.
-Every line in the body of the loop must be indented in the same way; we use the “4 spaces” convention. Pressing the <tab> key should automatically insert 4 spaces.
-Whitespace matters in Python.</tab>

example:
for i in range(0, 10):
print(i)

output: 2 variables
0
1
2
3
4
5
6
7
8
9

example: 3 variables
for i in range(3, 0, -1)
print(i)

output:
3
2
1

example: in-place operators:

number_of_enemies = 10
number_of_enemies += 2

number_of_enemies = 10
number_of_enemies -= 2

def sum_of_numbers(start, end):
total = 0
for i in range(start, end):
total += i
return total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

While Loop

A

(…continuing)

A loop that continues while a condition remains True.

Hardcoded example:
While 1:
print (“1 evaluates to True”)

prints:
#1 evaluates to True

While Loop Condition as a Comparison or Variable Example (more common than hardcoded):
num = 0
while num < 3:
num += 1
print(num)

prints:
# 1
# 2
# 3
# loop stops when num >= 3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

continue vs break

A

continue: use to skip to the next iteration in a loop

example: calculate square roots but skip the negative numbers.

numbers = [16, -4, -9, 36, 0, 49]

for number in numbers:
if number < 0:
continue # Skip negatives to avoid complex numbers
print(f”The square root of {number} is {number ** 0.5}.”)

break: exit the loop entirely

Example:
for n in range(42):
print(f”{n} * {n} = {n * n}”)
if n * n > 150:
break

This code would loop from 0 all the way to 41, but it will exit before it gets to 41. Once n*n is greater than 150, the break statement executes, stopping the loop.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Practice Problem: We need a timer to countdown the start of PvP matches in Fantasy Quest.

  1. Write a loop that counts down from 10 to 1. At each iteration, print the number with an ellipsis:
    10…
    9…
    8…
    etc.
  2. However, when i is 1, print “Fight!” additionally:
    1…Fight!
A

def countdown_to_start():
for i in range(10, 0, -1):
if i == 1:
print(f”{i}…Fight!”)
else:
print(f”{i}…”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In loops, Python returns the last value in the running accumulation

A

Example:
def sum_of_numbers(start, end):
total = 0
for i in range(start, end):
total += i
return total

total = 0
total += i
So
0+0=0
0+1=1
1+2=3
3+3=6
6+4=10

python returns 10 (as in 6+4=10). It just returns the last value in the running accumulation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

List

A

Some languages call lists arrays, but in python, we call just call them lists.

Strings, numbers, and boolean values can be stored in lists.

Examples:
inventory= [“Iron Breastplate”, “Healing Potion”, “Leather Scraps”]

flower_types = [
“daffodil”,
“rose”,
“chrysanthemum”

*You can declare the list using multiple lines if wanted. This is advised if there are so many that it’s hard to read all items on the same line of code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Counting in Programming

A

We start counting at 0 as opposed to 1.

Example:
names = [“Bob”, “Lane”, “Alice”, “Breanna”]
Q: Which index refers to the second in a list?
A: 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to access items in a list

A

Example: Access the “Leather Scraps” item in the below list.

def get_leather_scraps():
inventory = [
“Healing Potion”,
“Leather Scraps”,
“Iron Helmet”,
“Bread”,
“Shortsword”,
]

item_index = 1

return inventory[item_index]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Give an example of when Python will return an error code of IndexError

A

If you try to access a list with fewer than 2 items using inventory[1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Python evaluates “and” from left to right. Give an example of when this is something that you need to keep in mind.

A

Example:
Technically both of the following options work, but option 2 is safer. Why?

Option 1: if inventory[1] == “Iron Ore” and len(inventory) >= 2:
Option 2: if len(inventory) >= 2 and inventory[1] == “Iron Ore”:

In Python, and evaluates left to right and short-circuits. Your first version evaluates inventory[1] first. If the list is shorter than 2, that access raises IndexError before Python ever checks len(inventory) >= 2. So it can crash.

The safe version puts the length check first: len(inventory) >= 2 and inventory[1] == “Iron Ore”. If the list is too short, the second part isn’t evaluated at all, preventing the out-of-range access.

So it’s about avoiding an exception by guarding the index access.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Condition Flow

A

The sequence structure of checks.

Example:
Full solution code: if len(inventory) >= 2 and inventory[1] == “Iron Ore”:
Condition flow:
First check length.
Then, only if safe, check the value.
Then, perform the assignment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

appending in python

A

Fill a list with values using a loop.

It’s common to create an empty list and then fill it with values using a loop.

Example:
cards = []
cards.append(“nvidia”)
cards.append(“amd”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When to use if, while, or break

A

You use if when you want to execute code only if a certain condition is met.

A while loop is used when you want to repeat a block of code as long as a certain condition remains true. It’s often used when you don’t know in advance how many times you’ll need to repeat something.

The break statement allows you to immediately exit a loop (either for or while) even if the loop’s normal termination condition hasn’t been met. You use break when you find what you’re looking for or hit a critical condition that means there’s no point in continuing the loop.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

No-Index Syntax

A

output:

If you only need to access the value of an item in a list–and don’t need to update the item, you can do the following.

That is, you can use the no-index syntax when you don’t need to know the index, just the value.

“to not need the index” means you never use the numeric position (0, 1, 2, …) of each element during the loop.

example:

trees = [‘oak’, ‘pine’, ‘maple’]
for tree in trees:
print(tree)
oak
pine
maple

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Practice Question: If you don’t need the index, which method of writing-for-loops is considered more “clean”?

for item in items

for i in range(0, len(items))

A

for item in items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When do you use the no-index syntax vs index syntax?

A

Use the no-index system when you just read/process each value.

Examples:
Q: You need to log each username as-is. Do you need the index?
A: No, use
for users in users

Q: When appending doubled values to a new list (i.e., not mutating the original), which do you use?
a) for n in nums
b) for i in range(len(nums))
A: a

Use the index syntax when you need the numeric position (0, 1, 2, …) of each element during the loop to:
-modify items by index
-skip/replace based on position
-look ahead/behind using i

Examples:
Q: You must prefix each username with its position like “O:alice”. Which of the following loop styles is correct?
a) for users in users
b) for i, user in enumerate(users)
A: b

Q: You need to double every number in a list in place(i.e., mutate the list). Which loop?
a) for n in nums
b) for i in range(len(nums))
A: b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why does indentation matter in python?

A

In Python, indentation defines block scope. A return inside the loop’s block ends the function immediately during the first iteration that reaches it.

Conceptually:

Inside loop block = “do this each iteration”
After loop block = “do this once when the loop is done”

Example:
Inputs: [1, 200, 300, 4, 5]

def find_max(nums):
max_so_far = float(“-inf”)
for num in nums:
if num > max_so_far:
max_so_far = num
return max_so_far

output: 300

BUT

def find_max(nums):
max_so_far = float(“-inf”)
for num in nums:
if num > max_so_far:
max_so_far = num
return max_so_far

output: 1 BECAUSE return max_so_far is inside the loop’s block and ends the function during the first iteration (1) that reaches it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

%

A

modulo operator. % can be used to find the remainder of a division operation.

Divide by 2 using the modulo to determine if a number is even or odd.
If you get zero, it’s even (i.e., no remainder).
If you get a number that is not zero, it’s odd (i.e., it has a remainder).

Example:
8 % 3
output=2

Because 3 goes into 8 two times (equaling 6) and 8-6 is 2. So, there is a remainder of 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Slicing Lists

A

get all items from the start up to (but not including) the index noted in stop

Syntax:
my_list[ start : stop : step ]

start: where to start the slice. Inclusive so the first number is included.
stop: where to stop the slice. Exclusive so the last number is not included.
step: increment. If negative, the list is returned in reverse order.

Syntax to Omit Sections:
my_list[ : stop]
ex//numbers[ : 3] means get all items from start up to but not including index 3

my_list[start: ]
ex// numbers[3: ] means get all items from index 3 to the end.

my_list[ : :step]
ex//
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

numbers[ : : 2]

output: [0, 2, 4, 6, 8]

Other examples:

del strongholds[0]
#Deletes the item at index 0 (the first element).
After this, all remaining items shift left by one index.

del strongholds[-2:]
#Slices from the second-to-last element to the end.
With a positive (default) step, [-2:] means “take the last two elements.”
Deleting that slice removes both final items, regardless of list length.

Delete the slice that would read the list backward from the last item down to (but not including) index 2.
del strongholds[-1 : 2 : -1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

“True” vs True

“False” vs False

A

With “” is a string.

Without “” is a boolean value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Tuples

A

Tuples are collections of data that are ordered and unchangeable. You can think of a tuple as a List with a fixed size. Tuples are created with round brackets:

my_tuple = (“this is a tuple”, 45, True)
print(my_tuple[0])
# this is a tuple
print(my_tuple[1])
# 45
print(my_tuple[2])
# True

While it’s typically considered bad practice to store items of different types in a List, it’s not a problem with Tuples. Because they have a fixed size, it’s easy to keep track of which indexes store which types of data.

Tuples are often used to store very small groups (like 2 or 3 items) of data. For example, you might use a tuple to store a dog’s name and age.

dog = (“Fido”, 4)

There is a special case for creating single-item tuples. You must include a comma so Python knows it’s a tuple and not regular parentheses:

dog = (“Fido”,)

Because Tuples hold their data, multiple tuples can be stored within a list. Similar to storing other data in lists, each tuple within the list is separated by a comma. When accessing a list of tuples, the first index selects which tuple you want to access, the second selects a value within that tuple.

my_tuples = [
(“this is the first tuple in the list”, 45, True),
(“this is the second tuple in the list”, 21, False)
]
print(my_tuples[0][0]) # this is the first tuple in the list
print(my_tuples[0][1]) # 45
print(my_tuples[1][0]) # this is the second tuple in the list
print(my_tuples[1][2]) # False

Tuple Unpacking
You can easily assign the values of a tuple to variables using unpacking.

dog = (“Fido”, 4)
dog_name, dog_age = dog
print(dog_name)
# Fido
print(dog_age)
# 4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Exponent

A

**

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Addition

A

+

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Division

A

/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
check equivalency of two values & receive true or false as a result
==
26
Assign variables
=
27
greater than or equal to
>=
28
Example of a loop
For each number in the following list, print the output for number in [1, 2, 3, 4, 5]: print(number) output: 1 2 3 4 5
29
Practice Problem: Create a function to define anyone 18 or over as an adult and anyone younger as a minor
def is_adult(age): if age >= 18: print('adult') else: print('minor') Example of using the above function: is_adult(14) output: minor
30
installed dependency
A piece of software (i.e., a library, module, or package) that was previously required and is now present on a system, installed to meet the requirements of another program or project.
31
Some of the factors that can contribute to the speed of a program's execution
compile time, runtime, hardware, installed dependencies, and the efficiency of the code
32
Syntax
structure of the code, words, symbols, placement, and punctuation.
33
Variable
A named container which stores values in a reserved location in the computer's memory
34
Static vs Dynamic variables
static variables: a value is maintained throughout the entire run of a program. dynamic variables: values are determined when a program is run Different languages use different types of variables
35
Object-oriented languages
A programming system that is based around objects, which can contain both data and code that manipulates that data. Modeled around data objects. Object: An instance of a class; you can think of it as a fundamental building block of Python. -Focuses on modeling real-word entities as objects, containing data and methods -Encourages concepts such as inheritance, encapsulation, and polymorphism -Utilizes classes, objects, and interfaces to structure code Examples: Java, C++, Python, and Ruby Polymorphism: the ability to exist in many forms. Real World Example: A boy can be a student, a soccer player, and a writer Inheritance: allows classes to inherit common properties from the parent class. Real World Example: There is a class for Vehicle. All vehicles are not the same. They can inherit common properties like color, size, and type from the parent class and create classes like Car, Bus, and Bike. Encapsulation: Binds data and code together in one unit. Real World Example: A Pill: Mix a few ingredients together and store them in one capsule. Abstraction: Display only the important information by hiding the implementation process. Real World Example: In an ATM machine, we can withdraw cash, deposit cash, check our balance, print bills, etc. Even though a number of actions are being performed in the background, we don't see them. We only see the inputs and outputs.
36
Functional languages
Modeled around functions -Relies on pure functions, emphasizing computation without side effects -Promotes immutability and the avoidance of mutable state -Support higher-order functions, recursion, and declarative programming Examples: Haskell, Lisp, Scala, and functional features in languages like JavaScript Real World Example: I want to change this creature from a horse to a giraffe. Lengthen neck Lengthen legs Apply spots Give the creature a black tongue Remove horse tail Note: Functional programming is a form of declarative programming, which describes the logic of computation and the order of execution is completely de-emphasized. Each item can be run in any order to produce the same result.
37
Imperative languages
Modeled around code statements that can alter the state of the program itself -works by changing program state through a sequence of commands -uses control structures like loops and conditional statements for execution flow -emphasizes mutable data and explicit steps for task completion Examples: C, Python Real World Example: I want to park my car. 1. Note the initial state of the garage door. 2. Stop car in driveway. 3. If the garage door is closed, open garage door, remember new state; otherwise continue 4. Pull car into garage 5. Close garage door Note: Imperative programming is procedural. State and order are important.
38
Recursion
Recursion: A technique that allows your computer to break down a task into smaller and smaller pieces until it's easy to solve. A function may call other functions, including itself. A function that calls itself is known as a recursive function. Real World Examples: Problem: You need to sort 100 documents with names into alphabetical order. Solution: You place the documents into piles sorted by the first letter and the sort each pile and then combine the stacks. Problem: You need to collect feedback from everyone in the company. Solution: The top boss tells top executives to collect feedback from everyone in the company. Each executive gathers his/her direct reports and tells them to gather feedback from their direct reports. And on down the line. People with no direct reports — the leaf nodes in the tree — give their feedback. The feedback travels back up the tree with each manager adding his/her own feedback. Eventually, all the feedback makes it back up to the top boss. Software Engineering Use: Recursive algorithms can be used for sorting and searching data structures such as linked lists, binary trees, and graphs. They are also often used for string manipulation tasks (e.g., checking if two strings are anagrams or finding the longest common subsequence between two strings). Example: Technically, recursion is one of the primary reasons we write programs in the first place. It’s the ability to write out a set of steps and do them over and over again until a certain condition has been met. For example, I want to count from 0 to 9 — you know, the hard stuff. x = 0 while x <10: print("I'm on step {}".format(x)) x += 1 Output: I'm on step 0 I'm on step 1 I'm on step 2 I'm on step 3 I'm on step 4 I'm step 5 I'm on step 6 I'm on step 7 I'm on step 8 I'm on step 9 What we’ve done is write a recursive loop that takes an input and modifies it until the stop condition (x = 10) has been reached. The first thing the loop does is evaluate if the stop condition has been met. If not, it runs the code beneath. If so, it moves to the next block of code, if applicable. If you’ve done even basic introductory programming, you’ve probably already dealt with a fair amount of recursion.
39
4 most commonly used programming languages for data analysis
Python, R, Java, and C++
40
Jupyter Notebook
Open-source web app for creating and sharing documents containing live code, mathematical formulas, visuals, and text
41
terminal-based text editor
A terminal-based editor (also known as a command-line editor or console editor) is a text editor that runs within a text-only command-line interface (CLI) or terminal, rather than a graphical user interface (GUI)
42
Cells (Jupyter)
The modular code input/output fields into which Jupyter Notebooks are partitioned.
43
Markdown text
Allows you to write formatted text in a coding environment or plain text editor. You can add comments, annotations, and explanations using markdown.
44
Plain Text Editor
a software program for creating and editing files that contain only unformatted characters
45
computational notebook
a shareable document that combines computer code, plain language descriptions, data, rich visuals like 3-D models, charts, graphs, and figures, and interactive controls. Example: JupyterLab: a feature-rich notebook application and editing environment A notebook, along with an editor like JupyterLab, provides a fast, interactive environment for prototyping and explaining code, exploring and visualizing data, and sharing ideas with others.
46
Object:
A concrete, self-contained entity in memory that holds data and potentially methods that operate on that data. Everything in Python is an object, including numbers, strings, lists, functions, and even classes themselves. Each object has a unique identity (i.e., memory address accessible via id()), a type (accessible via type()), and a value. Objects can be mutable or immutable. mutable: their internal state can be changed after creation. immutable: their internal state cannot be changed after creation.
47
Class
A class is an object's data type that bundles data and functionality together Example: magic = 'HOCUS POCUS' print(type(magic)) output: Because this variable (magic) belongs to the string class, it behaves in a certain way and has a lot of built-in functionality reserved for strings. Python Data Types: Numeric>Integer, Float, Complex Number Dictionary Boolean Set Sequence Type>String, Tuple, List Dictionary: a collection of data values, used to store data values like a map. It holds a key: value pair. each key-value pair is separated by a colon; each key is separated by a comma. Example: d1 = dict({1: 'Geeks', 2: 'For', 3: 'Geeks'} Set: unordered collection of data types that is iterable, mutable, and has no duplicate elements. Sequence Type: an ordered collection of items Note: int (Integer): Represents whole numbers without a fractional part (e.g., 5, -10). They can be of arbitrary size [4]. float (Floating-point): Represents real numbers with a fractional part, typically implemented as 64-bit double-precision floating points (e.g., 3.14, -0.5, 1.2e5). They have a limited precision [4, 5]. complex (Complex Number): Represents numbers with both a real and an imaginary part (e.g., 1 + 2j, -3j, 4.5 + 0.1j). The imaginary part is denoted by the suffix j or J [4, 5]. List: int (Integer): an ordered collection that allows duplicate elements Set: an unordered collection that enforces unique elements
48
.swapcase()
Example: magic = 'HOCUS POCUS' magic = magic.swapcase() magic output: 'hocus pocus'
49
.replace()
Example: magic = 'HOCUS POCUS' magic = magic.replace('cus', 'key') magic output: 'hokey pokey'
50
.split()
Example: magic = 'HOCUS POCUS' magic = magic.split() magic output: 'hokey', 'pokey' Python knows from .split() that it should “split on any run of whitespace”
51
Method
A function that belongs to a class and typically performs an action or operation. Examples of Methods: .swapcase() .replace() .split()
52
Jupyter Notebook, hot key to get a list of methods for any given class
. tab key (hit dot and then the tab key)
53
Dot Notation, Jupyter
How to access the methods and attributes that belong to an instance of class
54
Core Python Classes
Integers, Floats, Strings, Booleans, Lists, Dictionaries, Tuples, Sets, Frozensets, Functions, Ranges, None
55
Attribute
A value associated with an object or class which is referenced by name using dot notation. *An attribute is a characteristic of an object Example: You have a data frame with 8 rows and 3 columns with the column headers: planet, radius_km, and moons. planets.shape output: (8, 3) (because 8 rows and 3 columns) planets.columns output: Index(['Planet', 'radius_km', 'moons'], dtype='object')
56
Value
The value of an object is the specific data or content that the object holds. It is what you get when you interact with an object in operations, comparisons, or output. examples: an integer object may have the value 5 a string object may have the value "hello" a list object may have the value [1, 2, 3]
57
Function vs Method
Function: A function is a standalone block of code that is not associated with any particular class or object. It is defined independently and can be called directly by its name. Functions typically take input arguments, process them, and may return a value. Python Example: def greet(name): return f"Hello, {name}!" message = greet("Alice") print(message) Method: A method is a function that is defined inside a class and is associated with instances (objects) of that class. Methods are called on an object using dot notation (e.g., object.method()) and have access to the object's data through the self parameter (which refers to the instance itself). * A method is an action or operation Python Example; class Dog: def __init__(self, name, breed): self.name = name self.breed = breed def bark(self): return f"{self.name} says Woof!" my_dog = Dog("Buddy", "Golden Retriever") print(my_dog.bark())
58
Keyword
A special word that is reserved for a specific purpose and that can only be used for that purpose. Examples: in not or for while return *never use keywords when naming variables Keywords change color in most coding environments. Rule of thumb: if you're naming a variable and the word changes color, don't use that word. Python won't let you use a keyword for a variable name. You'll get an error.
59
Variable Naming Conventions
Don't use reserved keywords like or, in, if, else, etc. Don't use reserved functions such as print, str, etc.
60
Naming restrictions & conventions for variables
Only include letters, numbers, and underscores Must start with a letter or underscore case-sensitive cannot include parentheses names cannot contain spaces names may be a mixture of upper and lower case characters names can't start with a number but may contain numbers after the first character variable names and function names should be written in snake_case, which means that all letters are lowercase and words are separated using an underscore Descriptive names are better than cryptic abbreviations because they help other programmers (and you) read and interpret your code. For example, student_name is better than sn. It may feel excessive when you write it, but when you return to your code you’ll find it much easier to understand.
61
Operators
Symbols that perform operations on objects and values. Examples: + Addition - Subtraction * Multiplication / Division ** Exponentiation % Modulo (returns the remainder after a division). Example: 10 % 3 = 1 // Floor division (divides the first operand by the second operand and rounds the result down to the nearest integer. Examples: 7 // 3 # 2 (an integer, rounded down from 2.333) -7 // 3 # -3 (an integer, rounded down from -2.333) > Greater than (returns a Boolean of whether the left operand is greater than the right operand) < Less than (returns a Boolean of whether the left operand is less than the right operand) == Equality (returns a Boolean of whether the left operand is equal to the right operand)
62
Expressions
A combination of numbers, symbols, and variables to compute and return a result upon evaluation. Example: [1, 2, 3] + [2, 4, 6]
63
Functions
A group of related statement to perform a task and return a value def to_celsius(x): '''Convert Fahrenheit to Celsius''' return (x-32) * 5/9 to_celsius(75) output: 23.8888
64
Conditional Statements
Sections of code that direct program execution based on specified conditions: number = -4 if number > 0: print('Number is positive.') elif number == 0: print('Number is zero.') else: print('Number is negative.')
65
Strings
A sequence of characters and punctuation that contains textual information. Enclosed in either single or double quotation marks.
66
Immutable Data Type
The values can never be altered or updated
67
Integer
A data type used to represent whole numbers without fractions.
68
Float
A data type that represent numbers that contain decimals
69
Implicit Converstion/Typecasting
Python converts one data type to another without user involvement
70
Implicit Conversion
Users convert the data type of an object to a required type
71
Argument
Argument: Information given to a function in its parentheses
72
Practice: Complete the curse function. It accepts a weapon_damage parameter and returns two values: The lesser_cursed damage: reduce the input weapon_damage from 100% to 50% (50% reduction). The greater_cursed damage: reduce the input weapon_damage from 100% to 25% (75% reduction). def curse(weapon_damage): pass
def curse(weapon_damage): lesser_cursed = weapon_damage * 0.5 greater_cursed = weapon_damage * 0.25 return lesser_cursed, greater_cursed
73
Stack Trace or Traceback
An error message that the Python interpreter prints to the console when it encounters certain problems. They're most common when you try to run invalid code. The purpose of a "trace" is to show us the path that the Python interpreter took through our code before it encountered the error, which can help us figure out what went wrong. Example: PythonError: Traceback (most recent call last): File "", line 17, in File "", line 1, in File "/home/pyodide/main.py", line 3 msg = f"You have {strength} strength, {wisdom} wisdom, and {dexterity} dexterity for a total of {total} stats. ^ IndentationError: unindent does not match any outer indentation level PythonError: Traceback (most recent call last): This is a standard header that's just letting us know that a Python traceback is what we're looking at. File "", line 17, in and File "", line 1, in This is the start of the "trace". These strange "" and "" files don't really exist, the Python interpreter is letting us know about them because they have to do with how your code is executed in a virtual browser-based environment. File "/home/pyodide/main.py", line 3 Now we're getting to the real meat of the error message! The purpose of a "trace" is to show us the path that the Python interpreter took through our code before it encountered the error, which can help us figure out what went wrong. In this case, the interpreter was executing the code in the main.py file, and it got to line 3 before it encountered the error. msg = f"You have {strength} strength, {wisdom} wisdom, and {dexterity} dexterity for a total of {total} stats. This is the line of code that caused the error. IndentationError: unindent does not match any outer indentation level This is the type of error that was raised. In this case, it's an IndentationError, which means that the Python interpreter was expecting a certain amount of indentation (whitespace at the beginning of the line) but it didn't get what it was expecting. Don't be fooled! The proper amount of indentation in Python is 4 spaces (or one stroke). In this case, line 2 is actually indented 6 spaces, which is why the interpreter is confused. Fix line 2. Run the code again. You should see another error, this time the last few lines are something like: msg = f"You have {strength} strength, {wisdom} wisdom, and {dexterity} dexterity for a total of {total} stats. ^ SyntaxError: unterminated string literal (detected at line 3) Now we have a SyntaxError, which is just a more general type of error related to invalid code. Take a close look at line 3 and fix the problem.
74
exponents
** 5**0=1 5**1=5 5**2=25 5**3=50
75
+= -= *= /=
incrementing a number variable by 1. In Python, we use the += in-place operator instead. star_rating = 4 star_rating += 1 # star_rating is now 5 Other Operators The other in-place operators work similarly: star_rating = 4 star_rating -= 1 # star_rating is now 3 star_rating = 4 star_rating *= 2 # star_rating is now 8 star_rating = 4 star_rating /= 2 # star_rating is now 2.0
76
Scientific Notation
a way of expressing numbers that are too large or too small to conveniently write normally. In a nutshell, the number following the e specifies how many places to move the decimal to the right for a positive number, or to the left for a negative number. You can add the letter e or E followed by a positive or negative integer to specify that you're using scientific notation. print(16e3) # Prints 16000.0 print(7.1e-2) # Prints 0.071
77
Underscores for Readability
Python also allows you to represent large numbers in the decimal format using underscores as the delimiter instead of commas to make it easier to read. num = 16_000 print(num) # Prints 16000 num = 16_000_000 print(num) # Prints 16000000
78
Logical and operator vs or operator
The logical and operator requires that both inputs are True to return True. The logical or operator only requires that at least one input is True to return True. For example: True and True == True True and False == False False and False == False True or True == True True or False == True False or False == False
79
not operator
the not operator reverses the result. It returns False if the input was True and vice-versa. print(not True) # Prints: False print(not False) # Prints: True
80
How do you write an integer in Python using binary syntax?
You can write an integer in Python using binary syntax using the 0b prefix: print(0b0001) # Prints 1 print(0b0101) # Prints 5
81
Bitwise Operation
Example: 0101 & 0111 = 0101 A 1 in binary is the same as True, while 0 is False. So really a bitwise operation is just a bunch of logical operations that are completed in tandem by column. Ampersand & is the bitwise AND operator in Python. "AND" is the name of the bitwise operation, while ampersand & is the symbol for that operation. For example, 5 & 7 = 5, while 5 & 2 = 0. 0101 is 5 0010 is 2 0101 & 0010 = 0000
82
int()
Python is able to convert some data types from one to another. Using the int() function on a variable that stores data as a float will convert the data from a floating point number to an integer. adult_tickets_sold = 59.0 type(adult_tickets_sold) output: float int(adult_tickets_sold) type(adult_tickets_sold) output: int
83
Function
A body of reusable code for performing specific processes or tasks
84
def
A keyword that defines a function at the start of the function block
85
return
a reserved keyword in Python that makes a function produce new results, which are saved for later use. Unlike print, return let's us store a value in a variable. Example: def area_triangle(base, height): return base * height / 2
86
Reusability
Defining code once and using it many times without having to rewrite it Example: The below script uses the len function which returns the length of an object. name = "Marisol" number = len(name)*9 print("Hello " = name = ". Your lucky number is " + str(number)) name = "Ardashir" number = len(name)*9 print("Hello " + name + ". Your lucky number is " + str(number)) As there is duplicated code, it's best practice to rewrite it and include all the duplicated code into just one line: def lucky_number(name): number = len(name)*9 print("Hello " + name + ". Your lucky number is " + str(number))
87
*Modularity
The ability to write code in separate components that work together and can be reused for other programs.
88
Refactoring
The process of restructuring code while maintaining its original functionality Example: Code Before Refactoring: #Using the Force formula F=ma while True: mass = int(input("Enter the mass value: ")) if mass > 0: break while True: acceleration = int(input("Enter the acceleration: ")) if acceleration > 0: break print("The Force is", mass * acceleration) Notice how the two blocks look nearly identical. What if we’re calculating the “Work done” by a person or machine? Most likely, we’d copy and paste the while loop, changing the variable name and input prompt. Now let's use a function to refactor this. Code After Refactoring: #REFACTORED code def input_positive_integer(prompt): while True: input_value = int(input(prompt)) if input_value > 0: return input_value mass = input_positive_integer("Enter the mass: ") acceleration = input_positive_integer("Enter the acceleration: ") print("The Force is", mass * acceleration) Now that our code is simpler to understand, we can start to save some lines of code. What's more, in the future, we'll be able to effortlessly insert our function into another script by just calling out the function, saving us time.
89
Self-documenting code
Code written in a way that is readable and makes its purpose clear
90
Algorithm
A set of instructions for solving a problem or accomplishing a task
91
Docstring
A string at the beginning of a function's body that summarizes the function's behavior and explains its arguments and return values The docstring should be in the form of a command. It should summarize the function's behavior and explain its argument and return values. It should be indented four spaces from the definition statement. Example: ''' Calculate the number of kilograms of grass needed for a border around a square fountain. '''
92
print vs return
a return statement is like your brother going to the market and bringing you back a bag of potatoes. A print statement is like your brother going to the market, coming home, and telling you what kind of potatoes were for sale.
93
Functions vs Methods
Functions and methods are very similar, but there are a few key differences. Methods are a specific type of function. They are functions that belong to a class. This means that you can use them—or “call” them—by using dot notation. Method example: my_string = “The eagles filled the sky.” my_string.split() The split method is a function that belongs to the string class. It splits strings on their whitespaces. Standalone functions do not belong to a particular class and can often be used on multiple classes. Function example: >>> sum([6, 3]) 9
94
Logical Operators
Operators that connect multiple statements together and perform more complex comparisons. Python Logical Operators: and or not
95
Comparators
Operators that compare two values and produce Boolean values (True/False). Python Comparators: Greater than > Greater than or equal to >= Less than < Less than or equal to <= Equal to == Not equal to != *If you try to compare data types that aren’t compatible, like checking if a string is greater than an integer, Python will throw a TypeError
96
Arithmetic Operators
Addition + Subtraction - Division ? Modulo (remained of division) % Exponentiation ** Floor division (the number of times the denominator can go into the numerator) //
97
Branching
The ability of a program to alter its execution sequence Branching uses if statements based on certain conditions Example: def is_even(number): if number % 2 == 0: return True return False Summary: An if statement branches the execution based on a specific condition being True and the else statement sets a piece of code to run only when the condition of the if statement is false.
98
if
A reserved keyword that sets up a condition in Python
99
if statements/conditional statements
Example: def hint_username(username): if len(username) < 8: print('Invalid username. Must be at least 8 characters.')
100
else
A reserved keyword that executes when preceding conditions evaluate as False Example 1: def hint_username(username): if len(username) < 8: print('Invalid username. Must be at least 8 characters long.') else: print('Valid username.') Example 2: def hint_username(username): if len(username) < 8: print('Invalid username. Must be at least 8 characters long.') else: if len(username) > 15: print('Invalid username. Cannot exceed 15 characters.') else: print('Valid username.')
101
Modulo
% An operator that returns the remainder when one number is divided by another. Even numbers are all multiples of 2, so if divide by 2, you will always get 0 if the number is 0. Examples: 5 % 2 output: 1 11 % 3 output: 2 10 % 2 output: 0
102
Example of if statement
def is_even(number): if number % 2 == 0: return True return False *You don't have to put an else statement there. You can but you don't have to do that. When the if statement evaluates to true, the code indented beneath it will be executed. If it evaluates to false, nothing beneath it will execute. But keep in mind, this technique can only be used when returning a value inside of the if statement.
103
elif
a reserved keyword that executes subsequent conditions when the previous conditions are not true. elif comparator allows us to handle an unlimited number of comparison cases. it also helps us avoid unnecessary nesting. Example: def hint_username(username): if len(username) < 8: print("Invalid username. Must be at least 8 characters long.") elif len(username) > 15: print('Invalid username. Cannot exceed 15 characters.') else: print('Valid username.") *Notes: -elif is used to specify an alternative condition to check if the first condition is false. -You can have any number of elif statements in your code. -The else statement is used to specify what code to execute if both the if statement and any subsequent elif statements are false. Step 1: Function checks whether the username is less than 8 characters long. If yes, it prints "Invalid username. Must be at least 8 characters long." If that's not the case/the username has at least 8 characters, it goes to step 2. Step 2: the function then checks to see if the username has more than 15 characters. If yes, it prints the invalid message. If no, it goes to step 3. Step 3: The function prints valid username.
104
Omitting else
Often, you'll find that there is no need to use an else statement, because it is superfluous in the logical context of your code. Consider this example: def greater_than_ten(x): if x > 10: return True else: return False print(greater_than_ten(15)) print(greater_than_ten(2)) The above code is the same as the below code: def greater_than_ten(x): if x > 10: return True return False print(greater_than_ten(15)) print(greater_than_ten(2))
105
Key Notes about Conditional Statements in Python
The elif and else statements are optional. You can have an if statement by itself. You can have multiple elif statements. You can only have one else statement, and only at the end of your logic block. The conditions must be an expression that evaluates to a Boolean value (True or False). Indentation matters! The code associated with each conditional statement must be indented below it. The typical convention for data professionals is to indent four spaces. Indentation mistakes are one of the most common causes of unexpected code behavior.
106
while loop
A loop that instructs your computer to continuously execute your code based on the value of a condition. Operates similarly to branching if statements, BUT the body of the block in the while loop can be executed multiple times instead of just once. While loops are useful because they allow you to perform an action or evaluation repeatedly until a given condition or requirement is met, and then they stop. Example: x = 0 while x < 5: print('Not there yet, x = ' + str(x)) x = x + 1 print('x = ' + str(x)) Output: Not there yet, x = 0 Not there yet, x = 1 Not there yet, x = 2 Not there yet, x = 3 Not there yet, x = 4 x = 5 *The last line is x = 5 because the function prints x when x is not < 5 and the first time this will happen is, of course, when x = 5
107
loop
# change the user's input to an integer; if they enter five (string) it won't match the integer 5--must change all inputs to integers to get a match a block of code used to carry out iterations. Loops are used to automate repetitive tasks. Example: import random number = random.randint(1, 25) number_of_guesses = 0 while number of guesses < 5: print('Guess a number between 1 and 25: ') guess = input() guess = int(guess) number_of_guesses += 1 if guess == number: break elif number_of_guesses == 5: break else: print('Nope! Try again.') if guess == number: print('Correct! You guessed the number in ' + str(number_of_guesses) + ' tries!') else: print('You did not guess the number. The number was ' + str(number) + '.') The above code WITH comments: import random package so you can generate a random number; (this is one of the package's many functions). import random use the random package's random.randint() function to generate a random number between 1 and 25, inclusive. number = random.randint(1, 25) #create a counter to store the number of loops that are run, start the loop at 0 number_of_guesses = 0 while number of guesses < 5: print('Guess a number between 1 and 25: ') #Set a line of code so the user can input their guess guess = input() guess = int(guess) #write script so that every time the user guesses, it increases the number of their guesses by 1 so that the user is not allowed to guess more than 5 times. number_of_guesses += 1 *Note: number of guesses = number_of_ guesses + 1 can be written as number_of_guesses += 1 #Checks if the guess is correct. If it is, the while loop breaks. if guess == number: break #If the guess is not right and it's the fifth time they've guessed, the while loop breaks. elif number_of_guesses == 5: break else: print('Nope! Try again.') if guess is right, the print statement is activated if guess == number: print('Correct! You guessed the number in ' + str(number_of_guesses) + ' tries!') # if the guess is incorrect and they've guessed five times, the print statement below is activated. else: print('You did not guess the number. The number was ' + str(number) + '.')
108
Iteration
The repeated execution of a set of statements, where one iteration is the single execution of a block of code.
109
Iterable
An object that's looped, or iterated, over
110
Initializing
The action of giving an initial value to a variable. For example: x = 0
111
random package
to use: import random It has many functions, one of which is it's ability to generate a random number
112
To instantiate a value
Instantiate a variable: assign a specific value to a variable. Example: we instantiate a variable called the number of guesses and assign it a value of 0. It will behave as a counter. number_of_guesses = 0
113
A counter
A counter counts. You can use it as a mechanism to know what iteration of the loop you're on. It keeps count of how many times the program iterates through the loop.
114
What does number_of_guesses += 1 mean?
number of guesses = number_of_ guesses + 1
115
break
A keyword that lets you escape a loop without triggering any else statement that follows it in the loop Example: import random number = random.randint(1, 25) number_of_guesses = 0 while number of guesses < 5: print('Guess a number between 1 and 25: ') guess = input() guess = int(guess) number_of_guesses += 1 if guess == number: break elif number_of_guesses == 5: break else: print('Nope! Try again.') if guess == number: print('Correct! You guessed the number in ' + str(number_of_guesses) + ' tries!') else: print('You did not guess the number. The number was ' + str(number) + '.')
116
Condition
The Boolean expression that is evaluated at the beginning of each iteration of the loop. If the condition is true, the code block executes. After the code block executes, the condition is evaluated again. This process continues until the condition is false. At which point the loop terminates and the program continues with the next statement after the loop
117
Infinite Loops
If you make a mistake with your logic or syntax you loop could become an infinite loop that never terminates. In the below example, if x = x*2 were not indented to be in the body of the while loop, the loop would reach the print statement and cycle back to check the conditional statement, which would still be true because the value of x would never change from one. x = 1 while x < 100: print(x) x = x*2 To stop an infinite loop, interrupt the kernel. Press Stop button at the top of the console Then go to the Menu Bar>Kernel>Interrupt Then while in command mode, press i twice
118
Continue statement
Use the continue statement to skip the next iteration of the loop without executing the rest of the code in the current iteration. Here’s an example: 1234567 i = 0 while i < 10: if i % 3 != 0: print(i) i += 1 continue i += 1 Reset This example is a loop that prints all the numbers from zero through 9 that are not divisible by three. For each iteration of the loop, the program: Checks if i is less than 10. If it is, then the program uses the modulo operator to check if i is evenly divisible by three. If it is not, then the program prints i, increments the value of i by one, and then cycles back to the beginning to check that i is less than 10. This happens because of the continue statement. The final i += 1 does not execute, thus avoiding a double incrementation of i. But if step 2 evaluates i as evenly divisible by three, nothing in the if block executes (so there’s no print statement) and i is incremented by one. Repeats until i becomes 10.
119
Bitwise and operator
and & 0101 is 5 0111 is 7 0101 & 0111 = 0101 A 1 in binary is the same as True, while 0 is False. So really a bitwise operation is just a bunch of logical operations that are completed in tandem by column. 0 & 0 = 0 1 & 1 = 1 1 & 0 = 0 Ampersand & is the bitwise AND operator in Python. "AND" is the name of the bitwise operation, while ampersand & is the symbol for that operation. For example, 5 & 7 = 5, while 5 & 2 = 0. 0101 is 5 0010 is 2 0101 & 0010 = 0000
120
Bitwise or operator
or | 0101 is 5 0111 is 7 0101 | 0111 = 0111 A 1 in binary is the same as True, while 0 is False. So a bitwise operation is just a bunch of logical operations that are completed in tandem. When two binary numbers are "or"ed together, the result has a 1 in any place where either of the input numbers has a 1 in that place. | is the bitwise "or" operator in Python. 5 | 7 = 7 and 5 | 2 = 7 as well! 0101 is 5 0010 is 2 0101 | 0010 = 0111
121
Binary notation How to convert a number from binary (base-2) to decimal (base-10)
Places: 128 64 32 16 8 4 2 1 Example: 10110101 To convert to decimal: Calculate the produce of the places and the binary numbers bitwise 128 64 32 16 8 4 2 1 * * * * * * * * 1 0 1 1 0 1 0 1 = 128 0 32 16 0 4 0 1 Then calculate the product: 128 + 0 + 32 + 16 + 0 + 4 + 0 + 1 = 181 So 10110101 is 181
122
How does return work in an if block
For example, in this code: def show_status(boss_health): if boss_health > 0: print("Ganondorf is alive!") return print("Ganondorf is unalive!") if boss_health is greater than 0, then this will be printed: Ganondorf is alive! Otherwise, this will be printed: Ganondorf is unalive! Without a return in the if block, Ganondorf is unalive would always be printed: def show_status(boss_health): if boss_health > 0: print("Ganondorf is alive!") print("Ganondorf is unalive!") This code could print both messages: Ganondorf is alive! Ganondorf is unalive! When you only want code within an if block to run, use return to exit the function early.
123
If-Else Statements What are the rules for elif and else statements?
An if statement can be followed by zero or more elif (which stands for "else if") statements, which can be followed by zero or one else statements. For example: if score > high_score: print("High score beat!") elif score > second_highest_score: print("You got second place!") elif score > third_highest_score: print("You got third place!") else: print("Better luck next time") Notes: You can't have an elif or an else without an if You can have an else without an elif
124
Example of the continue statement
continue means "go directly to the next iteration of this loop." Whatever else was supposed to happen in the current iteration is skipped. Let's say we want to print all the numbers from 1 to 50, but skip every 7th number. We can use continue to do this, by keeping track of a counter: Remember, `range` is inclusive of the start, but exclusive of the end counter = 0 for number in range(1, 51): counter = counter + 1 if counter == 7: counter = 0 # Reset the counter continue # Skip this number print(number) What we'll see printed are all the numbers from 1 to 50, except for 7, 14, 21, 28, 35, 42, and 49.
125
for loop
A piece of code that iterates over a sequence of values Example: for x in range(5): print(x) output: 0 1 2 3 4
126
range()
A Python function that returns a sequence of numbers starting from zero, increments by 1 by default, and stops before the given number. range function: 1. A range of numbers will start with the value 0 by default 2. The list of numbers generated will be one less than the given value
127
When do you use while loops?
When you want to repeat an action until a Boolean condition changes.
128
for loop vs while loop
For loops are like while loops, but instead of looping continuously until a condition is met, for loops iterate over each element of an iterable sequence, allowing you to perform an action or evaluation with each iteration. For loop use case: Data professionals use for loops to process data. for loop example: for x in range(5): print(x) output: 0 1 2 3 4 while loop example: num = 0 while num < 3: num += 1 print(num) output: 1 2 3 loop stops when num >= 3
129
Nested Loop
A loop inside another loop
130
accumulator
a variable that stores the results of an operation performed on a set of data. Example: new_list = [] acts as an accumlator in the below code: def number_items(items): new_list = [] for i in range(len(items)): new_item = str(i + 1) + ". " + items[i] new_list.append(new_item) return new_list if items == []: return []
131
What's the boolean value of an empty string? What's the boolean value of a non-empty string?
empty string: False non-empty string: True
132
Prime Number
A natural number greater than 1 that has no positive divisors other than 1 and itself. natural number: a positive whole number Examples: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29
133
How to declare lists in python?
Lists in Python are declared using square brackets, with commas separating each item: inventory = ["Iron Breastplate", "Healing Potion", "Leather Scraps"]
134
What can be stored in a list?
Strings, numbers, booleans
135
How do you calculate the length of a list?
# 3 Use the len() function Example: fruits = ["apple", "banana", "pear"] length = len(fruits) # 3 The length of the list is equal to the number of items present. Don't be fooled by the fact that the length is not equal to the index of the last element; in fact, it will always be one greater.
136
.append()
# the cards list is now ['nvidia', 'amd'] We can add values to the end of a list using the .append() method: cards = [] cards.append("nvidia") cards.append("amd") my_list = [0, 1, 1, 2, 3] variable = 5 my_list.append(variable) print(my_list) output: [0, 1, 1, 2, 3, 5]
137
.pop()
# vegetables = ['broccoli', 'cabbage', 'kale'] Pop removes the last element from a list and returns it for use. For example: vegetables = ["broccoli", "cabbage", "kale", "tomato"] last_vegetable = vegetables.pop() # last_vegetable = 'tomato' While .pop() typically removes the last item of a list, it can also be used to remove an item at a specific index. For example, vegetables.pop(1) would remove "cabbage" from the list. This can be useful when you need to remove items from other positions in your list.
138
Can you access a value in a list without citing the index number?
# Prints: Yes. If you don't need the index number you can use the following syntax: trees = ['oak', 'pine', 'maple'] for tree in trees: print(tree) # oak # pine # maple tree, the variable declared using the in keyword, directly accesses the value in the list rather than the index of the value. If we don't need to update the item and only need to access its value then this is a more clean way to write the code. *This method is considered more "clean" if you don't need the index.
139
What do you use a for loop for vs a while loop?
In Python, a for loop is a piece of code that iterates over a sequence of values, such as numbers in a list or characters in a string. A data professional can use a while loop to repeat a specific block of code until a condition is met.
140
Are strings immutable?
Yes. The values can never be altered or updated. But you can concatenate them. You can also multiply them. Example: danger = "Danger! " danger * 3 output: "Danger! Danger! Danger! "
141
How do you include quotation marks in your string?
To include double quotation marks: quote = ' "Thank you for pressing the self-destruct button." ' print(quote) output: "Thank you for pressing the self-destruct button." For single quotation marks, do the same vice versa.
142
Escape Character
A character that changes the typical behavior of the characters that follow it. Example: The typical behavior of quotation marks is to start of end a string, but if you proceed each with back slashes, they'll behave as regular punctuation marks. quote = "\"It's dangerous to go alone!\"" print(quote) output: "It's dangerous to go alone!"
143
\n
indicates a new line. Example: greeting = "Good day, \nsir." print(greeting) output: Good day, sir. Example 2: newline = "\\n represents a new line in Python." print(newline) output: \n represents a newline in Python.
144
An object is iterable if you can _____________
sequence through all of its values or items.
145
Indexing
A way to refer to individual items within an iterable by their relative position. Indexing can be used on anything iterable, e.g.,: strings lists tuples most other iterable data types
146
index()
A string method that outputs the index number of a character in a string. Note: the index method just returns the first index that matches. So, example: pets = "cats and dogs" pets.index("s") output: 3 *If you search for a value that is not there, e.g., z in the above example, you'll get a ValueError because the value was not found. And if you search for an index number that is not in the word, you'll get an IndexError, indicating that the string index is out of range: Example: name[5] output: e name[6] output: IndexError
147
How do negative numbers in indexing work?
Examples: sentence = "A man, a plan, a canal, Panama!" sentence[-1] output: "!" sentence[-2] output: "a"
148
String Slice
A portion of a string, also known as a substring, that can contain more than one character. Examples: color = "orange" color[1:4} output: "ran" fruit = "pineapple" fruit[ :4] output: "pine" fruit[4: ] output: "apple"
149
In
A keyword that can be used to check whether or not a substring is contained in a string. Example:
150
Which languages use one-based indexing?
R, Julia, and SAS (Meaning the index starts with the value 1.)
151
Is Python a zero-based indexing language?
Yup.
152
How does indexing with lists work?
Examples: my_list = [1, "unladen", "swallow"] print(my_list[1]) output: unladen print(my_list[-1]) output: swallow vs. indexing strings: my_string = "Mississippi half-step" print(my_string[0]) output: M new_string = "pining for the fjords" print(new_string[0:3]) print(newstring[ :3}) output: pin pin new_string = "pining for the fjords" print(new_string[6:21] print(new_string[6: ]) print(len(new_string)) output: for the fjords for the fjords 21
153
Slicing
Slicing refers to accessing a range of elements from a sequence. Use square brackets containing two indices separated by a colon.
154
Indexing and slicing allow you to _________
access specific elements or parts of a sequence
155
format()
The format ()) method formats and inserts specific substrings into designated places. This method belongs to the string class. Examples: name = "Manuel" number = 3 print('Hello {}, your lucky number is {}." .format(name, number)) output: Hello Manuel, your lucky number is 3. name = "Manuel" number = 3 print('Hello {name}, your lucky number is {num}." .format(num=number, name=name) output: Hello Manuel, your lucky number is 3.
156
What does the .2 and f mean in the below script? print("Base price: ${ : .2f} USD. \nWith tax: ${ : .2f} USD." .format(price, with_tax)) output: Base price: $7.75 USD. With tax: $8.29 USD
f: float .2: 2 places beyond the decimal Note: If you put 0, only a whole number will print
157
What does the below function do precisely? def to_celsius(x): return (X-32) * 5/9 for x in range(0, 101, 10): print("{ :>3} F | { : >6.2f} C" .format(x, to_celsius(x)))
>3: aligns the value 3 spaces to the right >6 aligns the Celsius temperatures 6 spaces to the right. output: 0 F | -17.78 C 10 F | -12.22 C 20 F | -6.67 C
158
What's the conventional maximum length for a single line of Python code?
79 characters Note: Enclosing your string in triple quotes lets you break the string over multiple lines so it's more readable. (Most of us don't have super wide monitors.) Example: x = "values" y = 100 print('''String formatting lets you insert {} into strings. They can even be numbers like {}.'''.format(x, y)) output: String formatting lets you insert values into strings. They can even be numbers like 100 Note: You can use """ or '''
159
Can you include arguments' index numbers within the braces to indicate which arguments get inserted in specific spots?
Yes. var_a = "A" var_b = "B" print("{1}, {0}".format(var_a, var_b)) print("{0}, {1}".format(var_a, var_b)) output: B, A A, B
160
Float Formatting Options
Syntax: {float: .2f} float variable: float colon: : precision: .2 presentation type: f Example: num = 1000.987123 f"{num:.2f}" output: 1000.99 num variable is rounded to two places beyond the decimal.
161
e f %
'e': Scientific notation. For a given precision p, formats the number in scientific notation with the letter ‘e’ separating the coefficient from the exponent. The coefficient has one digit before and p digits after the decimal point, for a total of p + 1 significant digits. With no precision given, e uses a precision of 6 digits after the decimal point for float, and shows all coefficient digits for decimal. 'f': Fixed-point notation. For a given precision p, formats the number as a decimal number with exactly p digits following the decimal point. '%': Percentage. Multiplies the number by 100 and displays in fixed ('f') format, followed by a percent sign. Example: num = 1000.987123 print(f'{num:.3e}') decimal = 0.2497856 print(f'{decimal:.4%}') output: 1.001e+03 24.9786%
162
str.count(sub[, start[, end]])
String method. Return the number of non-overlapping occurrences of substring sub in the range [start , end]. Example: my_string = 'Happy birthday' print(my_string.count('y')) print(my_string.count('y', 2, 7)) output: 2 1
163
str.find(sub)
Return the lowest index in the string where substring sub is found. Return -1 if sub is not found. my_string = 'Happy birthday' my_string.find('birth') output: 6
164
str.join()
Return a string which is the concatenation of the strings in iterable. The separator between elements is the string providing this method. Example: separator_string = ' ' iterable_of_strings = ['Happy', 'birthday', 'to', 'you'] separator_string.join(iterable_of_strings) Output: Happy birthday to you
165
str.partition(sep)
Split the string at the first occurrence of sep , and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing the string itself, followed by two empty strings. example: my_string = 'https://www.google.com/' my_string.partition('.') Output: ('https://www', '.', 'google.com/')
166
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced. my_string = 'https://www.google.com/' my_string.replace('google', 'youtube') output: https://www.youtube.com/
167
str.split([sep])
Return a list of the words in the string, using sep (optional) as the delimiter string. If no sep is given, whitespace characters are used as the delimiter. Any number of consecutive whitespaces would indicate a split point, so ' ' (a single whitespace) would split the same way as ' ' (two or more whitespaces). my_string = 'Do you know the muffin man?' my_string.split() Output: ['Do', 'you', 'know', 'the', 'muffin', 'man?'] Example 1: url = 'https://exampleURL1.com/r626c36' protocol = "https:" store_id = url.split(".com/")[-1] return store_id output: r626c36
168
Regular Expressions (Regex)
Techniques that advanced data professionals use to modify and process string data. This program will not require you to use regular expressions in your work, but it’s important for you to be aware of the concept. Regex works by matching patterns in Python. It allows you to search for specific patterns of text within a string of text. Regex is used extensively in web scraping, text processing and cleaning, and data analysis. Syntax for a regex: import re pattern = 'regex_pattern' match = re.search(pattern, string) Example: import re my_string = 'Three sad tigers swallowed wheat in a wheat field' re.search('wall', my_string) Output: <_sre.SRE_Match object; span=(18, 22), match='wall'> This example returns a match object that contains information about the search. In this case, it tells you that the substring ‘wall’ does occur in the string from indices 18–22. Example 2: import re my_string = 'Three sad tigers swallowed wheat in a wheat field' re.search('[bms]ad', my_string) output: <_sre.SRE_Match object; span=(6, 9), match='sad'>
169
What is a quick way to see if a number is even?
X % 2 == 0 If true, it's even. If false, it's not. % is the modulo operator. It calculates the remainder after division. Example: 7 % 3 = 1 3 goes into 7 twice and 1 is left over.
170
How do you slice lists?
Syntax: my_list[ start : stop : step ] For example: scores = [50, 70, 30, 20, 90, 10, 50] # Display list print(scores[1:5:2]) # Prints [70, 20] The above reads as "give me a slice of the scores list from index 1, up to but not including 5, skipping every 2nd value". All of the sections are optional.
171
How do you check whether or not a value exists in a list?
Use the in keyword to check for presence or not in to check for absence. examples: fruits = ["apple", "orange", "banana"] print("banana" in fruits) # Prints: True fruits = ["apple", "orange", "banana"] print("banana" not in fruits) # Prints: False
172
del
Python has a built-in keyword del that deletes items from objects. In the case of a list, you can delete specific indexes or entire slices. nums = [1, 2, 3, 4, 5, 6, 7, 8, 9] delete the fourth item del nums[3] print(nums) # Output: [1, 2, 3, 5, 6, 7, 8, 9] delete the second item up to (but not including) the fourth item nums = [1, 2, 3, 4, 5, 6, 7, 8, 9] del nums[1:3] print(nums) # Output: [1, 4, 5, 6, 7, 8, 9] delete all elements nums = [1, 2, 3, 4, 5, 6, 7, 8, 9] del nums[:] print(nums) # Output: []
173
Tuples
Tuples are collections of data that are ordered and unchangeable. You can think of a tuple as a List with a fixed size. Tuples are created with round brackets: my_tuple = ("this is a tuple", 45, True) print(my_tuple[0]) # this is a tuple print(my_tuple[1]) # 45 print(my_tuple[2]) # True While it's typically considered bad practice to store items of different types in a List, it's not a problem with Tuples. Because they have a fixed size, it's easy to keep track of which indexes store which types of data. Tuples are often used to store very small groups (like 2 or 3 items) of data. For example, you might use a tuple to store a dog's name and age. dog = ("Fido", 4) There is a special case for creating single-item tuples. You must include a comma so Python knows it's a tuple and not regular parentheses: dog = ("Fido",) Because Tuples hold their data, multiple tuples can be stored within a list. Similar to storing other data in lists, each tuple within the list is separated by a comma. When accessing a list of tuples, the first index selects which tuple you want to access, the second selects a value within that tuple. my_tuples = [ ("this is the first tuple in the list", 45, True), ("this is the second tuple in the list", 21, False) ] print(my_tuples[0][0]) # this is the first tuple in the list print(my_tuples[0][1]) # 45 print(my_tuples[1][0]) # this is the second tuple in the list print(my_tuples[1][2]) # False Tuple Unpacking You can easily assign the values of a tuple to variables using unpacking. dog = ("Fido", 4) dog_name, dog_age = dog print(dog_name) # Fido print(dog_age) # 4 When you return multiple values from a function, you're actually returning a tuple.
174
How to tell if a list has no elements?
len(list_name) == 0 or not list_name Example: You can use either len(items) == 0 or not items in the below example. def get_first_item(items): if len(items) == 0 return "ERROR" else: return items[0]
175
Data Structures
Collections of data values or objects that contain different data types. They can contain data type elements like floats or strings.
176
NumPy
Numerical Python A library commonly used by data professionals, which is known for its high-performance computational power. It's used to rapidly process large quantities of data.
177
pandas
Python Data Analysis Library pandas makes analyzing data in the form of a table with rows and columns easier and more efficient, because it has tools specifically designed for the job.
178
List
A data structure that helps store and manipulate an ordered collection of items
179
Sequence
A positionally ordered collection of items
180
Lists vs Strings
Lists: -Data structure -Allow duplicate elements -Allow indexing and slicing -Sequence of elements (a float is a data type element) -Mutable *Lists are mutable because you can easily edit them without needing to overwrite the list (like you must do with strings to edit them). Example: power = [1.21, 'gigawatts'] power[0] = 2.21 print(power) output: [2.21, 'gigawatts'] Strings: -Data type -Allow duplicate elements -Allow indexing and slicing -Sequence of characters -Immutable *The only way to make a string mutable is to override an existing variable with a new one. This is why lists are considered immutable. Example: power = '1.21' power = power + ' gigawatts' print(power) output: 1.21 gigawatts
181
Mutability
the ability to change the internal state of a data structure after it has been created. Strings and tuples are immutable, while lists are mutable. (Strings have to be reassigned to be altered.)
182
Immutability
A data structure or element's values can never be altered or updated
183
append()
Method that adds an element to the end of a list Example: fruits = ['pineapple', 'banana', 'apple', 'melon'] fruits.append('kiwi') print(fruits) output: ['pineapple', 'banana', 'apple', 'melon', 'kiwi']
184
insert()
Function that takes an index as the first parameter and an element as the second parameter then inserts the element into a list at a given index. Example: fruits = ['pineapple', 'banana', 'apple', 'melon'] fruits.insert(1, 'orange') print(fruits) output: ['pineapple', 'orange', 'banana', 'apple', 'melon', 'kiwi']
185
remove()
A method that removes an element from a list. Example: fruits = ['pineapple', 'banana', 'apple', 'melon'] fruits.remove('banana') print(fruits) output: ['pineapple', 'orange', 'apple', 'melon', 'kiwi'] Note: If you try to remove an element that is not in a list, you'll get a value error. ValueError: list.remove(x): x not in list
186
pop()
A method that extracts an element from a list by removing it at a given index. Example: Remove orange fruits = ['mango', 'pineapple', 'orange', 'apple', 'melon', 'kiwi'] fruits.pop(2) print(fruits) output= ['mango', 'pineapple', 'apple', 'melon', 'kiwi']
187
How to add to a list with an index number?
Example: fruits = ['mango', 'pineapple', 'orange', 'apple', 'melon', 'kiwi'] fruits[1] = 'mango' print(fruits) output: ['mango', 'mango', 'orange', 'apple', 'melon', 'kiwi']
188
List Operations
Lists can be combined using the addition operator(+) and repeated using the multiplication operator(*) but cannot be subtracted or divided. Examples: Addition Operator (+) num_list = [1, 2, 3] char_list = ['a', 'b', 'c'] num_list + char_list output: [1, 2, 3, 'a', 'b', 'c'] Multiplication Operator(*) list_a = ['a', 'b', 'c'] list_a * 2 output: ['a', 'b', 'c', 'a', 'b', 'c']
189
Indexing and Slicing
Indexing: allows access to individual elements in a list. Example 1: phrase = ['Astra', 'inclinant', 'sed', 'non', 'obligant'] print(phrase[1]) output: inclinant Example 2: phrase = ['Astra', 'inclinant', 'sed', 'non', 'obligant'] print(phrase[-1]) output: obligant Slicing: enables the extraction of a sublist using a range of indices. Example 1: phrase = ['Astra', 'inclinant', 'sed', 'non', 'obligant'] print(phrase[1:4]) output: ['inclinant', 'sed', 'non'] Example 2: phrase = ['Astra', 'inclinant', 'sed', 'non', 'obligant'] print(phrase[:3]) print(phrase[3:]) output: ['Astra', 'inclinant', 'sed'] ['non', 'obligant']
190
List
A data structure in Python that stores an ordered collection of items, which can be of any data type, including integers, floats, strings, and even other lists.
191
List Methods
Python lists come with built-in methods such as append(), insert(), remove(), and pop() that facilitate various operations on lists
192
Mutability
Lists are mutable, meaning their contents can be changed after creation, allowing for dynamic data manipulation. Examples: You can change an individual item in a list by specifying its index and assigning a new value to it. my_list = ['Macduff', 'Malcolm', 'Duncan', 'Banquo'] my_list[2] = 'Macbeth' print(my_list) output: ['Macduff', 'Malcolm', 'Macbeth', 'Banquo'] You can change a slice of a list, too. my_list = ['Macduff', 'Malcolm', 'Macbeth', 'Banquo'] my_list[1:3] = [1, 2, 3, 4] print(my_list) output: ['Macduff', 1, 2, 3, 4, 'Banquo']
193
How do you create a list?
There are 2 main ways: 1) square brackets [] and 2) the list function: list() Examples: List of Strings: list_a = ['olive', 'palm', 'coconut'] print(list_a) output: ['olive', 'palm', 'coconut'] List of Integers: list_b = [8, 6, 7, 5, 3, 0, 8] print(list_b) output: [8, 6, 7, 5, 3, 0, 8] List of Mixed Data Types: list_c = ['Abidjan', 14.2, [1, 2, None], 'Zagreb'] print(list_c) output: ['Abidjan', 14.2, [1, 2, None], 'Zagreb'] *To create an empty list, use empty brackets or the list() function: empty_list_1 = [] empty_list_2 = list()
194
How do you check whether a value is contained in a list?
Use the in operator: Example: num_list = [2, 4, 6] print(5 in num_list) print(5 not in num_list) output: False True
195
clear()
Remove all items. Example: my_list = ['a', 'b', 'c'] my_list.clear() print(my_list) output: []
196
count()
Return the number of times an item occurs in the list. Example: my_list = ['a', 'b', 'c', 'a'] my_list.count('a') output: 2
197
sort()
Sort the list ascending by default. (You can make a function to decide the sorting criteria.) Example: char_list = ['b', 'c', 'a'] num_list = [2, 3, 1] char_list.sort() num_list.sort(reverse=True) print(char_list) print(num_list) output: char_list = ['b', 'c', 'a'] num_list = [2, 3, 1] char_list.sort() num_list.sort(reverse=True) print(char_list) print(num_list)
198
Tuple
Pronounced: Too-pull An immutable sequence that can contain elements of any data type. Similar to lists but more secure, because they can't be changed easily. They are helpful, because they keep data that needs to be processed together in the same structure. They let you store data of different types inside other data structures. Use Cases: -When you need to find some information but keep the data intact, you can use a tuple. -When you need to store data of different types inside other data structures. -Use of tuples will make it clear to your team that the sequence of variables (in the tuple) are not intended to be modified. Example: A tuple that represents someone's full name. The first element of the tuple is their first name, the second their middle initial, and the last their last name. fullname = ('Masha', 'Z', 'Hopper') *Note: The position of the element is fixed in tuples, so you can't add or change elements. Doing so will throw an error. Example: fullname = ('Masha', 'Z', 'Hopper') fullname[2] = 'Copper' print(fullname) Output: TypeError: 'tuple' object does not support item assignment *Note: You CAN add a value to the end but only if you reassign the tuple. Example: fullname = fullname + ('Jr',) print(fullname) output: ('Masha', 'Z', 'Hopper', 'Jr') *Note: Even though tuples are immutable, they can be separated into different values. This process is known as unpacking a tuple. Example: We can assign the output of the to_dollars_cents function we can assign the output to distinct variables. def to_dollars_cents(price): ''' Split price (float) into dollars and cents. ''' dollars = int(price // 1) cents = round(price % 1 * 100) return dollars, cents ''' unpacking the tuple ''' dollars, cents = to_dollars_cents(6.55) print(dollars + 1) print(cents + 1) output: 7 56
199
tuple():
This function transforms input into tuples. Example: Convert a list to a tuple. (In the below example, the name is initially a list.) fullname = ['Masha', 'Z', 'Hopper'] fullname = tuple(fullname) print(fullname) output: ('Masha', 'Z', 'Hopper') *It's no longer a list so the brackets have been removed. Example: create a tuple mytuple = ("apple", "banana", "cherry") Example: create an empty tuple empty_tuple = () *Note: When a function returns more than one value, it's returning a tuple. Example: def to_dollars_cents(price): ''' Split price (float) into dollars and cents. ''' dollars = int(price // 1) cents = round(price % 1 * 100) return dollars, cents to_dollars_cents(6.55) output: (6, 55) *The return value is a tuple.
200
Example of a Tuple
In the below example, we use lists so that the order of the players can be changed. BUT the players themselves (i.e., player name, age, position) are tuples so that they are more secure/can't be accidentally changed. [('Marta', 20, 'center'), ('Ana', 22, 'point guard'), ('Gabi', 22, 'shooting guard'), ('Luz', 21, 'power forward'), ]
201
Can you extract information from lists in tuples?
Because lists in tuples are iterable (i.e., you can loop over a list/retrieve items from it one by one), we can extract information from them using a loop. Example: Write a for loop that extracts each tuple into 3 separate variables and then print one of the variables. for name, age, position in team: print(name) output: Marta Ana Gabi Luz Lorena
202
Instantiation
Method of creating data structures. Strings can be created with quotes, lists with brackets, and tuples with parentheses. Examples: Single, double, or triple quotes to create strings: empty_str = ' ' my_string1 = 'minerals' my_string2 = "martin" *Note: Using triple quotes to write a string over multiple lines will insert new lines (\n) my_string3 = """ marathon golfcart """ Output: marathon golfcart
203
Lists
Mutable sequences that can contain mixed data types, created using brackets, and are versatile for storing collections of items and performing operations like sorting and searching. Examples: empty_list = [] my_list = [1, 2, 3, 4,5] The list() function can be used for instantiation and conversion. Note that this function only works on iterable data types. print(list('rocks')) print(list(('stones', 'water', 'underground'))) ['r', 'o', 'c', 'k', 's'] ['stones', 'water', 'underground'] Lists can contain any data type in any combination. So, a single list can contain strings, integers, floats, tuples, dictionaries, and other lists. Example: my_list = [1, 2, 1, 2, 'And through', ['and', 'through']] output: [1, 2, 1, 2, 'And through', ['and', 'through']] Use Cases: -Storing collections of related items -Storing collections of items that you want to iterate over: Because lists are ordered, you can easily iterate over their elements using a for loop or list comprehension. -Sorting and searching: Lists can be sorted and searched, making them useful for situation in which you know you'll need to modify your data. -Storing results: Lists can be used to store the results of a computation or a series of operations, making them useful in many different programming tasks.
204
Tuples
Immutable sequences that can contain mixed data types, created using parentheses, and are often used for returning multiple values from functions and ensuring data integrity. Syntax for instantiation: -parentheses with each element separated by a comma empty_tuple = () my_tuple = (1, 'z') *Note: When using parentheses to declare a tuple with just a single element, you must use a trailing comma. Example: test1 = (1) test2 = (2,) print(type(test1)) print(type(test2)) output: The tuple() function can be used for instantiation and for the conversion of iterable data types. Example: empty_tuple = tuple() my_tuple = ([1, 'z']) Tuples can contain any data type and in any combination. A single tuple can contain strings, integers, floats, lists, dictionaries, and other tuples Example: my_tuple = (1871, 'all', 'mimsy', ('were', 'the'), [borogroves]) Tuples are immutable. Once a tuple is created, it cannot be modified. Because tuples are built for data security, Python has only two methods that can be used on them: -count() returns the number of times a specified value occurs in the tuple. -index() searches the tuple for a specified value and returns the index of the first occurrence of the value. Usage: -return multiple values from a function. -Packing and unpacking sequences: You can use tuples to assign values in a single line of code. -Dictionary keys: Because tuples are immutable, they can be used as dictionary keys, whereas lists cannot. -Data integrity: Due to their immutability, tuples are a more secure way of storing data because they safeguard against accidental changes.
205
Strings
Immutable sequences of characters--created using single, double, or triple quotes--and primarily used to represent text data. Strings can contain any character--letters, numbers, punctuation marks, spaces--but everything between the opening and closing quotation marks is part of the same single string. Strings are immutable. This means that once a string is created, it cannot be modified. Any operation that appears to modify a string actually creates a new string object. Strings are most commonly used to represent text data.
206
Iterable
Iterable (i.e., you can loop over a list/retrieve items from it one by one)
207
What methods can be used on tuples?
Because tuples are built for data security, Python has only two methods that can be used on them: -count() returns the number of times a specified value occurs in the tuple. -index() searches the tuple for a specified value and returns the index of the first occurrence of the value.
208
Examples of tuples
def player_position(players): result = [] for name, age, position in players: result.append('Name: {:>19} \nPosition: {:>15}\n'.format(name, position)) return result for player in player_position(team): print(player) *Notes: '{:>19}'.format(name) means put name in a space 19 characters wide and right-align it. If the name is shorter than 19 characters, Python will add spaces on the left so that it lines up nicely. If the name is longer than 19 characters, the field will expand to fit the name. {:>15}\n'.format(name, position) means that positions will be right-aligned in a width of 15. *Both of these sections of script are just for pretty printed columns.
209
List Comprehension
Formulaic creation of a new list based on the values in an existing list. A list comprehension formulaically creates a new list based on the values in an existing list. A list comprehension functions like a for loop, but is a more efficient and elegant way to create a new list from an existing list.
210
# a set of dominoes. Please explain the below code that utilized nested loops and create combinations. Nested loops can produce the different combinations of pips (dots) in for left in range(7): for right in range(left, 7): print(f"[{left}|{right}]", end=" ") print('\n')
A standard set of dominoes contains pairs of numbers from 0 to 6, but without duplicates: [0|0] [0|1] [0|2] … [6|6] You should not print both [2|5] and [5|2] — only one of them. This code prints exactly that set of unique domino tiles. 1. Outer loop: for left in range(7): range(7) produces: 0, 1, 2, 3, 4, 5, 6 So left takes these values one at a time. This loop controls the left side of the domino tile. 2. Inner loop: for right in range(left, 7): The key is that right starts at left, not at 0. Meaning: When left = 0 → right goes 0 → 6 When left = 1 → right goes 1 → 6 When left = 2 → right goes 2 → 6 … When left = 6 → right goes 6 → 6 This prevents duplicates like [3|1] when [1|3] was already printed. 3. The print statement print(f"[{left}|{right}]", end=" ") end=" " means: don’t start a new line after printing. instead, add a space after each tile. So you get a row of tiles horizontally. 4. After each row, print a blank line print('\n') This just creates spacing between rows. Putting it all together (step-by-step) When left = 0: right goes: 0,1,2,3,4,5,6 Prints: [0|0] [0|1] [0|2] [0|3] [0|4] [0|5] [0|6] When left = 1: right goes: 1,2,3,4,5,6 Prints: [1|1] [1|2] [1|3] [1|4] [1|5] [1|6] When left = 2: right goes: 2,3,4,5,6 Prints: [2|2] [2|3] [2|4] [2|5] [2|6] …and so on until: When left = 6: right goes: 6 [6|6] Full output: [0|0] [0|1] [0|2] [0|3] [0|4] [0|5] [0|6] [1|1] [1|2] [1|3] [1|4] [1|5] [1|6] [2|2] [2|3] [2|4] [2|5] [2|6] [3|3] [3|4] [3|5] [3|6] [4|4] [4|5] [4|6] [5|5] [5|6] [6|6] Why this works Because right starts from left, you get only unique combinations: You include [2|5] You skip [5|2] because it would be a duplicate *This is a common pattern in Python for generating combinations without repetition.
211
enumerate()
The enumerate() function allows iteration over a sequence while keeping track of the index of each element, returning pairs of indices and elements.
212
zip()
The zip() function combines elements from multiple sequences into tuples, returning an iterator that produces these tuples. A built-in function that does what the name implies: it performs an element-wise combination of sequences. The function returns an iterator that produces tuples containing elements from each of the input sequences. Notes: Iterator: an object that enables processing of a collection of items one at a time without needing to assemble the entire collection at once. Use an iterator with loops or other iterable functions such as list() or tuple(). Example: cities = ['Paris', 'Lagos', 'Mumbai'] countries = ['France', 'Nigeria', 'India'] places = zip(cities, countries) print(places) print(list(places)) output: #Python prints the type, not the contents. This just means: “You have a zip object stored at this memory location.” #list(places) forces iteration over the zip object. This pulls items out of the zip iterator pairs up city + country puts them in a new list and prints: [('Paris', 'France'), ('Lagos', 'Nigeria'), ('Mumbai', 'India')] Notes: It works with two or more iterable objects. The given example zips two sequences, but the zip() function will accept more sequences and apply the same logic. If the input objects are of unequal length, the resulting iterator will be the same length as the shortest input. If you give it only one iterable object as an argument, the function will return an iterator that produces tuples containing only one element from that iterable at a time.
213
List Comprehension
List comprehension provides a concise way to create lists by applying an expression to each element in an iterable, optionally filtering elements. A list comprehension formulaically creates a new list based on the values in an existing list. A list comprehension functions like a for loop, but is a more efficient and elegant way to create a new list from an existing list.
214
What are iterable objects?
In Python, an iterable object (or simply an iterable) is a collection of elements that you can loop (or iterate) through one element at a time. Examples: strings, lists, and tuples An iterable is ordered if you can retrieve its elements in a predictable order An iterable is mutable if you can change which elements it contains
215
* operator is used to do what?
You can unzip an object with the * operator. Example: scientists = [('Nikola', 'Tesla'), ('Charles', 'Darwin'), ('Marie', 'Curie')] given_names, surnames = zip(*scientists) print(given_names) print(surnames) output: ('Nikola', 'Charles', 'Marie') ('Tesla', 'Darwin', 'Curie') Note that this operation unpacks the tuples in the original list element-wise into two tuples, thus separating the data into different variables that can be manipulated further.
216
enumerate()
A built-in Python function that allows you to iterate over a sequence while keeping track of each element’s index. Similar to zip(), it returns an iterator that produces pairs of indices and elements. Example: letters = ['a', 'b', 'c'] for index, letter in enumerate(letters): print(index, letter) output: 0 a 1 b 2 c Note that the default starting index is zero, but you can assign it to whatever you want when you call the enumerate() function. For example: letters = ['a', 'b', 'c'] for index, letter in enumerate(letters, 2): print(index, letter) output: 2 a 3 b 4 c
217
List Comprehension
A list comprehension formulaically creates a new list based on the values in an existing list. A list comprehension functions like a for loop, but is a more efficient and elegant way to create a new list from an existing list. A concise and efficient way to create a new list based on the values in an existing iterable object. Syntax: my_list = [expression for element in iterable if condition] -expression: an operation or what you want to do with each element in the iterable sequence. -element: the variable name that you assign to represent each item in an iterable sequence. -iterable: the iterable sequence condition: any expression that evaluates to True or False. This element is optional and is used to filter elements of the iterable sequence. Example 1: Add 10 to each number in the list: numbers = [1, 2, 3, 4, 5] new_list = [x + 10 for x in numbers] print(new_list) output: [11, 12, 13, 14, 15] Example 2: Extract the first and last letter of each word as a tuple, but only if the word is more than five letters long. words = ['Emotan', 'Amina', 'Ibeno', 'Sankwala'] new_list = [(word[0], word[-1]) for word in words if len(word) > 5] print(new_list) output: [('E', 'n'), ('S', 'a')] *Note that multiple operations can be performed in the expression component of the list comprehension to result in a list of tuples Example 3: state_names = ["Arizona", "California", "California", "Kentucky", "Louisiana"] county_names = ["Maricopa", "Alameda", "Sacramento", "Jefferson", "East Baton Rouge"] state_county_tuples = zip(state_names, county_names) state_county_lists = [list(i) for i in state_county_tuples] print(state_county_lists) Think of the last line of the above script as: “For each item in collection, compute new_thing, and put all those results into a list.” Note: If you use [] in a list comprehension, you're telling python to create a list. Example: Create a list of monster names: monster_names = [encounter["monster"] for encounter in encounters] *Exaplanation: When Python sees square brackets [ ] around a for expression, it knows you're building a list. The comprehension automatically creates an empty list and appends each result of the expression to it. Think of the square brackets as saying "collect all of these into a list."
218
Loop vs List Comprehension Practice Problem: state_names = ["Arizona", "California", "California", "Kentucky", "Louisiana"] county_names = ["Maricopa", "Alameda", "Sacramento", "Jefferson", "East Baton Rouge"] state_county_tuples = zip(state_names, county_names) 1. Write a loop that unpacks each tuple in state_county_tuples and, if the state in the tuple is California, add the corresponding county to a list called ca_counties. 2. Now, use a list comprehension to accomplish the same thing as what you just did. -In a list comprehension, unpack each tuple in state_county_tuples and, if the state in the tuple is California, add the corresponding county to the list comprehension. -Assign the result to a variable called ca_counties. -Print ca_counties.
state_names = ["Arizona", "California", "California", "Kentucky", "Louisiana"] county_names = ["Maricopa", "Alameda", "Sacramento", "Jefferson", "East Baton Rouge"] state_county_tuples = zip(state_names, county_names) Task: Unpack each tuple in state_county_tuples and, if the state in the tuple is California, add the corresponding county to a list called ca_counties. 1. Using a Loop: ca_counties = [] for state, county in state_county_tuples: if state == "California": ca_counties.append(county) print(ca_counties) Output: ['Alameda', 'Sacramento'] 2. Using a List Comprehension: ca_counties = [] [ca_counties.append(county) for state, county in state_county_tuples if state == "California"] print(ca_counties) Output: ['Alameda', 'Sacramento']
219
Dictionary
A data structure that consists of a collection of key-value pairs. Use Cases: -Analyze large datasets with fast processing power. -Straightforward way to store data, making it easier for users to find information Example: Zoo is the dictonary Pen numbers act as keys Animals are the values zoo = { 'pen_1' : 'penguins', 'pen_2': 'zebras', 'pen_3: 'lions', } zoo['pen_2'] output: 'zebras' *Must write it in this fashion. Must search for the key (i.e., pen in this instance) to get the value (i.e., the animal in this example). Instantiation Syntax: There are 2 options. Option 1: Dictionary = { 'key': 'value', 'key': 'value', } dictionary['key'] Option 2: Use dict(), a function used to create a dictionary dictionary = dict( key='value', key='value', ) dictionary['value'] Example: zoo = dict( pen_1='monkeys', pen_2='zebras', pen_3='lions', ) zoo['pen_2'] output: 'zebras' Note: When the keys are strings, you can type them as keyword arguments. You don't have to use quotation marks to indicate that the keys are strings. Other Notes: Dictionaries are unordered, so you can't try to access a dictionary using a positional index; you'll get an error. Because dictionaries are unordered, you may find that the order may change as you're working with them. Though, in Python versions 3.7 and up, dictionaries will retain their order. A dictionary's keys are immutable. Only immutable keys (e.g., integers, floats, tuples, and strings) can be used. Mutable data types (e.g., lists, sets, and other dictionaries) cannot be used.
220
dict()
function used to create a dictionary. Note: When the keys are strings, you can type them as keyword arguments. You don't have to use quotation marks to indicate that the keys are strings.
221
How do you add a new key-value pair to a dictionary?
Example: Existing Dictionary: zoo = { 'pen_1' : 'monkeys', 'pen_2': 'zebras', 'pen_3: 'lions', } Add a key-value pair: zoo['pen_4'] = 'crocodiles' zoo output: {'pen_1': 'monkeys', 'pen_2': 'zebras', 'pen_3: 'lions', 'pen_4': 'crocodiles'}
222
Are a dictionary's keys immutable?
Yes. Only immutable keys (e.g., integers, floats, tuples, and strings) can be used. Mutable data types (e.g., lists, sets, and other dictionaries) cannot be used. Because keys cannot be entered as strings, they cannot contain whitespaces. (Whereas, values can contain whitespaces.) Example: smallest_countries = dict(africa='Seychelles', asia='Maldives', europe='Vatican City', oceania='Nauru', north_america='St. Kitts and Nevis', south_america ='Suriname' )
223
Immutable Keys
Integers, Floats, Tuples, Strings
224
Mutable data types
Lists, Sets, dictionaries
225
Are dictionaries ordered?
No. They're unordered, which means that you can't try to access a dictionary using a positional index; you'll get an error. Example: Existing Dictionary: zoo = { 'pen_1' : 'monkeys', 'pen_2': 'zebras', 'pen_3: 'lions', } Positional Index: zoo[2] output: KeyError: 2 Notes: -In the above example, Python interprets 2 as a dictionary key, not as an index. And there is no key "2" in the existing dictionary. -Because dictionaries are unordered, you may find that the order may change as you're working with them. Though, in Python versions 3.7 and up, dictionaries will retain their order.
226
How do you check if a key is in a dictionary?
Use the in keyword. Example: print('pen_1' in zoo) print('pen_7' in zoo) output: True False Note: this only works for keys.
227
Practice Problem: You have a list of basketball players that includes their first name, age, and position played. You want to convert this into a dictionary. You could do this manually but that would take far too much time. Write a for loop that will iterate over the team list and assign the position as the key and the name and age as the values in a dictionary called new_team.
new_team = {} for name, age, position in team: if position in new_team: new_team[position].append((name, age)) else: new_team[position] = [(name, age)] new_team output: {'center': [('Marta', 20), ('Sandra', 19)], 'point guard': [('Ana', 22), ('Mari', 18)], 'shooting guard': [('Gabi', 22), ('Esme', 18)], 'power forward': [('Luz', 21), ('Lin', 18)], 'small forward': [('Lorena', 19), ('Sol', 19)]} new_team['point guard'] output: [('Ana', 22), ('Mari', 18)]
228
If you run a loop over a dictionary, will you be able to access the keys and the values?
No, just the keys.
229
keys()
A dictionary method to retrieve only the dictionary's keys Example: new_team dictionary: {'center': [('Marta', 20), ('Sandra', 19)], 'point guard': [('Ana', 22), ('Mari', 18)], 'shooting guard': [('Gabi', 22), ('Esme', 18)], 'power forward': [('Luz', 21), ('Lin', 18)], 'small forward': [('Lorena', 19), ('Sol', 19)]} new_team.keys() output: dict_keys(['center', 'point guard', 'shooting guard', 'power forward', 'small forward'])
230
values()
A dictionary method to retrieve only the dictionary's values Example: new_team dictionary: {'center': [('Marta', 20), ('Sandra', 19)], 'point guard': [('Ana', 22), ('Mari', 18)], 'shooting guard': [('Gabi', 22), ('Esme', 18)], 'power forward': [('Luz', 21), ('Lin', 18)], 'small forward': [('Lorena', 19), ('Sol', 19)]} new_team.values() output: dict_values([[('Marta', 20), ('Sandra', 19)], [('Ana', 22), ('Mari', 18)]]) *Note: didn't list the full output because you get the gist.
231
items()
A dictionary method to retrieve both the dictionary's keys and values. Example: for a, b in new_team.items(): print(a, b) output: center [('Marta', 20), ('Sandra', 19)] *Note: didn't list the full output becaues you get the gist
232
How do you create an empty dictionary?
To create an empty dictionary, use empty braces or the dict() function: empty_dict_1 = {} empty_dict_2 = dict()
233
Are a dictionary's values immutable or mutable?
Could be either. Mutable data types: Lists, Sets, dictionaries Immutable data types: Integers, Floats, Tuples, Strings
234
How may values can each key correspond to in a dictionary?
One. each key can only correspond to a single value. Example: invalid_dict = {'numbers': 1, 2, 3} output: Error on line 1: invalid_dict = {'numbers': 1, 2, 3} ^ SyntaxError: invalid syntax But if you enclose multiple values within another single data structure, you can create a valid dictionary. For example: valid_dict = {'numbers': [1, 2, 3]} print(valid_dict) output: {'numbers': [1, 2, 3]}
235
del
To delete a key-value pair from a dictionary, use the del keyword: my_dict = {'nums': [1, 2, 3], 'abc': ['a', 'b', 'c'] } del my_dict['abc'] print(my_dict) output: {'nums': [1, 2, 3]} Another Example: names_dict = { "jack": "bronson", "jill": "mcarty", "joe": "denver" } del names_dict["joe"] print(names_dict) # Prints: {'jack': 'bronson', 'jill': 'mcarty'}
236
What is a class in python?
-classes package data with tools to work with it Example: Dictionaries are a core Python class.
237
What is a method?
Methods are functions that belong to a class. Example: Dictionary methods include items(), keys(), values(), etc.
238
What are view objects?
The objects returned by the items(), keys(), and values() functions for the dictionary class are view objects. They provide a dynamic view of the dictionary's entries, which means that, when the dictionary changes, the view reflects those changes. Dictionary view can be iterated over to yield their respective data. They also support membership tests
239
Set
A data structure in Python that contains only unordered, non-interchangeable elements. Instantiated with a set() function or non-empty braces. Each set element is unique and immutable, but the set itself is mutable. Because sets are mutable, they cannot be used as keys in a dictionary. Because the elements are immutable, a set cannot be indexed or sliced. *Note: Mutable simply means that you can add/delete elements & you can change it after creation. (It has nothing to do with the elements in the set being ordered or not.) Use Cases: -Storing data in a single row or record -Frequently used when storing a lot of elements ( & you want to be certain that each on is only present once). Instantiate with a set() function: Example: Pass a list through the set function x = set(['foo', 'bar', 'baz', 'foo']) print(x) output: {'bar', 'foo', 'baz'} Note: The second "foo" is lost, because each element must be unique in sets. instantiate with braces: x = {'foo'} print(type(x)) y = {} print(type(y)) output: Note: Instantiating a set with braces, treats what's inside the set as literals. So, when instantiating a set using only a single string in braces, you get single element back. And the element is the string itself. Example: x = {'foo'} print(x) output: {'foo'}
240
set()
A function that takes an iterable as an argument and returns a new set object. A collection of unique data elements, without duplicates, that is unordered and non-indexable. *Note: Mutable simply means that you can add/delete elements & you can change it after creation. (It has nothing to do with the elements in the set being ordered or not.) Set: -A mutable data type. -Because it's mutable, this class comes with additional methods to add and remove data from the set. -It can be applied to any iterable object and will remove duplicate elements from it. -It is unordered and non-indexable. -Elements in a set must be hashable; generally this means they must be immutable, which allows for their unique identification. Use Cases: -Data professionals compare sets to understand the range of data that they contain, where they intersect, and what items are present in either set but not both -Useful when cleaning data. Example: Pass a list through the set function x = set(['foo', 'bar', 'baz', 'foo']) print(x) output: {'bar', 'foo', 'baz'} Note: The second "foo" is lost, because each element must be unique in sets. Example: Pass a tuple through the set function x = set(('foo', 'bar', 'baz', 'foo')) print(x) output: {'bar', 'foo', 'baz'} Notes: -one set of parentheses to tell python we're working with a tuples -the other set of parentheses because the set function only takes a single argument. Example: Pass a string through the set function. x = set('foo') print(x) output: {'o', 'f'} Note: You get just the singular occurrence of the letters in the string. This is because the set function accepts a single argument and that argument must be iterable. So, the set function splits it into individual characters and keeps only the unique ones.
241
How do you define an empty set or a new set?
It's best to use the set() function. You can only use curly braces when the set is not empty and you are assigning the set to a variable.
242
intersection()
A function that finds the elements that two sets have in common. Example: # Define the sets set1 = {1, 2, 3, 4, 5, 6} set2 = {4, 5, 6, 7, 8, 9} option 1: attach the intersection method to set 1 and passing set 2 to the method's argument. print(set1.intersection(set2)) option 2: use the ampersand operator print(set1 & set2) output: {4, 5, 6} {4, 5, 6}
243
union()
A function that finds all the elements from both sets. Example: # Define the sets x1 = {'foo', 'bar', 'baz'} x2 = {'baz', 'qux', 'quuz'} print(x1.union(x2)) print(x1 | x2) output: {'quux', 'bar', 'foo', 'qux', 'baz'} {'quux', 'bar', 'foo', 'qux', 'baz'}
244
difference()
A function that finds the element present in one set, but not the other. *Note: You get different answers depending on which set you subtract from the other just like in math. Examples: set1 = {1, 2, 3, 4, 5, 6} set2 = {4, 5, 6, 7, 8, 9} print(set1.difference(set2)) print(set1 - set2) output: {1, 2, 3} {1, 2, 3} print(set2.difference(set1)) print(set2 - set1) output: {8, 9, 7} {8, 9, 7}
245
symmetric_difference()
A function that finds elements from both sets that are mutually not present in the other. Example: set1 = {1, 2, 3, 4, 5, 6} set2 = {4, 5, 6, 7, 8, 9} print(set2.symmetric_difference(set1)) print(set2 ^ set1) output: {1, 2, 3, 7, 8, 9} {1, 2, 3, 7, 8, 9}
246
Hashable elements
Elements in a set must be hashable, meaning they must be immutable, while allows for their unique identification.
247
Can you create an empty set with braces?
No. Python will understand that as a dictionary.
248
add()
A method available to sets but not frozensets. Example: example_d = {'mother', 'hamster', 'father'} example_d.add('elderberries') example_d output: {'hamster', 'elderberries', 'father', 'mother'}
249
Frozenset
A type of set in Python. They are their own class. They are similar to sets BUT they are immutable. *Note: Immutable does NOT mean ordered. Immutable simply means that you can't add/delete elements & you can't change it after creation. frozenset: -an immutable data type -can be applied to any iterable object and will remove duplicate elements from it -because they're immutable, frozensets can be used as dictionary keys and as elements in other sets. -like sets, they do not maintain insertion order, they do not have an index, and Python is free to store or display their elements in any order. Example: example_e = [1.5, frozenset(['a', 'b', 'c']), 1.5] set(example_e) output: {1.5, frozenset({'c', 'b', 'a'})}
250
Do you have to have values to create a dictionary?
# Prints False No. It's common to create a blank dictionary and then populate it later using dynamic values. Example: def get_character_record(name, server, level, rank): character_dict = { "name": name, "server": server, "level": level, "rank": rank, "id": f"{name}#{server}", } return character_dict Example: Note: The syntax is the same as getting data out of a key, just use the assignment operator (=) to give that key a value. planets = {} planets["Earth"] = True planets["Pluto"] = False print(planets["Pluto"])
251
What happens if you assign a new value to an existing key in a dictionary?
The value updates. Example: planets = { "Pluto": True, } planets["Pluto"] = False print(planets["Pluto"]) # Prints False
252
Can a dictionary have the same key twice?
No. Example: In the below example, jack denver overwrites jack bronson, because jack (the key) can only be listed once. full_names = ["jack bronson", "james mcarty", "jack denver"] names_dict = {} for full_name in full_names: # .split() returns a list of strings # where each string is a single word from the original names = full_name.split() first_name = names[0] last_name = names[1] names_dict[first_name] = last_name print(names_dict) # { # 'jack': 'denver', # 'james': 'mcarty' # } Solution: A plain dictionary can't hold duplicate keys, but you can make each key map to MORE data. For multiple Jacks, you can store a list (or another structure) as the value. names_dict = {} def add_person(first_name, last_name): if first_name not in names_dict: names_dict[first_name] = [] names_dict[first_name].append(last_name) add_person("jack", "bronson") add_person("james", "mcarty") add_person("jack", "denver") print(names_dict) # {'jack': ['bronson', 'denver'], 'james': ['mcarty']}
253
How do you check for the existence of a key in a dictionary?
Use the in keyword. Example: cars = { "ford": "f150", "toyota": "camry" } print("ford" in cars) # Prints: True print("gmc" in cars) # Prints: False
254
dictionaries in python 3.7 vs earlier
As of Python version 3.7, dictionaries are ordered. In Python 3.6 and earlier, dictionaries were unordered. Because dictionaries are ordered, the items have a defined order, and that order will not change. Unordered means that the items do not have a defined order, so you couldn't refer to an item by using an index. The takeaway is that if you're on Python 3.7 or later, you'll be able to iterate over dictionaries in the same order every time.
255
Library (or package)
Broadly refers to a reusable collection of code Examples: -matplotlib: a library for creating static, animated, and interactive visuals in Python. -Seaborn: a data visual library based on matplotlib that provides a simpler interface for working with common plots and graphs. -NumPy (stands for numerical python): An essential library that contains multi-dimensional array and matrix data structures and functions to manipulate them. (np is the standard alias for numpy) -pandas (stands for python data analysis) : a powerful library built on top of NumPy that's used to manipulate and analyze tabular data.
256
Module
A simple Python file containing a collection of functions and global variables.
257
Global Variables
Variables that can be accessed from anywhere in a program or script
258
Vectorization
Enables operations to be performed on multiple components of a data object at the same time. Use Case: Data professionals work with large sets of data and vectorized code saves time, because it computers more efficiently. Examples: Vectorization: list_a = [1, 2, 3] list_b = [2, 4, 6] result: list_c = [2, 8, 18] Vectorization in Python using a for loop: list_c = [] for i in range(len(list_a)): list_c.append(list_a[i] * list_b[i]) list_c output: [2, 8, 18] Vectorization in Python using NumPy to perform this operation as a vectorized computation: import numpy as np array_a = np.array(list_a) array_b = np.array(list_b) array_a * array_b output: array([2, 8, 18]) *Vectorized approach (numpy) is considered easier to read and faster to execute. Vectors also take up less memory space, which is important when working with a lot of data. It's the preferred method.
259
import statement
Uses the import keyword to load an external library, package, module, or function into your computing environment. Once your import something into your notebook, you don't need to import it again unless you restart your notebook. Examples: import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt (*Note: matplotlib is the library and pyplot is a module inside. The pyplot is aliased as plot, and it's accessed from the matplotlib library using the dot.) from sklearn.metrics import precision_score, recall_score (*Note: The import statement begins with the from keyword, followed by sklearn.metrics--the scikit-learn library + the metrics module. Next is the import keyword followed by the desired functions. In this case, there are two: precision_score and recall_score.) Discouraged syntax: from library.module import * This imports everything from a particular library or module and allows you to use its functions without any preceding syntax. If you wrote from numpy import *, you'd be able to use all of NumPy's functions without preceding them with numpy or np. NOT recommended because it makes it difficult to track where functions come from, but you may see it.
260
Aliasing
Lets you assign an alternate name--or alias--by which you can refer to something. Standard aliases; it's advised that you stick with the standard to prevent potential confusion when others read your code. import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt (*Note: matplotlib is the library and pyplot is a module inside. The pyplot is aliased as plot, and it's accessed from the matplotlib library using the dot.)
261
In NumPy, ___ enables operations to be performed on multiple components of a data object at the same time.
vectorization In NumPy, vectorization enables operations to be performed on multiple components of a data object at the same time. Data professionals often work with large datasets, and vectorized code helps them efficiently compute large quantities of data.
262
Commonly used built-in modules
The Python standard library comes with a number of built-in modules relevant to data professional work such as math, datetime, and random. These can be imported without additional installation. In other words, you can import them directly, as long as you have Python installed. import datetime date = datetime.date(1977, 5, 8) # assign a date to a variable print(date) # print date print(date.year) output: 1977-05-08 math: provides access to mathematical functions. import math print(math.exp(0)) # e**0 output: 1.0 random: useful for generating pseudo-random numbers (refer to documentation for explanation for explanation of pseudo-random number generation). import random print(random.random()) # 0.0 <= X < 1.0 print(random.choice([1, 2, 3])) # choose a random element from a sequence print(random.randint(1, 10)) # a <= X <= b output: 0.2742648510295501 1 10
263
Module:
Modules are groups of related classes and functions that are typically subcomponents of libraries, allowing for selective importing.
264
Library:
A library is a collection of reusable code modules and their documentation, often bundled into packages for installation and use.
265
Import Statement:
An import statement is used to bring libraries or modules into the working environment, requiring specific syntax with the 'import' keyword.
266
Package:
A package is a bundle of libraries that can be installed and imported into a coding environment, often used interchangeably with the term 'library'.
267
Deprecation
The process by which code becomes obsolete and phased out, often accompanied by warnings in development environments
268
Environment
The specific set of tools, libraries, and configurations used to develop and run code, which must be consistent among collaborators.
269
Dynamism
Python's ability to evolve and improve over time, influenced by user feedback and community contributions
270
N-dimensional array (ndarray)
The core data object of NumPy. The ndarray is a vector. Vectors allow many operations to be performed together when the code is executed, resulting in faster run times that require less computer memory. Notes: -ndarrays are mutable (i.e., you can change the value of an element; say switch a 4 to a 5--see below example). -But to change the size of an array, you have to reassign it. -All elements must be the same data type (or python will try to convert them to the same type.) Example: import numpy as np x = np.array([1, 2, 3, 4]) x array([1, 2, 3, 4]) Example: Mutable so can change values. array([1, 2, 3, 4]) x[-1] = 5 x output: array([1, 2, 3, 5])
271
dtype
A NumPy attribute used to check the data type of the contents of an array arr = np.array([1, 2, 3)] arr.dtype output: dtype('int64)
272
shape
A NumPy attribute used to check the shape of an array. Note: A one-dimensional array is neither a row nor a column. arr = np.array([1, 2, 3)] arr.shape output: 3, arr.ndim #ndim is used to check the number of dimensions of an array. arr = np.array([1, 2, 3)] arr.ndim output: 1
273
ndim
A NumPy attribute used to check the number of dimensions of an array. arr = np.array([1, 2, 3)] arr.ndim output: 1 arr.shape #shape is used to check the shape of an array. output: 3,
274
2-D Array
A 2-D array can be created from a list of lists so long as each list is the same length. Think of each list as an individual row so the final array is like a table. arr_2d = np.array([[1, 2], [3, 4], [5, 6], [7, 8]]) print(arr_2d.shape) print(arr_2d.ndim) arr_2d (4, 2) #this array has a shape of 4 rows by 2 columns 2 #and this array has 2 dimensions output: array([[1, 2], [3, 4], [5, 6], [7, 8]])
275
3-D Array
A 2-D array is a list of lists. A 3-D array contains two of these/It's two lists of lists. A 3-D array can be thought of as 2 tables, each with 2 rows and 3 columns print(arr_3d.shape) print(arr_3d.ndim) arr_3d (2, 2, 3) 3 output: array([[1, 2, 3], [3, 4, 5]], [[5, 6, 7], [7, 8, 9]]])
276
reshape
NumPy method used to change the shape of an array. Example: Change an array from 4 rows by 2 columns to 2 rows by 4 columns. array([1, 2], [3, 4], [5, 6], [7, 8]]) arr_2d = arr+2d.reshape(2, 4) arr_2d output: array([[1, 2, 3, 4], [5, 6, 7, 8]])
277
Common numpy functions and methods
mean arr = np.array{[1, 2, 3, 4, 5]) np.mean(arr) output: 3.0 #natural logarithm np.log(arr) array9{0. , 0.69314718, 1.09861229, 1.38629436, 1.60943791]) np.floor(5.7) output: 5.0 np.ceil(5.3) output: 6.0 And many more.
278
NumPy:
NumPy is a powerful library in Python for numerical computing, enabling efficient operations on large datasets through its array structure.
279
Indexing and Slicing in NumPy
Indexing and slicing in NumPy allow for accessing and modifying elements in arrays, similar to Python lists but with support for multi-dimensional arrays.
280
Mutability:
NumPy arrays are mutable, allowing for element modification, but they cannot be resized once created.
281
Array Creation:
Arrays can be created using functions like np.array(), np.zeros(), np.ones(), and np.full(), which allow for flexible initialization.
282
Array Methods
Arrays can be created using functions like np.array(), np.zeros(), np.ones(), and np.full(), which allow for flexible initialization.
283
np.array()
This creates an ndarray (n-dimensional array). There is no limit to how many dimensions a NumPy array can have, but arrays with many dimensions can be more difficult to work with. Example of a 1-D array: *Notice that a one-dimensional array is similar to a list. import numpy as np array_1d = np.array([1, 2, 3]) array_1d output: [1 2 3] Example of a 2-D array: *Notice that a two-dimensional array is similar to a table. array_2d = np.array([(1, 2, 3), (4, 5, 6)]) array_2d output: [[1 2 3] [4 5 6]] Example of a 3-D array: *Notice that a three-dimensional array is similar to two tables. array_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) array_3d output: [[[1 2] [3 4]] [[5 6] [7 8]]]
284
np.zeros()
This creates an array of a designated shape that is pre-filled with zeros. Example: np.zeros((3, 2)) output: [[ 0. 0.] [ 0. 0.] [ 0. 0.]] Use Cases: -To initialize an array of a specific size and shape, then fill it with values derived from a calculation -To allocate memory for later use -To perform matrix operations
285
np.ones()
This creates an array of a designated shape that is pre-filled with ones. Example: np.ones((2, 2)) output: [[ 1. 1.] [ 1. 1.]] Use Cases: -To initialize an array of a specific size and shape, then fill it with values derived from a calculation -To allocate memory for later use -To perform matrix operations
286
np.full()
this creates an array of a designated shape that is pre-filled with a specified value. Example: np.full((5, 3), 8) output: [[ 8. 8. 8.] [ 8. 8. 8.] [ 8. 8. 8.] [ 8. 8. 8.] [ 8. 8. 8.]] Use Cases: -To initialize an array of a specific size and shape, then fill it with values derived from a calculation -To allocate memory for later use -To perform matrix operations
287
Common NumPy Array Methods (there are many more)
ndarray.flatten(): this returns a copy of the array collapsed into one dimension. Example: array_2d = np.array([(1, 2, 3), (4, 5, 6)]) print(array_2d) print() array_2d.flatten() output: [[1 2 3] [4 5 6]] [1 2 3 4 5 6] ndarray.reshape(): This gives a new shape to an array without changing its data. array_2d = np.array([(1, 2, 3), (4, 5, 6)]) print(array_2d) print() array_2d.reshape(3, 2) output: [[1 2 3] [4 5 6]] [[1 2] [3 4] [5 6]] BUT adding a value of -1 in the designated new shape makes the process more efficient, as it indicates for NumPy to automatically infer the value based on other given values. array_2d = np.array([(1, 2, 3), (4, 5, 6)]) print(array_2d) print() array_2d.reshape(3, -1) output: [[1 2 3] [4 5 6]] [[1 2] [3 4] [5 6]]
288
ndarray.tolist()
This converts an array to a list object. Multidimensional arrays are converted to nested lists. array_2d = np.array([(1, 2, 3), (4, 5, 6)]) print(array_2d) print() array_2d.tolist() output: [[1 2 3] [4 5 6]] [[1, 2, 3], [4, 5, 6]]
289
NumPy Array Mathematical Functions (some of them)
ndarray.max() : returns the maximum value in the array or along a specified axis. ndarray.mean() : returns the mean of all the values in the array or along a specified axis. ndarray.min() : returns the minimum value in the array or along a specified axis. ndarray.std() : returns the standard deviation of all the values in the array or along a specified axis. Example: a = np.array([(1, 2, 3), (4, 5, 6)]) print(a) print() print(a.max()) print(a.mean()) print(a.min()) print(a.std()) output: [[1 2 3] [4 5 6]] 6 3.5 1 1.70782512766
290
Array Attributes
ndarray.shape : returns a tuple of the array’s dimensions. ndarray.dtype : returns the data type of the array’s contents. ndarray.size : returns the total number of elements in the array. ndarray.T : returns the array transposed (rows become columns, columns become rows). Example: array_2d = np.array([(1, 2, 3), (4, 5, 6)]) print(array_2d) print() print(array_2d.shape) print(array_2d.dtype) print(array_2d.size) print(array_2d.T) output: [[1 2 3] [4 5 6]] (2, 3) int64 6 [[1 4] [2 5] [3 6]]
291
Indexing and Slicing, NumPy Array
Indexing in NumPy is similar to indexing in Python lists, except multiple indices can be used to access elements in multidimensional arrays. Example: a = np.array([(1, 2, 3), (4, 5, 6)]) print(a) print() # This isn't necessary. It's just there to print a blank line to make the output easier to read. print(a[1]) print(a[0, 1]) #row 0 is 1 23 & column 1 is 2, 5 print(a[1, 2]) #row 1 is 4 5 6 and column 2 is 3 6 *Remember in python it's 0, 1, 2 Output: [[1 2 3] [4 5 6]] [4 5 6] 2 6 Slicing may also be used to access subarrays of a NumPy array: Example: a = np.array([(1, 2, 3), (4, 5, 6)]) print(a) print() a[:, 1:] Output: [[1 2 3] [4 5 6]] [[2 3] [5 6]]
292
NumPy Array Operations
NumPy arrays support many operations, including mathematical functions and arithmetic. These include array addition and multiplication, which performs element-wise arithmetic on arrays. Example: a = np.array([(1, 2, 3), (4, 5, 6)]) b = np.array([[1, 2, 3], [1, 2, 3]]) print('a:') print(a) print() print('b:') print(b) print() print('a + b:') print(a + b) print() print('a * b:') print(a * b) output: a: [[1 2 3] [4 5 6]] b: [[1 2 3] [1 2 3]] a + b: [[2 4 6] [5 7 9]] a * b: [[ 1 4 9] [ 4 10 18]]
293
Mutability in NumPy Arrays
NumPy arrays are mutable, but with certain limitations. For instance, an existing element of an array can be changed: a = np.array([(1, 2), (3, 4)]) print(a) print() a[1][1] = 100 a Output: [[1 2] [3 4]] [[ 1 2] [ 3 100]] However, arrays cannot be lengthened or shortened. a = np.array([1, 2, 3]) print(a) print() a[3] = 100 a output: Error on line 5: a[3] = 100 IndexError: index 3 is out of bounds for axis 0 with size 3
294
How do NumPy arrays store data in memory?
NumPy arrays work by allocating a contiguous block of memory at the time of instantiation. Most other structures in Python don’t do this; their data is scattered across the system’s memory. This is what makes NumPy arrays so fast; all the data is stored together at a particular address in the system’s memory. Interestingly, this is also what prevents an array from being lengthened or shortened: The abutting memory is occupied by other information. There’s no room for more data at that memory address. However, existing elements of the array can be replaced with new elements. Example: System memory: a b 1 2 3 4 5 x y z np.array([1, 2, 3, 4, 5])
295
pandas, import
import numpy as np import pandas as pd *Standardly, when using pandas, you import both numpy and pandas together. You often use them together. (You don't have to, though.) Key Use Case: Manipulation and analysis of tabular data. NumPy is not as easy to work with as pandas. NumPy requires you to work more abstractly with the data and keep track of what's being done to it even if you can't see it. Pandas provides as a simple interface that allows you to display your data as rows and columns so you can always follow exactly what's happening to your data.
296
Tabular Data
Data that is in the form of a table with rows and columns (e.g., a spreadsheet).
297
Load data in pandas
You can load data from different formats. (csv files, excel files, databases, etc.) Example: dataframe = pd.read_csv('https:// /titanic/main/train.csv') dataframe.head(25)
298
data frame
Made up of rows and columns and can contain data of many different data types (i.e., integers, floats, booleans, etc.). The data frame is a core structure of pandas.
299
How many lines of code does checking summary stats with pandas require?
One line of code. Examples: dataframe.describe() output: count: the number of rows mean std min 25% 50% 75% # the 25%, 50%, and 75% are the quartiles for every numeric column. max
300
pandas mean, max, min, std, etc.
Examples: dataframe['Age'].mean() dataframe['Age'].max() dataframe['Age'].min() dataframe['Age'].std() dataframe['Pclass'].value_counts() #to check how many passengers were in each class.
301
What does the following line of code using pandas do? dataframe[(dataframe['Age'] > 60) & (dataframe['Pclass'] == 3)]
Selecting only the passengers who are over the age of 60 and in third class.
302
How to add a column to a dataframe.
Example: dataframe['2023_Fare'] = dataframe['Fare'] * 146.14 dataframe
303
How to select rows and columns via indexing using pandas
Example: Select row 2, column 4: dataframe.iloc[1][3]
304
Example of more complicated pandas calculation
Group data (set of Titanic passengers) by sex and class and then count the rows, sum the fare, and determine the mean cost of fare. fare = dataframe.groupby(['Sex', 'Pclass']).agg({'Fare': ['count', 'sum']}) fare['fare_avg'] = fare['Fare']['sum'] / fare['Fare']['count'] fare
305
Core pandas object classes
DataFrame and Series DataFrame: A two-dimensional, labeled data structure with rows and columns (e.g., excel spreadsheet). import pandas as pd data = ('col1': [1, 2], 'col2': [3, 4]) df = pd.DataFrame(data*data) df|
306
DataFrame
A two-dimensional, labeled data structure with rows and columns. Used to structure, manipulate, and analyze data in pandas.
307
Series
A one-dimensional, labeled array Example using the titanic data set: titanic.age output: 0 22.0 1 38.0 2 26.8 3 35.0 4 35.0 Name: Age, Length: 891, dtype: float64 Example: Selecting multiple objects: titanic[['Name', 'Age]] output: Name Age 0 Braund, Mr. Owen Harris 22.0 1 Cumings, Mrs. John Bradley 38.0
308
methods vs attributes
Method: A function that belongs to a class. It performs an action on an object. Attribute: A value associated with a class instance. Both use dot notation, but methods use parentheses, while attributes do not.
309
NaN
"Not a Number" NaN is how null values are represented in pandas.
310
iloc[]
Use iloc[] o select rows or number by index in pandas. It's a way to indicate in pandas that you want to select by integer-location-based position. You can get a single row of you dataframe. Example: titanic['Age'] output: PassengerId 1 Survived 0 Pclass 3 Name Braund, Mr. Owen Harris Sex male Age 22.0 SibSp 1 If you enter a list of an integer in the iloc brackets, you will get a single row of the dataframe at that index. Example: titanic.iloc[[0]] Output: PassengerID Survived Pclass Name Sex Age SibSp 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 Access a range of rows by entering indices separated by a colon. It will include all index including the starting and up to (not including) the last index. Example: titanic.iloc[0:3] PassengerId Survived Pclass Name Sex Age SibSp 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 1 2 1 1 Cumings, Mrs. John Bradley female 38.0 1 2 3 1 3 Heikkinen, Miss Lana female 26.0 0 You can select subsets of rows and columns. Example: Return a data frame view of rows 0, 1, and 2 of the titanic dataset titanic.iloc[0:3, [3, 4] ] output: 0 Braund, Mr. Owen Harris male 1 Cumings, Mrs. John Bradley female 2 Heikkinen, Miss Lana female If you want a single column in its entirety, select all rows and then enter the index of the column wanted. Example: titanic.iloc[:, [3]] output: 0 Braund, Mr. Owen Harris 1 Cumings, Mrs. John Bradley 2 Heikkinen, Miss Laura 3 Futnelle, Mrs. Jacques 4 Allen, Mr. William Henry Get a single value at a particular row and particular column. Example: titanic.iloc[0, 30 output: 'Braund, Mr. Owen Harris'
311
loc[]
Used to select pandas rows and columns by name. Example: Select rows 1, 2, and 3 at just the name column. titanic.loc[1:3, ['Name']] output: 1 Cumings, Mrs. John Bradley 2 Heikkinen, Miss Lana 3 Futnelle, Mrs. Jacques Heath
312
How do you add a new column to a data frame?
Example: Add an Age_plus_100 column to the end of the existing data frame titanic['Age_plus_100'] = titanic['Age'] + 100 titanic.head() output: full data frame and then the following column tacked on the end: Age_plus_100 122.0 138.0 126.0 135.0
313
How do you get an index of all the column names in a dataframe using pandas?
Example: Using the titanic dataset: titanic.columns= output: Index(["PassengerId", "Survived", "Pclass", "Name", "Sex", "Age, "SibSp"], dtype="object")
314
How do you check the number of rows and columns in a data frame using pandas?
Example: Using titanic dataset: titanic.shape output: (891, 12) 891 rows and 12 columns.
315
How do you get summary information about a data frame using pandas?
# Column Non-Null Count Dtype Example: Using the titanic dataset: titanic.info() output: RangeIndex: 891 entries, 0 to 898 Data columns (total 12 columns): 0 Passenger ID 891 non-null int64 etc. Above shows data type contained in each column, number of null values in each column, and amount of memory used.
316
How do you create a dataframe?
Option 1: Use pandas import pandas as pd data = {'col1': [1, 2], 'col2': [3, 4]} df = pd.DataFrame(data=data) df output: col1 col2 0 1 3 1 2 4 *In the above example, each key of the dictionary represents a col name and the values for that key are in a list. Option 2: Use a numpy array. This approach makes it possible to name the columns and rows. import numpy as np df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['a', 'b', 'c'], index=['x', 'y', 'z']} df2 output: a b c x 1 2 3 y 4 5 6 z 7 8 9
317
Use pd.DataFrame() function to create a dataframe from a dictionary.
import pandas as pd data = {'col1': [1, 2], 'col2': [3, 4]} df = pd.DataFrame(data=data) df
318
Use pd.DataFrame() function to create a dataframe from a NumPy array.
import pandas as pd df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['a', 'b', 'c'], index=['x', 'y', 'z']) df2
319
Use pd.read_csv() function to create a dataframe from a .csv file # from a URL or filepath.
df3 = pd.read_csv('train.csv') df3.head()
320
Print class of first row Print class of "Name" column
Print class of first row print(type(df3.iloc[0])) Print class of "Name" column print(type(df3['Name']))
321
Create a copy of df3 named 'titanic'. The head() method outputs the first 5 rows of dataframe.
Create a copy of df3 named 'titanic'. titanic = df3 The head() method outputs the first 5 rows of dataframe. titanic.head()
322
The columns attribute returns an Index object containing the dataframe's columns.
titanic.columns
323
The shape attribute returns the shape of the dataframe (rows, columns).
titanic.shape
324
The info() method returns summary information about the dataframe.
titanic.info()
325
You can select a column by name using brackets.
titanic['Age']
326
You can select a column by name using dot notation # only when its name contains no spaces or special characters.
titanic.Age
327
# of column names inside brackets. Use iloc to return a Series object of the data in row 0.
titanic.iloc[0]
328
Use iloc to return a DataFrame view of the data in row 0.
titanic.iloc[[0]]
329
Use iloc to return a DataFrame view of the data in rows 0, 1, 2.
titanic.iloc[0:3]
330
Use iloc to return a DataFrame view of rows 0-2 at columns 3 and 4.
titanic.iloc[0:3, [3, 4]]
331
Use iloc to return a DataFrame view of all rows at column 3.
titanic.iloc[:, [3]]
332
Use iloc to access value in row 0, column 3.
titanic.iloc[0, 3]
333
Use loc to access values in rows 0-3 at just the Name column.
titanic.loc[0:3, ['Name']]
334
Create a new column in the dataframe containing the value in the Age column + 100.
titanic['Age_plus_100'] = titanic['Age'] + 100 titanic.head()
335
Pandas DataFrame in Python, the term "axis" refers to
he direction along which an operation is performed. DataFrames are two-dimensional, and thus have two axes: -Axis 0 (or 'index'): This axis refers to the rows of the DataFrame. When an operation is performed along axis=0, it is applied to each row, or across the index. -Axis 1 (or 'columns'): This axis refers to the columns of the DataFrame. When an operation is performed along axis=1, it is applied to each column.
336
Central Tendency
Central tendency in descriptive statistics is a single value that represents the center or typical point of a dataset, summarizing it by showing where data points tend to cluster. Measures of central tendency = mean (average), median (middle value), and mode (most frequent value).
337
Percentile
a measure in statistics that represents the percentage of data points in a given dataset that falls below a specific value Example: You are the fourth tallest person in a group of 20 80% of people are shorter than you: percentile 80% That means you are at the 80th percentile.
338
Common DataFrame attribute: DataFrame.columns
Returns the column labels of the dataframe Example: df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) df A B 0 1 3 1 2 4 df.columns Index(['A', 'B'], dtype='object')
339
Common DataFrame attribute: DataFrame.dtypes
Returns the data types in the dataframe Example: df = pd.DataFrame({'float': [1.0], 'int': [1], 'datetime': [pd.Timestamp('20180310')], 'string': ['foo']}) df.dtypes float float64 int int64 datetime datetime64[ns] string object dtype: object
340
Common DataFrame attribute: DataFrame.iloc
Accesses a group of rows and columns using integer-based indexing Example: df.iloc[[0]] a b c d 0 1 2 3 4 type(df.iloc[[0]])
341
Common DataFrame attribute: DataFrame.loc
Accesses a group of rows and columns by label(s) or a Boolean array df.loc[['viper', 'sidewinder']] max_speed shield viper 4 5 sidewinder 7 8
342
Common DataFrame attribute: DataFrame.shape
Returns a tuple representing the dimensionality of the dataframe df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4], 'col3': [5, 6]}) df.shape (2, 3)
343
Common DataFrame attribute: DataFrame.values
Returns a NumPy representation of the dataframe df = pd.DataFrame({'age': [ 3, 29], 'height': [94, 170], 'weight': [31, 115]}) df age height weight 0 3 94 31 1 29 170 115 df.dtypes age int64 height int64 weight int64 dtype: object df.values array([[ 3, 94, 31], [ 29, 170, 115]])
344
Common DataFrame method: DataFrame.apply
Applies a function over an axis of the dataframe df.apply(np.sum, axis=0) A 12 B 27 dtype: int64 df.apply(np.sum, axis=1) 0 13 1 13 2 13 dtype: int64 Notes: axis=0 (Rows/Index): This axis represents the rows, or the vertical direction. When an operation is performed with axis=0, it applies the function down each column, effectively collapsing the rows. For example, df.sum(axis=0) would calculate the sum of each column. In the context of dropping elements, df.drop(rows, axis=0) would remove specified rows. axis=1 (Columns): This axis represents the columns, or the horizontal direction. When an operation is performed with axis=1, it applies the function across each row, effectively collapsing the columns. For example, df.sum(axis=1) would calculate the sum of each row. In the context of dropping elements, df.drop(columns, axis=1) would remove specified columns.
345
Common DataFrame method: DataFrame.copy
Makes a copy of the dataframe’s indices and data Example: s_copy = s.copy() s_copy a 1 b 2 dtype: int64
346
Common DataFrame method: DataFrame.describe
Returns descriptive statistics of the dataframe, including the minimum, maximum, mean, and percentile values of its numeric columns; the row count; and the data types s = pd.Series([1, 2, 3]) s.describe() count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0 dtype: float64
347
Common DataFrame methods: DataFrame.drop
Drops specified labels from rows or columns Example:df.drop(['B', 'C'], axis=1) A D 0 0 3 1 4 7 2 8 11
348
Common DataFrame methods: DataFrame.groupby
Splits the dataframe, applies a function, and combines the results Example: arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'], ['Captive', 'Wild', 'Captive', 'Wild']] index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type')) df = pd.DataFrame({'Max Speed': [390., 350., 30., 20.]}, index=index) df Max Speed Animal Type Falcon Captive 390.0 Wild 350.0 Parrot Captive 30.0 Wild 20.0 df.groupby(level=0).mean() Max Speed Animal Falcon 370.0 Parrot 25.0 df.groupby(level="Type").mean() Max Speed Type Captive 210.0 Wild 185.0
349
Common DataFrame methods: DataFrame.head
Returns the first n rows of the dataframe (default=5) Examples: df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion', 'monkey', 'parrot', 'shark', 'whale', 'zebra']}) df animal 0 alligator 1 bee 2 falcon 3 lion 4 monkey 5 parrot 6 shark 7 whale 8 zebra Viewing the first 5 lines df.head() animal 0 alligator 1 bee 2 falcon 3 lion 4 monkey
350
Common DataFrame methods: DataFrame.info
Returns a concise summary of the dataframe Examples: int_values = [1, 2, 3, 4, 5] text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon'] float_values = [0.0, 0.25, 0.5, 0.75, 1.0] df = pd.DataFrame({"int_col": int_values, "text_col": text_values, "float_col": float_values}) df int_col text_col float_col 0 1 alpha 0.00 1 2 beta 0.25 2 3 gamma 0.50 3 4 delta 0.75 4 5 epsilon 1.00 Prints information of all columns: df.info(verbose=True) RangeIndex: 5 entries, 0 to 4 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 int_col 5 non-null int64 1 text_col 5 non-null object 2 float_col 5 non-null float64 dtypes: float64(1), int64(1), object(1) memory usage: 248.0+ bytes
351
Common DataFrame methods: DataFrame.isna()
Returns a same-sized Boolean dataframe indicating whether each value is null (can also use isnull() as an alias) Example: df = pd.DataFrame(dict(age=[5, 6, np.nan], born=[pd.NaT, pd.Timestamp('1939-05-27'), pd.Timestamp('1940-04-25')], name=['Alfred', 'Batman', ''], toy=[None, 'Batmobile', 'Joker'])) df age born name toy 0 5.0 NaT Alfred None 1 6.0 1939-05-27 Batman Batmobile 2 NaN 1940-04-25 Joker df.isna() age born name toy 0 False True False True 1 False False False False 2 True False False False
352
Common DataFrame methods: DataFrame.sort_values
Sorts by the values across a given axis Example: df = pd.DataFrame({ 'col1': ['A', 'A', 'B', np.nan, 'D', 'C'], 'col2': [2, 1, 9, 8, 7, 4], 'col3': [0, 1, 9, 4, 2, 3], 'col4': ['a', 'B', 'c', 'D', 'e', 'F'] }) df col1 col2 col3 col4 0 A 2 0 a 1 A 1 1 B 2 B 9 9 c 3 NaN 8 4 D 4 D 7 2 e 5 C 4 3 F Sort by col1 df.sort_values(by=['col1']) col1 col2 col3 col4 0 A 2 0 a 1 A 1 1 B 2 B 9 9 c 5 C 4 3 F 4 D 7 2 e 3 NaN 8 4 D
353
Common DataFrame methods: DataFrame.value_counts
Returns a series containing counts of unique rows in the dataframe Example: df = pd.DataFrame({'num_legs': [2, 4, 4, 6], 'num_wings': [2, 0, 0, 0]}, index=['falcon', 'dog', 'cat', 'ant']) df num_legs num_wings falcon 2 2 dog 4 0 cat 4 0 ant 6 0 df.value_counts() num_legs num_wings 4 0 2 2 2 1 6 0 1 Name: count, dtype: int64
354
Common DataFrame methods: DataFrame.where
Replaces values in the dataframe where a given condition is false Examples: s = pd.Series(range(5)) s.where(s > 0) 0 NaN 1 1.0 2 2.0 3 3.0 4 4.0 dtype: float64 s.mask(s > 0) 0 0.0 1 NaN 2 NaN 3 NaN 4 NaN dtype: float64
355
356
Common DataFrame methods:
357
How are sets different than lists?
Sets are unordered and can't contain duplicates. They guarantee uniqueness; only one of each value can be in a set. Lists are ordered and can contain duplicates.
358
How do you add values to a set?
.add() Example: fruits = {"apple", "banana", "grape"} fruits.add("pear") print(fruits) # Prints: {'pear', 'banana', 'grape', 'apple'}
359
How do you create an empty set?
# Prints: {'pear'} Because the empty bracket {} syntax creates an empty dictionary, to create an empty set, you need to use the set() function. Example: fruits = set() fruits.add("pear") print(fruits)
360
Set Iteration
Note: Sets are unordered, so the order of iteration is not guaranteed. Example: fruits = {"apple", "banana", "grape"} for fruit in fruits: print(fruit) # Prints: # banana # grape # apple
361
How do you remove values from a set?
# Prints: {'banana', 'grape'} fruits = {"apple", "banana", "grape"} fruits.remove("apple") print(fruits) # Prints: {'banana', 'grape'}
362
How does set subtraction work?
You can use some of the "normal" mathematical operations on sets. For example, you can subtract one set from another. It removes all the values in the second set from the first set. Example: set1 = {"apple", "banana", "grape"} set2 = {"apple", "banana"} set3 = set1 - set2 print(set3) # Prints: {'grape'}
363
API (not python specific)
Application Programming Interface Analogy Roles: -The Customer (Client): Represents the software application or user making a request for a service or information. For example, a mobile app wanting to retrieve weather data. -The Menu (API Documentation): Defines what services are available and how to request them. It's like the API's rules and specifications that developers consult to understand how to interact with the API. -The Waiter (API): Acts as the intermediary. The API receives the customer's request, understands it, and then translates it into a format the kitchen (server) can understand. It then takes the kitchen's response and delivers it back to the customer. -The Kitchen (Server/Backend System): Represents the system that holds the data or provides the service requested by the customer. It processes the request and prepares the "meal" (the data or service). -The Meal (Data/Service): Is the response or result delivered back to the customer by the waiter. Process: -The Customer (Client) looks at the Menu (API Documentation) to see what they can order. -The Customer (Client) places an order (sends a request) to the Waiter (API). -The Waiter (API) takes the order to the Kitchen (Server), translating it into a language the chef (server) understands. -The Kitchen (Server) prepares the order (processes the request). -The Waiter (API) brings the prepared Meal (Data/Service) back to the Customer (Client). NOTE: This analogy highlights that the customer (client) doesn't need to know the intricacies of how the kitchen (server) operates; they only need to know how to communicate their needs through the waiter (API). --- What does API stand for? API stands for Application Programming Interface. In the context of APIs, the word Application refers to any software with a distinct function. Interface can be thought of as a contract of service between two applications. This contract defines how the two communicate with each other using requests and responses. Their API documentation contains information on how developers are to structure those requests and responses. How do APIs work? API architecture is usually explained in terms of client and server. The application sending the request is called the client, and the application sending the response is called the server. So in the weather example, the bureau’s weather database is the server, and the mobile app is the client. There are four different ways that APIs can work depending on when and why they were created.
364
Errors and Exceptions Type Error
Used to say: "you gave me the wrong type of thing." Example: Scenario: You’re designing an API: def get_player_record(player_id): if not isinstance(player_id, int): # case 1 ... if player_id == 1: return {"name": "Slayer", "level": 128} if player_id == 2: return {"name": "Dorgoth", "level": 300} # case 2 ... If player_id isn’t an int (like "abc" or None), raise an exception type that means “you used this function wrong”. What's the most suitable exception to raise in this instance? The best exception type in this scenario is a Type Error
365
Errors & Exceptions Key Error
Used when a key isn't found in a mapping (like a dictionary). Conceptually, in the below example, it means: "no such player with this ide" matches "missing key" Example: Scenario: You’re designing an API: def get_player_record(player_id): if not isinstance(player_id, int): # case 1 ... if player_id == 1: return {"name": "Slayer", "level": 128} if player_id == 2: return {"name": "Dorgoth", "level": 300} # case 2 ... If it is an int but doesn’t exist in your data (like 999), raise an exception type that means “I understood your request, but that player doesn’t exist”. What's the most suitable exception to raise in this instance? The best exception type in this scenario is a key error.
366
Unit testing (not python specific)
Unit testing is the process where you test the smallest functional unit of code. Software testing helps ensure code quality, and it's an integral part of software development. It's best practice to write software as small, functional units then write a unit test for each code unit. You can first write unit tests as code. Then, run that test code automatically every time you make changes in the software code. This way, if a test fails, you can quickly isolate the area of the code that has the bug or error. Unit testing enforces modular thinking paradigms and improves test coverage and quality. Automated unit testing helps ensure you or your developers have more time to concentrate on coding. --- A unit test is a block of code that verifies the accuracy of a smaller, isolated block of application code, typically a function or method. The unit test is designed to check that the block of code runs as expected, according to the developer’s theoretical logic behind it. The unit test is only capable of interacting with the block of code via inputs and captured asserted (true or false) output. A single block of code may also have a set of unit tests, known as test cases. A complete set of test cases cover the full expected behavior of the code block, but it’s not always necessary to define the full set of test cases.
367
AWS (not python specific)
(Amazon Web Services) Cloud Allows on-demand access to a vast range of IT services like computing power, storage, databases, analytics, and AI over the internet, allowing businesses to scale, innovate faster, and reduce costs by renting resources instead of owning physical data centers
368
infinity
float("-inf")
369
*=
Example: Complete the factorial() function. It should return the factorial of a given number. Script: def factorial(num): integer_list = [] factorial = 1 for i in range(1, num+1): integer_list.append(i) for integer in integer_list: factorial *= integer return factorial Explanation: factorial *= integer is equivalent to factorial = factorial * integer So it updates the factorial variable by multiplying its current value by integer, then storing the result back into factorial. Let's trace through your code with factorial(3) to see how it works: Start: factorial = 1 First iteration: integer = 1 factorial *= 1 # factorial = factorial * 1 = 1 * 1 = 1 Second iteration: integer = 2 factorial *= 2 # factorial = factorial * 2 = 1 * 2 = 2 Third iteration: integer = 3 factorial *= 3 # factorial = factorial * 3 = 2 * 3 = 6 Return 6 The key insight is that factorial keeps its value between iterations. It's not resetting to 1 each time through the loop. So you're building up a running product: Start with 1 Multiply by 1 → get 1 Multiply by 2 → get 2 Multiply by 3 → get 6 This is exactly how you calculate 3! = 3 * 2 * 1 = 6
370
Boolean Masking
# Instantiate a dictionary of planetary data. You can use pandas to do boolean masking. Boolean Masking: A filtering technique that overlays a Boolean grid onto a dataframe in order to select only the values in the dataframe that align with the True values of the grid. Example: # Instantiate a dictionary of planetary data. data = {'planet': ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune'], 'radius_km': [2440, 6052, 6371, 3390, 69911, 58232, 25362, 24622], 'moons': [0, 0, 1, 2, 80, 83, 27, 14] } df = pd.DataFrame(data) df output: moons planet radius_km 0 0 Mercury 2440 1 0 Venus 6052 2 1 Earth 6371 3 2 Mars 3390 4 80 Jupiter 69911 5 83 Saturn 58232 6 27 Uranus 25362 7 14 Neptune 24622 Use pd.DataFrame() function to convert dictionary to dataframe. planets = pd.DataFrame(data) planets Create a Boolean mask of planets with fewer than 20 moons. mask = planets['moons'] < 20 mask output: moons planet radius_km 0 0 Mercury 2440 1 0 Venus 6052 2 1 Earth 6371 3 2 Mars 3390 7 14 Neptune 24622 Apply the Boolean mask to the dataframe to filter it so it contains # only the planets with fewer than 20 moons. planets[mask] Define the Boolean mask and apply it in a single line. planets[planets['moons'] < 20] Boolean masks don't change the data. They're just views. planets #if you just call planets you'll get the original data prior to mask. You can assign a dataframe view to a named variable. moons_under_20 = planets[mask] moons_under_20 output: moons planet radius_km 0 0 Mercury 2440 1 0 Venus 6052 2 1 Earth 6371 3 2 Mars 3390 7 14 Neptune 24622 Example 2: # Create a Boolean mask of planets with fewer than 10 moons OR more than 50 moons. mask = (planets['moons'] < 10) | (planets['moons'] > 50) mask Apply the Boolean mask to filter the data. planets[mask] Example 3: # Create a Boolean mask of planets with more than 20 moons, excluding them if they # have 80 moons or if their radius is less than 50,000 km. mask = (planets['moons'] > 20) & ~(planets['moons'] == 80) & ~(planets['radius_km'] < 50000) Apply the mask planets[mask]
371
pandas logical operators & | tilda
& and tilda not (may look like an en dash but should be a tilda) | or Examples of not operator mask = (df['moons'] > 20) & ~(df['moons'] == 80) & ~(df['radius_km'] < 50000) df[mask] output: moons planet radius_km 5 83 Saturn 58232 which is the same result as: mask = (df['moons'] > 20) & (df['moons'] != 80) & (df['radius_km'] >= 50000) df[mask] output: moons planet radius_km 5 83 Saturn 58232 | or
372
Boolean Mask
A Boolean mask is a method of applying a filter to a dataframe. The mask overlays a Boolean grid over your dataframe in order to select only the values in the dataframe that align with the True values of the grid. To create Boolean comparisons, pandas has its own logical operators. These operators are: & (and) | (or) ~ (not) Each criterion of a multi-conditional selection statement must be enclosed in its own set of parentheses. With practice, making complex selection statements in pandas is possible and efficient.
373
groupby()
A pandas DataFrame method that groups rows of the dataframe together based on their values at one or more columns, which allows for further analysis of the groups. Examples: import numpy as np import pandas as pd Instantiate a dictionary of planetary data. data = {'planet': ['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune'], 'radius_km': [2440, 6052, 6371, 3390, 69911, 58232, 25362, 24622], 'moons': [0, 0, 1, 2, 80, 83, 27, 14], 'type': ['terrestrial', 'terrestrial', 'terrestrial', 'terrestrial', 'gas giant', 'gas giant', 'ice giant', 'ice giant'], 'rings': ['no', 'no', 'no', 'no', 'yes', 'yes', 'yes','yes'], 'mean_temp_c': [167, 464, 15, -65, -110, -140, -195, -200], 'magnetic_field': ['yes', 'no', 'yes', 'no', 'yes', 'yes', 'yes', 'yes'] } Use pd.DataFrame() function to convert dictionary to dataframe. planets = pd.DataFrame(data) planets The groupby() function returns a groupby object. planets.groupby(['type']) Apply the sum() function to the groupby object to get the sum # of the values in each numerical column for each group. planets.groupby(['type']).sum() Apply the sum function to the groupby object and select # only the 'moons' column. planets.groupby(['type']).sum()[['moons']] Group by type and magnetic_field and get the mean of the values # in the numeric columns for each group. planets.groupby(['type', 'magnetic_field']).mean() Group by type, then use the agg() function to get the mean and median # of the values in the numeric columns for each group. planets.groupby(['type']).agg(['mean', 'median']) Group by type and magnetic_field, then use the agg() function to get the # mean and max of the values in the numeric columns for each group. planets.groupby(['type', 'magnetic_field']).agg(['mean', 'max']) Define a function that returns the 90 percentile of an array. def percentile_90(x): return x.quantile(0.9) Group by type and magnetic_field, then use the agg() function to apply the # mean and the custom-defined `percentile_90()` function to the numeric # columns for each group. planets.groupby(['type', 'magnetic_field']).agg(['mean', percentile_90])
374
pandas, numerical aggregations: sum() median() mean() *numeric_only keyword argument
Note: Beginning with pandas v.1.5.0, numerical aggregations must have their numeric_only keyword argument set to True when used with a groupby)) operation or else pandas will throw an error if you data also contains non-numerical columns. Example: planets.groupby(['type']).sum(numeric_only=True)
375
pandas agg()
Short for "aggregate". A pandas groupby method that allows you to apply multiple calculations to groups of data.
376
MultiIndex
MultiIndex is a hierarchical indexing system in pandas that allows for more complex data manipulation and storage in DataFrames.
377
DataFrame
A DataFrame is a two-dimensional, size-mutable, potentially heterogenous tabular data structure with labeled axes (rows and columns).
378
groupby()
The groupby() function splits data into groups based on specified criteria, allowing for independent function applied to each group, and then combines the results into a data structure. When applied to a dataframe, the function returns a groupby object. This groupby object serves as the foundation for different data manipulation operations, including: -aggregation: computing summary statistics for each group -transformation: applying functions to each group and returning the modified data -filtration: selecting specific groups based on certain conditions -iteration: iterating over groups or values Example: clothes = pd.DataFrame({'type': ['pants', 'shirt', 'shirt', 'pants', 'shirt', 'pants'], 'color': ['red', 'blue', 'green', 'blue', 'green', 'red'], 'price_usd': [20, 35, 50, 40, 100, 75], 'mass_g': [125, 440, 680, 200, 395, 485]}) clothes output: color mass_g price_usd type 0 red 125 20 pants 1 blue 440 35 shirt 2 green 680 50 shirt 3 blue 200 40 pants 4 green 395 100 shirt 5 red 485 75 pants grouped = clothes.groupby('type') print(grouped) print(type(grouped)) output: grouped = clothes.groupby('type') grouped.mean() output: mass_g price_usd type pants 270.0 45.000000 shirt 505.0 61.666667 *Groups may also be created based on multiple columns: clothes.groupby(['type', 'color']).min() output: mass_g price_usd type color pants blue 200 40 red 125 20 shirt blue 440 35 green 395 50
379
What are groupby() and agg()
pandas Essential DataFrame methods that data professionals use to group, aggregate, summarize, and better understand data.
380
pandas, groupby() How do you return the number of observations in each group?
use the size() method. Example: Use the clothes dataset and group by type and color and then return the number of observations: clothes.groupby(['type', 'color']).size() output: type color pants blue 1 red 2 shirt blue 1 green 2 dtype: int64
381
built-in aggregate functions (some require numpy or pandas)
count(): The number of non-null values in each group sum(): The sum of values in each group mean(): The mean of values in each group median(): The median of values in each group min(): The minimum value in each group max(): The maximum value in each group std(): The standard deviation of values in each group var(): The variance of values in each group
382
agg()
Useful when you want to apply multiple functions to a dataframe at the same time. It's a method that belongs to the dataframe class. It stands for "aggregate". Its most important parameters: -func: the function to be applied -axis: the axis over which to apply the function (default=0) Example: clothes output: color mass_g price_usd type 0 red 125 20 pants 1 blue 440 35 shirt 2 green 680 50 shirt 3 blue 200 40 pants 4 green 395 100 shirt 5 red 485 75 pants clothes[['price_usd', 'mass_g']].agg(['sum', 'mean']) output: price_usd mass_g sum 320.00 2325.0 mean 53.33 387.5 Note: The two columns are subset from the dataframe before applying the agg() method. If you don’t subset the relevant columns first, agg() will attempt to apply sum() and mean() to all of the columns, which wouldn’t work because some columns contain strings. (Technically, sum() would work, but it would return something useless because it would just combine all the strings into one long string.) The sum() and mean() functions are entered as strings in a list, without their parentheses. This will work for any built-in aggregation function
383
MultiIndex
You might have noticed that, when functions are applied to a groupby object, the resulting dataframe has tiered indices. This is an example of MultiIndex. MultiIndex is a hierarchical system of dataframe indexing. It enables you to store and manipulate data with any number of dimensions in lower dimensional data structures such as series and dataframes. This facilitates complex data manipulation. This course will not require any deep knowledge of hierarchical indexing, but it’s helpful to be familiar with it. Consider the following example: grouped = clothes.groupby(['color', 'type']).agg(['mean', 'min']) grouped output: mass_g price_usd mean min mean min color type blue pants 200.0 200 40.0 40 shirt 440.0 440 35.0 35 green shirt 537.5 395 75.0 50 red pants 305.0 125 47.5 20 Note: Notice that color and type are positioned lower than the column names in the output. This indicates that color and type are no longer columns, but named row indices. Similarly, notice that price_usd and mass_g are positioned above mean and min in the output of column names, indicating a hierarchical column index.
384
How do you perform selection on a dataframe with a MultiIndex?
use loc[] selection and put indices in parentheses. Examples: grouped output: mass_g price_usd mean min mean min color type blue pants 200.0 200 40.0 40 shirt 440.0 440 35.0 35 green shirt 537.5 395 75.0 50 red pants 305.0 125 47.5 20 To select a first-level (top) column: grouped.loc[:, 'price_usd'] output: mean min color type blue pants 40.0 40 shirt 35.0 35 green shirt 75.0 50 red pants 47.5 20 To select a second-level (bottom) column: grouped.loc[:, ('price_usd', 'min')] output: color type blue pants 40 shirt 35 green shirt 50 red pants 20 Name: (price_usd, min), dtype: int64 To select first-level (left-most) row: grouped.loc['blue', :] output: mass_g price_usd mean min mean min type pants 200.0 200 40.0 40 shirt 440.0 440 35.0 35 To select a bottom-level (right-most) row: grouped.loc[('green', 'shirt'), :] output: mass_g mean 537.5 min 395.0 price_usd mean 75.0 min 50.0 Name: (green, shirt), dtype: float64 To select individual values: grouped.loc[('blue', 'shirt'), ('mass_g', 'mean')] output: 440.0 If you want to remove the row MultiIndex from a groupby result, include as_index=False as a parameter to your groupby() statement: clothes.groupby(['color', 'type'], as_index=False).mean() output: color type mass_g price_usd 0 blue pants 200.0 40.0 1 blue shirt 440.0 35.0 2 green shirt 537.5 75.0 3 red pants 305.0 47.5 Note: Notice how color and type are no longer row indices, but named columns. The row indices are the standard enumeration beginning from zero.
385
concat()
A pandas function that combines data either by adding it horizontally as new columns for existing rows or vertically as new rows for existing columns. Works well when you have dataframes that have identically formatted data and just needs to be combined vertically (i.e., you want to add rows/extend axis 0)
386
Axis 1 vs Axis 0
# Instantiate a dictionary of planetary data. Axis 1: runs horizontally across columns (i.e., to add new columns, you want to extend axis 1) Axis 0: runs vertically over rows (i.e., to add new rows, you want to extend axis 0) Example: Add rows for Jupiter, Saturn, Uranus, and Neptune Current dataframe: data = {'planet': ['Jupiter', 'Saturn', 'Uranus', 'Neptune'], 'radius_km': [69911, 58232, 25362, 24622], 'moons': [80, 83, 27, 14], } # Use pd.DataFrame() function to convert dictionary to dataframe. df2 = pd.DataFrame(data) df2 output: planet radius_km moons 0 Jupiter 69911 80 1 Saturn 58232 83 2 Uranus 25362 27 3 Neptune 24622 14 df3 = pd.concat([df1, df2], axis=0) df3 output: planet radius_km moons 0 Jupiter 69911 80 1 Saturn 58232 83 2 Uranus 25362 27 3 Neptune 24622 14 0 Jupiter 69911 80 1 Saturn 58232 83 2 Uranus 25362 27 3 Neptune 24622 14 *Note each row retains its index from the original dataframe. To adjust the index numbering so that it restarts, use the drop=True because otherwise a new index will be added to the dataframe: df3 = df3.reset_index(drop=True) df3 output: planet radius_km moons 0 Jupiter 69911 80 1 Saturn 58232 83 2 Uranus 25362 27 3 Neptune 24622 14 4 Jupiter 69911 80 5 Saturn 58232 83 6 Uranus 25362 27 7 Neptune 24622 14
387
merge()
To add data to a dataframe horizontally (i.e., to add columns/extend axis 1), use merge() A pandas function that joins two dataframes together; it only combines data by extending along axis one horizontally. Note: Both datasets must share keys in order to connect Example:
388
Keys
The shared points of reference between different dataframes--what to match on
389
Joins available for dataframes
# Inner merge retains only keys that appear in both dataframes. "inner join": only the keys in both dataframes are included. "outer join": all keys from both dataframes included. "left join: all keys in left are included even if they aren't in the right dataframe. "right join": all the keys in the right are included even if they aren't in the left dataframe. Use pd.merge() to combine dataframes. inner = pd.merge(df3, df4, on='planet', how='inner') inner Use pd.merge() to combine dataframes. # Outer merge retains all keys from both dataframes. outer = pd.merge(df3, df4, on='planet', how='outer') outer Use pd.merge() to combine dataframes. # Left merge retains only keys that appear in the left dataframe. left = pd.merge(df3, df4, on='planet', how='left') left Use pd.merge() to combine dataframes. # Right merge retains only keys that appear in right dataframe. right = pd.merge(df3, df4, on='planet', how='right') right Note: If a cell is missing information, it will list NaN (Not a Number)
390
pandas
A powerful python library built on top of NumPy that’s used to manipulate and analyze tabular data
391
NumPy
An essential library that contains multidimensional array and matrix data structures and functions to manipulate them
392
Data tidying
Structuring datasets to facilitate analysis
393
Tidy dataset
-Easy to manipulate, model, and visualize -Each variable is a column -Each observation is a row -Each type of observational unit is a table
394
Unit Testing
The process where you test the smallest functional unit of code. Note: If you're a programmer, you'll spend a lot of time writing your own unit tests. When you're doing your own testing, you'll usually have a library for writing and running tests. (Though it is okay to write tests from scratch sometimes.)
395
test-driven development (TDD)
1. Stub out a function 2. Write the tests that expect the correct behavior 3. Run the tests again (they should fail) 4. Write the function and keep updating it until it passes the tests "stub out": create a temporary, minimal placeholder implementation that allows a program to compile and run before the actual, complex code is written.
396
Pros & Cons to Python
Pros: -Easy to read and write - Python reads like plain English. Due to its simple syntax, it's a great choice for implementing advanced concepts like AI. This is arguably Python's best feature. -Popular - According to the Stack Overflow Developer Survey, Python is the 4th most popular coding language in 2025. -Free - Python, like many languages nowadays, is developed under an open-source license. It's free to install, use, and distribute. -Portable - Python written for one platform will work on any other platform. -Interpreted - Code can be executed as soon as it's written. Because it doesn't need to take a long time to compile like Java, C++, or Rust, releasing code to production is typically faster. Cons: -The code needs to run fast. Python code executes very slowly, which is why performance critical applications like PC games aren't written in Python. -The codebase will become large and complex. Due to its dynamic type system, Python code can be harder to keep clean of bugs. -The application needs to be distributed directly to non-technical users. They would have to install Python in order to run your code, which would be a huge inconvenience.
397
Which version of python should you use?
3! Python 2 and 3 are similar, but Python 3 contains significant changes that are not backward compatible with the 2.x versions.
398
Two main kinds of distinguishable errors in python:
1) Syntax Errors: Python interpreter tells you that your code isn't adhering to proper Python syntax. 2) Exceptions: errors detected during execution are called "exceptions" and can be handled by your code. You can even raise your own exceptions when bad things happen in your code.
399
try-except pattern
Python uses a try-except pattern for handling errors. Example: try: 10 / 0 except Exception: print("can't divide by zero") The try block is executed until an exception is raised or it completes, whichever happens first. In this case, an exception is raised because division by zero is impossible. The except block is only executed if an exception is raised in the try block. If we want to access the data from the exception, we use the following syntax: try: 10 / 0 except Exception as e: print(e) prints "division by zero" try: 10 / 0 except Exception as e: print(e) prints "division by zero" The try block is executed until an exception is raised or it completes, whichever happens first. In this case, a "divide by zero" error is raised because division by zero is impossible. The except block is only executed if an exception is raised in the try block. It then exposes the exception as data (e in our case) so that the program can handle the exception gracefully without crashing.
400
When something in our own code happens that isn't the "happy path", we should ___
raise our own exceptions. For example, if someone passes some bad inputs to a function we write, we should not be afraid to raise an exception to let them know they did something wrong. An error or exception is raised when something bad happens, but as long as our code handles it as users expect it to, it's not a bug. A bug is when code behaves in ways our users don't expect it to. For example, if a player tries to forge a sword out of a metal bar, we might stop that from happening by using raise to prevent a bug. If the game doesn't have certain items, such as a gold sword, then players shouldn't be able to craft a sword from gold bars even though gold bars do exist. def craft_sword(metal_bar): if metal_bar == "bronze": return "bronze sword" if metal_bar == "iron": return "iron sword" if metal_bar == "steel": return "steel sword" raise Exception("invalid metal bar") We prevent a bug by raising an exception. This exception prevents other developers who use the craft_sword function from creating items that don't exist in our game. raise stops the program from executing and forces the exception to be handled.
401
Bug vs an error or exception
An error or exception is raised when something bad happens, but as long as our code handles it as users expect it to, it's not a bug. A bug is when code behaves in ways our users don't expect it to. For example, if a player tries to forge a sword out of a metal bar, we might stop that from happening by using raise to prevent a bug. If the game doesn't have certain items, such as a gold sword, then players shouldn't be able to craft a sword from gold bars even though gold bars do exist. def craft_sword(metal_bar): if metal_bar == "bronze": return "bronze sword" if metal_bar == "iron": return "iron sword" if metal_bar == "steel": return "steel sword" raise Exception("invalid metal bar") We prevent a bug by raising an exception. This exception prevents other developers who use the craft_sword function from creating items that don't exist in our game. raise stops the program from executing and forces the exception to be handled.
402
numpy, pandas, .tail
The .tail(n) method returns the last n rows of a DataFrame or Series. By default, it returns the bottom 5 rows. Example: import pandas as pd import numpy as np Create a sample DataFrame df = pd.DataFrame(np.arange(10, 20).reshape(-1, 1), columns=['A']) Display the first 3 rows print("Head (first 3 rows):") print(df.head(3)) Display the last 3 rows print("\nTail (last 3 rows):") print(df.tail(3))
403
Not python-specific Data Measurement Chart
Unit Equivalent Real-Life Example Bit (b) Single binary digit (1 or 0) Answer to a simple yes/no question Byte (B) 8 bits A single text character (e.g., 'A', '?', '5') Kilobyte (KB) 1,024 bytes A single page of plain text, or a simple email without attachments Megabyte (MB) 1,024 KB A three-minute MP3 song, a high-resolution smartphone photo, or a 200-page book of plain text Gigabyte (GB) 1,024 MB A standard-definition movie (around 1.5–2.5 GB), or roughly 1,000 photos Terabyte (TB) 1,024 GB A modern computer hard drive size, or enough space for about 250 full HD movies Petabyte (PB) 1,024 TB The data processing amount for large companies like Google servers, or the human brain's memory capacity Exabyte (EB) 1,024 PB The approximate amount of data that has traversed the internet since its creation Zettabyte (ZB) 1,024 EB An estimate of all the data generated worldwide in a given year
404
Formatting Rules
If a variable is used across the whole function (e.g., function-wide accumulator) → define near the top. If it only lives inside one loop/section → define it inside that loop/section. Example: def analyze_damage(rounds, min_hit): #function-wide accumulators total_damage_from_big_hits = 0 max_round_damage = 0 rounds_with_crit = 0 # use total_hits only inside this loop for round_hits in rounds: total_hits = 0 # ...
405
Practice Problem: ''' Analyze Arena Damage Complete the analyze_damage function. You are analyzing battle logs from an arena. Each battle is split into rounds. Every round is a list of integers, where each integer is the damage of a single hit in that round. You must use loops to walk through this nested list and calculate some statistics. Function Details Implement: def analyze_damage(rounds, min_hit): # your code here rounds is a list of rounds Each round is a list of integers Example: [[3, 10, 5], [0, 0], [8]] min_hit is an integer threshold Return a tuple with three values, in this order: total_damage_from_big_hits Add up all hits in all rounds that are greater than or equal to min_hit. max_round_damage For each round, add up all hits in that round. max_round_damage is the largest of these round totals. If there are no rounds at all (empty list), this should be 0. rounds_with_crit Count how many rounds contain at least one hit that is greater than or equal to 2 * min_hit. A round counts once even if it has multiple such hits. So the function should return: (total_damage_from_big_hits, max_round_damage, rounds_with_crit) '''
def analyze_damage(rounds, min_hit): # Assigning function-wide accumulators total_damage_from_big_hits = 0 max_round_damage = 0 rounds_with_crit = 0 # 1. total_damage_from_big_hits for round_hits in rounds: for hit in round_hits: if round_hits != [] and hit >= min_hit: total_damage_from_big_hits += hit # 2. max_round_damage for round_hits in rounds: total_hits = 0 if round_hits == []: total_hits = 0 else: for hit in round_hits: total_hits += hit if total_hits > max_round_damage: max_round_damage = total_hits # 3. rounds_with_crit for round_hits in rounds: for hit in round_hits: if hit >= 2 * min_hit: rounds_with_crit += 1 break # Assigning tuple my_tuple = (total_damage_from_big_hits, max_round_damage, rounds_with_crit) return my_tuple
406
pandas question: When do you use numeric_only=True?
Mental model to keep: Aggregation functions fall into two buckets: Counts (type-agnostic): -count -size Math-based (numeric only): -mean -median -std -var If it’s math → select numeric columns first or use numeric_only=True. Notes: -numeric_only=True prevents pandas from trying to take medians of strings -pandas aggregates before column selection
407
When to use pandas (pd) vs seaborn (sns) vs matplotlib.pyplot (plt)
Use pandas when you’re… -Cleaning: missing values, types, new columns -Reshaping: groupby, agg, pivot_table, melt -Filtering/sorting: boolean masks, query, sort_values -Producing the “plot-ready” dataframe (often 2–3 columns) **Tell: If you’re answering “what should be on the x/y axis?” you’re in pandas. Use seaborn when you want… -A quick, good-looking plot from a dataframe -“Statistical” defaults: confidence intervals, aggregation, distributions -Common chart types with minimal code: -barplot, lineplot, scatterplot -histplot, kdeplot, boxplot, violinplot -heatmap **Tell: If you can describe your plot as “relationship between variables,” seaborn is usually the fastest. Use matplotlib when you need… -Precise formatting and layout control: -figsize, titles, labels, ticks/rotation -axis limits, annotations, text placement -subplots and multi-panel layouts -saving figures (plt.savefig) -Custom plots seaborn doesn’t cover well -Finishing touches (even if seaborn made the chart) **Tell: If you’re saying “I want it to look exactly like…” you’re in matplotlib.
408
N/A or NaN or [blank]
NaN: Not a number All of these things stand for missing data. Missing data (or null values): A value that is not stored for a variable in a set of data.
409
zero in a dataset means what?
A zero (0) could be considered a missing value, but in other datasets, it could be a legitimate data point.
410
What should you do with missing data?
Options: -Request that the missing values be filled in by the owner of the data. This is the best method if there are large amounts of data missing. -Delete the missing column(s), row(s), or value(s). This works best if the total count of missing data is relatively low or the values won't impact the business plan. Be careful to not discard data that is not missing at random; deleting values that have been left blank intentionally can skew the results. For example, if you had 100 people answer 10 questions and only 33% answered question 7, you'd be deleting the majority of the data if you deleted the 67% that did not answer that question. -Create a NaN category This is a good strat if the missing data is categorical rather than numerical. For example, if you put all non-responses to a question into a category called "answer not recorded" -Derive new representative value(s), such as taking the median or average of the values that aren't missing. More useful with business plans that call for a predictive value or forecast. 4 most common tactics: -Forward Filling -Backward Filling (backfilling) -Deriving mean values -Deriving median values
411
How to identify duplicates in a dataset? That is, how do you identify entire rows in a dataset that have exactly matching values?
Example: print(df) print() #This provides a blank line in between df and identified duplicated results print(df.duplicated()) output: brand style rating 0 Wowyow cistern 4.0 1 Wowyow cistern 4.0 2 Splaysh jug 5.5 3 Splaysh stock 3.3 4 Pipplee stock 3.0 0 False 1 True 2 False 3 False 4 False dtype: bool Note: The duplicated() function will only return ENTIRE rows that have exactly matching values, not just individual matching values found within a column
412
How do you identify duplicates for only one column or a series of columns within a dataframe? How to you specify which of the duplicates to keep as the "original"?
Below is an example of identifying duplicates in only one column (subset) of values and labeling the last duplicates as “false,” so that they are “kept”: print(df) print() print(df.duplicated(subset=['type'], keep='last')) output: color rating type 0 olive 9.0 rinds 1 olive 9.0 rinds 2 gray 4.5 pellets 3 salmon 11.0 pellets 4 salmon 7.0 pellets 0 True 1 False 2 True 3 True 4 False dtype: bool
413
Using pandas, how do you create a new dataframe with all of the duplicate rows removed?
drop_duplicates() Example: df output: brand style rating 0 Wowyow cistern 4.0 1 Wowyow cistern 4.0 2 Splaysh jug 5.5 3 Splaysh stock 3.3 4 Pipplee stock 3.0 df.drop_duplicates() output: brand style rating 0 Wowyow cistern 4.0 2 Splaysh jug 5.5 3 Splaysh stock 3.3 4 Pipplee stock 3.0 Note: drop_duplicates() function as written above will only drop duplicates of exact matches of entire rows of data. If you wish to drop duplicates within a single column, you will need to specify which columns to check for duplicates using the subset keyword argument.
414
How do you drop duplicates within a single column as opposed to droping duplicates of exact matches of entire rows of data?
you will need to specify which columns to check for duplicates using the subset keyword argument. Example 1: Drop all rows that have duplicate values in the "style" column. print(df) df = df.drop_duplicates(subset='style') print() print(df) output: brand style rating 0 Wowyow cistern 4.0 1 Wowyow cistern 4.0 2 Splaysh jug 5.5 3 Splaysh stock 3.3 4 Pipplee stock 3.0 brand style rating 0 Wowyow cistern 4.0 2 Splaysh jug 5.5 3 Splaysh stock 3.3 Example 2: Drop all rows (except the first occurrence) that have duplicate values in BOTH the "style" and "rating" columns. print(df) df = df.drop_duplicates(subset=['style', 'rating']) print() print(df) output: brand style rating 0 Wowyow cistern 4.0 1 Wowyow cistern 4.0 2 Splaysh jug 5.5 3 Splaysh stock 3.3 4 Pipplee stock 3.0 brand style rating 0 Wowyow cistern 4.0 2 Splaysh jug 5.5 3 Splaysh stock 3.3 4 Pipplee stock 3.0
415
data_frame.describe() gives you what?
Count: The number of non-null (non-missing) rows in each column. mean: arithmetic average of each column. std: stand deviation of each column min: smallest value in a col max: largest value in a col 25%: first quartile: 25% of observations in this column are less than or equal to this value. Example: You have a dataset with the number of lightening strikes. The number of strikes: 25% = 3. So, 25% of rows have 3 or fewer strikes. 50%: second quartile. 50% of observations in this column are less than or equal to this value. Example: You have a dataset with the number of lightening strikes. The number of strikes: 50% = 6. So half the data is less than or equal to 6 strikes. And the other half is greater than or equal to 6 strikes. 75%: third quartile. 75% of observations in this column are less than or equal to this value. Example: You have a dataset with the number of lightening strikes. The number of strikes: 75% = 21. So, 75% of the data is less than or equal to 21 strikes. Min (1) Q1(2) Median(6) Q3(21) Max(2211) Note: So, the quartiles are cut-off values.
416
How do you import a csv using pandas?
pd.read_csv(r'file_name') Example: df = pd.read_csv(r'C:\Users\kasey\Desktop\Cou Files\Go Beyond the Numbers\Module 3\lab1\eda_missing_data_dataset1.csv')
417
What do you do before joining two datasets?
First step: Check the shapes of each dataset. dataset_name.shape Example: df.shape
418
How do you left join two datasets?
Example: df = dataset1 df_zip = dataset2 df_joined = df.merge(df_zip, how='left', on=['date', 'center_point_geom'])
419
What do you do in pycharm if you only see some of the columns in a dataset you're viewing?
pd.set_option('display.max_columns', None)
420
what should you do after joining two datasets?
Check the summary of a dataset/check the descriptive stats. Example: df_joined = joined datasets df_joined.describe()
421
How do you determine how many rows are missing values?
Step 1: Pick a column that has NaNs (in this case, we'll use 'state_code_values') and determine how many null values it has. Example: df_joined = name of joined datasets df_null_geo = df_joined[pd.isnull(df_joined.state_code)] df_null_geo.shape output: 393,830 Step 2: Confirm that df_null_geo is counting only the rows with the missing state_code values. df_joined.info() output: RangeIndex: 717530 entries, 0 to 717529 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 717530 non-null object 1 center_point_geom 717530 non-null object 2 longitude 717530 non-null float64 3 latitude 717530 non-null float64 4 number_of_strikes_x 717530 non-null int64 5 zip_code 323700 non-null float64 6 city 323700 non-null object 7 state 323700 non-null object 8 state_code 323700 non-null object 9 number_of_strikes_y 323700 non-null float64 dtypes: float64(4), int64(1), object(5) memory usage: 54.7+ MB None Takeaway: Some of the rows have a non-null count of 323700. Others have a count of 717530. 717530-323700 is 393,830 (the number we got when we did is null for state_code values. # so yes, there are 393,830 rows missing data.
422
In pandas, every dataframe has two different things: columns and index. Describe both.
Columns: The data you analyze Index: The labels that identify each row Note: When something becomes the index, it stops being treated as ordinary data and becomes the row label instead.
423
What happens when you do groupby in pandas?
The grouped columns (A, B in the below example) become the index. This will happen with ANY groupby aggregation (e.g., sum, mean, count, etc.) If you use groupby, it will be triggered. Note: When something becomes the index, it stops being treated as ordinary data and becomes the row label instead. Columns: The data you analyze Index: The labels that identify each row Example: df = dataframe A & B: selected columns df.groupby(['A', 'B']).agg(...) TAKEAWAY: So, if you use "groupby", you must tack on .reset_index() at the end of you code.
424
The complete list of actions that cause columns to become the index in pandas
Note: When something becomes the index, it stops being treated as ordinary data and becomes the row label instead. Columns: The data you analyze Index: The labels that identify each row List: groupby() -grouping keys (columns) become the index; it's a multindex if more than one key set_index() pivot() pivot_table() resample() stack() unstack() xs() TAKEAWAY: If you want the index to turn back into a column at the end of your code, you must do something like .reset_index() at the end of your code.
425
Why is it necessary to do .reset_index after you use groupby() or some similar function that converts columns to an index?
You'll need the columns to be columns if you try to access the data in those columns again. Or if you merge/join your dataframe with another one. Or plot or export Or write "clean" analysis code -rule of thumb for most data analysts: .reset_index after groupby().agg unless you explicitly want a hierarchical order
426
Indexes vs Columns (put simply)
Think of indexes as labels and columns as data
427
plotly.express How to import & what is it used for?
import plotly.express as px Often will want to import express in order to reduce the size of the db; otherwise it could break Use Cases: -Interactive Visualizations: Creates graphs that users can explore directly in their browser or application, unlike static plots. -Versatile Chart Types: Offers a vast array of charts, from basic bar and scatter plots to complex 3D graphs, statistical charts, and maps.
428
How do you reduce the size of a db to prevent it from breaking?
Example: dataframe: top_missing columns: number_of_strikes_x, latitude,longitude, number_of_strikes_x" import plotly.express as px fig = px.scatter_geo(top_missing[top_missing.number_of_strikes_x>=300], # Input Pandas DataFrame lat="latitude", # DataFrame column with latitude lon="longitude", # DataFrame column with latitude size="number_of_strikes_x") # Set to plot size as number of strikes fig.update_layout( title_text = 'Missing data', # Create a Title ) fig.show() Note: to narrow in on one part of geographical map, add geo_scope. See updated code below. import plotly.express as px fig = px.scatter_geo(top_missing[top_missing.number_of_strikes_x>=300], lat="latitude", lon="longitude", size="number_of_strikes_x") fig.update_layout( title_text = 'Missing data', geo_scope='usa' ) print(fig.show())
429
Organize your code. One option:
~/workspace/USERNAME/PROJECTNAME
430
non-null count
The total number of data entries for a data column that are not blank.
431
plotly.express
A package that speeds up coding by doing a lot of the backend work. Without it, run times can be long or code can break. import plotly.express as px
432
BoxPlot: Calculate the 25th percentile, 75th percentile, and IQR
Reminder: IQR is interquartile range. It looks at the spread of the middle 50% of the data (i.e., Q3 - Q1). Example: # Calculate 25th percentile of annual strikes percentile25 = df['number_of_strikes'].quantile(0.25) Calculate 75th percentile of annual strikes percentile75 = df['number_of_strikes'].quantile(0.75) Calculate interquartile range iqr = percentile75 - percentile25 Calculate upper and lower thresholds for outliers upper_limit = percentile75 + 1.5 * iqr lower_limit = percentile25 - 1.5 * iqr print('Lower limit is: '+ readable_numbers(lower_limit))
433
with block .read() method
A with block can be used to open a file: with open(path_to_file) as f: # do something with f (the file) here You can use the .read() method to read the contents of a file into a string. f is a file object file_contents = f.read() Example: # opens and reads book def get_book_text(book_file_path): with open(book_file_path) as f: read_book = f.read() return read_book prints/displays book text def main(): text = get_book_text("frankenstein.txt") print(text) main() Note: Python needs to read the book before it can print it. The first section of code opens the file on disck (frankenstein.txt), reads all of its contents into memory as a string, and returns that string to whoever CALLS it. Think of it as "go fetch the book and hand it to me.: The second section of code calls the get_book_text, receives the returned string, and prints that string to the screen. Think of it as "take the book handed to me and display it"
434
What does .apply() do in the following code: def str_to_num(x): x = x.strip('$B') x = int(x) return x df_companies['Valuation'].apply(str_to_num)
It tells Python to take each individual value in the Valuation column, pass it one at a time into str_to_num(), and collect the results into a new Series. This is necessary as .strip() is a string method, not a series method. As the Valuation column is a column and not a string, we need .apply. Remember: *Functions you write usually work on ONE value. *apply() is how you scale them to a whole column.
435
.isna() vs any()
isna() asks: Is this value missing? If yes, the value is missing: True If no, the value is not missing: False any() asks: does this value count as something? If yes, the values counts as something (I.e., a, 10, etc.): True If no, the value doesn't count as something (i.e., NaN, 0, False): False
436
df: A B C 0 0 a 10 1 False 0 1 2 NaN NaN NaN axis = 0 vs axis = 1
axis = 0 goes down each column A = 0, False, NaN B = a, 0, NaN C = 10, 1, NaN axis = 1 goes across each column 0 = 0, a, 10 1 = False, 0, 1 2 = NaN, NaN, NaN
437
Documentation String or Docstring
A line of text following a method or function that is used to explain to others, using your code, what this method or function does. A docstring represents good documentation practice in Python
438
df.info
a pandas dataframe method that returns a concise summary of the dataframe, including a non-null count, which helps you know the number of missing values.' Example: df.info()
439
pd.isna() or pd.isnull
a pandas function that returns a same-size Boolean array indicating whether each value is null. Note that this function also exists as a dataframe method Example: pd.isnull(df) output: Planet radius_km moons 0 false false true 1 false false true So: NaN = true any other value = false
440
pd.notna() or pd.notnull()
A pandas function that returns a same-sized Boolean array indicating whether each value is NOT null. Note: this function also exists as a dataframe method So: NaN = False Any value = true
441
df.fillna()
A pandas dataframe method that fills in missing values using a specified method. Example: df.fillna(2) output: All NaN values will be replaced with "2.0"
442
df.replace()
A pandas dataframe method that replaces specified values with other specified values. Note: This can also be applied to pandas series. Example: df.replace('Aves', 'bird') output: all instances of "Aves" will be replaced with "bird"
443
df.dropna()
A pandas dataframe method that removes rows or columns that contain missing values, depending on the axis you specify. Example: Original df: animal class color legs 0 NaN Aves red 2 1 gecko Reptilia NaN 4 code: print(df.dropna(axis=1) output: class leg 0 Aves 2 1 Reptilia 4
444
df.describe()
A pandas dataframe method that returns general statistics (e.g., count, mean, std, min, max,, 25%, 50%, 75%) about the dataframe which can help determine outliers
445
standard deviation
A standard deviation (or σ) is a measure of how dispersed the data is in relation to the mean.
446
sns.boxplot()
A seaborn function that generates a boxplot. Data points beyond 1.5x the interquartile range are considered outliers.
447
df.astype()
A pandas dataframe method that allows you to encode its data as a specified dtype. Note: this method can also be used on series objects.
448
Series Object
In Python, a Series object is a primary data structure within the pandas library that represents a one-dimensional labeled array. It can be thought of as a single column in a spreadsheet or a single variable in a database table.
449
df.astype()
A pandas dataframe method that allows you to encode its data as a specified dtype. Note: this object can also be used on series objects. Example: df['class'] = df['class'].astype('category') print(df.dtypes) output: class will now have a data type of category
450
dataframe
A DataFrame is a fundamental 2-dimensional data structure, like a spreadsheet or SQL table, organizing data into labeled rows and columns
451
pd.get_dummies()
A function that converts categorical values into new binary columns--one for each category
451
Series.cat.codes
A pandas series attribute that returns the numeric category codes of the series. Example; Series.cat.codes in pandas converts categorical string labels into numerical codes (integers), mapping each unique category to an index, which is useful for machine learning; for example, a Series with ['red', 'blue', 'red'] might become [0] if 'blue' is the first category (0) and 'red' the second (1)
452
LabelEncoder()
a transformer from the scikit-learn.preprocessing that encodes specified categories or labels with numeric codes. Note: when building predictive models, it should only be used on target variables (i.e., y data)
453
Input validation
The practice of thoroughly analyzing and double-checking to make sure data is complete, error-free, and high-quality
454
Questions to ask while validating data
-Are all entries in the same format? (e.g., if using ages: thirty-five and 35 are diff formats) -Are all entries in the same range (e.g., if using financial data, are all numbers in billions of euros or thousands?) -Are the applicable data entries expressed in the same data type (e.g., if you have dates is it month-day-year?)
455
Joining
The process of augmenting data by adding values from other datasets
456
Input validation
The practice of thoroughly analyzing and double-checking to make sure data is complete, error-free, and high-quality
457
set()
Built-in python function The set() function in Python creates a set object, which is an unordered collection of unique and hashable elements. It is primarily used for removing duplicates from a sequence, performing fast membership tests, and executing mathematical set operations like union and intersection.
458
What's a good way to remove duplicates from a sequence?
set() Built-in python function The set() function in Python creates a set object, which is an unordered collection of unique and hashable elements. It is primarily used for removing duplicates from a sequence, performing fast membership tests, and executing mathematical set operations like union and intersection.
459
pd.qcut
pandas. Quantile-based discretization function. Discretize variable into equal-sized buckets based on rank or based on sample quantiles. For example 1000 values for 10 quantiles would produce a Categorical object indicating quantile membership for each data point. pandas.qcut(x, q, labels=None, retbins=False, precision=3, duplicates='raise') Parameters : x 1d ndarray or Series Input Numpy array or pandas Series object to be discretized. q int or list-like of float Number of quantiles. 10 for deciles, 4 for quartiles, etc. Alternately array of quantiles, e.g. [0, .25, .5, .75, 1.] for quartiles. labels array or False, default None Used as labels for the resulting bins. Must be of the same length as the resulting bins. If False, return only integer indicators of the bins. If True, raises an error. retbins bool, optional Whether to return the (bins, labels) or not. Can be useful if bins is given as a scalar. precision int, optional The precision at which to store and display the bins labels. duplicates {default ‘raise’, ‘drop’}, optional If bin edges are not unique, raise ValueError or drop non-uniques. Example: Convert numerical data to categorical data Sometimes, you'll want to simplify a numeric column by converting it to a categorical column. To do this, one common approach is to break the range of possible values into a defined number of equally sized bins and assign each bin a name. In the next step, you'll practice this process. Create a High Valuation column The data in the Valuation column represents how much money (in billions, USD) each company is valued at. Use the Valuation column to create a new column called High Valuation. For each company, the value in this column should be low if the company is in the bottom 50% of company valuations and high if the company is in the top 50%. Create new `High Valuation` column YOUR CODE HERE ### Use qcut to divide Valuation into 'high' and 'low' Valuation groups companies['High Valuation'] = pd.qcut(companies['Valuation'], 2, labels = ['low', 'high'])
460
What is python used for?
Backend Development Automation Machine Learning DevOps Cloud Engineering
461
DevOps
It's how developers operate. Code>Test>Deploy and Maintain the pipelines>interact with end users and create bug reports, bug fixes, and feature requests. *Sometimes you have different people handle these steps, but often, on smaller teams, a dev does all of these things. Common image: Infinity symbol with: plan>code>build>test>release>deploy>operate>monitor>plan>(reloops) Plan (dev team): Jira/Asana/uTrack & plan out your project Code (dev team): This is the stage where the build of your entire pipeline for this process begins. You'll be using Git for version control (in the form of GitHub or GitLab or BitBucket) Pull Requests (PRs) are built in & Code review tools to review said PRs. And you code using IDEs and code editors. Build (dev team): When you push code to a version control system, it has to build. Code gets compiled (or if it's an interpretive language like Java it gets prepared for execute). Tools like Graal and Maven (Apache Maven) come into play. They build the software. Automated tests kick in: unit tests, integration tests, system tests happen here. Idea is to ensure that the code base works as expected and works together. Once the tests pass. Test (QA): QA will try to break it. Use it as a user and then also try to break it. Maybe there are weird workflows, whatever. Goal is for them to find the issue before the actual user does. Release: Deploy: Operate: Monitor: (then based on the feedback, you restart with the plan step and fix bugs or add features, etc.)
462
Cloud Engineering
463
Machine Learning
464
Automation
465
Backend Development
466
Pull Request (PRs)
GitHub pull requests (PRs) are a built-in collaboration feature of the GitHub platform. Serve as a mechanism to propose, discuss, review, and merge code changes from a feature branch into a target branch.
467
CI/CD
Continuous Integration/Continuous Deployment
468
how do you access an element in a list that is one position before the current index?
Example: Use the list items: items[i - 1]
469
Good rule of thumb for thinking about outer loop, inner loop, and variable placement
Outer Loop: Represents the "overall" task (e.g., "For every player..."). Inner Loop: Represents the "sub-task" for that specific item (e.g., "...compare them to everyone else"). Variable Placement: If a variable should start fresh for every "overall" task, it belongs inside the outer loop. If it should track something across the entire function, it belongs outside both loops.
470
Nested Loops:
Any problem that requires comparing every item in a list to every other item in that same list
471
Matrix Traversal:
This involves lists of lists (like a grid or a game board). You use one loop for the rows and another for the columns.
472
Frequency Counting:
Problems that ask you to count occurrences of specific items within a collection.
473
Find Pairs:
Challenges that ask you to find two numbers that sum to a specific target.
474
Type Error vs Value Error
The TypeError appears when you attempt to use an operation or function on an object that Python does not support. Examples: 1. Adding a String to an Integer age = 25 message = "I am " + age + " years old." Output: TypeError: can only concatenate str (not "int") to str Value Error: A ValueError occurs when a function receives a value that is different from the argument type. That is, the data type is correct, but the data itself is not. Example: 1. Converting a non-numeric string to an integer or a float num = int("hello") You can’t convert the string “hello” to an integer even with the int() function. 2. Passing a negative number to math.sqrt() import math result = math.sqrt(-16) A negative value passed to the function results in a ValueError because the square root of negative numbers does not exist.
475
Element-to-element comparisons
involve checking corresponding items in sequences (lists, arrays) or adjacent items, rather than comparing containers as a single entity
476
in code we usually use: cols to mean
“how many column positions exist in each row” (the width), even if there’s only 1 row Example: grid = ["GGW", "GWW", "GGG"] len(grid) # 3 rows grid[0] # "GGW" (first row) len(grid[0]) # 3 columns Why len(grid[0]) = columns (this is the tricky part) Now we ask: len(grid[0]) Step-by-step: grid[0] → "GGW" "GGW" is a string len("GGW") counts characters There are 3 characters "G" "G" "W" ^ ^ ^ col0 col1 col2 So: 3 column positions
477
Mental model for working with boards:
Whenever you're working with 2D lists: board → the whole grid board[i] → one row board[i][j] → one specific cell
478
How do you get the first and last item of a list?
Indexing Example: List is called items items[0] items[-1]
479
defaultdict
A defaultdict is a specialized dictionary found in Python's built-in collections module. It is designed to handle the exact pattern you just implemented: checking if a key exists before trying to update its value. In a standard dictionary, if you try to access a key that doesn't exist, Python raises a KeyError. With a defaultdict, you provide a "factory function" when you create it. If you try to access a missing key, the defaultdict calls that function to create a default value for you automatically. For counting, we use the int function as the factory, because calling int() returns 0. Here is how your count_monsters function would look using a defaultdict: from collections import defaultdict def count_monsters(encounters): # We tell it to use 'int' (which defaults to 0) for missing keys counts = defaultdict(int) for monster in encounters: # If the monster isn't in counts yet, # it's automatically added with a value of 0 # then incremented to 1. counts[monster] += 1 return dict(counts) By using defaultdict(int), you eliminate the need for the if/else block entirely. It makes the code much cleaner when you are aggregating data into a dictionary. Common factory functions include: int: Defaults to 0 (useful for counters). list: Defaults to [] (useful for grouping items). set: Defaults to set() (useful for finding unique items in groups). Note: Here's the code in the example WITHOUT the defaultdict. (Both work but defaultdict is more concise.) def count_monsters(encounters): monster_dict = {} for encounter in encounters: if encounter not in monster_dict: monster_dict[encounter] = 1 else: monster_dict[encounter] = monster_dict[encounter] + 1 return monster_dict
480
How does python look for variables (LEGB rule)
Python searches for variables in this order: L → E → G → B Level Name Meaning L Local inside current function E Enclosing outer function G Global module level B Built-in Python built-ins
481
What nonlocal tells Python
First: How Python normally looks for variables (LEGB rule) Python searches for variables in this order: L → E → G → B Level Name Meaning L Local inside current function E Enclosing outer function G Global module level B Built-in Python built-ins nonlocal tells Python: "When I modify this variable, go to the Enclosing scope, not Local." Example: def create_counter(): count = 0 def counter(): nonlocal count count += 1 return count return counter my_counter = create_counter() So, it goes to count = 0 when it increments by 1 (i.e., 0+1). And count becomes 1. count was 1. count becomes 2. VERSUS the same code without nonlocal def create_counter(): count = 0 def counter(): count += 1 return count return counter Python assumes count must be a local variable inside counter() because you're assigning to it. But local count doesn't exist yet. So python errors. Example 2: Give me a play-by-play of the below code. Code: def make_wrapper(prefix, suffix): def content(text): nonlocal prefix nonlocal suffix return prefix + text + suffix return content def make_counter(start, step): def increase(): nonlocal start nonlocal step temp = start start += step return temp return increase Input: prefix: < suffix: > text: orc Expected: Actual: Pass --------------------------------- Testing make_wrapper... Input: prefix: suffix: ! text: Wow Expected: Wow! Actual: Wow! Pass --------------------------------- Testing make_wrapper... Input: prefix: Hi, suffix: ! text: Alice Expected: Hi, Alice! Actual: Hi, Alice! Pass --------------------------------- Testing make_counter... Input: start: 0 step: 1 Expected sequence: [0, 1, 2] Actual sequence: [0, 1, 2] Pass --------------------------------- Testing make_counter... Input: start: -5 step: 10 Expected sequence: [-5, 5, 15] Actual sequence: [-5, 5, 15] Pass --------------------------------- Testing make_counter... Input: start: 100 step: -25 Expected sequence: [100, 75, 50] Actual sequence: [100, 75, 50] Pass Play-by-Play This is the magic of Closures. It's one of the most powerful concepts in programming. The reason it doesn't always return the original input is that the inner function increase has a "memory" of the environment where it was created. When you call make_counter(0, 1), Python creates a little "bubble" (a scope) where start is 0 and step is 1. Your increase function lives inside that bubble. Here is the play-by-play: First Call to increase(): It looks at start in its "bubble" and sees 0. It saves 0 into temp. Because of nonlocal start, it reaches out and changes the start inside that bubble to 1. It returns 0. Second Call to increase(): It looks at start in the same bubble. But wait! start is no longer 0. It was changed to 1 during the last call! It saves 1 into temp. It reaches out and changes start in the bubble to 2. It returns 1. If you didn't have the nonlocal keyword, the inner function wouldn't be allowed to change the start variable in the outer bubble. By using nonlocal, you are telling the inner function: "When I say start, I don't want a new local version; I want you to modify the one that belongs to the make_counter function." The start variable "lives on" inside that bubble as long as the increase function exists, even after make_counter has finished running!
482
Clean Code
Any fool can write code that a computer can understand. Good programmers write code that humans can understand. -- Martin Fowler Clean Code Does Not Make your programs run faster Make your programs function correctly Only occur in object-oriented programming Clean Code Does Make code easier to work with Make it easier to find and fix bugs Make the development process faster Help us retain our sanity
483
Object-Oriented Programming
Object-Oriented Programming, or "OOP", is a pattern for (allegedly) writing clean and maintainable code.
484
DRY code
Another "rule of thumb" for writing maintainable code is "Don't Repeat Yourself" (DRY). It means that, when possible, you should avoid writing the same code in multiple places. Repeating code can be bad because: If you need to change it, you have to change it in multiple places If you forget to change it in one place, you'll have a bug It's more work to write it over and over again
485
re-factor
fancy programming word for "re-wrote" for example, I refactored the code so it...
486
Class
A special type in an object-oriented language like Python. It's a bit like a dictionary in that it usually stores name-value pairs. Example: Define a new class called "Soldier" # with three properties: health, armor, damage class Soldier: health = 5 armor = 3 damage = 2 Just like a string, integer, or float, a class is a TYPE, but instead of being a built-in type, classes are custom types that you define
487
Object
An instance of class (i.e., an example of class). "Instance" means "example of". For example, "Lane is an instance of a human" or "My Chemical Romance is an instance of a band". Example: health = 50 # health is an instance of an integer type aragorn = Soldier () # aragorn is an instance of the soldier class type Each new instance of a class is an "object" Example: class Archer: health = 4-- arrows = 10 create several instances of the Archer class legolas = Archer() bard = Archer() ~~~~ Note: Class isa bit like a dictionary in that it usually stores name-value pairs. Define a new class called "Soldier" class Soldier: health = 5 armor = 3 damage = 2 Just like a string, integer, or float, a class is a TYPE, but instead of being a built-in type, classes are custom types that you define
488
Method
# prints "3" A function that's tied directly to a class and has access to its properties. Methods are defined within the class declaration. Their first parameter is always the instance of the class that the method is being called on. By convention, it's called "self" and because self is a reference to the object, you can use it to read and update the properties of the object. It's like a function, BUT it's tied directly to a class and has access to the properties of the object. Example: class Soldier = health = 5 # This is a method that reduces the health of one soldier def take_damage(self, damage): self.health -= damage class Soldier: health = 5 # This is a method that reduces the # health of the soldier def take_damage(self, damage): self.health -= damage soldier_one = Soldier() soldier_one.take_damage(2) print(soldier_one.health) soldier_two = Soldier() soldier_two.take_damage(1) print(soldier_two.health) # prints "4" Note: methods are called directly on an object instance using the dot operator: my_object.my_methon()
489
What is self?
Example: class Wall: armor = 10 height = 5 def fortify (self): self.armor *= 2 Think of the Wall class as a blueprint for a wall. You can use that blueprint to build many different walls in your game world. Some might be tall, some might be short, and some might have more armor than others. When you call a method like fortify(), the computer needs to know which specific wall you are talking about. self is the way Python keeps track of that specific wall. self.armor refers to the armor of the specific wall you are currently "touching." If you had two different walls: Wall A: armor = 10 Wall B: armor = 50 When you call Wall B.fortify(), self points to Wall B. It ignores Wall A's armor entirely and only doubles the value for the second wall.
490
mutate
update
491
"getter" method
A common use case of a method that return a calculated value based on the properties of the object.
492
Does a method receive the object it was called on as its first parameter?
# "damage" and "multiplier" are passed explicitly as arguments Yes. class Soldier: health = 100 def take_damage(self, damage, multiplier): # "self" is dalinar in the first example # damage = damage * multiplier self.health -= damage dalinar = Soldier() # 20 and 2, respectively # "dalinar" is passed implicitly as the first argument, "self" dalinar.take_damage(20, 2) print(dalinar.health) # 60 adolin = Soldier() # Again, "adolin" is passed implicitly as the first argument, "self" # "damage" and "multiplier" are passed explicitly as arguments adolin.take_damage(10, 3) print(adolin.health) # 70 A method can operate on data that is contained within the class. In other words, you won't always see all the "outputs" in the return statement because the method might just mutate the object's properties directly.
493
The OOP Debate
OOP: Object-Oriented Programming Because functions are more explicit, some developers argue that functional programming is better than object-oriented programming. Neither paradigm is "better" (I'm required to say this as an educator). The best developers learn and understand both styles and use them as they see fit.
494
helm_of_mordune.enchant(mana, power) which value is automatically passed as self to the enchant method?
helm_of_mordune In Python, when you call a method on an object using the dot notation, the object itself is automatically passed as the first argument to that method. This is what becomes the self parameter inside the method's definition. In the example helm_of_mordune.enchant(mana, power): helm_of_mordune is the instance (the specific object). Because the method enchant is being called on this object, Python implicitly passes helm_of_mordune as the first argument. mana and power are explicit arguments. They are passed into the second and third parameters defined in the method signature.
495
It's rare in the real world to see a class that defines properties like this (as we did): class Soldier: name = "Legolas" armor = 2 num_weapons = 2 A constructor is (usually) better. What is a constructor?
It's a specific method on a class called __init__ that is called automatically when you create a new instance of a class. So, using a constructor, the code from above would look like this: class Soldier: def __init__(self): self.name = "Legolas" self.armor = 2 self.num_weapons = 2 Not only is this safer (we'll talk about why later), but it also allows us to make the starting property values configurable: class Soldier: def __init__(self, name, armor, num_weapons): self.name = name self.armor = armor self.num_weapons = num_weapons soldier_one = Soldier("Legolas", 2, 10) print(soldier_one.name) # prints "Legolas" print(soldier_one.armor) # prints "2" print(soldier_one.num_weapons) # prints "10" soldier_two = Soldier("Gimli", 5, 1) print(soldier_two.name) # prints "Gimli" print(soldier_two.armor) # prints "5" print(soldier_two.num_weapons) # prints "1"
496
instantiation
The creation of an instance. an object is an "instance" or "example of" class Example: Class: sword Object: Lane's sword (steel) Object: Allan's sword (wood) The creation of Lane's sword (i.e., an instance of the sword class) could also be called an instantiation. So, we could say the instantiation of Lane's sword...
497
How do you create a wall class using a constructor?
Note: you can think of a class as a "blueprint" or a "template" for an object. class Wall: def __init__(self, depth, height, width): self.depth = depth self.height = height self.width = width
498
Instance Variable vs Class Variable
Instance variables vary from object to object and are declared in the constructor. They're more common. Example: class Wall: def __init__(self): self.height = 10 # instance variable (per object) south_wall = Wall() south_wall.height = 20 # only updates this instance of a wall print(south_wall.height) # prints "20" north_wall = Wall() print(north_wall.height) # prints "10" Class variables are shared between instances of the same class and are declared at the top level of a class definition. They're less common. Note: In other languages, they're often called "static variables" Example: class Wall: height = 10 # class variable (shared across all instances) south_wall = Wall() print(south_wall.height) # prints "10" Wall.height = 20 # updates all instances of a Wall print(south_wall.height) # prints "20"
499
Which is better: instance variable or class variable?
Instance Variables. You'll typically want to avoid class variables. Like global variables, class variables are usually a bad idea, because they make it hard to keep track of which parts of your program are making updates.
500
Encapsulation
# I sure don't, I just call it and assume Encapsulation is the practice of hiding complexity inside a "black box" so that it's easier to focus on the problem at hand. The simplest example of encapsulation is a function. The caller of a function doesn't need to worry too much about what happens inside, they just need to understand the inputs and outputs. who even knows how this function works??? # it calculates the acceleration correctly acceleration = calc_acceleration(initial_speed, final_speed, time) Example: To use the calc_acceleration function, we don't need to think about each line of code inside the function definition. We just need to know that if we give it the inputs: initial_speed final_speed time It will produce the correct acceleration as an output. ** Real life example: It's kinda like how the inner workings of your car's power steering are hidden from you. You don't need to be an expert on hydraulic systems to turn the wheel from side to side.
501
public properties and methods
# 10 By default, all properties and methods in a class are public. That means that you can access them with the . operator: wall.height = 10 print(wall.height) Note: In the context of Object-Oriented Programming, properties (often called "attributes") are variables that belong to an instance of a class. They represent the data or state of an object. In your Wizard class, you have several properties: name health mana __stamina __intelligence Note: public means that any code outside of the class can see and change them using the dot . operator. For example, because health is public, you could do this: merlin = Wizard("Merlin", 10, 10) print(merlin.health) # This works because health is public merlin.health = 0 # This also works However, because you prefixed stamina with __, it is now private. If you tried to run print(merlin.__stamina) from outside the class, Python would raise an AttributeError.
502
Properties Data Members Attributes
Other names for properties: Data Members Attributes In the context of Object-Oriented Programming, properties (often called "attributes" or "data members") are variables that belong to an instance of a class. They represent the data or state of an object. In your Wizard class, you have several properties: name health mana __stamina __intelligence
503
What are private data members?
# 30 Private data members are a way to encapsulate logic and data within a class definition. To make a property or method private just prefix it with two underscores: class Wall: def __init__(self, armor, magic_resistance): self.__armor = armor self.__magic_resistance = magic_resistance def get_defense(self): return self.__armor + self.__magic_resistance front_wall = Wall(10, 20) This results in an error print(front_wall.__armor) This works print(front_wall.get_defense()) We do this to make it easier to use our class. Now when another developer (or even ourselves) use the Wall class, they don't need to think about how armor and magic_resistance affect the defense of a Wall. In fact, we don't even allow them to access armor and magic_resistance directly by making them private with __. They simply call the public get_defense() method and know that the correct value will be returned. Note: public means that any code outside of the class can see and change them using the dot . operator. For example, because health is public, you could do this: merlin = Wizard("Merlin", 10, 10) print(merlin.health) # This works because health is public merlin.health = 0 # This also works However, because you prefixed stamina with __, it is now private. If you tried to run print(merlin.__stamina) from outside the class, Python would raise an AttributeError.
504
Data member
Data member" is simply another term for a property or attribute. It refers to any variable that is "a member of" a class and holds data related to that class. In the programming world, we often use these terms interchangeably: Data member Property Attribute Instance variable (when it belongs to a specific object like self.name) Think of a class as having two types of members: Data Members: These store information (e.g., self.health = 100). Method Members: These store behavior or actions (e.g., def cast_spell(self):). When the lesson refers to private data members, it is specifically talking about those variables you've prefixed with __, like self.__stamina. They are "members" of the Wizard class that hold "data," but they are hidden from the outside world.
505
Does encapsulation and the concept of private and public members have to do with security.
No. Encapsulation and the concepts of private and public members have NOTHING to do with security. This really confused me as a new developer. Just as the casing on your computer hides its inner workings but doesn't stop you from opening the case and looking inside, encapsulation doesn't stop anyone from knowing how your code works, it just puts it all in one easy to find place. Encapsulation is about organization, not security. Encapsulation is like storing folders in an unlocked filing cabinet. The cabinet doesn't stop anyone from peeking inside, but it does keep everything tidy and easy to find.
506
What is a "side effect"
many methods in Object-Oriented Programming (OOP) are designed to perform "side effects"—meaning they modify the state of the object itself rather than giving a result back. Think of it like this: A function that returns a value is like asking a wizard, "What is your current health?" and the wizard tells you "980." A function that modifies state (like the below example ) is like the wizard actually getting hit. The wizard doesn't need to shout his new health total to the world; his health bar simply drops. If a function doesn't have a return statement, Python automatically returns None. In the below example, that's perfectly fine because the "work" was already done when you updated self.health and self.mana Example: class Wizard: def __init__(self, name, stamina, intelligence): self.name = name self.__stamina = stamina self.__intelligence = intelligence self.mana = self.__intelligence * 10 self.health = self.__stamina * 100 # don't touch above this line def get_fireballed(self, fireball_damage): d = fireball_damage - self.__stamina self.health = self.health - d def drink_mana_potion(self, potion_mana): potion_mana = potion_mana + self.__intelligence self.mana = self.mana + potion_mana
507
Prefixing methods and properties with a double underscore is ______
Prefixing methods and properties with a double underscore is a strong suggestion to the users of your class that they shouldn't be touching that stuff. If a developer wants to break convention, there are ways to get around the double underscore rule. Note: Python is a very dynamic language, which makes it difficult for the interpreter to enforce some of the safeguards that languages like Go do. That's why encapsulation in Python is achieved mostly by convention rather than by force.
508
You get the following error: Exception: Wizard.get_fireballed() takes 2 positional arguments but 3 were given when you submit the below code. Why? class Wizard: def __init__(self, name, stamina, intelligence): self.name = name self.__stamina = stamina self.__intelligence = intelligence self.mana = self.__intelligence * 10 self.health = self.__stamina * 100 def cast_fireball(self, target, fireball_cost, fireball_damage): if self.mana < fireball_cost: raise Exception(f"{self.name} cannot cast fireball") else: self.mana -= fireball_cost self.get_fireballed(target, fireball_damage) # this is the problematic line.
the "three arguments" mystery! This is a classic Python trick that catches many apprentices. In Python, when you call a method on an object like self.get_fireballed(...), Python automatically passes the object itself as the first argument. This becomes the self parameter in the function definition. So, when you write: self.get_fireballed(target, fireball_damage) Python actually sends: self (automatically) target fireball_damage That's three! But look at the definition of get_fireballed: def get_fireballed(self, fireball_damage): It only expects two! Furthermore, the assignment says to call get_fireballed on the target wizard. If you use self.get_fireballed, the wizard is effectively hitting themselves with their own fireball! How would you change that line to call the method on the target object instead of self? Answer: target.get_fireballed(fireball_damage)
509
Why does def is_alive(self): return self.health > 0 work the same as def is_alive(self): if self.health > 0: return True else: return False ?
In Python, any comparison using operators like >, <, or == already results in a boolean value (True or False). Think of it like this: 1. When Python sees self.health > 0, it asks: "Is the health greater than zero?" 2. If the health is 500, the answer is True. 3. If the health is 0, the answer is False. The longer version does this: "If the answer to 'Is health > 0' is True, then return True. Otherwise, return False." The shorter version does this: "Return the answer to 'Is health > 0'."
510
Guard Clauses or Early Returns
In the industry, skipping the else when you've already used a return or raise is often considered best practice. This pattern is frequently called Guard Clauses or Early Returns. Example: Use this: def withdraw(self, amount): if amount <= 0: raise ValueError("cannot withdraw zero or negative funds") if self.__balance < amount: raise ValueError("insufficient funds") self.__balance -= amount as opposed to this: def withdraw(self, amount): if amount <= 0: raise ValueError("cannot withdraw zero or negative funds") elif self.__balance < amount: raise ValueError("insufficient funds") else: self.__balance -= amount Why skip the else? When you use a raise or return, the function execution stops right there. There is no physical way for the computer to reach the code below it if the condition is met. Because of this, the else becomes redundant. Here are the primary reasons wizards prefer the "solution" style: Reduced Indentation: Every else block adds another level of indentation. In complex functions, this can lead to "arrow code" that marches off the right side of the screen. Keeping the main logic at the lowest level of indentation makes it easier to read. Mental Overhead: When a reader sees an else, they have to keep the original if condition in their head to understand when that block runs. With a guard clause, you "clear" the error cases out of the way first, allowing the reader to focus on the "happy path" ( the main logic) without distraction. Linear Logic: It makes the function read like a checklist: -Is the amount valid? No? Raise error. -Is there enough money? No? Raise error. -Okay, all clear! Now do the actual work. Note: the purest form of the Guard clause pattern leaves only independent if statements (no elif or else) Example: Use this: def cast_fireball(mana, spell_known): if mana < 10: return "Out of mana!" if not spell_known: return "You don't know that spell!" return "Fireball cast!" Instead of this; def cast_fireball(mana, spell_known): if mana < 10: return "Out of mana!" elif not spell_known: return "You don't know that spell!" return "Fireball cast!" By using independent if blocks, you treat each requirement (mana and knowledge) as a separate "gate" that must be passed before the main logic can run. It is a subtle difference, but it is the hallmark of a clean, professional codebase.
511
Abstraction vs. Encapsulation
Abstraction is about creating a simple interface for complex behavior. It focuses on what's exposed (public). Encapsulation is about hiding internal state. It focuses on tucking away the implementation details (private). Abstraction is more about reducing complexity, encapsulation is more about maintaining the integrity of system internals. Put in another way: Encapsulation = hide/protect internals (state + implementation) behind a boundary. Abstraction = offer a simpler “what you can do” interface that ignores messy details. Note: Boot.dev creator said, In my personal opinion, it's a bit of a distinction without a difference... but "abstraction" is a more broadly used term, and in my view at least, it's also a more general term for "making something easier to use by adding a layer on top". Example 1: import random attack_damage = random.randrange(5) Generating random numbers is a really hard problem. The operating system uses the physical hardware of the computer to create a seed for the randomness. However, the developers of the random library have abstracted that complexity away and encapsulated it within the simple randrange function. We just say "I want a random number from 0 to 4" and the library does it. When writing libraries for other developers to use, getting the abstractions right is critical because changing them later can be disastrous. Imagine if the maintainers of the random module changed the input parameters to the randrange function! It would break code used by thousands of applications around the world. Example 2: Bank account (both) Encapsulation: balance is private; you can’t set it directly. Abstraction: you get simple actions like deposit() / withdraw() without caring how overdrafts, logging, or fraud checks work. Example 3: Car controls (abstraction) vs engine internals (encapsulation) Abstraction: steering wheel + pedals are a simple interface: steer/go/stop. Encapsulation: you can’t (and shouldn’t) directly tweak fuel injection timing while driving; the engine management system protects those details. Example 4: Your Human class in a game Encapsulation: __pos_x, __pos_y, __speed are private so other code can’t accidentally corrupt them. Abstraction: coworkers use move_left() / move_up() and don’t need to know “position is stored as x/y and speed is added/subtracted.”
512
defaultdict
# counts is now {"slime": 2, "goblin": 1} defaultdict works like a normal dictionary, but it automatically creates default values when you access a key that does not exist yet. For counting, you usually use defaultdict(int) because int() gives a default value of 0. Example: from collections import defaultdict counts = defaultdict(int) counts["slime"] += 1 # counts["slime"] was 0, now 1 counts["slime"] += 1 # now 2 *Note: In the above example, defaultdict is like a blueprint. count is the actual dictionary you built from that blueprint Note: defaultdict(int) creates a dictionary-like container, not a number. You can't do count += 1 on it. Instead, you use it to count things by their keys: counts = defaultdict(int) counts["slime"] += 1 # counts["slime"] is now 1 counts["slime"] += 1 # counts["slime"] is now 2 counts["goblin"] += 1 # counts["goblin"] is now 1
513
What's a class in object-oriented programming?
Classes in object-oriented programming are all about grouping data and behavior together in one place: an object. bject-oriented programmers tend to think about programming as a modeling problem. They think: "How can I write a Human class that holds the data and simulates the behavior of a real human?" To provide some contrast, when functional programmers aren't busy writing white papers, they tend to think of their code as inputs and outputs, and how those inputs and outputs transition the world from one state to the next: "My game has 7 humans in it. When one takes a step, what's the next state of the game?"
514
random vs "pseudorandom"
In the world of programming, most "randomness" is actually what we call "pseudorandom." Computers use a starting value known as a seed to kick off a mathematical formula that generates a sequence of numbers. If you start with the same seed, you will always get the exact same sequence of "random" numbers.
515
Why is it more common to pull from the end of a python list/in the example of a deck cards, why is it more likely to make the top of the next (as in pull from the top of the deck) the last card?
Efficiency. This is a common optimization. In Python, pop() from the end of a list is an $O(1)$ operation, meaning it's very fast regardless of how many cards are in the deck. However, pop(0) is an $O(n)$ operation because after the first item is removed, every other item in the list has to be shifted one position to the left to fill the gap. Dealing from the "top" as the end of the list is a common optimization!
516
whenever you see self.something inside a class method, think of it as a permanent piece of data that the object carries around in its pocket. Without the self., it is just___
a temporary note that the function scribbles down and then tosses in the bin once it is finished. Example: class Wizard: def __init__(self, name): self.name = name def cast_spell(self, spell_name): power_level = 10 print(f"{self.name} casts {spell_name} with {power_level} power!") merlin = Wizard("Merlin") merlin.cast_spell("Fireball") Keeping the above note about tossing after finished, which of the following lines would cause an attribute error after the code runs? print(merlin.name) print(merlin.spell_name) print(merlin.power_level) answer: 2 & 3. Option 2 (spell_name): This is a parameter passed to the cast_spell method. It only exists while that specific function is running. Since it wasn't saved as self.spell_name, the merlin object doesn't know what it is once the function finishes. Option 3 (power_level): This is a local variable defined inside the method. Just like spell_name, it only exists within the "scope" of the function and is not "attached" to the object because you didn't use self.. Only Option 1 (name) works because you explicitly saved it to the object using self.name = name in the __init__ method.
517
Inheritance
Non-OOP languages like Go and Rust support encapsulation and abstraction... almost every language does. Inheritance, on the other hand, is typically unique to class-based languages like Python, Java, and Ruby. Inheritance allows a "child" class, to inherit properties and methods from a "parent" class. It's a way to share code between classes. For example, say we have this Aircraft class: class Aircraft: def __init__(self, height, speed): self.height = height self.speed = speed def fly_up(self): self.height += self.speed And say we want to also model more specific kinds of aircraft. We could create a more specific Helicopter class like this: class Helicopter: def __init__(self, height, speed): self.height = height self.speed = speed self.direction = 0 def fly_up(self): self.height += self.speed def rotate(self): self.direction += 90 Trouble is, we've rewritten a lot of the same code twice... wouldn't it be nice if a Helicopter could just take all the behavior from an Aircraft, and then just add its own unique behavior on top of that? Well, it can! We'll just make Helicopter a child class of Aircraft: class Helicopter(Aircraft): def __init__(self, height, speed): super().__init__(height, speed) self.direction = 0 def rotate(self): self.direction += 90 By adding Aircraft in parentheses after Helicopter, we're saying "make Helicopter a child class of Aircraft". Now Helicopter inherits all the properties and methods of Aircraft! The super() method returns a proxy of the parent class, meaning we can use it to call the parent class's constructor and other methods. So the Helicopter's constructor says "first, call the Aircraft constructor, and then additionally set the direction property". Now, say we want to create a Jet class. Again, because all jets are aircraft, we can inherit from Aircraft again. One parent class can have as many child classes as you want. class Jet(Aircraft): def __init__(self, speed): # Jets always start on the ground super().__init__(0, speed) def go_supersonic(self): self.speed *= 2
518
When a child class inherits from a parent, it inherits ___
When a child class inherits from a parent, it inherits everything. If you only want to share some functionality, inheritance should not be the tool you use. Just share some functions, or maybe make a new parent class that both classes can inherit from
519
A good child class is a strict ___of its parent class.
subset In mathematics, a set A is a subset of a set B if and only if all elements of A are also elements of B Image: small circle A within large circle B. be careful! New programmers often get carried away. You should never think to yourself: "Well most wizards are elves... so I'll just have Wizard inherit from Elf"
520
How to ignore a value
the most common "best practice" for ignoring invalid input is using a Guard Clause with an early return. you exit the function immediately. It's often considered cleaner because it avoids unnecessary math and makes the "happy path" of the function more obvious: def add_xp(self, amount): # Guard clause: if the data is bad, just stop here if amount < 0: return # This part only runs if the data is valid self.__xp += amount Using return tells the computer: "This input is invalid for this operation, so don't do anything at all."
521
Name mangling & "cheating"
When you prefix an attribute with two underscores (like __health), Python performs something called name mangling. It internally renames that variable to include the class name, like _Player__health. This makes it difficult (though not impossible) for outside code to accidentally access or change that data. It's Python's way of saying, "This is internal to the class; keep out!" In contrast, if you use a single underscore (like _health), it's just a convention. Python won't stop outside code from accessing it, but other programmers will see the underscore and know they shouldn't be touching it directly. Here is what that error looks like: hero = Player("Arthur", 100) This will raise: AttributeError: 'Player' object has no attribute '__health' print(hero.__health) Technically, you could still "cheat" like this, but you shouldn't! print(hero._Player__health) The "Cheat" In languages like Java or C++, "private" means the compiler will strictly forbid you from accessing that data. If you try, the code won't even run. Python is different. It follows a philosophy often called "We are all consenting adults here." Why the “cheat” works When you use double underscores (like __health), Python doesn't actually lock the data in a vault. Instead, it performs Name Mangling. It automatically renames the variable behind the scenes to: _ClassName__attributeName So, __health inside the Player class becomes _Player__health. The "cheat" works because that is the actual name of the variable in the object's memory dictionary! Python doesn't stop you from using that name; it just makes it intentionally annoying to find. Why does Python do this? There are two main reasons: Avoiding Name Clashes: If you create a subclass (like class Mage(Player):) and it also defines its own __health, name mangling ensures the Mage health doesn't accidentally overwrite the Player health. They would be stored as _Mage__health and _Player__health. Internal Protection (Not Security): It’s meant to protect the class from accidental misuse by outside code. It's like putting a "Staff Only" sign on a door. It doesn't physically lock the door with a deadbolt, but it tells everyone else they aren't supposed to be in there. If a programmer goes out of their way to type hero._Player__health, they are essentially saying, "I know I shouldn't be doing this, but I'm doing it anyway." Python trusts the programmer to know the risks!
522
.get()
the .get() dictionary method is used to retrieve a value for a specific key. Syntax: dict.get(key, default_value) Key: The key you want to search for in the dictionary. Default Value (optional): The value to return if the key is not found. If omitted, it returns None by default instead of raising a KeyError Example:
523
Game map as a cartesian plane x_1 y_1 x_2 y_2
(x2, y2) and (x1, y1) x_1 is the left wall (minimum x). y_1 is the bottom wall or floor (minimum y). x_2 is the right wall (maximum x). y_2 is the top wall or ceiling (maximum y)
524
isinstance(x, SomeClass)
isinstance(x, SomeClass) is a built-in Python function that answers: “Is x an object whose type is SomeClass (or a subclass of it)?” So in this lesson, other is an object like Hero("Bob") or Merchant("Cara"). To see whether that object is a Hero, you do: isinstance(other, Hero)
525
Abstraction vs Encapsulation
-Abstraction is about creating a simple interface for complex behavior. It focuses on what's exposed (public). -Encapsulation is about hiding internal state. It focuses on tucking away the implementation details (private). -Abstraction focuses on exposing essential features while hiding complexity -Encapsulation focuses on bundling data with methods and restricting direct access to implementation details
526
Why do we need to use target.get_name when trying to get the target's name in a class that inherits from a class that has self.__name = name?
We need to use the get_name() method because the name attribute in the Human class is defined with a double underscore: self.__name. In Python, prefixing an attribute with __ triggers name mangling. This is a way to make an attribute "private," meaning it is intended to be accessed only from within the class where it was defined. If you tried to access it directly like this: target_name = target.__name Python would raise an AttributeError. It does this to protect the internal state of the object and ensure that outside code doesn't accidentally mess with data it shouldn't touch. By providing a public method like get_name(), the Human class says: "I will keep my name data private, but if you ask nicely through this specific method, I will give you a copy of it." This concept is known as encapsulation. It allows you to change how the name is stored internally later on (perhaps you want to capitalize it or log every time it's accessed) without breaking any code that relies on get_name().
527
How do you call on a parent method?
super().method_name() Example: super().get_trip_cost(distance, food_price) *distance & food_price are the arguments for this method
528
How do you construct a new instance of a class?
instance = ClassName(arguments) Example: Create a new instance of the sword class that is an iron sword: return Sword("iron") full code: def __add__(self, other): if self.sword_type == "bronze" and other.sword_type == "bronze": return Sword("iron") raise Exception("cannot craft")
529
Do custom classes have built-in support for operators?
No. You must add your own support. Example: class Point: def __init__(self, x, y): self.x = x self.y = y def __add__(self, other): x = self.x + other.x y = self.y + other.y return Point(x, y) p1 = Point(4, 5) p2 = Point(2, 3) p3 = p1 + p2 # p3 is (6, 8)
530
What is operator overloading? How do the operators translate into method names?
The practice of defining custom behavior for standard Python operators. Custom classes don't have built-in support for operators, so you must add your own support. Operation Operator Method Addition + __add__ Subtraction - _sub__ Multiplication * __mul__ Power ** __pow__ Division / __truediv__ Floor Division // __floordiv__ Remainder (modulo) % __mod__ Bitwise Left Shift << __lshift__ Bitwise Right Shift >> __rshift__ Bitwise AND & __and__ Bitwise OR | __or__ Bitwise XOR ^ __xor__ Bitwise NOT ~ __invert__
531
Script
If you quit from the Python interpreter and enter it again, the definitions you have made (functions and variables) are lost. Therefore, if you want to write a somewhat longer program, you are better off using a text editor to prepare the input for the interpreter and running it with that file as input instead. This is known as creating a script.
532
Module
If you quit from the Python interpreter and enter it again, the definitions you have made (functions and variables) are lost. Therefore, if you want to write a somewhat longer program, you are better off using a text editor to prepare the input for the interpreter and running it with that file as input instead. This is known as creating a script. As your program gets longer, you may want to split it into several files for easier maintenance. You may also want to use a handy function that you’ve written in several programs without copying its definition into each program. To support this, Python has a way to put definitions in a file and use them in a script or in an interactive instance of the interpreter. Such a file is called a module
533
How do you import functions from other modules?
from module_name import my_function # or import everything: from module_name import *
534
Delta time
The Greek letter delta (Δ) is often used to represent a change in a value in mathematics. In game development, we use "delta time" to represent the amount of time that has passed since the last frame was drawn This value is useful to decouple the game's speed from the speed it's being drawn to the screen. If your computer's CPU speeds up (its speed is not constant, like a sprinter running as fast as they can), the asteroids shouldn't also speed up. Conversely, if your computer slows down, the asteroids shouldn't also slow down: they may just move less smoothly.
535
536
FPS
Frames per second
537
Functional vs Imperative (or procedural) programming
Functional programming is a style (or "paradigm" if you're pretentious) of programming where we compose functions instead of mutating state (updating the value of variables). Functional programming is more about declaring what you want to happen, rather than how you want it to happen. Imperative (or procedural) programming declares both the what and the how. Example of imperative code: car = create_car() car.add_gas(10) car.clean_windows() Example of functional code: return clean_windows(add_gas(create_car()))
538
Static Typing
A statically-typed language is a language (such as Java, C, or C++) where variable types are known at compile time. In most of these languages, types must be expressly indicated by the programmer; in other cases (such as OCaml), type inference allows the programmer to not indicate their variable types.
539
pattern matching
In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually must be exact: "either it will or will not be a match."
540
How do you add a string to a tuple
Tuples are immutable so you must create a brand new tuple with the string (treated as a single element tuple) added to the end. Example: document: a string documents: the current tuple of strings def add_prefix(document, documents): prefix = f"{len(documents)}. " new_doc = prefix + document # new doc is the string that below is treated as a single element tuple and tacked on to #documents. # docs is the new tuple docs = documents + (new_doc,) return docs
541
Functions or Classes?
Should I Use Functions or Classes? Here's my rule of thumb: If you're unsure, default to functions. I find myself reaching for classes when I need something long-lived and stateful that would be easier to model if I could share behavior and data structure via inheritance. This is often the case for: Video games Simulations GUIs The difference is: Classes encourage you to think about the world as a hierarchical collection of objects. Objects bundle behavior, data, and state together in a way that draws boundaries between instances of things, like chess pieces on a board. Functions encourage you to think about the world as a series of data transformations. Functions take data as input and return a transformed output. For example, a function might take the entire state of a chess board and a move as inputs, and return the new state of the board as output.
542
.replace(old, new)
Use to replace all occurrences of a character in a string. Example: line = Friends don't lie. line.replace('.', '') output: Friends don't lie
543
.upper() vs .capitalize
.upper() capitalizes an entire string .capitalize capitalizes the first letter Example: line: You can't spell America without Erica line.upper() output: YOU CAN'T SPELL AMERICA WITHOUT ERICA line.capitalize() output: You can't spell america without erica
544
.strip() vs .lstrip() vs .rstrip()
.strip() removes whitespace from the beginning and end of a string .lstrip() removes whitespaces from the left .rstrip() removes whitespaces from the right Example: input: She's our friend line.strip() output: She's our friend
545
When working in a language that supports ideas from both FP and OOP (like __, __, or __) the best developers are the ones who can use ______________.
When working in a language that supports ideas from both FP and OOP (like Python, JavaScript, or Go) the best developers are the ones who can use the best ideas from both paradigms effectively and appropriately
546
Ternary Expressions
Ternaries are a great way to reduce a series of statements, like an if/else block, to a single expression. Syntax: value_a if condition else value_b Example: result = number / 2 if number % 2 == 0 else (number * 3) + 1 vs result = 0 if number % 2 == 0: result = number / 2 else: result = (number * 3) + 1
547
Common ways to iterate over a dictionary
Just keys: for name in user_ages: print(name) Keys explicitly: for name in user_ages.keys(): print(name) Values: for age in user_ages.values(): print(age) Key–value pairs: for name, age in user_ages.items(): print(name, age) You can build up a list while you loop: adults = [] for name, age in user_ages.items(): if age >= 18: adults.append(name)
548
First-class function
A function that is treated like any other value Example: def square(x): return x * x Assign function to a variable f = square print(f(5)) # 25
549
Higher-order function
A function that accepts another function as an argument or returns a function Example: def square(x): return x * x def my_map(func, arg_list): result = [] for i in arg_list: result.append(func(i)) return result squares = my_map(square, [1, 2, 3, 4, 5]) print(squares) # [1, 4, 9, 16, 25]
550
map
# [1, 4, 9, 16, 25] Example: Without map: def square(x): return x * x nums = [1, 2, 3, 4, 5] squared_nums = [] for num in nums: num_squared = square(num) squared_nums.append(num_squared) print(squared_nums) # [1, 4, 9, 16, 25] With map: def square(x): return x * x nums = [1, 2, 3, 4, 5] squared_nums = map(square, nums) print(list(squared_nums)) # [1, 4, 9, 16, 25] Note: map() returns a "map object", so the list() type constructor is needed to convert it back into a standard list.
551
Filter()
Syntax: filter(function, iterable) The Python filter() function extracts elements from an iterable (like a list, tuple, or set) that satisfy a specific condition defined by a function. It returns a filter object (an iterator), which is memory-efficient and must typically be converted to a list or other sequence to view its contents Note: You need to wrap it in list(). Why? Think of it like this: filter: A map showing where the berries are in the forest. list(): Actually going into the forest and putting the berries in your basket. Example: def get_affordable_items(menu, max_price): return list(filter(lambda item: item[1] <= max_price, menu))
552
isalpha()
The Python isalpha() method is a built-in string method that checks if all characters in a string are alphabetic and that the string is not empty. It returns a boolean value, True or False. def filter_valid_gamer_tags(tags): return list(filter(lambda tag: len(tag) >= 4 and len(tag) <= 10 and " " not in tag and tag[0].isalpha(), tags))
553
anonymous functions
Anonymous functions have no name, and in Python, they're called lambda functions after lambda calculus. Here's a lambda function that takes a single argument x and returns the result of x + 1: lambda x: x + 1 Notice that the expression x + 1 is returned automatically, no need for a return statement. Compare that to how you'd normally write a function: def add_one(x): return x + 1 Because functions are just values, we can assign the function to a variable named add_one: add_one = lambda x: x + 1 print(add_one(2)) # 3 Lambda functions might look scary, but they're still just functions. Because they simply return the result of an expression, they're often used for small, simple evaluations. Here's an example that uses a lambda to get a value from a dictionary: get_age = lambda name: { "lane": 29, "hunter": 69, "allan": 17 }.get(name, "not found") print(get_age("lane")) # 29
554
In programming, "state" refers to
the data stored in memory at a specific moment in time. When we use loops, we are constantly changing (mutating) that state.
555
Why do we aim for "stateless" code
state: the data stored in memory at a specific moment in time. When we use loops, we are constantly changing (mutating) that state. 1. The Burden of “Mental Tracking” In the imperative loop example: result = 1 for i in range(1, n + 1): result *= i To understand what this code is doing at any given second, you have to track the "moving parts" in your head. You have to ask: "What is the value of i right now? And what was result updated to in the previous step?" As programs get larger, tracking these shifting variables becomes a major source of bugs. 2. Predictability When you use reduce, you are describing what the result is, rather than how to step-by-step calculate it: functools.reduce(lambda x, y: x * y, range(1, n + 1)) The logic is contained within a pure function (the lambda). Because there are no variables being reassigned (no x = x * y lines), the code is much easier to reason about. You don't have to worry about a variable being accidentally changed by another part of the loop or a different function. 3. Debugging Stateful mutations are often where bugs hide. If result ends up with the wrong value in a long loop, you have to find exactly which iteration caused the mistake. With functional composition, you are building a "pipeline." If the output is wrong, you usually just need to check the logic of the small, isolated functions within that pipeline. By avoiding stateful mutations, we make our code more "declarative," which leads to fewer side effects and a more robust codebase!
556
Pure function
A pure function always returns the same output for the same input. The two rules of a pure function are: -Deterministic — given the same input, it always produces the same output -No side effects — it doesn't modify anything outside itself (no changing global variables, no I/O, no mutation of the input, etc.) In short: pure functions don't do anything with anything that exists outside of their scope. Why do we use functions that aren't pure? A program with no side effects is useless. It makes your computer heat up but it has no results. It can't print to console, can't access the internet, can't update a database, etc.
557
Benefits of Pure Functions
Pure functions have a lot of benefits. Whenever possible, good developers try to use pure functions instead of impure functions. Remember, pure functions: Return the same result if given the same arguments. They are deterministic. Do not change the external state of the program. For example, they do not change any variables outside of their scope. Do not perform any I/O operations (like reading from disk, accessing the internet, or writing to the console). These properties result in pure functions being easier to test, debug, and think about.
558
Reference vs Value
When you pass a value into a function as an argument, one of two things can happen: It's passed by reference: The function has access to the original value and can change it. It's passed by value: The function only has access to a copy. Changes to the copy within the function don't affect the original. There is a bit more nuance, but this explanation mostly works. These types are passed by reference: Lists Dictionaries Sets These types are passed by value: Integers Floats Strings Booleans Tuples Most collection types are passed by reference (except for tuples) and most primitive types are passed by value. Example of Pass by Reference (Mutable) def modify_list(inner_lst): inner_lst.append(4) # the original "outer_lst" is updated # because inner_lst is a reference to the original outer_lst = [1, 2, 3] modify_list(outer_lst) # outer_lst = [1, 2, 3, 4] Example of Pass by Value (Immutable) def attempt_to_modify(inner_num): inner_num += 1 # the original "outer_num" is not updated # because inner_num is a copy of the original outer_num = 1 attempt_to_modify(outer_num) # outer_num = 1
559
Pass by Reference Impurity
Because certain types in Python are passed by reference, we can mutate values that we didn't intend to. This is a form of function impurity. Remember, a pure function should have no side effects. It shouldn't modify anything outside of its scope, including its inputs. It should return new copies of inputs instead of changing them. Pure Function def remove_format(default_formats, old_format): new_formats = default_formats.copy() new_formats[old_format] = False return new_formats Impure Function def remove_format(default_formats, old_format): default_formats[old_format] = False return default_formats Why Do We Care? One of the biggest differences between good and great developers is how often they incorporate pure functions into their code. Pure functions are easier to read, easier to reason about, easier to test, and easier to combine. Even if you're working in an imperative language like Python, you can (and should) write pure functions whenever reasonable. There's nothing worse than trying to debug a program where the order of function calls needs to be juuuuust right because they all read and modify the same global variable.
560
i/o
The term "i/o" stands for input/output. In the context of writing programs, i/o refers to anything in our code that interacts with the "outside world". "Outside world" just means anything that's not stored in our application's memory (like variables). Examples of I/O Reading from or writing to a file on the hard drive Accessing the internet Reading from or writing to a database Even simply printing to the console is considered i/o!
561
why is returning not a side effect but printing is?
Printing reaches out to the console (an external system) to display text. That's an interaction with the "outside world" - a side effect. Something beyond your function's own memory is changed as a result. Returning just hands a value back to the caller, staying entirely within the program's memory. Nothing external is touched. The caller can then decide what to do with that value - store it, print it, pass it elsewhere, or ignore it. A useful way to think about it: a function with no side effects is like a vending machine. You put something in, you get something out, and the machine doesn't call anyone or write anything down in the process. print, on the other hand, is like the machine announcing your purchase over a loudspeaker - it reaches out into the world. This is why pure functions (no side effects) are easier to test: you can simply check what they return without worrying about what they might be doing to the outside world.
562
No-Op
A no-op is an operation that does... nothing. Example: This function performs a useless computation because it doesn't return anything or perform a side effect. It's a no-op. def square(x): x * x
563
memoization
memoization is just caching (storing a copy of) the result of a computation so that we don't have to compute it again in the future. For example, take this simple function: def add(x, y): return x + y A call to add(5, 7) will always evaluate to 12. So, if you think about it, once we know that add(5, 7) can be replaced with 12, we can just store the value 12 somewhere in memory so that we don't have to do the addition operation again in the future. Then, if we need to add(5, 7) again, we can just look up the value 12 instead of doing a (potentially expensive) CPU operation. The slower and more complex the function, the more memoization can help speed things up.
564
Referential Transparency
"Referential transparency" is a fancy way of saying that a function call can be replaced by its would-be return value because it's the same every time. Referentially transparent functions can be safely memoized. For example add(2, 3) can be smartly replaced by the value 5. Note: Pure functions are always referentially transparent.
565
Should I Always Memoize?
No! Memoization is a tradeoff between memory and speed. If your function is fast to execute, it's probably not worth memoizing, because the amount of RAM (memory) your program will need to store the results will go way up. It's also a bunch of extra code to write, so you should only do it if you have a good reason to.
566
What happens if you do "if not dictionary_name"?
In Python, certain values are considered "falsy" - they behave like False in a boolean context. An empty dictionary {} is one of them. So if not bundle: reads as "if bundle is falsy" - which is True when bundle is {}. These are all falsy in Python: {} # empty dict [] # empty list "" # empty string 0 # zero None # nothing
567
Base case
A base case in recursion is the terminating condition that stops a function from calling itself indefinitely. Example: if i == len(word): return is the base case in the below example def print_chars(word, i): if i == len(word): return print(word[i]) print_chars(word, i + 1) print_chars("Hello", 0) # H # e # l # l # o
568
isinstance()
isinstance checks type membership (i.e., does this value belong to this type?) Syntax: isinstance(value, type) Example: def sum_nested_list(lst): size = 0 for i in lst: if isinstance(i, int): size += i if isinstance(i, list): lst_size = sum_nested_list(i) size += lst_size return size
569
Recursion is useful for tree-like structures (e.g., nested dictionaries, file systems, HTML docs, JSON objects). Why?
# 10 Because we don't always know how deep they are nested. Example: for item in tree: for nested_item in item: for nested_nested_item in nested_item: for nested_nested_nested_item in nested_nested_item: # ... WHEN DOES IT END??? Example of recursion used to calculate a nested sum: In Doc2Doc, users can process files or entire directories. We need to know the total size of those files and directories (measured in bytes). Due to the nested nature of directories, we represent a root directory as a list of lists. Each list represents a directory, and each number represents the size of a file in that directory. For example, here's a directory that contains 2 files at the root level, then a nested directory with its own two files: root = [ 1, 2, [3, 4] ] print(sum_nested_list(root)) Solution: def sum_nested_list(lst): size = 0 for i in lst: if isinstance(i, int): size += i if isinstance(i, list): lst_size = sum_nested_list(i) size += lst_size return size
570
.extend() List method
Adds the specified list elements (or any iterable) to the end of the current list. syntax: list.extend(iterable) Example: Add the elements of cars to the fruits list: fruits = ['apple', 'banana', 'cherry'] cars = ['Ford', 'BMW', 'Volvo'] fruits.extend(cars)
571
Stacks
Stacks in computing architectures are regions of memory where data is added or removed in a last-in-first-out (LIFO) manner. In most modern computer systems, each thread has a reserved region of memory referred to as its stack. When a function executes, it may add some of its local state data to the top of the stack; when the function exits it is responsible for removing that data from the stack. At a minimum, a thread's stack is used to store the location of a return address provided by the caller in order to allow return statements to return to the correct location. The stack is often used to store variables of fixed length local to the currently active functions. Programmers may further choose to explicitly use the stack to store local data of variable length. If a region of memory lies on the thread's stack, that memory is said to have been allocated on the stack, i.e. stack-based memory allocation (SBMA). This is contrasted with a heap-based memory allocation (HBMA). The SBMA is often closely coupled with a function call stack.
572
Stack Overflow & Recursion
Stack Overflow: Each function call requires a bit of memory. So, if you recurse too deeply, you can run out of "stack" memory which will crash your program. (This is what the famous website is named after) If you don't have a solid base case, you can end up in an infinite loop (which will likely lead to a stack overflow).
573
Tail call optimization
ECMAScript 6 offers tail call optimization, where you can make some function calls without growing the call stack. This chapter explains how that works and what benefits it brings. Notes: Stack overflow & recursion: Stack Overflow: Each function call requires a bit of memory. So, if you recurse too deeply, you can run out of "stack" memory which will crash your program. (This is what the famous website is named after) If you don't have a solid base case, you can end up in an infinite loop (which will likely lead to a stack overflow).
574
Performant Code
Performant coding refers to writing software that executes efficiently, using minimal time and resources (CPU, memory, network) to meet required speed and scalability goals
575
High Order Functions vs Function Transformation
# 25 A Higher-Order Function is a function that does at least one of the following: -Takes one or more functions as arguments. -Returns a function as its result. "Function transformation" is just a more concise way to describe a specific type of higher order function. It's when a function takes a function (or functions) as input and returns a new function. Example of a function transformation: def multiply(x, y): return x * y def add(x, y): return x + y def self_math(math_func): # inner_func is defined inside self_math. # It can only be referenced directly # inside self_math's scope. However, it is then # returned and can be captured into a new variable # like square_func or double_func, and called that way def inner_func(x): return math_func(x, x) return inner_func square_func = self_math(multiply) double_func = self_math(add) print(square_func(5)) print(double_func(5)) # 10
576
Closure
# to the new function 'harry_potter_aggregator' A closure is a function that references variables from outside its own function body. The function definition and its environment are bundled together into a single entity. Put simply, a closure is just a function that keeps track of some values from the place where it was defined, no matter where it is executed later on. Example The concatter() function returns a function called doc_builder (yay higher-order functions!) that has a reference to an enclosed doc value. def concatter(): doc = "" def doc_builder(word): # "nonlocal" tells Python to use the 'doc' # variable from the enclosing scope nonlocal doc doc += word + " " return doc return doc_builder save the returned 'doc_builder' function harry_potter_aggregator = concatter() harry_potter_aggregator("Mr.") harry_potter_aggregator("and") harry_potter_aggregator("Mrs.") harry_potter_aggregator("Dursley") harry_potter_aggregator("of") harry_potter_aggregator("number") harry_potter_aggregator("four,") harry_potter_aggregator("Privet") print(harry_potter_aggregator("Drive")) # Mr. and Mrs. Dursley of number four, Privet Drive When concatter() is called, it creates a new "stateful" function that remembers the value of its internal doc variable. Each successive call to harry_potter_aggregator appends to that same doc.
577
nonlocal
Python has a keyword called nonlocal that's required to modify a variable from an enclosing scope. Most programming languages don't require this keyword, but Python does. Example: def concatter(): doc = "" def doc_builder(word): # "nonlocal" tells Python to use the 'doc' # variable from the enclosing scope nonlocal doc doc += word + " " return doc return doc_builder save the returned 'doc_builder' function harry_potter_aggregator = concatter() harry_potter_aggregator("Mr.") harry_potter_aggregator("and") harry_potter_aggregator("Mrs.") harry_potter_aggregator("Dursley") harry_potter_aggregator("of") harry_potter_aggregator("number") harry_potter_aggregator("four,") harry_potter_aggregator("Privet") print(harry_potter_aggregator("Drive")) # Mr. and Mrs. Dursley of number four, Privet Drive
578
Closure
A closure is a function that references variables from outside its own function body. The function definition and its environment are bundled together into a single entity. Put simply, a closure is just a function that keeps track of some values from the place where it was defined, no matter where it is executed later on. Example: def word_count_aggregator(): count = 0 def increment_count(doc): nonlocal count count += len(doc.split()) return count return increment_count The whole point of a closure is that it's stateful. It's a function that "remembers" the values from the enclosing scope even after the enclosing scope has finished executing. That means that in many cases, closures are not pure functions. They can mutate state outside of their scope and have side effects.
579
When to use the nonlocal keyword? And when not to use the nonlocal keyword?
You only need the nonlocal keyword if you are reassigning a variable instead of modifying its contents (which you must do to change immutable values such as strings and integers). When not to use the nonlocal keyword: when the variable is mutable (such as a list, dictionary or set), and you are modifying its contents rather than reassigning the variable.
580
currying
Function currying is a specific kind of function transformation where we translate a single function that accepts multiple arguments into multiple functions that each accept a single argument. This is a "normal" 3-argument function: box_volume(3, 4, 5) This is a "curried" series of functions that does the same thing: box_volume(3)(4)(5)
581
desugaring
Language processors, including compilers and static analyzers, often expand sugared constructs into their more verbose equivalents before processing, a process sometimes called "desugaring". Note: What are Language Processors? Compilers, interpreters, translate programs written in high-level languages into machine code that a computer understands and assemblers translate programs written in low-level or assembly language into machine code. In the compilation process, there are several stages. To help programmers write error-free code, tools are available.
582
syntatic sugar
"Syntactic sugar" just means "a more convenient syntax". In computer science, syntactic sugar is syntax within a programming language that is designed to make things easier to read or to express. It makes the language "sweeter" for human use: things can be expressed more clearly, more concisely, or in an alternative style that some may prefer. Syntactic sugar is usually a shorthand for a common operation that could also be expressed in an alternate, more verbose, form: The programmer has a choice of whether to use the shorter form or the longer form, but will usually use the shorter form since it is shorter and easier to type and read. For example, in the Python programming language it is possible to get a list element at a given index using the syntax list_variable.__getitem__(index), but this is frequently shortened to list_variable[index] which could be considered simpler and easier to read, despite having identical behavior. Similarly, list_variable.__setitem__(index, value) is frequently shortened to list_variable[index] = value. Example: It's Just Syntactic Sugar Python decorators are just another (sometimes simpler) way of writing a higher-order function. These two pieces of code are identical: With Decorator @vowel_counter def process_doc(doc): print(f"Document: {doc}") process_doc("Something wicked this way comes") Without Decorator def process_doc(doc): print(f"Document: {doc}") process_doc = vowel_counter(process_doc) process_doc("Something wicked this way comes") Note: A Python decorator is just syntactic sugar for higher-order functions. "Syntactic sugar" just means "a more convenient syntax".
583
**kwargs
# Your name is Alice What is **kwargs? **kwargs lets you pass any number of named arguments (key=value pairs) to a function, when you don't know in advance how many there will be. The ** is the special syntax — kwargs is just the conventional name (you could call it anything). Breaking down the example When you call greet_me(name="yasoob"), Python packages that into a dictionary behind the scenes: python{"name": "yasoob"} So inside the function, kwargs is just a regular dictionary you can loop over, check keys, etc. A more vivid example pythondef describe_person(**kwargs): for key, value in kwargs.items(): print(f"Your {key} is {value}") describe_person(name="Alice", age=30, city="Paris") Output: # Your age is 30 # Your city is Paris You passed 3 named arguments, and the function handled all of them — without you defining name, age, or city as parameters. Why is this useful? Without **kwargs, you'd have to pre-define every possible argument: python# Rigid — only works with exactly these 3 arguments def describe_person(name, age, city): ... With **kwargs, the function is flexible — it accepts whatever named arguments you throw at it. The key mental model What you write when callingWhat Python gives the functiongreet_me(name="yasoob")kwargs = {"name": "yasoob"}greet_me(a=1, b=2, c=3)kwargs = {"a": 1, "b": 2, "c": 3} It simply collects all the name=value pairs into a dictionary for you to use inside the function. TL:DR: **kwargs collects keyword (named) arguments into a dictionary
584
*args
*args — catching extra positional arguments When you call a function, positional arguments are the ones passed by position, without a name: pythonadd(1, 2, 3) # 1, 2, 3 are positional *args collects any number of these into a tuple inside the function: pythondef add_numbers(*args): total = 0 for num in args: total += num return total add_numbers(1, 2, 3) # returns 6 add_numbers(10, 20) # returns 30 add_numbers(1, 2, 3, 4, 5) # returns 15 TL;DR: *args catches extra positional arguments as a tuple
585
*args vs **kwargs
A good way to remember it: *args catches extra positional arguments as a list, and **kwargs catches extra named arguments as a dictionary. Notes: it is not necessary to write *args or **kwargs. Only the * (asterisk) is necessary. You could have also written *var and **vars. Writing *args and **kwargs is just a convention.
586
Positional Arguments
Positional Arguments Positional arguments are the ones you're already familiar with, where the order of the arguments matters. Like this: def sub(a, b): return a - b a=3, b=2 res = sub(3, 2) # res = 1
587
Keyword Arguments
Keyword Arguments Keyword arguments are passed in by name. Order does not matter. Like this: def sub(a, b): return a - b res = sub(b=3, a=2) # res = -1 res = sub(a=3, b=2) # res = 1
588
sub(b=3, 2) does not work. Why?
Any positional arguments must come before keyword arguments. When Python reads arguments in a function call, it processes them left to right. If it sees a keyword argument like b=3, it assumes all remaining arguments will also be keyword arguments. Then when it hits a bare 2, it doesn't know what to do with it - is it a positional argument? Which parameter does it belong to? Python simply forbids this ambiguity entirely. Positional arguments must always come first. The correct way to call sub with those values would be either: sub(2, 3) # both positional sub(a=2, b=3) # both keyword sub(2, b=3) # positional first, then keyword All three are valid. The rule is just: once you start using keyword arguments, you cannot go back to positional ones.
589
Variadic
Variadic describes a function, method, or macro in programming that accepts a variable number of arguments, rather than a fixed number. Example: variadic function
590
*
The * symbol is the unpacking operator. When you use it in a function call, it tells Python: "Take this collection and unpack its contents so they are passed as individual arguments." Example: -cleaned is a list -so python unpacks the list into individual elements and applies the function func to them. return func(*cleaned)
591
parameters vs arguments
Parameters are the variables defined in a function's declaration or definition. They are placeholders for the data the function is designed to receive. Arguments are the actual values or expressions supplied to the function when it is called or invoked. GeeksforGeeks GeeksforGeeks +3 Code Example Consider the following Python function definition and call: python def add(num1, num2): # num1 and num2 are parameters return num1 + num2 result = add(4, 3) # 4 and 3 are arguments
592
First-class functions:
Most modern languages (JavaScript, Python, Go, C#, Java, etc.) treat functions as "first-class citizens," meaning you can pass them as arguments, return them from other functions, and assign them to variables.
593
Higher-order functions:
If a language has first-class functions, it naturally supports higher-order functions (functions that take or return other functions).x
594
Function Currying
While more prevalent in functional languages like Haskell, the ability to transform a function that takes multiple arguments into a series of functions that each take a single argument is a mathematical concept applicable in almost any language that supports closures.
595
lru_cache
from functools import lru_cache @lru_cache() The situations where you'd actually reach for it are fairly specific: a function that is called repeatedly with the same inputs AND is slow due to computation, I/O, or deep recursion.
596
Escaped
"Escaped" means replacing special characters with a safe alternative representation. In HTML, the < character has special meaning - it starts a tag. If you want to display a literal < on a webpage without the browser interpreting it as HTML, you replace it with <. That substitution is called "escaping" the character. So if your input is: hello After escaping, it becomes: <b>hello</b>
597
enum
If you're trying to represent a fixed set of values (but not store additional data within them) enums are the way to go. Let's say we have a Color variable that we want to restrict to only three possible values: RED GREEN BLUE We could use a plain-old string to represent these values, but that's annoying because we have to remember all the "valid" values and defensively check for invalid ones all over our codebase. Instead, we can use an Enum: from enum import Enum Color = Enum('Color', ['RED', 'GREEN', 'BLUE']) print(Color.RED) # this works, prints 'Color.RED' print(Color.TEAL) # this raises an exception There is also a class-based syntax for creating enums: from enum import Enum class Color(Enum): RED = 1 GREEN = 2 BLUE = 3 print(Color.RED) # this works, prints 'Color.RED' print(Color.TEAL) # this raises an exception Now Color is a sum type! At least, as close as we can get in Python.
598
Algebraic Data Types
Product Type: can have many (often infinite) combinations Sum Type: have a fixed number of possible values.
599
Product Type
product types can have many (often infinite) combinations, This Python object is an example of a product type: man.studies_finance = True man.has_trust_fund = False The total number of combinations a man can have is 4, the product of 2 * 2: studies_finance has_trust_fund True True True False False True False False If we add a third attribute, perhaps a has_blue_eyes boolean, the total number of possibilities multiplies again, to 8! studies_finance has_trust_fund has_blue_eyes True True True True True False True False True True False False False True True False True False False False True False False False
600
Sum Type
Sum Type: have a fixed number of possible values. Example: t let's pretend that we live in a world where there are really only three types of people that our program cares about: Dateable Undateable Maybe dateable We can reduce the number of cases our code needs to handle by using a (admittedly fake Pythonic) sum type with only 3 possible types: class Person: def __init__(self, name): self.name = name class Dateable(Person): pass class MaybeDateable(Person): pass class Undateable(Person): pass Then we can use the isinstance built-in function to check if a Person is an instance of one of the subclasses. It's a clunky way to represent sum types, but hey, it's Python. def respond_to_text(guy_at_bar): if isinstance(guy_at_bar, Dateable): return f"Hey {guy_at_bar.name}, I'd love to go out with you!" elif isinstance(guy_at_bar, MaybeDateable): return f"Hey {guy_at_bar.name}, I'm busy but let's hang out sometime later." elif isinstance(guy_at_bar, Undateable): return "Have you tried being rich?" else: raise ValueError("invalid person type")
601
Does python support sum types as well as statically typed languages like Rust?
No. Python does not support sum types as well as some of the other statically typed languages. Python does not enforce your types before your code runs. That's why we need this line here to raise an Exception if a color is invalid:
602
603
API
An API (Application Programming Interface) is a defined way for two pieces of software to communicate with each other. Think of it like a restaurant menu: you (the client) don't go into the kitchen and cook your own food. Instead, you place an order using the menu (the API), and the kitchen (the server) handles the details and sends back your meal (the response). In the context of this lesson, your Python script is the client, and Google's Gemini service is the server. You send it a prompt via the Gemini API, and it sends back a response -- along with metadata like token counts.
604
Token
Tokens are the fundamental unit of text that LLMs work with. What is a token? A token is a chunk of text -- not quite a word, not quite a character, but somewhere in between. The model breaks all text into these chunks before processing it. For example: "cat" might be 1 token "unbelievable" might be 3 tokens ("un", "believ", "able") A space and punctuation often count as their own tokens too As a rough rule of thumb, 1 token is about 4 characters, or roughly 0.75 words in English. How are they used? When you send a prompt to an LLM: Your prompt is broken into tokens (input tokens) The model processes those tokens The model generates a response, one token at a time (output tokens) The model has a "context window" -- a maximum number of tokens it can hold in memory at once. This includes both your prompt and its response. Why charge by token? Processing tokens costs real compute resources -- memory, GPU time, electricity. Charging by token is a fair, granular way to bill for actual usage. Input and output tokens are often priced differently too, since generating tokens (output) is typically more expensive than reading them (input).
605
Sandbox
A sandbox is a restricted environment where code can only access a limited, pre-approved set of resources. The idea is to contain what a program (or LLM agent, etc.) is allowed to do. TL;DR: a sandbox defines what is allowed.
606
Escaping the Sandbox
Find a way to access resources outside the permitted boundaries of a sandbox. TL;DR: a sandbox defines what is allowed. And the escape is any attempt--intentional or accidental--to go beyond it. Note: A sandbox is a restricted environment where code can only access a limited, pre-approved set of resources. The idea is to contain what a program (or LLM agent, etc.) is allowed to do.
607
os.path operations
os.path is a module for working with file system paths in a cross-platform way. import os Join path segments safely (handles slashes for you) os.path.join("calculator", "pkg", "calculator.py") # -> "calculator/pkg/calculator.py" Convert a relative path to an absolute one os.path.abspath("calculator") # -> "/home/user/project/calculator" Normalize a path (resolves ".." and redundant slashes) os.path.normpath("/home/user/../user/project") # -> "/home/user/project" Check if a path is an existing file os.path.isfile("calculator/main.py") # True or False Check if a path is an existing directory os.path.isdir("calculator/pkg") # True or False Find the common root of two paths (used for sandboxing) os.path.commonpath(["/project/calculator", "/project/calculator/main.py"]) # -> "/project/calculator"
608
os.path.join()
import os Join path segments safely (handles slashes for you) os.path.join("calculator", "pkg", "calculator.py") # -> "calculator/pkg/calculator.py"
609
os.path.abspath()
import os Convert a relative path to an absolute one os.path.abspath("calculator") # -> "/home/user/project/calculator"
610
os.path.normpath()
import os Normalize a path (resolves ".." and redundant slashes) os.path.normpath("/home/user/../user/project") # -> "/home/user/project"
611
os.path.isfile()
import os Check if a path is an existing file os.path.isfile("calculator/main.py") # True or False
612
os.path.isdir()
import os Check if a path is an existing directory os.path.isdir("calculator/pkg") # True or False
613
os.path.commonpath()
import os Find the common root of two paths (used for sandboxing) os.path.commonpath(["/project/calculator", "/project/calculator/main.py"]) # -> "/project/calculator"
614
with open(...) as f
open() gives you a file object you can read from or write to. The with statement ensures the file is automatically closed when the block ends, even if an error occurs. with open("notes.txt", "r") as f: content = f.read() # file is closed here automatically The second argument to open() is the mode: "r" — read (default) "w" — write (overwrites the file) "a" — append Common methods on the file object: f.read() # read entire file as a string f.read(1000) # read up to 1000 characters f.readlines() # read all lines into a list f.write("hi") # write a string (mode must be "w" or "a")
615
how do you calculate a logarithm in python?
There isn't a language-level operator to calculate a logarithm, but we can import the math library and use the math.log() function. import math print(f"Logarithm base 2 of 16 is: {math.log(16, 2)}") # Logarithm base 2 of 16 is: 4.0
616
"Big O" analysis
(pronounced "Big Oh", not "Big Zero") is one way to compare the practicality of algorithms by classifying their time complexity. Big O is a characterization of algorithms according to their worst-case growth rates We write Big-O notation like this: O(formula) Types of algorithms: O(1) - constant The execution time does not change regardless of how much data you have. Example: Accessing an element in an array by its index. def get_first_item(items): return items[0] Whether the list has 10 items or 10 million, looking up index 0 takes the same amount of time. O(log n) - Logarithmic Time The time grows linearly while the input size grows exponentially. Each step typically cuts the remaining work in half. Example: Finding a word in a physical dictionary or a sorted list using Binary Search. If you search for a name in a phone book by opening it in the middle and discarding the half that doesn't contain the name, you are performing an O(log n) operation. O(n) - Linear Time The time it takes to run is directly proportional to the size of the input. Example: Searching for a specific value in an unsorted list. def find_item(items, target): for item in items: if item == target: return True return False If you have 10 items, you might check 10 times. If you have 100 items, you might check 100 times. O(n log n) - Linearithmic This happens when you perform an O(log n) operation for every item in your input of size n. Example: Efficient sorting algorithms like Merge Sort or Quick Sort. To visualize the difference: If n is 1,000,000: log n is approximately 20 operations. n log n is approximately 1,000,000 * 20, or 20,000,000 operations. O(n^2) - Squared (Quadratic) Time The time grows at the square of the input size. This often happens when you have nested loops over the same data. Example: Comparing every item in a list to every other item in the same list. def find_duplicates(items): for i in range(len(items)): for j in range(i + 1, len(items)): if items[i] == items[j]: return True return False O(2^n) - Exponential Time The time required doubles with every single addition to the input data. Example: A recursive calculation of Fibonacci numbers without optimization. def fibonacci(n): if n <= 1: return n return fibonacci(n - 1) + fibonacci(n - 2) Calculating fibonacci(20) takes significantly longer than fibonacci(10) because the number of recursive calls explodes. O(n!) - Factorial Time The time grows by the product of all integers up to n. This is one of the "slowest" growth rates. Example: The Traveling Salesperson Problem solved via brute force. If you have 10 cities and want to find the shortest possible route that visits every city by checking every possible permutation, you are dealing with factorial growth. Adding just one more city makes the problem massively more difficult.
617
What is the difference between O(log n) and O(n log n) ?
The difference between O(log n) and O(n log n) is a matter of how many times you perform a logarithmic operation. O(log n) - Logarithmic Think of this as "halving" the work. You have a large pile of data, and with each step, you discard half of it until you find what you need. Example: Finding a specific page in a book by opening it in the middle, then opening the middle of the remaining half, and so on. O(n log n) - Linearithmic This happens when you perform an O(log n) operation for every item in your input of size n. Example: Efficient sorting algorithms like Merge Sort or Quick Sort. To visualize the difference: If n is 1,000,000: log n is approximately 20 operations. n log n is approximately 1,000,000 * 20, or 20,000,000 operations. As you can see, O(n log n) is significantly slower than O(log n), but it is still much faster than O(n^2) (which would be 1,000,000,000,000 operations!).
618
Diff between O(n^2) and O(2^n)?
While both are "slow" as input sizes grow, the difference in how quickly they explode is vast. O(n^2) - Polynomial (Quadratic) The exponent is constant (2), and the base is the input size (n). Example: Nested loops. If n = 10, you do 100 operations. If n = 100, you do 10,000 operations. It grows significantly, but it is often still manageable for smaller datasets in modern computing. O(2^n) - Exponential The base is constant (2), and the exponent is the input size (n). Example: Trying every possible combination of a password (Brute Force). If n = 10, you do 1,024 operations. If n = 100, you do approximately 1,267,650,600,228,229,401,496,703,205,376 operations. To put that in perspective: At n = 100, an O(n^2) algorithm finishes in a fraction of a second on a laptop. At n = 100, an O(2^n) algorithm would likely not finish before the sun burns out, even on the world's fastest supercomputer. In short: O(n^2) is "slow," but O(2^n) is "impossible" for large inputs.
619
Lists the common Big O categories, ordered from the slowest (worst performance) to the fastest (best performance):
Here are the common Big O categories, ordered from the slowest (worst performance) to the fastest (best performance): O(n!) - Factorial (Slowest) O(2^n) - Exponential O(n^2) - Quadratic O(n log n) - Linearithmic O(n) - Linear O(log n) - Logarithmic O(1) - Constant (Fastest) To visualize this, imagine you have an input size of n = 100: O(1) would take 1 operation. O(n) would take 100 operations. O(n^2) would take 10,000 operations. O(2^n) would take more operations than there are atoms in the known universe!
620
O(1)
O(1) - constant The execution time does not change regardless of how much data you have. Example: Accessing an element in an array by its index. def get_first_item(items): return items[0] Whether the list has 10 items or 10 million, looking up index 0 takes the same amount of time.
621
O(log n)
O(log n) - Logarithmic Time The time grows linearly while the input size grows exponentially. Each step typically cuts the remaining work in half. Example: Finding a word in a physical dictionary or a sorted list using Binary Search. If you search for a name in a phone book by opening it in the middle and discarding the half that doesn't contain the name, you are performing an O(log n) operation.
622
O(n)
O(n) - Linear Time The time it takes to run is directly proportional to the size of the input. Example: Searching for a specific value in an unsorted list. def find_item(items, target): for item in items: if item == target: return True return False If you have 10 items, you might check 10 times. If you have 100 items, you might check 100 times.
623
O(n log n)
O(n log n) - Linearithmic This happens when you perform an O(log n) operation for every item in your input of size n. Example: Efficient sorting algorithms like Merge Sort or Quick Sort. To visualize the difference: If n is 1,000,000: log n is approximately 20 operations. n log n is approximately 1,000,000 * 20, or 20,000,000 operations.
624
O(n^2)
O(n^2) - Squared (Quadratic) Time The time grows at the square of the input size. This often happens when you have nested loops over the same data. Example: Comparing every item in a list to every other item in the same list. def find_duplicates(items): for i in range(len(items)): for j in range(i + 1, len(items)): if items[i] == items[j]: return True return False
625
O(2^n)
O(2^n) - Exponential Time The time required doubles with every single addition to the input data. Example: A recursive calculation of Fibonacci numbers without optimization. def fibonacci(n): if n <= 1: return n return fibonacci(n - 1) + fibonacci(n - 2) Calculating fibonacci(20) takes significantly longer than fibonacci(10) because the number of recursive calls explodes.
626
O(n!)
O(n!) - Factorial Time The time grows by the product of all integers up to n. This is one of the "slowest" growth rates. Example: The Traveling Salesperson Problem solved via brute force. If you have 10 cities and want to find the shortest possible route that visits every city by checking every possible permutation, you are dealing with factorial growth. Adding just one more city makes the problem massively more difficult.
627
The main takeaways from big 0 analysis
*Constant O(1): Instant, no matter how much data you have. Example: Accessing an element in an array by its index. def get_first_item(items): return items[0] Whether the list has 10 items or 10 million, looking up index 0 takes the same amount of time. *Linear O(n): You have a single loop through the data. Example: Searching for a specific value in an unsorted list. def find_item(items, target): for item in items: if item == target: return True return False If you have 10 items, you might check 10 times. If you have 100 items, you might check 100 times. *Quadratic O(n^2): You have nested loops (a loop inside a loop). Example: Comparing every item in a list to every other item in the same list. def find_duplicates(items): for i in range(len(items)): for j in range(i + 1, len(items)): if items[i] == items[j]: return True return False *Here are the common Big O categories, ordered from the slowest (worst performance) to the fastest (best performance): O(n!) - Factorial (Slowest) O(2^n) - Exponential O(n^2) - Quadratic O(n log n) - Linearithmic O(n) - Linear O(log n) - Logarithmic O(1) - Constant (Fastest)
628
Bubble Sort
O(n^2) Bubble sort is a very basic sorting algorithm named for the way elements "bubble up" to the top of the list. Bubble sort repeatedly steps through a slice and compares adjacent elements, swapping them if they are out of order. It continues to loop over the slice until the whole list is completely sorted. Bubble sort is famous for how easy it is to write and understand. However, it's one of the slowest sorting algorithms, and as a result is almost never used in practice. Example: While our avocado toast influencers were happy with our search functionality, now they want to be able to sort all their followers by follower count. Bubble sort is a straightforward sorting algorithm that we can implement quickly, so let's do that! def bubble_sort(nums): swapping = True end = len(nums) while swapping == True: swapping = False for i in range(1, end): if nums[i-1] > nums[i]: temp = nums[i] nums[i] = nums[i-1] nums[i-1] = temp swapping = True end -= 1 return nums
629
Merge Sort When is a merge sort a good idea?
O(n*log(n)) Merge sort is a recursive sorting algorithm and it's quite a bit faster than bubble sort. It's a divide and conquer algorithm.: Divide: divide the large problem into smaller problems, and recursively solve the smaller problems Conquer: Combine the results of the smaller problems to solve the large problem In merge sort we: Divide the array into two (equal) halves (divide) Recursively sort the two halves Merge the two halves to form a sorted array (conquer) Here’s a tiny example: List: [5, 2, 4, 1] Merge sort does this: Split into halves: [5, 2] [4, 1] Split again: [5] [2] [4] [1] Merge sorted pieces: [2, 5] [1, 4] Merge again: [1, 2, 4, 5] So the idea is: -break the list into tiny pieces -sort while merging those pieces back together -That’s why it’s fast, but it needs extra memory for the split parts. When is a merge sort a good idea? When you need a fast sorting algorithm and memory isn't an issue. Example: Our LockedIn influencers are complaining that when they sort their followers by follower count, it gets really slow if they have more than 1,000 followers (because we're using Bubble Sort). Let's speed it up for them with merge sort. def merge_sort(nums): if len(nums) < 2: return nums else: mid = len(nums) // 2 left_half = nums[:mid] right_half = nums[mid:] sorted_left_side = merge_sort(left_half) sorted_right_side = merge_sort(right_half) return merge(sorted_left_side, sorted_right_side) def merge(first, second): final = [] i = 0 j = 0 while i < len(first) and j < len(second): if first[i] <= second[j]: final.append(first[i]) i += 1 else: final.append(second[j]) j += 1 while i < len(first): final.append(first[i]) i += 1 while j < len(second): final.append(second[j]) j += 1 return final
630
O(n^2)
both insertion sort and bubble sort are O(n^2) in the worst case. They share a similar structure — nested loops over the list. However, there are some practical differences: Insertion sort tends to be faster in practice because it does fewer swaps. It shifts elements into place and can stop early for each element once it finds the right spot. On a nearly sorted list, it approaches O(n). Bubble sort repeatedly walks through the entire list swapping adjacent elements, which generally results in more total operations even though the Big O classification is the same. Think of it this way: Big O describes the growth rate as input size increases, but it hides constant factors. Two O(n^2) algorithms can have very different real-world performance. Insertion sort's constants are typically smaller, which is why it's often preferred for small or nearly-sorted datasets
631
Insertion Sort
O(n^2) (like bubble sort) Think of picking up a hand of playing cards one at a time. Each time you draw a new card, you slide it into the correct position among the cards you're already holding. You don't re-sort your whole hand — you just scan backward from the right until you find where the new card belongs, shift everything over, and insert it. It's much less efficient on large lists than merge sort because it's O(n^2), but it's actually faster (not in Big O terms, but due to smaller constants) than merge sort on small lists. Bubble Sort vs Insertion Sort Think of it this way: Big O describes the growth rate as input size increases, but it hides constant factors. Two O(n^2) algorithms can have very different real-world performance. Insertion sort's constants are typically smaller, which is why it's often preferred for small or nearly-sorted datasets Example: Our influencers want to sort their affiliate deals by revenue. None of our users have more than a couple hundred affiliate deals, so we don't need an n * log(n) algorithm like merge sort. In fact, insertion_sort can be faster than merge_sort, and uses less of our server's memory. def insertion_sort(nums): for i in range(1, len(nums)): j = i while j > 0 and nums[j-1] > nums[j]: nums[j], nums[j-1] = nums[j-1], nums[j] j -= 1 return nums
632
Why Use Insertion Sort?
Fast: for very small data sets (even faster than merge sort and quick sort, which we'll cover later) Adaptive: Faster for partially sorted data sets Stable: Does not change the relative order of elements with equal keys In-Place: Only requires a constant amount of memory Inline: Can sort a list as it receives it
633
Should insertion sort or merge sort be used for the following scenarios? -data that is very small -data that is nearly sorted -larger data that is not nearly sorted
"insertion sort" for data that is very small "insertion sort" for data that is nearly sorted "merge sort" for larger data that is not nearly sorted
634
Quick Sort
Divide: -Select a pivot element that will preferably end up close to the center of the sorted pack -Move everything onto the "greater than" or "less than" side of the pivot -The pivot is now in its final position -Recursively repeat the operation on both sides of the pivot Conquer: -The array is sorted after all elements have been through the pivot operation concept: pick a pivot, partition elements around it, recurse on both halves. That conceptual understanding is what sticks long-term. Quick sort is an efficient sorting algorithm that's widely used in production sorting implementations. Like merge sort, quick sort is a recursive divide and conquer algorithm. Real-Life Example: Imagine you're a teacher sorting a line of students by height. You pick one student (the pivot) and have them step forward. You tell everyone shorter to go to the left, and everyone taller to go to the right. That pivot student is now in their correct spot -- they don't need to move again. You then point to the left group and say "do the same thing among yourselves," and then the right group. Each sub-group picks their own pivot, splits again, and so on. Eventually every group is just one student standing alone, and the whole line is sorted. Actual Example: We now have two sorting algorithms on our LockedIn backend! It is a bit annoying to maintain both in the codebase. Quicksort is fast on large datasets just like merge sort, but is also lighter on memory usage. Let's use quick sort for both follower count and influencer revenue sorting! def quick_sort(nums, low, high): # Partition the input list using the partition function and store the returned "middle" index if low < high: mid = partition(nums, low, high) # Recursively call quick_sort on the left side of the partition quick_sort(nums, low, mid-1) # Recursively call quick_sort on the right side of the partition quick_sort(nums, mid+1, high) def partition(nums, low, high): pivot = nums[high] # Set i to the index before low i = low - 1 for j in range(low, high): if nums[j] < pivot: i += 1 nums[i], nums[j] = nums[j], nums[i] nums[i + 1], nums[high] = nums[high], nums[i + 1] return i + 1 The key parallels to the real-life example: Picking the pivot = choosing nums[high] Students shuffling left/right = the partition loop with i and j "Do the same thing among yourselves" = the two recursive quick_sort calls One student standing alone = the base case where low < high is False The magic is that every time you do a round, at least one student (the pivot) lands in their final position permanently. So progress is always being made, even though it might look chaotic in the middle. Equation: Best / Average Case: O(n log n) Each partition step does O(n) work -- it scans through all elements once to split them around the pivot. Then the list is (ideally) split roughly in half, giving us log n levels of recursion. Level 0: [ n ] -> n comparisons Level 1: [ n/2 ] [ n/2 ] -> n comparisons total Level 2: [n/4][n/4] [n/4][n/4] -> n comparisons total ... log n levels deep So it's n work per level x log n levels = O(n log n). Worst Case: O(n^2) This happens when the pivot is consistently the smallest or largest element -- like if the list is already sorted and you always pick the last element as pivot. Instead of splitting in half, you get: Level 0: [ n ] -> n comparisons Level 1: [ n-1 ] -> n-1 comparisons Level 2: [ n-2 ] -> n-2 comparisons ... n levels deep That's n + (n-1) + (n-2) + ... + 1 = O(n^2). Each partition only peels off one element instead of splitting evenly. Space Complexity: O(log n) Unlike merge sort which needs O(n) extra space for temporary arrays, quick sort sorts in-place. The only extra memory is the recursive call stack, which is O(log n) deep in the average case. This lighter memory footprint is exactly why the lesson mentions quick sort being "lighter on memory usage" than merge sort.
635
*(job interview q) Describe the following algorithms conceptually: -Quick Sort -Insertion Sort -Merge Sort -Bubble Sort
Quick Sort: Imagine you're a teacher sorting a line of students by height. You pick one student (the pivot) and have them step forward. You tell everyone shorter to go to the left, and everyone taller to go to the right. That pivot student is now in their correct spot -- they don't need to move again. You then point to the left group and say "do the same thing among yourselves," and then the right group. Each sub-group picks their own pivot, splits again, and so on. Eventually every group is just one student standing alone, and the whole line is sorted. Insertion Sort: Think of picking up a hand of playing cards one at a time. Each time you draw a new card, you slide it into the correct position among the cards you're already holding. You don't re-sort your whole hand — you just scan backward from the right until you find where the new card belongs, shift everything over, and insert it. Merge Sort: Imagine you're a teacher sorting a huge pile of exams by score. You split the pile in half and hand each half to a helper. Each helper splits their pile in half again and hands those off to more helpers. This keeps going until each helper is holding just one exam -- which is trivially "sorted." Now the helpers start merging back up. Two helpers sit together, each with their sorted pile, and combine them by comparing the top exam from each pile and placing the lower score first. This merging bubbles all the way back up until you have one fully sorted pile. Bubble Sort Imagine you're organizing a bookshelf by height, but you can only compare two neighboring books at a time. You start at the left end. You look at the first two books -- if the left one is taller, you swap them. Then you move one spot right and compare the next pair. You do this all the way to the end of the shelf. By the time you reach the right side, the tallest book has "bubbled" to the far right, like a bubble rising to the surface. You walk back to the left and do it again. This time the second tallest bubbles into place. You keep making passes until you walk the whole shelf without swapping anything -- that means it's sorted. That's why it's called bubble sort -- the largest elements gradually float to the top (right side) with each pass.
636
Why Use Quick Sort? Pros and Cons
Pros: Very fast: At least it is in the average case In-Place: Saves on memory, doesn't need to do a lot of copying and allocating Cons: Typically unstable: changes the relative order of elements with equal keys Recursive: can incur a performance penalty in some implementations Pivot sensitivity: if the pivot is poorly chosen, it can lead to poor performance
637
What is a stable sorting algorithm?
keeps equal items in the same relative order after sorting
638