Python II Flashcards by heinz meins

How well did you know this?

Not at all

Perfectly

Which of the following statements accurately describe Lambda functions in Python? a. They are a concept from functional programming, originating from Alonzo Church’s λ-calculus . b. They can have several arguments, but are limited to a single expression . c. Every Lambda function can be rewritten using a regular def function definition . d. They are primarily used for defining functions that will be called multiple times throughout a large codebase .

a, b, c

How well did you know this?

Not at all

Perfectly

Consider the following Python code snippet: my_lambda = lambda a : a + 10 print(my_lambda(5)) Which of the following regular function definitions is equivalent to my_lambda? a. def my_lambda(a): return a + 10 b. def my_lambda(a): print(a + 10) c. def my_lambda(): return 10 d. def my_lambda(a): return a - 10

How well did you know this?

Not at all

Perfectly

Lambda functions are often used as arguments to higher-order functions. Which of the following built-in Python functions or modules can directly benefit from the use of a Lambda function as an argument, as shown in the lecture slides? a. map() b. filter() c. reduce() (from functools) d. sum()

a, b, c

How well did you know this?

Not at all

Perfectly

Given the list people = [‘Alice’, ‘Bob’, ‘Charlie’], which of the following uses of sort() with a lambda function would sort the list by the length of the person’s name? a. people.sort(key=lambda x : x) b. people.sort(key=lambda x : x) c. people.sort(key=lambda x : len(x)) d. people.sort(key=lambda x : len(x))

How well did you know this?

Not at all

Perfectly

When working with complex objects, such as instances of a custom class, lambda functions can be particularly useful for sorting or finding minimum/maximum values. Consider the Person class from the slides: class Person: def __init__(self, name, age): self.name = name self.age = age jack = Person(‘Jack’, 30) agnes = Person(‘Agnes’, 28) students = [jack, agnes] Which of the following correctly uses a lambda function to find the person with the minimum age in the students list? a. min_age = min(students, key=lambda s: s.name) b. min_age = min(students, key=lambda s: s.age) c. min_age = min(students.age) d. min_age = min(students, key=lambda s: len(s.name))

How well did you know this?

Not at all

Perfectly

Which of the following Python constructs are considered “Callable Objects” as discussed in the lectures? a. Functions, such as print() b. Methods, such as some_list.sort() c. Classes (their constructors), such as my_dog = Dog() d. Instances of a class that implement the __call__ magic method

a, b, c, d

How well did you know this?

Not at all

Perfectly

Consider the Logger class implementation from the slides: class Logger: def __init__(self, prefix): self.prefix = prefix def __call__(self, message): print(f’{self.prefix}: {message}’) info_logger = Logger(‘INFO’) debug_logger = Logger(‘DEBUG’) Which of the following statements are true regarding this code? a. info_logger and debug_logger are instances of the Logger class that can be called like functions . b. The __call__ method allows instances of Logger to store a state (e.g., prefix) that is reused across calls without being passed as an argument . c. Calling info_logger(‘System boot’) would print ‘INFO: System boot’ . d. The Logger class is an example of a decorator.

a, b, c

How well did you know this?

Not at all

Perfectly

Which of the following statements about decorators in Python are correct? a. Decorators are functions that take one or more functions as arguments and return a new function that modifies the behavior of the original . b. The @ syntax is syntactic sugar for manually wrapping a function with a decorator . c. Decorators implemented as classes can be useful for maintaining state between calls to the decorated function . d. The functools.wraps decorator is essential for preserving the __name__ and __doc__ attributes of the original function .

a, b, c, d

How well did you know this?

Not at all

Perfectly

Consider the functools.cache decorator. Which of the following accurately describes its behavior when applied to a recursive function like fibonacci(n)? from functools import cache @cache def fibonacci(n): # … (implementation for fibonacci) … a. It stores the return values of the function based on its arguments . b. For repeated calls with the same arguments, it returns the cached result without re-executing the function’s body . c. It is useful for optimizing functions with potentially expensive computations for repeated inputs . d. It automatically detects and resolves infinite recursion.

a, b, c

How well did you know this?

Not at all

Perfectly

The functools.singledispatch decorator allows a function to behave differently based on the type of its first argument. Consider the example: from functools import singledispatch @singledispatch def display(x) -> None: raise NotImplementedError @display.register def _(x: list | tuple): for i, item in enumerate(x): print(f’Item {i}: {item}’) @display.register def _(x: dict): for key, value in x.items(): print(f’{key}: {value}’) my_list = [1, 2, 3] my_dict = {‘a’: 1, ‘b’: 2} Which of the following will correctly execute and produce output? a. display(my_list) b. display(my_dict) c. display(10) (assuming no other registrations for int) d. display(‘hello’) (assuming no other registrations for str)

a, b (c and d would raise NotImplementedError unless int or str were registered.)

How well did you know this?

Not at all

Perfectly

Asynchronous functions in Python, using async and await, are primarily designed for which type of concurrency? a. CPU-bound tasks, leveraging multiple CPU cores in parallel . b. I/O-bound tasks, allowing the program to perform other operations while waiting for I/O to complete . c. Simultaneously executing heavy mathematical computations on multiple GPUs. d. Distributing computations across a cluster of machines.

How well did you know this?

Not at all

Perfectly

Consider the sequential vs. asynchronous image download example from the UE slides. Why does the asynchronous version significantly reduce the total execution time compared to the sequential one, especially for a large number of images? a. The asyncio.sleep() function is much faster than time.sleep(). b. Asynchronous functions automatically download images in parallel across multiple CPU cores. c. The await keyword signals to the event loop that the function is waiting for an I/O operation (network request) to complete, allowing other tasks to run in the meantime instead of idling . d. aiohttp automatically compresses image files before downloading, reducing transfer size.

How well did you know this?

Not at all

Perfectly

Which of the following statements are true about image data representation? a. Grayscale 2D images are typically represented as 2D arrays, with each pixel carrying brightness information . b. RGB 2D images are represented as 3D arrays, with two spatial dimensions and one dimension for color channels (red, green, blue) . c. The Alpha channel in RGBA images typically represents transparency information . d. Image data is exclusively stored as vector graphics.

a, b, c

How well did you know this?

Not at all

Perfectly

Which of the following image file formats uses lossless compression and is vector-based, making it suitable for line plots and neural network architecture depictions without loss of resolution when zooming? a. JPEG b. PNG c. SVG d. GIF

How well did you know this?

Not at all

Perfectly

When working with matplotlib, which of the following components are correctly described? a. A ‘Figure’ represents the window you are plotting in, and it can be saved to image files . b. ‘Axes’ refer to the x-axis and y-axis lines on a plot . c. An ‘Axes’ object is what you plot on, and a Figure can contain multiple Axes . d. Matplotlib only supports plotting in an interactive mode where plots are shown immediately.

a, c

How well did you know this?

Not at all

Perfectly

Consider the following Matplotlib code snippet for creating a basic line plot: import matplotlib.pyplot as plt import numpy as np t = np.arange(0, 100) fig, ax = plt.subplots() ax.plot(t) ax.set_xlabel(‘Label for x’) ax.set_ylabel(‘Label for y’) fig.suptitle(‘Title of the figure’) plt.show() Which of the following statements are correct about this code? a. plt.subplots() returns a Figure object (fig) and an Axes object (ax) . b. ax.plot(t) adds a line to the Axes object . c. fig.suptitle() sets a super-title for the entire figure window . d. This example demonstrates the pyplot style, where pyplot implicitly manages figures and axes.

a, b, c (d is incorrect, it shows the object-oriented style. The plt.plot(t); plt.show() without fig, ax = plt.subplots() would be more pyplot-style )

How well did you know this?

Not at all

Perfectly

Given the following Matplotlib code for creating multiple subplots: fig, ax = plt.subplots(2, 3) # ax is a 2x3 array of Axes ax[0, 0].plot(t) ax[0, 1].plot(t) ax[1, 0].plot(-t) ax[1, 2].plot(t, label=’data t’) ax[1, 2].plot(-t, label=’data -t’) ax[1, 2].legend() fig.tight_layout() plt.show() Which of the following statements are true? a. The code creates a figure with 6 subplots arranged in 2 rows and 3 columns . b. The legend will appear on ax[1, 2] because legend() is called on that specific Axes object after plots with labels are added to it . c. fig.tight_layout() adjusts subplot parameters for a tight layout, preventing labels from being clipped . d. It’s not possible to plot multiple lines on the same Axes object (e.g., ax[1,2] plotting t and -t).

a, b, c

How well did you know this?

Not at all

Perfectly

When displaying scalar image data (e.g., a 2D grayscale array) with ax.imshow() in Matplotlib, how is color typically assigned to values by default? a. By randomly assigning colors to pixels. b. Using a colormap that maps values to colors . c. The image is displayed in its original colors, regardless of it being scalar data. d. Only black and white are used to represent the lowest and highest values, respectively.

How well did you know this?

Not at all

Perfectly

Which of the following plotting libraries is known for offering interactive plots and is particularly useful for web applications like Dash or Shiny? a. Matplotlib b. Seaborn c. Plotly d. Altair

How well did you know this?

Not at all

Perfectly

Which of the following statements are true regarding the use of Seaborn? a. Seaborn is built on top of Matplotlib, acting as a high-level interface . b. It is particularly optimized to work well with Pandas DataFrames . c. sns.set_theme() applies a default aesthetic theme to plots . d. Seaborn’s pairplot() is useful for visualizing pairwise relationships in a dataset .

a, b, c, d

How well did you know this?

Not at all

Perfectly

Which of the following statements correctly describe the fundamental data structures in Pandas? a. A Series is a 1D ordered data structure where every element has an index/label, and typically contains a single data type . b. A DataFrame is a 2D data structure for tabular data, comparable to a spreadsheet or an SQL table . c. A DataFrame can be thought of as multiple Series (columns) that share the same index . d. Unlike NumPy arrays, a DataFrame can support different data types in its columns .

a, b, c, d

How well did you know this?

Not at all

Perfectly

Consider a Pandas DataFrame df with a custom index like Index([‘a’, ‘b’, ‘c’], dtype=’object’). Which of the following methods would reset the index to the default integer indexing (from 0 to n-1) and avoid adding the old index as a new column? a. df.reset_index() b. df.reset_index(inplace=True) c. df.reset_index(drop=True) d. df.set_index(range(len(df)))

How well did you know this?

Not at all

Perfectly

When indexing and slicing a Pandas DataFrame, loc and iloc are commonly used. Which of the following statements correctly distinguish loc and iloc? a. df.loc selects a row using its label . b. df.iloc selects a row using its integer position . c. When slicing rows using labels with df.loc, the upper bound is inclusive . d. When slicing rows using integer indices with df.iloc, the upper bound is inclusive .

a, b, c (d is incorrect, iloc upper bound is exclusive)

How well did you know this?

Not at all

Perfectly

Which of the following Pandas methods are used for inspecting or cleaning data, as covered in the lecture? a. head() and tail() to check the first or last few rows . b. describe() to get basic statistics about numerical variables . c. isna() to check for missing values . d. dropna() to remove rows with missing values or fillna() to replace them . e. duplicated() and drop_duplicates() to find and remove duplicate rows . f. astype() to adjust data types of columns . g. replace() to change values . h. str.strip() to clean up strings within columns .

a, b, c, d, e, f, g, h

Regarding data types in Pandas, which of the following are true? a. Missing values typically show up as NaN (Not a Number) by default . b. Pandas offers nullable Data Types (e.g., Int64, Float64, boolean) that can contain missing values, which show up as NA (Not Available) . c. datetime64 is a data type for date and time, with NaT (Not a Time) representing missing values . d. NumPy has strong string support, so object dtype is always the preferred way to store strings in Pandas.

a, b, c (d is incorrect, NumPy has limited string support, object is default but conversion to string is preferred ).

Consider the following DataFrame df_p and a boolean mask created for filtering: # Assume df_p contains a 'bill_length_mm' column # and 'species' column as shown in LV 30 # Example: df_p.loc[df_p['bill_length_mm'] > 50, 'species'].value_counts() Which of the following methods can be used to create boolean masks for filtering a DataFrame? a. Direct comparison operators (e.g., df_p['bill_length_mm'] > 50) . b. The isin() method . c. df_p.filter() with a lambda function. d. df_p.where() with a condition.

a, b

Which of the following statements about aggregating values in Pandas DataFrames are correct? a. Simple aggregations like calculating the mean can be done by calling df.mean(), which returns column means . b. The agg() method allows applying multiple aggregation functions, including lambdas . c. agg() is particularly useful when combined with groupby() . d. pivot_table() can be used to summarize data and allows specifying index, columns, values, and aggfunc .

a, b, c, d

Regarding 'Tidy Data' principles, which of the following characteristics define tidy data? a. Each variable is a column . b. Each observation is a row . c. Each value is a cell . d. Data is always stored in a wide format.

a, b, c (d is incorrect, tidy data is often in a 'long' format for statistical analysis ).

When performing arithmetic operations (e.g., add(), sub(), mul(), div()) between Pandas DataFrames or Series, what is a crucial aspect for alignment? a. Only integer indices are used for matching elements. b. The dimensions must exactly match, otherwise an error is raised. c. Indices and column names are a crucial component, and non-matching positions are filled with NaN . d. The axis parameter specifies whether column names or indices should be used for matching .

Shiny for Python allows developers to create interactive web applications. Which of the following are core motivations for using a framework like Shiny? a. Incorporating interactive user interface (UI) elements into an application . b. Efficiently distributing Python applications . c. Automating the deployment of machine learning models to production environments. d. Replacing the need for Python scripts and Jupyter Notebooks entirely.

a, b

Shiny for Python comes in two flavors: Shiny Core and Shiny Express. Which of the following statements accurately describe their relationship and differences? a. Shiny Core is essentially the Python version of the R package . b. Shiny Express is a newer version specifically developed for Python in 2024 . c. For complex and extensive apps, Shiny Core generally offers more functionality than Express . d. Shiny Core uses nested function calls for structuring UI container components, while Shiny Express uses context managers with the with statement .

a, b, c, d

In Shiny Express, how does one typically define input elements (e.g., a dropdown menu or a file upload button) and access their values? a. Input elements are defined using functions in the ui module, such as ui.input_select() . b. Each input element must have a unique id . c. The values of input elements are accessed via methods of the input object, using the same name as their id (e.g., input.selectBox1()) . d. Input values are accessed directly as Python variables without needing the input object.

a, b, c

Which of the following decorators or functions are used in Shiny to implement reactivity and manage reactive dependencies? a. @render._ (e.g., @render.text) to generate UI elements based on a function's return value . b. @reactive.calc for computations whose return value depends on inputs and can be reused to avoid duplicate code . c. @reactive.effect for functions that do not return anything but are used for reactive side effects (e.g., logging) . d. @reactive.event to ensure a reactive function only responds to specific listed dependencies . e. reactive.value() to create an object that other reactive functions can depend upon, similar to an input .

a, b, c, d, e

When using reactive.value() in Shiny to store a state, such as a list of entered values, why is it explicitly stated in the lecture notes that append might not work as expected to trigger reactivity, and instead, one should create a new list and assign it? a. Python's append method modifies the list in-place, and in-place changes to mutable objects stored in reactive.value do not trigger a re-evaluation of dependent reactive functions . b. reactive.value() only accepts immutable objects. c. It's a performance optimization to force a new list creation. d. The append method is not available for reactive.value objects.

Shiny for Python provides various layout options beyond basic sequential arrangement. Which of the following layout components are mentioned in the lecture slides? a. Sidebar layout . b. Navbars and Tabs . c. Panels/Cards . d. ui.layout_columns for multi-column layouts .

a, b, c, d

Consider the following Shiny Express code snippet for dynamically updating choices in a select input: import pandas as pd from shiny import reactive from shiny.express import input, render, ui ui.input_file(id='myFile', label='Upload a csv file', accept='.csv') ui.input_selectize(id='mySelect', label='Choose a column', choices=[], selected='', multiple=True) @reactive.effect @reactive.event(input.myFile) def update_choices(): file_info = input.myFile() if file_info: df = pd.read_csv(file_info[0]['datapath']) choices = df.columns.values.tolist() ui.update_selectize('mySelect', choices=choices) Which of the following statements are true about this dynamic UI behavior? a. The mySelect dropdown starts with an empty list of choices . b. When a file is uploaded to myFile, the update_choices function is triggered . c. ui.update_selectize() is used to dynamically change the available choices in the select input . d. The datapath key in file_info provides the name of the uploaded file as provided by the browser.

a, b, c (d is incorrect, datapath is the path to a temporary file, name is the browser-provided name )

Which of the following frameworks for Python web applications are discussed in the lecture slides as alternatives or comparisons to Shiny? a. Streamlit b. Dash c. Tkinter d. Flask

a, b, d

PyTorch provides several advantages for machine learning projects compared to purely using NumPy. Which of the following are highlighted as key benefits of PyTorch? a. It offers a Pythonic API that feels similar to working with NumPy . b. It provides a flexible and dynamic computational graph with automatic differentiation support (autograd) . c. It has excellent GPU support, allowing data to be moved to VRAM for parallel computations . d. It has a huge ecosystem (e.g., TorchVision, TorchAudio) and a strong community .

a, b, c, d

Tensors are the primary data structure in PyTorch. Which of the following are valid ways to create a PyTorch tensor, as demonstrated in the lecture notes? a. From a Python list: torch.tensor([]) b. From a NumPy array: torch.tensor(np.array()) c. Using convenience functions like torch.zeros(), torch.ones(), torch.rand() . d. Using ones_like() or randn_like() to create a new tensor with the same properties as an existing one, optionally overriding data type or device .

a, b, c, d

Which of the following statements about PyTorch tensor properties are correct? a. x.dtype returns the data type of the tensor (e.g., torch.float32, torch.int64) . b. x.shape returns the dimensions of the tensor . c. x.requires_grad indicates whether gradients need to be computed for the tensor during backpropagation . d. x.is_leaf indicates if the tensor is a leaf tensor in the computational graph, meaning it was created by the user and not as a result of an operation .

a, b, c, d

Consider the following tensor operations: x = torch.rand(2, 3) # Example tensor y_1 = x @ x.T # Operation 1 y_2 = x * x # Operation 2 Which of the following statements are true about the operations shown? a. Operation 1 (x @ x.T) performs matrix multiplication . b. Operation 2 (x * x) performs element-wise multiplication . c. x.matmul(x.T) is an alternative way to perform matrix multiplication . d. In-place operations, indicated by a trailing underscore (e.g., add_()), are generally discouraged when using autograd .

a, b, c, d

In the context of gradient-based optimization for neural networks, what is the purpose of a 'loss function'? a. It computes the derivative (gradient) of the model output with respect to the input. b. It quantifies the difference between the model's output (prediction) and the true target value . c. It aims to maximize the distance between the prediction and the target during training. d. A high loss value indicates that the model's prediction is very close to the target.

When minimizing the loss of a neural network using gradient-based optimization, which of the following steps are performed iteratively? a. Compute the derivative (gradient) of the model loss with respect to the current model weights . b. Change the weights a little bit in the direction of the steepest ascent. c. Change the weights a little bit (scaled by the learning rate η) into the direction of the steepest descent . d. Directly compute the global minimum of the loss function.

a, c

Regarding torch.utils.data.Dataset and torch.utils.data.DataLoader, which of the following statements are correct? a. A custom Dataset class should derive from Dataset and implement __getitem__() to return one sample and __len__() to return the total number of samples . b. DataLoader extracts minibatches of samples from a Dataset instance . c. DataLoader supports shuffling and multiprocessing, though multiprocessing might affect determinism . d. It is generally recommended to disable shuffling in the validation and test sets .

a, b, c, d

Consider the following Python code snippet for defining a neural network module: import torch.nn as nn class MyNetwork(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 32, kernel_size=3) self.relu = nn.ReLU() self.linear = nn.Linear(32 * 26 * 26, 10) # assuming 28x28 input, no padding/pooling here self.flatten = nn.Flatten() def forward(self, x): x = self.conv1(x) x = self.relu(x) x = self.flatten(x) x = self.linear(x) return x Which of the following statements about torch.nn.Module and this implementation are correct? a. nn.Module is the base class for all neural network modules in PyTorch . b. The __init__() method is used to define the layers and architecture of the model . c. The forward() method specifies how the input data flows through the network layers to produce an output . d. When my_model = MyNetwork(); output = my_model(input_data) is called, it implicitly executes the forward method .

a, b, c, d

Which of the following statements are true about the automatic parameter registration within torch.nn.Module? a. When a torch.nn.Parameter instance is assigned as a module attribute, it is automatically registered as a trainable parameter . b. When another torch.nn.Module instance (a submodule) is assigned as an attribute, its own parameters are automatically registered as part of the parent module . c. A standard Python list containing nn.Module instances will automatically register all parameters within those modules . d. To move a model and its parameters to a specific device (e.g., GPU), you can use model.to(device=...) .

a, b, d (c is incorrect, as seen in the old exam, a plain Python list of modules will be ignored for automatic registration).

Regarding different types of Neural Networks, which of the following statements are correct? a. Fully-connected Feed-Forward Neural Networks (FFNNs) typically flatten multi-dimensional inputs into a single vector . b. Convolutional Neural Networks (CNNs) use weight kernels that are convolved along an input tensor . c. 2D CNNs are commonly used for images, where kernels are convolved over two spatial dimensions . d. Recurrent Neural Networks (RNNs) apply the same weights to each input in a sequence, having access to output/hidden state from previous inputs .

a, b, c, d

Which of the following activation functions are discussed in the lecture as commonly used in neural networks? a. Rectified Linear Unit (ReLU), known for being computationally inexpensive and avoiding vanishing gradients . b. Scaled Exponential Linear Unit (SELU), which has self-normalizing properties . c. Sigmoid, which outputs values in the range (0, 1) but can be susceptible to vanishing gradients . d. Mean Squared Error (MSE).

a, b, c (d is a loss function, not an activation function)

PyTorch's autograd system is crucial for efficient neural network training. What is its primary function? a. Automatically optimizing hyperparameters for the model. b. Keeping track of the computational graph of operations to automatically compute gradients . c. Performing forward passes through the network. d. Saving memory by not storing intermediate computation results.

When performing weight updates in a PyTorch training loop using an optimizer (e.g., torch.optim.SGD), which of the following sequence of steps is the correct order to update the model parameters after computing the loss for a batch? a. Compute loss -> Reset gradients -> Perform weight update -> Compute gradients b. Compute loss -> Compute gradients (loss.backward()) -> Perform weight update (optimizer.step()) -> Reset gradients (optimizer.zero_grad()) c. Reset gradients -> Compute loss -> Compute gradients -> Perform weight update d. Compute loss -> Perform weight update -> Compute gradients -> Reset gradients

b (This matches old exam questions 11 & 13 from June 2023, and 13 & 25 from Dec 2024, 10 & 23 from Sep 2023)

Which of the following are common loss functions suitable for different machine learning tasks, as covered in the lecture? a. Mean-squared error (torch.nn.MSELoss) for regression tasks . b. Cross entropy (torch.nn.CrossEntropyLoss or torch.nn.BCEWithLogitsLoss) for classification tasks . c. ReLU (Rectified Linear Unit). d. Adam Optimizer.

a, b (c is an activation function, d is an optimizer)

Early stopping is a technique used during neural network training. Which of the following statements about early stopping are accurate? a. It checks the model loss on a validation set at regular intervals . b. It can help counter overfitting by stopping training when validation loss stops improving . c. It typically saves the model with the best validation loss as the final model . d. It guarantees to find the globally optimal model parameters.

a, b, c

Which of the following techniques are discussed as forms of regularization to counter overfitting in neural networks? a. Dropout (randomly dropping out features or inputs) . b. Weight penalty terms (e.g., L1 or L2 penalties added to the loss) . c. Adding random noise to inputs or features . d. Increasing the complexity of the model significantly.

a, b, c

When monitoring a model during training, which of the following aspects are useful to track, as suggested in the lecture? a. Histograms of weights, gradients, and activations . b. Line plots of loss and regularization terms for both training and validation sets . c. TensorBoard for development and CSV files for final evaluation . d. Whether the gradients and weights are changing and if they are reasonable values .

a, b, c, d

For a binary classification task where samples belong to either a positive (P) or negative (N) class, a Confusion Matrix lists specific entries. Which of the following definitions are correct? a. True Positive (TP): Number of samples correctly classified as positive . b. False Positive (FP): Number of samples wrongly classified as positive (Type I error, false alarm) . c. True Negative (TN): Number of samples correctly classified as negative . d. False Negative (FN): Number of samples wrongly classified as negative (Type II error, miss) .

a, b, c, d

Consider the following Confusion Matrix: Predicted Positive Negative Actual Positive 40 10 Actual Negative 35 15 Which of the following statements are correct based on this matrix? a. There are 40 True Positives (TP) . b. There are 15 True Negatives (TN) . c. There are 35 False Positives (FP) . d. The total number of samples is 100.

a, b, c, d

Using the confusion matrix from the previous question: Predicted Positive Negative Actual Positive 40 10 Actual Negative 35 15 Calculate the True Positive Rate (TPR), also known as Sensitivity or Recall. a. TPR = 40 / (40 + 35) = 0.5333 b. TPR = 40 / (40 + 10) = 0.8 c. TPR = 15 / (15 + 35) = 0.3 d. TPR = (40 + 15) / (40 + 10 + 35 + 15) = 0.55

Accuracy (ACC) is a common evaluation metric, calculated as (TP + TN) / (TP + TN + FP + FN). However, it can be misleading, especially with imbalanced classes. Which of the following metrics are better suited for evaluating models on imbalanced datasets? a. Balanced Accuracy (BACC) . b. Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) . c. F1-score . d. Mean Absolute Error (MAE).

a, b, c (d is for regression)

What does the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) represent? a. The proportion of correctly classified samples. b. The probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one . c. The tradeoff between precision and recall. d. The sensitivity of the model at a fixed threshold of 0.5.

For regression models, which of the following are common measures to evaluate performance? a. Mean Absolute Error (MAE) . b. Mean Squared Error (MSE) . c. Root Mean Squared Error (RMSE) . d. Cross Entropy.

a, b, c (d is for classification)

When performing hyperparameter optimization, the lecture suggests using a 'Hold-Out Test Set' strategy. Which of the following describes this strategy? a. Randomly partitioning a dataset into training, validation, and test sets . b. Using the training set to train a model, the test set for independent performance estimation, and the validation set for hyperparameter optimization . c. Obtaining a performance estimate that is always perfectly reliable, even with a small test set. d. Using the test set for training and the training set for evaluation.

a, b (c is incorrect: 'Performance estimate not necessarily reliable if test set is small' )

Cross-validation (CV) is an alternative evaluation strategy, especially useful for small datasets. Which of the following statements about CV are true? a. It splits the dataset into n disjoint folds . b. In n-fold CV, n-1 folds are used as the training set, and the remaining fold is used as the validation set, repeating n times . c. It provides a better estimate of generalization capability by averaging over n estimated risks on validation sets . d. The test set is typically kept separate from the cross-validation folds .

a, b, c, d

Which of the following are the two main options for using pre-trained models in transfer learning, as discussed in the lecture? a. Training a model from scratch with a very large dataset. b. Feature extraction, where the backbone of the pre-trained model is used, and only a new classification head (or last layer) is trained . c. Fine-tuning, where specific parts or the entire pre-trained model are trained on the new dataset . d. Manually engineering new features from the raw data.

b, c

When fine-tuning a pre-trained model, what is a common strategy to adapt it to a new, smaller dataset, especially to save computational resources? a. Training all layers of the pre-trained model simultaneously. b. Freezing the parameters of the model backbone (setting requires_grad = False) and only training a newly added classifier head or last layer . c. Increasing the learning rate for all layers of the model. d. Applying aggressive data augmentation only to the test set.

Which of the following statements about lambda functions in Python are true?a. A lambda function can contain multiple expressions, separated by commas.b. A lambda function must be assigned to a variable before it can be used.c. Any function defined with `def` can be rewritten as a lambda function.d. Lambda functions can accept multiple arguments.

d. Lambda functions can accept multiple arguments.

Consider the following list of file names: `files = ['f1', 'f3', 'f2', 'f12', 'f22']`. Which of the following code snippets will correctly sort these files numerically (i.e., `f1, f2, f3, f12, f22`)?a. `sorted(files)`b. `sorted(files, key=lambda name: name[1:])`c. `sorted(files, key=lambda name: int(name[1:]))`d. `files.sort(key=lambda name: int(name.strip('f')))`

c. `sorted(files, key=lambda name: int(name[1:]))`, d. `files.sort(key=lambda name: int(name.strip('f')))`

What is the output of the following code snippet?```pythonfrom functools import reducedata = [1, 2, 3, 4, 5]result = reduce(lambda x, y: x + y*2, data)print(result)```a. 15b. 29c. 25d. The code will raise a `TypeError`.

b. 29

Given the `Pet` class from the lecture materials and the following list:```pythonclass Pet: def __init__(self, name, age): self.name = name self.age = age def __repr__(self): return self.namepets = [Pet('Fido', 3), Pet('Rex', 5), Pet('Buddy', 2)]```Which of the following expressions will correctly identify the oldest pet?a. `max(pets)`b. `max(pets, key=lambda pet: pet.age)`c. `sorted(pets, key=lambda pet: pet.age)[-1]`d. `pets.sort(key=lambda x: x.age)`

b. `max(pets, key=lambda pet: pet.age)`, c. `sorted(pets, key=lambda pet: pet.age)[-1]`

Which of the following are considered callable objects in Python by default?a. An instance of a class that does not define a `__call__` method.b. A method of a class instance.c. A class itself.d. A lambda function.

b. A method of a class instance., c. A class itself., d. A lambda function.

Consider the following class:```pythonclass Multiplier: def __init__(self, factor): self.factor = factor def __call__(self, value): return self.factor * value```What will be the output of the following code?```pythondouble = Multiplier(2)result = double(10)print(result)```a. The code will raise a `TypeError` because `double` is an object, not a function.b. 20c. 10d. `None`

b. 20

What is the primary advantage of making a class instance callable by implementing the `__call__` method, as opposed to using a standard instance method?a. It is the only way to pass arguments to an object.b. It provides a more function-like syntax and is useful for creating objects that behave like functions but need to maintain an internal state.c. It improves performance by bypassing standard method dispatch.d. It allows the object to be used in `for` loops.

b. It provides a more function-like syntax and is useful for creating objects that behave like functions but need to maintain an internal state.

What is the primary purpose of using `*args` and `**kwargs` in a decorator's wrapper function?a. To make the decorator run faster.b. To allow the decorator to be applied to any function, regardless of its parameters.c. To automatically cache the results of the function.d. To ensure the decorator can only be used with functions that have no arguments.

b. To allow the decorator to be applied to any function, regardless of its parameters.

Consider the following code:```pythonfrom functools import wrapsdef my_decorator(func): # @wraps(func) # This line is commented out def wrapper(): \"\"\"This is the wrapper's docstring.\"\"\" return func() return wrapper@my_decoratordef original_function(): \"\"\"This is the original docstring.\"\"\" passprint(original_function.__name__)print(original_function.__doc__)```What will be printed to the console?a. `original_function` and `This is the original docstring.`b. `wrapper` and `This is the wrapper's docstring.`c. `wrapper` and `This is the original docstring.`d. `original_function` and `This is the wrapper's docstring.`

b. `wrapper` and `This is the wrapper's docstring.`

You need a decorator that logs the time it takes for a function to execute. This decorator should also accept an argument to specify the logging level (e.g., 'INFO', 'DEBUG'). How many levels of nested functions would be required to implement this decorator correctly?a. One (`wrapper`).b. Two (`decorator` and `wrapper`).c. Three (an outer function to accept the argument, a `decorator` function, and a `wrapper` function).d. It can be done with a single-level class decorator.

c. Three (an outer function to accept the argument, a `decorator` function, and a `wrapper` function).

What is the output of the following code?```pythonfrom functools import cache@cachedef factorial(n): print(f\"Calculating {n}!\") return n * factorial(n-1) if n > 1 else 1factorial(3)factorial(4)```a. `Calculating 3!`, `Calculating 2!`, `Calculating 1!`, `Calculating 4!`, `Calculating 3!`, `Calculating 2!`, `Calculating 1!`b. `Calculating 3!`, `Calculating 2!`, `Calculating 1!`, `Calculating 4!`c. `Calculating 3!`, `Calculating 2!`, `Calculating 1!`, `Calculating 4!`, `Calculating 3!`d. The code will result in an infinite recursion error.

b. `Calculating 3!`, `Calculating 2!`, `Calculating 1!`, `Calculating 4!`

Which of the following statements correctly describe the `@singledispatch` decorator?a. It allows a function to have different implementations based on the number of arguments it receives.b. It allows a function to have different implementations based on the type of its first argument.c. It requires the `multipledispatch` library to be installed.d. When used on an instance method, it dispatches based on the type of `self`.

b. It allows a function to have different implementations based on the type of its first argument.

Which of the following statements best describes the primary use case for asynchronous functions in Python?a. To perform CPU-intensive calculations faster by using multiple CPU cores.b. To improve the performance of I/O-bound operations by allowing the program to execute other tasks while waiting for I/O to complete.c. To simplify the syntax of complex, nested function calls.d. To guarantee that tasks are executed in a specific, sequential order.

b. To improve the performance of I/O-bound operations by allowing the program to execute other tasks while waiting for I/O to complete.

What is the role of the `await` keyword in an `async def` function?a. It defines the start of an asynchronous function block.b. It pauses the execution of the current coroutine, yields control to the event loop, and waits for an awaitable object to complete before resuming.c. It immediately executes a function in a separate thread.d. It is syntactic sugar for creating a `threading.Thread` object.

b. It pauses the execution of the current coroutine, yields control to the event loop, and waits for an awaitable object to complete before resuming.

Consider the following asynchronous program:```pythonimport asyncioimport timeasync def task_a(): print(\"Task A starting\") await asyncio.sleep(2) print(\"Task A finished\")async def task_b(): print(\"Task B starting\") await asyncio.sleep(1) print(\"Task B finished\")async def main(): await asyncio.gather(task_a(), task_b())asyncio.run(main())```What will be the order of the printed messages?a. `Task A starting`, `Task A finished`, `Task B starting`, `Task B finished`b. `Task A starting`, `Task B starting`, `Task A finished`, `Task B finished`c. `Task A starting`, `Task B starting`, `Task B finished`, `Task A finished`d. The order is non-deterministic and can vary between runs.

c. `Task A starting`, `Task B starting`, `Task B finished`, `Task A finished`

Which of the following are true about the `asyncio` event loop?a. It is responsible for scheduling and executing coroutines.b. A Python program can have multiple independent event loops running simultaneously in the same thread.c. `asyncio.run()` is a high-level function that creates an event loop, runs a coroutine, and closes the loop.d. The event loop uses multiprocessing to run tasks on different CPU cores.

a. It is responsible for scheduling and executing coroutines., c. `asyncio.run()` is a high-level function that creates an event loop, runs a coroutine, and closes the loop.

Why is it problematic to define PyTorch torch.nn.Module submodules (like torch.nn.Linear) directly within the forward method of a custom nn.Module class, as opposed to defining them in the __init__ method? A. Submodules defined in forward are automatically registered as parameters, leading to redundant memory usage. B. Defining submodules in forward prevents their parameters from being properly registered with the main nn.Module, making them untrainable and re-instantiated in every forward pass. C. The forward method is only for operations, not for module instantiation, which would cause a TypeError. D. This practice would lead to a static computational graph, preventing dynamic changes during training.

What is the fundamental difference between Python's asyncio (asynchronous programming) and true parallel processing (e.g., using the multiprocessing module) in terms of CPU utilization? A. asyncio executes tasks on multiple CPU cores simultaneously, while parallel processing switches rapidly between tasks on a single core. B. asyncio is suitable for CPU-bound tasks, whereas parallel processing is best for I/O-bound tasks. C. asyncio allows concurrent execution of tasks on a single CPU core, yielding control during I/O operations, while parallel processing runs tasks simultaneously on multiple CPU cores. D. Parallel processing uses async/await keywords, while asyncio uses separate processes to achieve parallelism.

What is a primary advantage of using Pandas' nullable data types (e.g., Int64, Float64, string) compared to standard NumPy data types when dealing with real-world datasets that often contain missing values? A. Nullable types automatically impute missing values with the mean or median, simplifying preprocessing. B. Standard NumPy types do not support any representation for missing values, requiring external libraries. C. Pandas' nullable types provide a dedicated NA indicator for missing values, which is distinct from numerical NaN and allows for more consistent type handling across different data types (e.g., integers, booleans, strings). D. Nullable types reduce memory footprint significantly compared to NumPy's default types for similar data.

According to the provided materials, which set of three characteristics accurately defines "Tidy Data"? A. Each dataset is a file, each variable is a row, and each observation is a cell. B. Each value is a cell, each row is a variable, and each column is an observation. C. Each variable is a column, each observation is a row, and each value is a cell. D. Each table is normalized, each primary key is unique, and each foreign key is indexed.

Consider the advanced use of Python decorators. Which statement correctly describes how a decorator implemented as a class manages state and identifies a standard library decorator for caching? A. A decorator class manages state by passing mutable objects as arguments to the decorated function; functools.lru_cache is used for caching. B. A decorator implemented as a class maintains state through instance attributes of the decorator object, which persists across calls; functools.cache provides memoization. C. State management in decorator classes is achieved by dynamically modifying the decorated function's __globals__ dictionary; functools.property enables caching. D. Decorator classes inherently reset their state with each function call; functools.partial is the recommended caching decorator.

In PyTorch, when configuring a torch.utils.data.DataLoader for training with shuffle=True and num_workers > 0, what is crucial for ensuring the reproducibility of minibatch samples and reproducible dataset splits? A. Setting torch.backends.cudnn.deterministic = False and using torch.utils.data.ConcatDataset for splitting. B. Disabling shuffle in the DataLoader and manually indexing the dataset for splits. C. Implementing careful seed management for both the random number generators and the DataLoader, and utilizing torch.utils.data.Subset to create fixed training, validation, and test splits. D. Setting pin_memory=True in the DataLoader and storing the entire dataset in memory before splitting.

When choosing a loss function in PyTorch for classification, why are torch.nn.BCEWithLogitsLoss and torch.nn.CrossEntropyLoss often preferred over using torch.nn.BCELoss or torch.nn.NLLLoss (Negative Log Likelihood Loss) combined with explicit activation functions (like torch.nn.Sigmoid or torch.nn.LogSoftmax) in the model's output layer? A. BCEWithLogitsLoss and CrossEntropyLoss compute gradients much faster due to simplified internal graph structures. B. These combined loss functions inherently integrate the activation function, leading to improved numerical stability and preventing issues like vanishing/exploding gradients during backpropagation. C. They automatically perform data normalization on the output logits, which is essential for accurate gradient calculations. D. They are compatible with a wider range of optimizers than the separate activation + loss function approach.

For a highly imbalanced binary classification problem (e.g., fraud detection), why is accuracy an unreliable primary evaluation metric, and which set of metrics provides a more informative assessment of a model's performance? A. Accuracy is problematic because it is sensitive to outliers; MAE and RMSE are better. B. Accuracy can be misleading because a high score can be achieved by simply predicting the majority class; Recall, Precision, F1-score, and ROC AUC are more suitable. C. Accuracy is too computationally expensive for large datasets; computational time and memory usage are preferred. D. Accuracy only measures True Positives; True Negatives and False Positives should be used instead.

In a binary classification task, after a model outputs probabilities (or logits), why and how might one determine an "optimal" classification threshold that deviates from the default 0.5? A. The threshold is adjusted to minimize computational cost; this is typically done by lowering it to 0.1. B. Adjusting the threshold is crucial to balance the model's precision and recall according to specific application goals (e.g., maximizing recall in fraud detection), often guided by analysis of the ROC or Precision-Recall curve. C. Optimal thresholding is an outdated technique, as modern models inherently learn the best threshold during training. D. The threshold should always be set to the mean of the predicted probabilities to ensure unbiased classification.

What is a significant advantage of PyTorch's dynamic computational graph (the "define-by-run" approach) provided by autograd, especially when compared to static graph frameworks? A. Dynamic graphs allow for faster deployment to production environments because they are pre-compiled. B. They enforce stricter model architecture definitions, leading to more robust and less error-prone code. C. The dynamic nature allows for flexible model design, easier debugging, and conditional execution paths during forward passes, as the graph is built on-the-fly. D. Dynamic graphs automatically perform hyperparameter tuning, reducing the need for manual optimization.

Describe the distinct mechanisms by which L2 penalty (weight decay) and Dropout act as regularization techniques in neural networks to mitigate overfitting. A. L2 penalty randomly sets a fraction of neuron activations to zero during training, while Dropout adds a penalty proportional to the squared magnitude of weights to the loss function. B. L2 penalty encourages smaller weights by adding a penalty to the loss function, thereby reducing model complexity, while Dropout randomly deactivates neurons during training, preventing complex co-adaptations and forcing the network to learn more robust features. C. Both L2 penalty and Dropout directly modify the network's architecture by removing layers during training to reduce overfitting. D. L2 penalty is a form of data augmentation, while Dropout is a technique for early stopping based on validation loss.

In transfer learning, when would fine-tuning the entire pre-trained model (Option 2) typically be a more advantageous strategy compared to using it solely as a feature extractor with a new classifier head (Option 1)? A. When the target dataset is very small, and its domain is very similar to the original training data of the pre-trained model. B. When computational resources are extremely limited, and rapid prototyping is the main goal. C. When the target dataset is large, its domain is significantly different from the original training data, and Option 1 (feature extraction) does not achieve sufficient performance. D. Fine-tuning is always preferred over feature extraction, as it guarantees better performance and faster convergence.

If a Pandas DataFrame has a non-unique index label (e.g., multiple rows share the same index value), what is the behavior of the .loc[] accessor when attempting to select data using that non-unique label? A. .loc[] will raise a KeyError because index labels must be unique for selection. B. .loc[] will return only the first row that matches the non-unique label. C. .loc[] will return all rows that match the non-unique label, potentially as a DataFrame or Series. D. .loc[] will automatically re-index the DataFrame to ensure uniqueness before selection.

What is the primary purpose of the collate_fn argument in torch.utils.data.DataLoader, and in which scenario would a custom collate_fn be essential? A. Its primary purpose is to apply data augmentation; it's essential when transformations are too complex for torchvision.transforms. B. It defines how individual samples are fetched from the dataset; a custom one is needed for very large datasets stored on disk. C. It specifies how individual samples from the Dataset are combined and structured into a minibatch; it is essential when samples within a batch have varying shapes (e.g., different image sizes or sequence lengths) and require custom padding or processing before stacking into a single tensor. D. Its purpose is to handle multi-threaded data loading; it's essential when num_workers > 0.

When implementing a Python decorator, why is it considered crucial to use functools.wraps (or manually copy relevant attributes) on the wrapper function, and what problem does it solve? A. It improves the runtime performance of the decorated function by caching its results. B. It ensures that the decorated function's original signature (__name__, __doc__, __module__, etc.) is preserved, which is important for introspection, debugging, and documentation.' 'C. It automatically handles error logging for the decorated function, directing exceptions to a specified file.' 'D. It allows the decorator to accept arguments dynamically, without needing an outer function.

Python II Flashcards

(100 cards)