Programming Flashcards

1
Q

Advanced indexing (PyTorch)

A

A powerful feature that allows users to access and manipulate specific elements or subsets of a tensor using advanced indexing techniques. This includes boolean masking, integer array indexing, and using tensor indices to select elements along specific dimensions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Application Programming Interface (API)

A

Set of rules and protocols that allows different software applications to communicate and interact with each other. APIs define the methods and data formats that applications can use to request and exchange information. They facilitate the development of software by providing a standardized way for developers to access functionality or services provided by other applications, libraries, or platforms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Array

A

A data structure that stores a collection of elements of the same data type in contiguous memory locations. Arrays offer efficient access to elements using index-based retrieval and support various operations such as insertion, deletion, and traversal. They are fundamental in programming and are used to represent vectors, matrices, and multidimensional data structures in languages like Python, Java, and C++.

Arrays in Python are data structures that store collections of elements of the same data type in contiguous memory locations. Unlike traditional arrays in languages like C or Java, Python arrays are implemented using the array module or the more versatile numpy library. Arrays in Python provide efficient access to elements through index-based retrieval and support various operations such as insertion, deletion, and traversal. They are commonly used to represent vectors, matrices, and multidimensional data structures.

Arrays are created by calling a method, not just constructed when used [] as this is reserved for lists

array = array.arrray[0,4,0,9,1]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Assert

A

Programming construct used to test assumptions or conditions within code. It evaluates a Boolean expression and throws an exception or raises an error if the condition is false, indicating a violation of expected behavior. Assert statements are commonly employed in unit testing to validate program correctness and identify errors early in the development process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Atribute

A

Represents a piece of data associated with an object. Attributes describe the state of an object. For example, in a Car class, color could be an attribute representing the color of the car.

We call an atribute like this:
object.atribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Autograd (PyTorch)

A

Built in function in PyTorch. Autograd in PyTorch is the heart of its deep learning capabilities. It’s a powerful automatic differentiation engine that allows you to efficiently compute gradients (rates of change) for any operation performed on tensors. Here’s a breakdown:

Automatic Differentiation: Imagine building a complex mathematical equation with tensors. Normally, calculating the gradients for each variable involved would be tedious and error-prone. Autograd automates this process.
Computational Graph: When you perform operations on tensors with requires_grad=True, PyTorch creates a computational graph behind the scenes. This graph tracks all the operations performed, essentially showing how each tensor depends on others.
Backpropagation: During training, when you calculate a loss function (how well your model performs), autograd uses the computational graph to efficiently backpropagate the error. It starts from the loss and works backward through the graph, calculating the gradients for each tensor involved.
Optimizer: These gradients are then used by an optimizer (like SGD) to update the weights and biases in your neural network, allowing it to learn and improve its predictions.
In simpler terms: Autograd acts like a magical bookkeeper, meticulously tracking every step in your calculations and then efficiently calculating the gradients you need to train your neural network effectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Boolean

A

Named after the mathematician George Boole, refers to a data type or algebraic system that represents two possible values: True and False. In Boolean algebra, these values are typically denoted as 1 for True and 0 for False. Boolean values are fundamental in computer science for logical operations, decision-making, and binary state representation. In Python, boolean values are represented by the bool type, and logical operations such as AND, OR, and NOT are performed using the keywords and, or, and not, respectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Buffer

A

A temporary storage area in computer memory used to hold data temporarily during input/output operations or between different processes. In the context of neural networks, a buffer refers to a temporary storage area used to hold intermediate or temporary data during the forward and backward passes of the training process. Buffers are commonly used to store activations, gradients, and other intermediate computations at different layers of the network. During the forward pass, input data is propagated through the network, and intermediate results are stored in buffers for subsequent computation. During the backward pass (backpropagation), gradients are computed with respect to the loss function, and intermediate gradients are stored in buffers to update the network parameters (weights and biases) through optimization algorithms such as gradient descent. Buffers play a crucial role in managing data flow and optimizing memory usage in neural network implementations, especially for large-scale models with many layers and parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Casting

A

Casting in Python, with functions like int(), float(), and str(), ensures data type compatibility and facilitates manipulation. Explicit conversion is common, while implicit casting occurs, such as in arithmetic operations. Handling errors, like incompatible type conversions, is essential for smooth execution. Python provides built-in functions for type conversion, allowing seamless transition between different data types. Care should be taken to ensure data integrity and prevent runtime issues. Overall, casting is a fundamental aspect of Python programming, enabling flexibility and versatility in data processing tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Class

A

A class is a blueprint for creating objects with specific attributes and behaviors. It encapsulates data (attributes) and behavior (methods) into a cohesive unit, promoting code organization and reusability. Objects are instances of classes, created using the class’s constructor method. Classes support inheritance, allowing subclasses to inherit attributes and methods from their superclass. This enables hierarchical organization of code and facilitates code reuse and modularity, essential principles in object-oriented programming.

Think of an object as a blueprint (class) brought to life. For example, a “Car” blueprint has properties (color, make, model) and behaviors (accelerate, brake, turn). A specific car you see on the street is an object—an instance of the “Car” class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Code interpreter

A

A software component responsible for executing code statements or instructions interactively. Interpreters translate and execute code directly, line by line, without the need for compilation. In machine learning, code interpreters facilitate rapid prototyping, debugging, and experimentation with algorithms and models, enhancing the development workflow and productivity of practitioners.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Code layouts

A

Code layout refers to the organization and structure of code within a file or project. It encompasses various aspects such as indentation, spacing, and commenting styles, which significantly affect code readability and maintainability. An effective code layout enhances collaboration among developers and reduces the likelihood of introducing errors during code modifications. Properly structured code layouts adhere to consistent conventions and principles, making it easier for developers to understand, debug, and extend the codebase over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Command line argparse

A

A module in Python’s standard library that facilitates the parsing of command-line arguments passed to Python scripts. It provides a user-friendly interface for creating powerful and flexible command-line interfaces. By defining arguments, options, and their corresponding actions, developers can effortlessly handle user inputs from the command line. Command line argparse simplifies argument parsing by automatically generating help messages and error handling mechanisms. It supports a wide range of argument types and validation rules, making it suitable for building robust and interactive command-line applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Command lines

A

Command lines serve as the primary interface for users to interact with computer programs by entering text commands into a terminal or command prompt. These commands typically instruct the operating system to execute specific actions or run programs. Command lines provide a versatile and efficient means of performing tasks such as file manipulation, system configuration, and program execution. Users can leverage command lines to navigate file systems, install software packages, manage processes, and automate repetitive tasks through scripting. Despite the prevalence of graphical user interfaces (GUIs), command lines remain indispensable for advanced users and system administrators due to their flexibility and scripting capabilities.

Example: Running Python scripts or executing system commands using the terminal or command prompt.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Compiler

A

A software tool that translates source code written in a high-level programming language into machine-readable binary code or executable files. Compilers analyze, optimize, and transform source code into an efficient form that can be executed on a target platform. In machine learning and artificial intelligence, compilers are used to optimize and accelerate code execution, particularly for performance-critical tasks such as training deep neural networks and executing inference on edge devices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Comprehentions

A

Comprehensions are concise and expressive syntax constructs in programming languages, such as list comprehensions, dictionary comprehensions, and set comprehensions. Comprehensions enable developers to create new data structures by iterating over existing ones and applying transformations or filters in a single line of code.

even_numbers = [number for number in numbers if number % 2 == 0]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Conditional breakpoint

A

A debugging feature that allows developers to pause program execution at specific points in the code only when certain conditions are met. Unlike regular breakpoints, which halt execution unconditionally, conditional breakpoints provide more flexibility by allowing developers to specify criteria for triggering the breakpoint. Common use cases include debugging loops, conditional branches, or complex logic where developers need to inspect variables or evaluate expressions under specific conditions. By setting conditional breakpoints, developers can streamline the debugging process and focus their attention on relevant code paths, thereby accelerating the identification and resolution of software bugs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Context managers

A

Objects that enable the management of resources within a block of code by automatically allocating and releasing them. They are typically used with the with statement, which ensures that the necessary setup and teardown actions are performed in a predictable and consistent manner. Context managers abstract away resource management complexities and help prevent resource leaks or conflicts by encapsulating resource-related logic within context manager objects. Common examples of context managers include file handles (open()), database connections, and locks. By using context managers, developers can write cleaner, more robust code that is easier to read, understand, and maintain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Control Flow

A

Control flow refers to the order in which instructions or statements are executed in a program or algorithm. In coding, control flow structures, such as loops, conditional statements, and function calls, govern the flow of execution and decision-making in algorithms and models. Control flow mechanisms enable the implementation of complex logic, iteration, and branching behavior in code, facilitating algorithmic design and problem-solving strategies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Data Frame

A

A two-dimensional labeled data structure used for storing and manipulating tabular data in programming languages like Python (Pandas), R, and Julia. It consists of rows and columns, where each column can be of a different data type (e.g., numerical, categorical, or text).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

DataLoader library

A

Library in PyTorch equiped allowing for loading and assembling many data inputs into batches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Debugging

A

Process of identifying, isolating, and resolving errors, or bugs, in computer programs. It plays a crucial role in software development by ensuring that programs behave as intended and meet the specified requirements. Debugging techniques range from simple print statements and logging to sophisticated debugging tools and techniques provided by integrated development environments (IDEs). Developers use debugging to trace the execution flow, inspect variable values, analyze stack traces, and identify the root causes of software defects. Effective debugging requires a systematic approach, critical thinking skills, and a deep understanding of the programming language and environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Decorators

A

A higher-order functions in Python that modify or enhance the behavior of other functions or methods without altering their core implementation. They achieve this by wrapping the target function with additional functionality, such as logging, caching, authentication, or error handling. Decorators are commonly used to enforce cross-cutting concerns, such as security policies or performance optimizations, across multiple functions within a codebase. They promote code reuse, modularity, and separation of concerns by allowing developers to encapsulate common functionalities within reusable decorator functions. Decorators are a powerful tool in Python’s arsenal, enabling developers to write clean, concise, and expressive code with minimal boilerplate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Dictionary

A

Dictionaries are sometimes found in other languages as “associative memories” or “associative arrays”. Dictionary is a set of key: value pairs, with the requirement that the keys are unique (within one dictionary).

Unlike sequences, which are indexed by a range of numbers, dictionaries are indexed by keys, which can be any immutable type; strings and numbers can always be keys. Tuples can be used as keys if they contain only strings, numbers, or tuples; if a tuple contains any mutable object either directly or indirectly, it cannot be used as a key. You can’t use lists as keys, since lists can be modified in place using index assignments, slice assignments, or methods like append() and extend().

A pair of braces creates an empty dictionary: {}.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Enumerate

A

The enumerate() method in Python is a built-in function used to iterate over a sequence (such as a list, tuple, or string) while also keeping track of the index or position of each item. It returns an iterator that yields pairs of (index, value) tuples, where index represents the index of the item in the sequence and value represents the corresponding item.

It is commonly used in for loops when you need to access both the index and the value of each item in a sequence simultaneously. It simplifies code by eliminating the need to manually manage index variables. It is also handy for constructing dictionaries or other data structures where you need both keys and values from a sequence.

You can specify a custom starting index for counting by providing the start parameter. For example, enumerate(sequence, start=1) will start counting from 1 instead of 0. The enumerate() function returns an enumerate object, which is an iterator. You can convert it to a list or tuple if needed using the list() or tuple() functions, respectively.

range() is used when you need to iterate over a sequence of numbers, typically for controlling the number of iterations in a loop or generating index values for accessing elements in a sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Evaluation mode in NN

A

The operational state, or a setting in neural network frameworks (like PyTorch or TensorFlow). When it’s used solely for making predictions or inferences on new, unseen data, without updating its parameters. In this mode, the network doesn’t learn from the input data; instead, it applies the learned parameters to produce output. During training you for example have activated dropouts . Evaluation mode is typically used during model evaluation, testing, or deployment phases.

model.eval() # Switches the model to evaluation mode
model.train() # Switches back to training mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Exception handling

A

Programming paradigm that focuses on managing and responding to errors, or exceptions, that occur during program execution. Exceptions represent abnormal or unexpected conditions that disrupt the normal flow of the program and require special handling to ensure graceful recovery or termination. Python provides robust support for exception handling through the try, except, finally, and raise keywords, allowing developers to intercept, handle, and propagate exceptions as needed. By implementing effective exception handling strategies, developers can improve the reliability, resilience, and maintainability of their software applications. Common exception handling techniques include logging errors, retrying failed operations, and providing informative error messages to users.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Float

A

Floats. Decimal point can “float” move before and after any number of digits. Floating-point number is a data type used to represent decimal numbers with a fractional component. Floats are typically stored using a fixed number of bits in memory, allowing them to represent a wide range of values, but with limited precision.

Floats are used to represent real numbers in applications where precision is required, such as scientific computing, numerical simulations, and machine learning. However, due to the finite precision of float representation, arithmetic operations on floats may introduce rounding errors, leading to numerical instability in certain computations. Techniques like double precision and arbitrary-precision arithmetic are used to mitigate these issues in critical applications.

7.6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Generators

A

Generators in Python are functions or expressions that enable the creation of iterators in a memory-efficient and lazy-evaluation manner. Unlike traditional functions that compute and return all values at once, generators produce values on-the-fly, one at a time, as they are requested by the consumer. This lazy evaluation strategy conserves memory and improves performance, especially when dealing with large or infinite sequences of data. Generators are implemented using the yield keyword, which suspends the function’s execution and yields a value to the caller. By leveraging generators, developers can write concise, expressive code for processing data streams, generating sequences, and implementing custom iterators with minimal overhead.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Iloc vs loc

A

In pandas, iloc and loc are methods used for selection and indexing in DataFrame objects. While both methods enable data access based on row and column labels, they differ in their indexing conventions. iloc stands for integer location and is used for selecting rows and columns by their integer position within the DataFrame. In contrast, loc stands for label location and is used for selecting rows and columns by their index or column labels.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

In what parts, files is NN model saved/exported

A

Model Architecture:

Structure: The configuration of layers (types, numbers, connections), activation functions, optimizer used, input/output shapes, etc. This blueprint defines the model itself.
Format: Can be a computational graph representation (TensorFlow), or a more declarative, serialization-friendly format.
Learned Weights:

Values: The numerical values of the weights for each connection within the neural network. These are meticulously adjusted during training and are crucial for the model’s ability to make predictions.
Format: Often stored in binary files optimized for fast loading.
What Might Sometimes Be Included

Optimizer State:

Includes information like momentum values, learning rate schedules, etc. This is less essential for basic inference, but useful when you want to resume training later.
Metadata:

Things like the original training dataset’s information, preprocessing steps, or class labels can sometimes be included for easier model management later.
How It’s Saved

Libraries:
TensorFlow: SavedModel format (a whole directory structure), or HDF5 files.
PyTorch: Typically using torch.save() which pickles the model objects.
Keras: HDF5 files are common, or you can integrate TensorFlow’s SavedModel.
Serialization: Saving the model in platform-independent formats like ONNX (Open Neural Network Exchange) for deployment across different frameworks.

it’s very common to save a neural network model in two separate files:

Architecture File:

Contains the model’s structure: layer types, connections, activation functions, etc.
Usually a text-based format like JSON or YAML for human readability.
Weights File:

Contains the learned weights and biases (numerical values) for all the connections in the network.
This is often a binary file optimized for efficient loading and computation.
Why Two Files?

Flexibility: Separating architecture from weights allows you to load the same architecture and initialize it with different weight sets (e.g., fine-tuning a pre-trained model, experimenting with random initializations).
Transferability: You might potentially reuse the model architecture with weights trained on a different dataset.

File 1: Model Architecture

JSON (JavaScript Object Notation): A hierarchical, text-based format widely used due to its simplicity and human readability. Libraries often provide easy ways to define and save the model structure as JSON.
YAML (Yet Another Markup Language): Similar to JSON but often considered slightly more human-friendly for configuration files.
Protocol Buffers: Language-neutral, platform-neutral mechanism by Google for serializing data. Can be more efficient than JSON or YAML in some cases.
Library-Specific Formats:
TensorFlow: Can be part of SavedModel, or its own structure saved within an HDF5 file.
PyTorch: Often uses a Python pickle format.
File 2: Learned Weights

HDF5 (Hierarchical Data Format 5): A common standard for storing scientific data. Allows for organizing weights and related metadata within a single file.
NumPy Arrays (.npy): Simple format for storing raw numerical arrays, often used for individual weight matrices.
Library-specific:
TensorFlow: Checkpoints or within SavedModel (variables).
PyTorch: Often in Python pickle format.
Important Notes

Cross-Framework Formats: Formats like ONNX aim to represent the model in a way that’s portable between different deep learning frameworks.
Compression: Weights files can sometimes be compressed to save space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Inheritance

A

Inheritance is a core concept in object-oriented programming that enables code reuse and specialization. A subclass inherits attributes and methods from its superclass, allowing it to extend or modify the superclass’s behavior. This promotes modularity and scalability by facilitating hierarchical organization of classes. Inheritance supports the “is-a” relationship, where a subclass is a specialized version of its superclass. It fosters polymorphism, enabling objects of different subclasses to be treated uniformly based on their common superclass.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Intermidiete data representations (PyTorch)

A

intermediate data representations refer to the transformed versions of your input data as it flows through the different layers of the neural network. Here’s why they’re important:

Hierarchical Learning:
Each layer of a neural network extracts increasingly complex and abstract features from the previous layer’s output.
Early layers might focus on basic edges and patterns. Later layers can build representations of objects, textures, or even higher-level concepts.
Debugging and Understanding:
Examining intermediate representations can help you understand what the network has learned at different stages.
This can be helpful for identifying training problems, diagnosing bottlenecks, or even interpreting how a model makes its decisions.

Not Input, Not Output: It’s the data in its transformed states between the layers of a neural network. It’s neither the raw input nor the model’s final prediction.
Progressive Transformation: Each layer takes its input data, applies weights, biases, and activation functions, and produces a new modified representation. This modified representation is the intermediate data for that layer.
Increasingly Abstract: As data moves deeper into the network:
Early layers: Intermediate data captures low-level features (lines, edges, basic colors).
Later layers: Intermediate data represents more complex concepts and patterns (shapes, objects, or task-specific information).
Why the Term “Representation” Matters

Reframing the Data: Intermediate data isn’t just modified numbers; it’s the evolving way the neural network “understands” or re-represents the input to suit the task.
The Key to Learning: The network’s ability to learn lies in how it successfully modifies these intermediate representations into ones that are highly useful for the final output.
Example: Facial Recognition

Input: Raw pixel values of a face image.
Early Layers: Intermediate data might highlight edges, lines, and color gradients.
Middle Layers: Intermediate data might capture parts of a face (eyes, nose, mouth shapes).
Later Layers: Intermediate data could represent higher-level concepts related to facial identity.
Output: The final classification of the person’s identity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Intiger

A

Integers (often shortened to ‘int’) represent whole numbers – both positive and negative – without any fractional components. Integers cannot have a decimal point. (3.14 is not an integer, it’s a floating-point number).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Lambda

A

A keyword used to create anonymous functions, also known as lambda functions. These are small, inline functions that are defined using a single expression. Lambda functions are often used when you need a simple function for a short period of time and don’t want to define a full-fledged named function. Lambda functions are often used in conjunction with higher-order functions like map(), filter(), and reduce(), or as arguments to functions that accept other functions as arguments, such as sorted(). They provide a concise and convenient way to define small, single-use functions without the need for a formal function definition.

The syntax for a lambda function is:
variable = lambda arguments: expression

square = lambda x: x ** 2

36
Q

Library

A

A collection of reusable code modules or functions that extend the capabilities of a programming language. Libraries provide pre-written code for common tasks, allowing developers to save time and effort by leveraging existing solutions.

37
Q

List

A

Lists are ordered, mutable collections of items. They can hold different data types (integers, strings, floats, even other lists). Lists are defined using square brackets [] and elements are separated by commas. The order of the items you place into a list is preserved.

my_list = [item1, item2, item3]

Key Operations:
Indexing: Access elements by position (e.g., my_list[0])
Slicing: Extract sub-lists (e.g., my_list[1:3])
Adding items: append(), insert(), extend()
Removing items: remove(), pop(), del

38
Q

List comprehension

A

A concise and expressive syntax in Python for creating lists by iterating over an iterable and applying an expression to each element. It provides a more compact alternative to traditional loops for generating lists with specific transformations or filtering criteria. List comprehension consists of an expression followed by a for clause, optionally accompanied by additional if clauses to filter elements based on certain conditions. This compact syntax not only enhances code readability but also improves performance by avoiding the overhead associated with manual loop construction. List comprehension is a powerful feature of Python’s functional programming paradigm, enabling developers to write elegant and efficient code for list generation, transformation, and filtering tasks.

39
Q

Lists and tuples vs arrays and tensors

A

Python lists or tuples of numbers are collections of Python objects that are individually allocated in memory, as shown on the left in figure 3.3. PyTorch tensors or NumPy arrays, on the other hand, are views over (typically) contiguous memory blocks contain- ing unboxed C numeric types rather than Python objects.

40
Q

Matplotlib

A

Matplotlib is a comprehensive library in Python used for creating static, interactive, and animated visualizations. It offers a wide range of plotting functions and customization options, making it suitable for various data visualization tasks. Matplotlib provides a MATLAB-like interface for generating plots quickly and efficiently, making it accessible to both beginners and advanced users. With Matplotlib, users can create line plots, scatter plots, bar plots, histograms, and more, allowing them to explore and communicate patterns and relationships in their data effectively. Additionally, Matplotlib integrates seamlessly with other Python libraries such as NumPy and Pandas, enabling users to visualize data stored in arrays or DataFrames easily. Whether for exploratory data analysis, presentation-quality graphics, or interactive visualizations in web applications, Matplotlib remains a versatile and essential tool in the Python data science ecosystem.

41
Q

Method

A

Represents a function associated with an class and objects of that class. Methods define the behavior of objects. For example, in a Car class, drive() could be a method representing the action of driving the car.
In the parenthesis we pass arguments. Arguments are pieces of data that you send into a method when you call it. These inputs allow the method to perform its task with specific information. Think of them as ingredients for a recipe. The method is the recipe, and the arguments are the particular ingredients that influence the outcome.

We call a method like this:
object.method(argument)

Internally, methods in Python are also objects. You can use the callable() function to check if something is a method or not.

42
Q

Method vs atribute

A

Attribute: Represents a piece of data associated with an object. Attributes describe the state of an object. For example, in a Car class, color could be an attribute representing the color of the car.
Method: Represents a function associated with an object. Methods define the behavior of an object. For example, in a Car class, drive() could be a method representing the action of driving the car.

We call atributes without parenthisis:
Atribute - object.atribute
Method - object.method()

43
Q

Module

A

A module in Python is essentially a file containing Python code (with a .py extension). It provides a collection of reusable functions, classes, and variables. Modules help organize your code into logical units, making it more readable, maintainable, and easier to share with others. Think of them as specialized toolboxes that you can import into your Python projects to access their functionality. For example, the ‘math’ module provides mathematical functions, while the ‘pandas’ module offers data analysis tools.

44
Q

Multidimensional array

A

Multidimensional arrays, often referred to as tensors, are fundamental data structures in machine learning (ML). They extend the concept of traditional arrays into higher dimensions. Imagine a 1D array as a line of numbers, a 2D array as a table with rows and columns, and then extend that further into 3D and beyond. In ML, these multidimensional arrays are used to represent a variety of data:

Images: A color image can be represented as a 3D array (height, width, color channels). For example, a 256x256 RGB image would have a 3D array with dimensions (256, 256, 3).
Text: Sequences of words in a sentence can be represented as a 2D array where each row is a word, and columns represent word embeddings (numerical representations).
Time-series data: Sensor readings over time can be a 2D array (time steps, sensor values), or even higher dimensional if multiple sensors are involved.
Complex Datasets: Many ML datasets have features with multiple aspects. These fit naturally into multidimensional arrays enabling models to learn intricate relationships within the data.

Machine learning algorithms are designed to work with these multidimensional structures, allowing them to detect complex patterns and relationships within the various dimensions.

45
Q

Multithreading and multiprocessing (lock& pool)

A

Multithreading and multiprocessing are techniques used to achieve concurrency in computer programs, enabling tasks to be executed concurrently for improved performance and responsiveness.

  • Multithreading involves executing multiple threads within a single process, allowing tasks to run concurrently and share the same memory space. However, this concurrency can lead to issues such as race conditions and data inconsistency.
  • Multiprocessing involves executing multiple processes simultaneously, each with its own memory space. This approach avoids issues related to shared memory but may incur higher overhead due to inter-process communication.
  • Locks (or mutexes) are synchronization primitives used to prevent multiple threads from simultaneously accessing shared resources, thus avoiding data corruption and race conditions.
  • Pools (or pools of workers) are used in multiprocessing to manage a group of worker processes that execute tasks concurrently. A pool distributes tasks among available processes and handles process creation, communication, and termination.
46
Q

Nested lists

A

Nested lists in Python are like multi-level boxes. Just as a box can contain smaller boxes within it, a Python list can contain other lists as its elements. These inner lists can even hold further lists, creating multiple levels of nesting

my_schedule = [
[“Math”, “Chemistry”], # Monday
[“English”, “History”, “Gym”], # Tuesday
]

Use multiple indices. For example, my_schedule[1][0] would access “English” (Tuesday’s first class). Nested lists are perfect for representing structured data where there are groups within groups, like schedules, hierarchical trees, or game boards.

47
Q

Numeric Data types

A

Numeric Types in Python are designed to store numbers so you can do math! Python offers several types to handle different kinds of numerical data:

Integers (int): Whole numbers without fractions (e.g., 5, -100).
Floating-Point Numbers (float): Numbers with decimal points (e.g., 3.14, -0.25). Great for representing measurements.
Complex Numbers (complex): Numbers with a real and imaginary part (e.g., 2 + 3j). Used in specialized fields like electrical engineering.

Key Points:
Python doesn’t strictly limit integer size; they can get very large.
Floats have precision limits, so be mindful of rounding in sensitive calculations.

48
Q

Numpy

A

NumPy, or Numerical Python, is a fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently. NumPy is widely used in areas such as linear algebra, numerical analysis, statistics, and machine learning.

np.array(): Create an array from a Python list or tuple, which serves as the fundamental data structure for numerical computations in NumPy.
np.zeros(): Create an array filled with zeros of a specified shape.
np.ones(): Create an array filled with ones of a specified shape.
np.random.rand(): Generate an array of random numbers from a uniform distribution over [0, 1] with a specified shape.
np.random.randn(): Generate an array of random numbers from a standard normal distribution (mean=0, variance=1) with a specified shape.
np.arange(): Create an array of evenly spaced values within a specified range.
np.linspace(): Create an array of evenly spaced values over a specified interval.
np.reshape(): Reshape an array into a new shape without changing its data.
np.concatenate(): Concatenate arrays along a specified axis.
np.dot(): Compute the dot product of two arrays, which is used in various mathematical operations such as matrix multiplication.

49
Q

Object

A

Objects are instances of classes, created using the class’s constructor method. Car (class) has properties (color, make, model) and methods (accelerate, brake, turn). Classes support inheritance, allowing subclasses to inherit attributes and methods from their superclass. This enables hierarchical organization of code and facilitates code reuse, modularity and maintainibility. Classes usualy start with capital letter.

50
Q

Object Oriented Programming (OOP)

A

A programming paradigm centered around the concept of objects, which have data as well as behavior attributed to them. OOP focuses on the objects that developers want to manipulate rather than the logic required to manipulate them. This approach to programming is well-suited for programs that are large, complex and actively updated or maintained. The first step in OOP is to collect all of the objects a programmer wants to manipulate and identify how they relate to each other - an exercise known as data modeling.

OOP principles include encapsulation, inheritance, and polymorphism:
- Encapsulation hides the internal state of objects, ensuring data integrity.
- Inheritance allows classes to inherit attributes and methods from other classes, facilitating code reuse.
- Polymorphism enables objects of different classes to be treated uniformly, enhancing flexibility and extensibility in code design.

OOP is an alternative paradigm to Functional Programming (FP), Procedural Programming, Declarative Programming, Data-Oriented Programming

51
Q

Pandas

A

Pandas is a powerful data manipulation library in Python, built on top of NumPy. It offers data structures and functions designed for working with structured or tabular data, such as data frames. pandas provides tools for reading and writing data from various file formats, cleaning and transforming data, and performing complex data analysis tasks.

52
Q

PEP8 & coding best practices

A

PEP8 (Python Enhancement Proposal 8) is a style guide for Python code written by Guido van Rossum, Barry Warsaw, and Nick Coghlan. It outlines coding conventions and best practices to ensure consistency and readability in Python code. PEP8 covers aspects such as naming conventions, indentation, spacing, line length, imports, comments, and programming style. Adhering to PEP8 guidelines improves code maintainability, facilitates collaboration among developers, and enhances code readability.

Naming conventions:
- Use descriptive names for variables, functions, classes, and modules.
- Use lowercase for variable names, separated by underscores (snake_case).
- Use CamelCase for class names.
- Avoid single-character names except for certain common variables like loop indices.

Indentation:
- Use 4 spaces for each level of indentation (no tabs).
- Maintain consistent indentation throughout the codebase.

Spacing:
- Use a space after commas, colons, and semicolons.
- Use a space before and after binary operators (e.g., +, -, *, /).
- Use blank lines to separate logical sections of code.
- Avoid extraneous whitespace at the end of lines.

Line length:
- Limit lines to a maximum of 79 characters to ensure readability without horizontal scrolling.
- For docstrings or comments, limit lines to a maximum of 72 characters.

Imports:
- Import modules individually rather than using wildcard imports (e.g., import module instead of from module import *).
- Group imports in the following order: standard library imports, third-party library imports, and local application/library imports.
- Separate import groups with a blank line.

Comments:
- Use comments to explain the purpose of code sections, particularly complex or non-obvious ones.
- Write comments in complete sentences with proper grammar and punctuation.
- Avoid redundant or obvious comments that merely repeat the code.

Programming style:
- Write code that is clear, concise, and readable.
- Follow consistent naming, indentation, and spacing conventions.
- Break long lines into multiple lines if necessary for readability.
- Use meaningful variable names and avoid abbreviations unless they are widely understood.

53
Q

Polymorphism (python)

A

A key feature of object-oriented programming languages like Python, allowing objects of different types to be treated as instances of a common interface or superclass. In Python, polymorphism is achieved through method overriding and method overloading. Method overriding allows subclasses to provide a specific implementation of a method defined in a superclass, enabling objects of different subclasses to invoke the same method name but execute different code. Method overloading, on the other hand, allows a single method name to be used for multiple implementations with different parameter signatures, providing flexibility in function invocation. Polymorphism enables code reuse, abstraction, and flexibility in Python programming, promoting modular and extensible software design.

54
Q

Primitive Data types

A

Basic building blocks of programming languages, representing simple values such as numbers, characters, and Boolean values. Common primitive data types include integers, floating-point numbers, characters, strings, and Boolean values. In Python, primitive data types are implemented as built-in classes, such as int, float, str, bool, etc.

55
Q

Pytest

A

Pytest is a popular Python testing framework that significantly simplifies the process of writing, organizing, and running unit tests. It simplifies writing and executing tests with intuitive syntax and powerful features like fixtures, parameterized testing, and robust assertion mechanisms. Pytest promotes good coding practices by facilitating the creation of comprehensive test suites, enabling developers to catch and fix bugs early in the development cycle.

PyTest is often used for performing Unit Tests.

56
Q

Python Package

A

A package is a collection of pre-written Python modules (files with .py extension) organized within a directory structure. Packages provide reusable code for tasks like data manipulation (NumPy), machine learning (Scikit-learn), web development (Django), and countless more. Think of them as toolboxes with specialized tools that extend Python’s core capabilities, saving you from having to code common functionalities from scratch.

57
Q

PyTorch

A

PyTorch, developed by Facebook’s AI Research lab, is a versatile open-source machine learning library widely used for deep learning applications. Its core data structure, tensors, enables efficient computation, particularly on GPUs. With its dynamic computation graph feature, PyTorch facilitates building complex neural networks with ease. The library’s autograd module automates gradient computation, crucial for training models via backpropagation. PyTorch offers a rich ecosystem for building, training, and deploying neural networks, providing modules for defining architectures, optimizing models, and deploying them in production environments. Its seamless integration with other Python libraries like NumPy and Pandas enhances its usability, making PyTorch a preferred choice for researchers and practitioners in the deep learning community.

58
Q

Range

A

In Python, the range() function is a workhorse for generating sequences of numbers. By default, it starts at 0, increments by 1, and stops before a specified endpoint, creating a half-open interval. You can customize this behavior by providing up to three arguments: the starting value, the ending value (not included), and the step size. This allows you to iterate through a specific range of numbers, counting up or down by any increment you choose. range() is particularly useful in for loops, where you can use it to loop a certain number of times or iterate over a specific set of numbers within your program’s logic.

59
Q

Record (Data frame)

A

In the context of data analysis and database management, a record refers to a collection of related data fields or attributes that describe a single entity or observation. A data frame is a two-dimensional tabular data structure, similar to a spreadsheet or database table, where each row represents a record and each column represents a data field or attribute. Data frames are commonly used in data analysis and manipulation tasks, and they are supported by libraries like pandas in Python.

60
Q

Scikit-Learn

A

Scikit-Learn stands out as a robust Python library for machine learning tasks, offering a comprehensive suite of tools for data preprocessing, supervised, and unsupervised learning. From simple data preprocessing tasks such as normalization and encoding to sophisticated algorithms like SVMs and decision trees, Scikit-Learn provides an accessible interface for practitioners at all levels. Its evaluation and selection tools enable efficient model comparison and hyperparameter tuning. With seamless integration with other Python libraries like Pandas and NumPy, Scikit-Learn streamlines the entire machine learning workflow, making it an indispensable tool for data scientists and machine learning engineers.

61
Q

Seaborn

A

Seaborn, built on top of matplotlib, serves as a powerful Python library for statistical data visualization. It offers a plethora of high-level functions for creating attractive and informative plots, ranging from statistical visualizations to categorical data representations. With its emphasis on aesthetics and information-rich plots, Seaborn simplifies the process of visualizing complex data relationships and patterns. Its integration with Pandas DataFrames streamlines data manipulation and visualization tasks, making it an invaluable tool for exploratory data analysis and presentation of results. Moreover, Seaborn’s flexibility in customization and theming allows users to tailor visualizations to their specific needs or preferences, further enhancing its utility in data analysis workflows.

62
Q

Sequential Data types

A

Sequential data types refer to data structures where elements are arranged in a specific order, allowing sequential access.s. These data types are characterized by their ability to store elements in a linear sequence and support operations such as indexing and iteration.

Lists:
Lists are ordered collections that can contain elements of different data types. Elements can be added, removed, or modified, and duplicate elements are allowed. Lists are mutable, meaning they can be changed after creation.
Example: my_list = [1, 2, 3, ‘a’, ‘b’, ‘c’]

Tuples:
Tuples are similar to lists but are immutable, meaning they cannot be changed after creation. They are typically used to store heterogeneous data, such as coordinates or records.
Example: my_tuple = (1, 2, 3, ‘a’, ‘b’, ‘c’)

Strings:
Strings are sequences of characters, and each character has a specific index. They are immutable, meaning individual characters cannot be changed. Strings support various operations such as slicing, concatenation, and repetition.
Example: my_string = “Hello, World!”

Ranges:
Ranges represent a sequence of numbers and are often used for looping a specific number of times in for loops. Ranges are immutable and support operations like slicing.
Example: my_range = range(0, 10)

Bytes and Bytearrays:
Bytes and bytearrays are used to represent sequences of bytes (integers between 0 and 255). Bytes are immutable, while bytearrays are mutable. They are commonly used for handling binary data.
Example: my_bytes = b’hello’

Arrays (from the array module):
Arrays are similar to lists but are constrained to contain elements of the same numeric data type. They are more memory-efficient than lists for certain operations.
Example: import array; my_array = array.array(‘i’, [1, 2, 3, 4, 5])

63
Q

Sequential leyers

A

Type of architecture where layers are arranged sequentially, with each layer feeding its output as input to the next layer. This linear stack of layers is a fundamental building block in many deep learning models. In frameworks like PyTorch, sequential layers can be easily constructed using containers such as torch.nn.Sequential, allowing for a concise and readable definition of the neural network architecture.

It’s like stacking Lego blocks—you arrange layers in a linear, ordered fashion. Data flows through this stack in a straightforward manner. The output of one layer becomes the input of the next. This creates a chain-like structure. When your problem involves a well-defined progression (e.g., image classification, text sentiment analysis), sequential models are highly suitable. Layers can be dense, convolutional or recurrent. Sequential models become less flexible when dealing with complex data flows, multiple inputs or outputs, or the need to share layers within the network.

Alternative to Sequential design is using non-linear structures, for example: shared layers, skip connections, and loops within deep learning architectures. We may use Keras “Master Builder” API or by subclassing PyTorch

64
Q

Set (Programming)

A

Set is a collection data type that stores unique elements without any particular order. Sets are commonly used when the existence of an element is more important than the order or frequency of its occurrence. In Python, sets can be created using curly braces {} or the set() constructor function. Set operations such as union, intersection, difference, and subset checking are supported, making sets useful for tasks like removing duplicates or testing membership efficiently.

65
Q

Squeezing and unsqueezing tensors

A
  • Squeezing: A tensor operation that removes dimensions of size 1 from a tensor.
  • Unsqueezing: A tensor operation that adds dimensions of size 1 to a tensor.

Squeezing tensors involves removing dimensions of size 1, effectively collapsing those dimensions and reducing the rank of the tensor. This operation is useful for eliminating redundant dimensions that do not carry meaningful information. This can be helpful when you need to perform operations that require tensors to have matching dimensions.
Similarily unsqueezing adds dimensions where necessary, often used for broadcasting operations or aligning tensor shapes for concatenation.

66
Q

Stacking tensors

A

Combining multiple tensors along a new dimension to create a single tensor. It is a fundamental operation in data aggregation (particularly useful when you want to aggregate data from different tensors while maintaining their individual identities), especially in machine learning tasks like batch processing. For example, when training neural networks, stacking individual data samples into batches enables efficient parallel processing.

PyTorch provides the torch.stack() function for stacking tensors along a specified dimension.

67
Q

String

A

A string is a sequence of characters, enclosed within either single quotes ‘ ‘ or double quotes “ “. Strings are immutable in most programming languages, meaning that once created, their contents cannot be changed. Strings support various operations, such as concatenation, slicing, indexing, and formatting, making them versatile for representing text data in programs.

68
Q

Tensor Flow

A

TensorFlow is an open-source machine learning framework developed by Google. It provides a comprehensive ecosystem of tools and libraries for building and deploying machine learning models, particularly deep learning models. TensorFlow allows for the creation of computational graphs, automatic differentiation, distributed computing, and integration with hardware accelerators like GPUs and TPUs.

69
Q

Torch Hub

A

A mechanism through which authors can publish a model on GitHub, with or without pretrained weights, and expose it through an interface that PyTorch understands. This makes loading a pretrained model from a third party as easy as loading a TorchVision model.
All it takes for an author to publish a model through the Torch Hub mechanism is to place a file named hubconf.py in the root directory of the GitHub repository. The file has a very simple structure:

dependencies = [‘torch’, ‘math’]
def some_entry_fn(args, **kwargs):
model = build_some_model(
args, kwargs)
return model
def another_entry_fn(
args, **kwargs):
model = build_another_model(
args, **kwargs)
return model

70
Q

torch.nn

A

A fundamental module that provides the building blocks for constructing and training neural networks like common layers, architectural components. It offers a rich collection of classes and functions specifically designed for this purpose:

Layers: It contains pre-built layers like convolutional layers, linear layers, pooling layers, and activation functions. These layers are like the Lego bricks you can assemble to create your neural network architecture.

Modules: It provides the base class nn.Module from which you can define custom neural network architectures. This allows you to tailor networks to specific tasks by combining different layers and functionalities.

Utilities: It offers helper functions for common tasks like weight initialization, loss functions, and optimizers. These tools streamline the development and training process of your neural networks.

Overall, torch.nn acts as a comprehensive toolbox for building, training, and deploying neural networks in PyTorch

71
Q

torch.optim

A

Provides the tools and strategies to steer your neural network towards better performance during the learning process.

Optimizers: It houses a collection of optimization algorithms, each with its own strategy for finding the best weights to improve model performance. These include popular options like gradient descent (SGD), Adam, RMSprop, and many more.

Updating Weights: During training, torch.optim optimizers use the gradients calculated by autograd (see previous explanation) to iteratively adjust your model’s weights and biases. Each optimizer has its own update rule, which determines how aggressively or cautiously those adjustments are made.

Learning Rate and More: The optimizers also allow you to control hyperparameters like the learning rate (how big of a step to take on each update) and momentum (for smoothing out the updates).

72
Q

torch.utils.data

A

Library for transforming raw data into tensors. torch.utils.data module is like a conveyor belt factory, making data handling and feeding into your neural network a smooth and structured process. Here’s what it does:

Datasets: It includes the Dataset class, which is an abstract framework for representing your data in a way PyTorch can understand. You can create custom datasets to load images, text, or any data structure needed for your task.

DataLoaders: The DataLoader class wraps your dataset, making it easy to handle batching (dividing data into smaller chunks), shuffling (important for training), and even applying data transformations on the fly.

Flexibility: This module gives you control over how data is loaded and processed. You can specify multi-worker loading for speed, pin data to memory, or customize how samples are organized and retrieved.

Think of it this way: torch.utils.data provides the building blocks and customizable tools to streamline the process of getting your data organized and ready for your hungry neural network to learn from.

73
Q

Training mode in NN

A

State in which the network is actively learning from the training data. During training, the model’s parameters (weights and biases) are adjusted iteratively to minimize a predefined loss function. Various operations, such as dropout and batch normalization, behave differently depending on whether the model is in training or evaluation mode. For example, in training mode, dropout layers randomly drop units from the network to prevent overfitting, whereas in evaluation mode, all units are retained. Similarly, batch normalization layers compute batch statistics (mean and variance) during training but use learned parameters during evaluation. PyTorch and TensorFlow, among other deep learning frameworks, provide mechanisms to switch between training and evaluation modes using specific API calls (model.train() and model.eval() in PyTorch).

74
Q

Tuple

A

A tuple is an ordered collection of elements, similar to a list, but immutable, meaning its contents cannot be modified after creation. Tuples are typically enclosed within parentheses () and can contain elements of different data types. They are often used to represent fixed-size collections of items, such as coordinates, records, or function arguments, and can be unpacked into individual variables.

tuple = ( ‘Adam’, 7)

75
Q

Types of Data sets

A

Structured Data:
Organized: Data neatly fits into rows and columns, like tables.
Examples: Excel spreadsheets, SQL databases, CSV files.
Types:
Numerical: Numbers representing quantities, measurements (e.g., age, height, temperature).
Categorical: Data representing categories or groups (e.g., country, product type, color).

Unstructured Data:
No predefined format: Doesn’t follow a tidy structure.
Examples: Text documents, images, audio, video, social media data.

Semi-Structured Data:
Hybrid: A mix of structure and free-form elements.
Examples: XML, JSON, emails (have some structure but also free text).

Types Based on Properties
Time Series Data: Data where time is a key organizing dimension, with values tracked over time intervals (e.g., stock prices, weather patterns, website traffic).
Cross-Sectional Data: Data collected at a single point in time on multiple subjects (e.g., a survey on customer satisfaction, medical data on a group of patients).
Panel Data (Longitudinal): Combines time series and cross-sectional – data tracked on the same subjects over multiple time points (e.g., sales records across multiple stores over several years).
Geospatial Data: Data tied to geographic locations (e.g., maps, GPS coordinates, population density).

Types Based on Labels
Supervised Learning Datasets: Have a labeled target variable you want to predict (e.g., a dataset with customer information and past purchase history to predict who will buy in the future).
Unsupervised Learning Datasets: No labeled target variable – used for finding patterns and associations (e.g., customer data to identify clusters with similar behavior).
Semi-Supervised Datasets: A mixture of labeled and unlabeled data, often with large amounts of unlabeled data and a smaller set of labeled data.

76
Q

Unit test

A

Unit testing verifies individual components or units of code to ensure they perform as expected. Unit testing is a software testing methodology where the smallest testable parts of an application, called units, are individually and independently scrutinized for correct behavior. The primary goal is to ensure that each isolated unit of code functions precisely as intended by the developer.

For Unit Testing is often used PyTest software

77
Q

Unsqueeze

A

In PyTorch, the unsqueeze function is used to add a new dimension to a tensor at a specified position. This operation increases the tensor’s dimensionality by one. It is particularly useful when dealing with tensors in operations where the number of dimensions matters. For example, if you have a 1D tensor representing a row vector and need to convert it into a column vector, you can use unsqueeze to add a new dimension at position 1. The syntax for unsqueeze in PyTorch is torch.unsqueeze(input, dim).

78
Q

Yield

A

A keyword used in generator functions to produce a series of values lazily. When a generator function encounters yield, it temporarily suspends execution, yielding the specified value to the caller while preserving its state. Unlike return, which terminates the function, yield allows the function to resume execution from where it left off, generating values on-demand. This is particularly useful for processing large datasets or infinite sequences efficiently, as it conserves memory by generating values one at a time.

79
Q

What files do you need to run a pretrained model

A

There isn’t a universal two-file requirement for running a pre-trained model. It depends on the framework (e.g., TensorFlow, PyTorch) and the way the model is saved. However, here are the two common scenarios:

Single File: In many cases, a pre-trained model can be packaged into a single file that contains all the necessary information, including the model architecture, weights, and potentially some configuration details. This file format can vary depending on the framework (e.g., .h5 for Keras, .pt for PyTorch).

Two Files (Model Architecture + Weights): Sometimes, pre-trained models might be stored in two separate files:

Model Architecture File: This file defines the structure of the neural network, specifying the layers, their connections, and activation functions. This could be a text file or a framework-specific format.
Weights File: This file contains the actual learned weights and biases associated with each layer in the model. These are the numerical values that the model uses for making predictions. The format of this file again depends on the framework.

Single .pth or .pt File:

This is the most popular way to store PyTorch models. It packages the model architecture and weights into a single, serialized file, typically with the .pth or .pt extension.
Saving: You can use torch.save(model.state_dict(), ‘model.pth’) to save the model’s state dictionary (the weights and biases).
Loading: You’d use model.load_state_dict(torch.load(‘model.pth’)) to load the weights back into your model’s architecture.
Separate Files (less common):

Architecture File (.json, .yaml, etc.): This file defines the model structure using a text-based format like JSON or YAML.
Weights File (.pth, .pt, etc.): This file contains the weights and biases as a serialized PyTorch tensor.
Why the single file approach is dominant:

Convenience: It’s simple to manage and share a single file.
Built-in support: PyTorch’s torch.save and torch.load functions make saving and loading easy with this format.

80
Q

Encapsulation (python)

A

A fundamental principle of object-oriented programming. In Python, you bundle data (variables) and the actions you can perform on that data (methods) together inside a class. Encapsulating allows the internal state of an object to be hidden from the outside world and accessed only through well-defined interfaces (methods). You control how the data can be accessed or changed from outside the class. This prevents accidental messing up of the internal workings of your objects. Also, encapsulation keeps your code tidy and organized. Everything related to a specific concept is neatly packaged together.

81
Q

Filtering DF (code)

A

import pandas as pd

Create sample DataFrame
data = {‘Fruit’: [‘Apple’, ‘Banana’, ‘Orange’, ‘Apple’, ‘Grape’],
‘Color’: [‘Red’, ‘Yellow’, ‘Orange’, ‘Red’, ‘Purple’],
‘Price’: [1.2, 0.8, 1.5, 1.0, 0.75]}
df = pd.DataFrame(data)

Method 1: Boolean Indexing (Filtering by a single value)
filtered_df = df[df[‘Fruit’] == ‘Apple’]
print(filtered_df)

Method 2: Using loc for column-based selection
filtered_df = df.loc[df[‘Color’] == ‘Red’, [‘Fruit’, ‘Price’]] # Select specific columns
print(filtered_df)

Method 3: Using isin() for multiple values
target_fruits = [‘Apple’, ‘Orange’]
filtered_df = df[df[‘Fruit’].isin(target_fruits)]
print(filtered_df)

Method 4: Using query() for complex conditions
filtered_df = df.query(‘Price > 1 and Fruit == “Apple”’)
print(filtered_df)

82
Q

How can we used pre-trained model (if we train it only on our small sample how does it capture enough wight of that input?)

A

Initial Knowledge: Pre-trained models have been trained on massive datasets, learning general patterns and features relevant to a wide range of tasks (e.g., image recognition, language understanding).
Transferable Features: The lower layers of the network often learn features that are broadly useful (edges, textures in images; common grammatical structures in language). These don’t need to change as much for your specific dataset.

Pretrained model is now only learning those more high level, specific features that you deliver in your data set. this means that the layers of deeper (first) layers are frozen and we adjust only layers at the end of the network.

Feature extraction:
Freeze most of the pre-trained model’s layers.
Use the pre-trained model as a feature extractor, feeding its outputs into a new, smaller classifier on top that you train specifically on your data.

Gradual Unfreezing:
Train your dataset, then gradually unfreeze a few layers at a time from the top of the pre-trained model onwards, allowing those to adjust to your data while preserving previously learned knowledge.

A pre-trained model avoids starting from scratch. This helps prevent overfitting (memorizing your small dataset too closely) and leads to better generalization. The pre-trained weights act as a form of regularization, guiding the model towards patterns that are generally useful.

83
Q

Procedural Programming

A

programming paradigm centered around breaking down a program into smaller procedures or routines, each designed to perform specific tasks. Here’s what you need to know:

Procedure-Centric Approach: Focuses on writing procedures or functions that manipulate data rather than on the data itself. Procedures encapsulate sets of instructions for specific tasks.
Sequential Execution: Procedures are executed sequentially, with control flowing from one procedure to the next. Control flow is managed using conditional statements and looping constructs.
Data and Procedures Separation: Data and procedures are typically separated, with procedures acting on data stored in variables or data structures. Procedures accept input parameters and return output values.
Procedural Abstraction: Emphasizes hiding implementation details within procedures, promoting code reuse, readability, and maintainability.
Examples: Languages like C, Pascal, and Fortran are classic examples of procedural programming languages, where functions are used to define procedures that manipulate data.
Benefits: Procedural programming offers simplicity, ease of understanding, code reusability, and straightforward debugging and maintenance.
Limitations: May lead to code duplication, complexity management challenges for large programs, and limited encapsulation compared to object-oriented programming.

In summary, procedural programming is a programming paradigm focused on modularizing code into procedures or routines, prioritizing simplicity and directness. Understanding its principles and characteristics is essential for effective program design and development.

84
Q

SciPy (library)

A

SciPy is a scientific computing library in Python, built on top of NumPy. It provides a wide range of mathematical functions and numerical algorithms for optimization, integration, interpolation, linear algebra, signal processing, statistics, and more. SciPy is an essential tool for scientific computing, data analysis, and numerical simulations in various fields such as physics, engineering, biology, and finance.

85
Q

Statsmodels (library)

A

Python library for estimating statistical models and conducting statistical tests. It provides a wide range of functionality for regression analysis, time series analysis, hypothesis testing, and statistical modeling. Statsmodels includes classes and functions for fitting various types of models, such as linear regression, generalized linear models (GLM), mixed effects models, and time series models. It also offers capabilities for model diagnostics, parameter estimation, confidence intervals, hypothesis tests, and statistical inference. Statsmodels is widely used in academia, research, and industry for analyzing data, testing hypotheses, and building predictive models.