Advanced python Flashcards

1
Q

How is memory managed in Python?

A
  • The OS carries out (or denies) requests to read and write memory. It also creates a virtual memory layer that applications (including Python) can access
  • The default Python implementation, CPython, handles memory management for Python code
  • Each Python object (everything is an object) has a C PyObject, with a reference count, used for garbage collection, and a pointer to the actual object in memory
  • Python’s Global Interpreter Lock (GIL) locks the interpreter when a thread is interacting with the shared memory resource
  • An object’s reference count increments when it is used (e.g. assigned to another variable) and decremented when a reference is removed. If it drops to 0, the object has a specific deallocation function that is called which “frees” the memory so that other objects can use it (garbage collection but really just makes it available again).
  • Python uses a portion of the memory for internal use and a portion as a private heap space for object storage
  • Python’s memory manager divides the private heap into variable-size arenas, and those arenas into fixed-size blocks called “pools”
  • A usedpools list tracks all the pools that have space available for data for each size class. When a given block size is requested, the algorithm checks this usedpools list for the list of pools for that block size
  • pools contain a pointer to their “free” (available for use again) blocks of memory. As the memory manager makes blocks “free,” they are added to the front of the freeblock list. It will used free blocks before unused blocks
  • Arenas are instead organized into a list called usable_arenas, sorted by the number of free pools available. The fewer free pools, the closer the arena is to the front of the list so those most full of data will be selected to place new data into.
  • Arenas are the only things that can truly be freed to the os (instead of overwritten). So those arenas that are closer to being empty should be allowed to become empty, reducing the overall memory footprint of the Python program.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does garbage collection work in Python?

A
  • Reference Counting: Python primarily uses reference counting for garbage collection. When an object’s reference count drops to zero, it means the object is no longer accessible, and its memory can be reclaimed. e.g. variables declared inside a function have a local scope. They are created when the function is called and are destroyed when the function exits, freeing up the memory
  • Cycle Detector: Python’s garbage collector includes a cycle detector to identify and clean up reference cycles (groups of objects that reference each other, creating a cycle) that wouldn’t be collected by reference counting alone.
  • Generational Garbage Collection: Python uses a generational approach to garbage collection, dividing objects into three generations (young, middle-aged, and old). New objects start in the youngest generation, and objects that survive multiple garbage collection rounds are promoted to older generations.
  • Thresholds and Tuning: The garbage collector is triggered when the number of objects in a generation exceeds a certain threshold. These thresholds can be manually tuned to optimize garbage collection performance for specific applications.
  • Manual Control: Python provides functions like gc.collect() to manually trigger garbage collection, and gc.set_debug() to debug garbage collection behavior. However, in most cases, the automatic garbage collection process is sufficient.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are Python namespaces? Why are they used?

A

A namespace in Python ensures that object names in a program are unique and can be used without any conflict. Python implements these namespaces as dictionaries with ‘name as key’ mapped to a corresponding ‘object as value’. This allows for multiple namespaces to use the same name and map it to a separate object. A few examples of namespaces are as follows:

  • Local Namespace includes local names inside a function. the namespace is temporarily created for a function call and gets cleared when the function returns.
  • Global Namespace includes names from various imported packages/ modules that are being used in the current project. This namespace is created when the package is imported in the script and lasts until the execution of the script.
  • Built-in Namespace includes built-in functions of core Python and built-in names for various types of exceptions.

The lifecycle of a namespace depends upon the scope of objects they are mapped to. If the scope of an object ends, the lifecycle of that namespace comes to an end. Hence, it isn’t possible to access inner namespace objects from an outer namespace.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Scope Resolution in Python?

A

Sometimes objects within the same scope have the same name but function differently. In such cases, scope resolution comes into play in Python automatically. A few examples of such behavior are:

Python modules namely ‘math’ and ‘cmath’ have a lot of functions that are common to both of them - log10(), acos(), exp() etc. To resolve this ambiguity, it is necessary to prefix them with their respective module, like math.exp() and cmath.exp().

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are decorators?

A

Decorators in Python are essentially functions that add functionality to an existing function in Python without changing the structure of the function itself. They are represented by the @decorator_name in Python and are called in a bottom-up fashion

The beauty of decorators lies in the fact that besides adding functionality to the output of the method, they can even accept arguments for functions and can further modify those arguments before passing it to the function itself. The inner nested function, i.e. ‘wrapper’ function, plays a significant role here. It is implemented to enforce encapsulation and thus, keep itself hidden from the global scope.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are Dict and List comprehensions?

A

Python comprehensions are syntactic sugar constructs that help build altered and filtered lists, dictionaries, or sets from a given list, dictionary, or set. Using comprehensions saves a lot of time and code that might be considerably more verbose. Examples:

  • Performing mathematical operations on the entire list
  • Performing conditional filtering operations on the entire list
  • Combining multiple lists into one
  • Flattening a multi-dimensional list
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is lambda in Python? Why is it used?

A

Lambda is an anonymous function in Python, that can accept any number of arguments, but can only have a single expression. It is generally used in situations requiring an anonymous function for a short time period. Lambda functions can be used in either of the two ways:

  1. Assigning lambda functions to a variable e.g.

mul = lambda a, b : a * b
print(mul(2, 5)) # output => 10

  1. Wrapping lambda functions inside another function:

def myWrapper(n):
return lambda a : a * n
mulFive = myWrapper(5)
print(mulFive(2)) # output => 10

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you copy an object in Python?

A

In Python, the assignment statement (= operator) does not copy objects. Instead, it creates a binding between the existing object and the target variable name. To create copies of an object in Python, we need to use the copy module. Moreover, there are two ways of creating copies for the given object using the copy module -

Shallow Copy is a bit-wise copy of an object. The copied object created has an exact copy of the values in the original object. If either of the values is a reference to other objects, just the reference addresses for the same are copied.

Deep Copy copies all values recursively from source to target object, i.e. it even duplicates the objects referenced by the source object.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are arguments passed in python - by value or by reference?

A

In Python, arguments are passed by reference, i.e., reference to the actual object is passed.

  • Pass by value: Copy of the actual object is passed. Changing the value of the copy of the object will not change the value of the original object.
  • Pass by reference: Reference to the actual object is passed. Changing the value of the new object will change the value of the original object.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is pickling and unpickling?

A

Python library offers a feature - serialization out of the box. Serializing an object refers to transforming it into a format that can be stored, so as to be able to deserialize it, later on, to obtain the original object. Here, the pickle module comes into play.

Pickling:

Pickling is the name of the serialization process in Python. Any object in Python can be serialized into a byte stream and dumped as a file in the memory. The process of pickling is compact but pickle objects can be compressed further. Moreover, pickle keeps track of the objects it has serialized and the serialization is portable across versions.

The function used for the above process is pickle.dump().

Unpickling:

Unpickling is the complete inverse of pickling. It deserializes the byte stream to recreate the objects stored in the file and loads the object to memory.
The function used for the above process is pickle.load().

Note: Python has another, more primitive, serialization module called marshall, which exists primarily to support .pyc files in Python and differs significantly from the pickle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are generators in Python?

A

Generators are functions that return an iterable collection of items, one at a time, in a set manner. Generators, in general, are used to create iterators with a different approach. They employ the use of yield keyword rather than return to return a generator object.

Instead of computing all the values upfront and storing them in memory, it generates them on-the-fly using a function and the yield keyword.

  • Memory Efficiency: Since generators generate values on-the-fly, they can represent large sequences of data without consuming memory for all the items at once. This is particularly useful when working with large datasets or streams of data.
  • Lazy Evaluation: Generators compute values lazily, meaning they produce values one at a time and only when requested. This can lead to performance improvements in scenarios where not all values are needed.
  • Infinite Sequences: Generators can represent infinite sequences

generate fibonacci numbers upto n
def fib(n):
p, q = 0, 1
while(p < n):
yield p
p, q = q, p + q
x = fib(10) # create generator object

x.__next__() # output => 0
x.__next__() # output => 1
x.__next__() # output => 1
x.__next__() # output => 2
x.__next__() # output => 3
x.__next__() # output => 5
x.__next__() # output => 8
x.__next__() # error

for i in fib(10):
print(i) # output => 0 1 1 2 3 5 8

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the benefits of generator functions?

A
  • Memory Efficiency: Generators yield one item at a time, making them more memory-efficient than functions that return a list with all the output values, especially for large data sets.
  • Lazy Evaluation: Generators produce values only when requested, allowing you to start using the results immediately without waiting for the entire result set to be generated, leading to better performance.
  • Simplicity and Modularity: Generators can simplify code by eliminating the need for temporary variables and complex loops, and they can be easily chained together to create modular data processing pipelines.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is PYTHONPATH in Python?

A

PYTHONPATH is an environment variable which you can set to add additional directories where Python will look for modules and packages. This is especially useful in maintaining Python libraries that you do not wish to install in the global default location.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the use of help() and dir() functions?

A

help() function in Python is used to display the documentation of modules, classes, functions, keywords, etc. If no parameter is passed to the help() function, then an interactive help utility is launched on the console.
dir() function tries to return a valid list of attributes and methods of the object it is called upon. It behaves differently with different objects, as it aims to produce the most relevant data, rather than the complete information.

  • For Modules/Library objects, it returns a list of all attributes, contained in that module.
  • For Class Objects, it returns a list of all valid attributes and base attributes.
  • With no arguments passed, it returns a list of attributes in the current scope.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the difference between .py and .pyc files?

A

.py files contain the source code of a program. Whereas, .pyc file contains the bytecode of your program. We get bytecode after compilation of .py file (source code). .pyc files are not created for all the files that you run. It is only created for the files that you import.

Before executing a python program python interpreter checks for the compiled files. If the file is present, the virtual machine executes it. If not found, it checks for .py file. If found, compiles it to .pyc file and then python virtual machine executes it.

Having .pyc file saves you the compilation time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is Python interpreted?

A

Python as a language is not interpreted or compiled. Interpreted or compiled is the property of the implementation. Python is a bytecode(set of interpreter readable instructions) interpreted generally.

Source code is a file with .py extension.

Python compiles the source code to a set of instructions for a virtual machine. The Python interpreter is an implementation of that virtual machine. This intermediate format is called “bytecode”.

.py source code is first compiled to give .pyc which is bytecode. This bytecode can be then interpreted by the official CPython or JIT(Just in Time compiler) compiled by PyPy.

17
Q

What are iterators in Python?

A
  • An iterator is an object.
  • It remembers its state i.e., where it is during iteration
  • __iter__() method initializes an iterator.
  • It has a __next__() method which returns the next item in iteration and points to the next element. Upon reaching the end of iterable object __next__() must return StopIteration exception.
    It is also self-iterable.
    Iterators are objects with which we can iterate over iterable objects like lists, strings, etc.
18
Q

Explain split() and join() functions in Python?

A
  • You can use split() function to split a string based on a delimiter to a list of strings.
  • You can use join() function to join a list of strings based on a delimiter to give a single string.

string = “This is a string.”
string_list = string.split(‘ ‘) #delimiter is ‘space’ character or ‘ ‘
print(string_list) #output: [‘This’, ‘is’, ‘a’, ‘string.’]
print(‘ ‘.join(string_list)) #output: This is a string.

19
Q

What does *args and **kwargs mean?

A

*args

  • *args is a special syntax used in the function definition to pass variable-length arguments.
  • “*” means variable length and “args” is the name used by convention. You can use any other.

def multiply(a, b, *argv):
mul = a * b
for num in argv:
mul *= num
return mul
print(multiply(1, 2, 3, 4, 5)) #output: 120

**kwargs

  • **kwargs is a special syntax used in the function definition to pass variable-length keyworded arguments.
  • Here, also, “kwargs” is used just by convention. You can use any other name.
  • Keyworded argument means a variable that has a name when passed to a function.
  • It is actually a dictionary of the variable names and its value.

def tellArguments(**kwargs):
for key, value in kwargs.items():
print(key + “: “ + value)
tellArguments(arg1 = “argument 1”, arg2 = “argument 2”, arg3 = “argument 3”)
#output:
# arg1: argument 1
# arg2: argument 2
# arg3: argument 3

20
Q

What is Concurrency? What is the difference between Concurrency and Parallelism?

A

Concurrency is a concept where several tasks are executed in overlapping time periods. It doesn’t necessarily mean tasks are executed simultaneously but can be interleaved or executed in parallel depending on the system.

Concurrency is about dealing with multiple tasks at once (not necessarily simultaneously), while Parallelism is about executing multiple tasks or processes simultaneously, often achieved by using multiple CPU cores.

21
Q

What is a Thread? What is Multithreading?

A

A thread is the smallest unit of execution within a process. It shares the same memory space as other threads within the same process but executes independently.

Multithreading is a type of concurrency where multiple threads run in the same process. It allows for efficient use of CPU time when one thread is waiting for resources (e.g., I/O operations).

22
Q

What is the Global Interpreter Lock (GIL) and how does it affect multithreading in Python?

A
  • The GIL is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes simultaneously in a single process.
  • The GIL was introduced to simplify memory management and prevent data corruption due to multiple threads accessing Python objects simultaneously.
  • Limitation on Concurrency: The GIL limits the performance benefits of multithreading in CPU-bound Python programs, as only one thread can execute Python code at a time.
  • I/O-bound Tasks: For I/O-bound tasks, the GIL can be released during I/O operations, allowing other threads to run and making multithreading more effective.
  • Alternatives: To achieve true parallelism in Python, consider using multiprocessing (multiple processes) or other concurrency models like asyncio (asynchronous I/O) that are not affected by the GIL.
  • CPython Specific: The GIL is specific to the CPython implementation of Python. Other implementations like Jython or IronPython do not have a GIL.
23
Q

What are the advantages and disadvantages of using threads?

A

Advantages: Efficient use of CPU time, shared memory space, responsiveness in I/O-bound programs.
Disadvantages: Complexity in managing threads, potential for race conditions and deadlocks, limited by GIL in Python.

24
Q

What are some alternatives to threads for achieving concurrency in Python?

A

Alternatives include multiprocessing (using multiple processes), asyncio (asynchronous I/O), and using external services or libraries like Celery for task queues.

  • asyncio: A Python standard library module that provides a framework for writing concurrent, asynchronous code using the async and await syntax, primarily for I/O-bound tasks.
  • Celery: A distributed task queue system for Python that allows for executing tasks asynchronously, potentially across multiple machines, suitable for both I/O-bound and CPU-bound tasks.
25
Q

What is a Race Condition and what is a deadlock? How can you prevent race conditions and deadlocks in multithreaded programs?

A
  • A race condition occurs when two or more threads access shared data and try to change it simultaneously, leading to unpredictable and incorrect results.
  • A deadlock occurs when two or more threads are waiting for each other to release resources, causing them to be stuck indefinitely.
  • Use synchronization mechanisms like locks, semaphores, or conditions to control access to shared resources.
  • Follow best practices like acquiring locks in a consistent order, using timeouts, or using higher-level constructs like Python’s threading.Lock or threading.Semaphore.
26
Q

Does python run with a single thread by default?

A

The operating system is capable of handling multiple processes concurrently. It allocates a separate memory space to each process, so that one process cannot access or write anything other’s space. A thread on the other hand can be thought of as a light-weight sub-process in a single program. Threads of a single program share the memory space allocated to it.

Python runs with a single thread by default. When you execute a Python script, it runs in a single process with a single thread of execution. However, Python provides mechanisms to create and manage multiple threads using the threading module.

It’s important to note that due to the Global Interpreter Lock (GIL) in CPython (the standard and most widely-used implementation of Python), even when multiple threads are used, only one thread can execute Python bytecode at a time in a single process. This means that while multithreading in Python can be beneficial for I/O-bound tasks, it may not provide a significant performance boost for CPU-bound tasks. For true parallel execution in such cases, one might consider using the multiprocessing module, which creates separate processes and can fully utilize multiple CPU cores.

27
Q

Discuss logging in python

A

Key Elements of Python Logging:

  1. Log Levels: Python’s logging library provides different severity levels for logs, such as DEBUG, INFO, WARNING, ERROR, and CRITICAL, allowing you to categorize the importance of log messages.
  2. Loggers: The logger is the main entry point into the logging system. You can create different loggers for various parts of an application to control logging behavior at a granular level.
  3. Handlers: Handlers determine where the log messages will be output, such as to the console, a file, or over a network. Multiple handlers can be attached to a single logger.
  4. Formatters: Formatters specify the layout and content of log messages, allowing you to include information like timestamps, log levels, and message content.
  5. Filters: Filters provide finer-grained control over which log records to output, based on conditions other than log levels.
  6. Propagation: Log messages in Python propagate from the logger that generated them up to the root logger, passing through all ancestor loggers in the logger hierarchy.
  7. Configuration: Logging behavior can be configured programmatically using Python code, or externally using configuration files or environment variables.
  8. Thread-Safety: Python’s logging library is thread-safe, meaning it can be used without issue in multi-threaded applications.
  9. Exception Logging: The logging library provides special functions like exception() to log exception information along with stack traces, which is useful for debugging.
  10. Customization: Python logging is highly customizable, allowing you to create custom log levels, handlers, and formatters to suit your specific needs.
28
Q

How is python run?

A
  1. Source Code Parsing and Compilation: Python source code is parsed into an Abstract Syntax Tree (AST) and then compiled into bytecode, a lower-level, platform-independent representation.
  2. Python Virtual Machine (PVM): The compiled bytecode is executed on Python’s built-in Virtual Machine, which interprets the bytecode instructions one by one.
  3. Dynamic Typing: Variables and data types are determined at runtime, with all data being represented as objects that have type, value, and identity.
  4. Standard Libraries and Native Modules: Python includes a rich set of pre-compiled libraries and modules for various tasks, written in C for performance.
  5. Memory Management and Extensibility: Python handles memory allocation and garbage collection automatically, and it can be extended with C and C++ code for optimization.

While Python is an interpreted programming language, Python code actually gets compiled down to more computer-readable instructions called bytecode when you run the programme. These instructions get interpreted by a virtual machine when you run your code. These are the .pyc files and __pycache__ folders

29
Q

What is the multiprocessing module?

A
  • Process-Based Parallelism: The multiprocessing module provides a way to achieve parallelism using separate processes, bypassing the Global Interpreter Lock (GIL) and taking advantage of multiple CPU cores.
  • Process and Pool: The module offers the Process class to represent individual parallel processes and the Pool class to manage a pool of worker processes for parallel execution of tasks.
  • Inter-Process Communication: Supports various mechanisms for communication between processes, including Queue, Pipe, and Value or Array for shared data.
  • Synchronization: Provides primitives like Lock, Semaphore, and Event to help coordinate and synchronize operations between processes, ensuring data consistency and order of execution.
  • Compatibility with Threading: The API is designed to be similar to the threading module, making it easier for developers to transition from multi-threading to multi-processing when needed.

The multiprocessing module is a powerful tool in Python for parallelizing CPU-bound tasks and can lead to significant performance improvements in suitable applications.