FA5 + M5 - Sheet1 Flashcards

(138 cards)

1
Q

Which of the following libraries are used for mathematical and statistical operations on multi-dimensional arrays and matrices in Python?

Group of answer choices

Matplotlib

NumPy

Pandas

A

NumPy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of the following libraries are used for data visualization in Python?

Group of answer choices

NumPy

Matplotlib

SciPy

A

Matplotlib

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which of the following libraries are used for deep learning in Python?

Group of answer choices

TensorFlow

Scikit-learn

Keras

A

TensorFlow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which of the following libraries are used for natural language processing in Python?

Group of answer choices

NLTK

Scrapy

Scikit-learn

A

NLTK

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which of the following libraries are used for creating spiders bots that scan website pages and collect structured data in Python?

Group of answer choices

Scrapy

Pandas

SciPy

A

Scrapy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of the following libraries are used for object identification, speech recognition, and more in Python?

Group of answer choices

PyTorch

Keras

Dist-keras

A

Tensorflow dapat pero Pytorch ung tama sa canvas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which of the following libraries are used for reading data, selecting and filtering in data, and data manipulations in Python? There are two correct answer in the options, just choose one.

Group of answer choices

PyTorch

Pandas

NumPy

SciPy

A

Pandas
NumPy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which of the following libraries are used for creating interactive and scalable visualizations in a browser using JavaScript widgets in Python? There are two correct ansers from the choices, just select one.

Group of answer choices

SciPy

Bokeh

NumPy

Bokeh

Plotly

Plotly

NumPy

SciPy

A

Bokeh
Bokeh

Plotly
Plotly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which Python libraries are built on NumPy? There are two correct ansers from the choices, just select one.

Group of answer choices

Pandas

Seaborn

Scikit-Learn

Matplotlib

A

Pandas
Scikit-Learn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which Python library provides machine learning algorithms?

Group of answer choices

Pandas

Scikit-Learn

NumPy

Matplotlib

A

Scikit-Learn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data Wrangling:

A

SciPy
NumPy
pandas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Statistic

A

StatsModels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

NLP

A

Natural Language Toolkit
SpaCy
gensim

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Machine Learning

A

scikitlearn
xgboost
lightgbm
catboost
eli5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Deep Learning

A

TensorFlow
Pytorch
Keras

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Distributed Deep Learning

A

dist-keras
elephas
spark-deep-learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Visualization

A

matplotlib
Bokeh
plotly
Seaborn
pydot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

it is intended for processing large multidimensional arrays and matrices, and an extensive collection of high-level mathematical functions and implemented methods makes it possible to perform various operations with these objects

A

NumPy (numpy.org)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

it is based on NumPy and therefore extends its capabilities. SciPy main data structure is again a multidimensional array, implemented by Numpy.

A

SciPy (scipy.org/scipylib)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

The package contains tools that help with solving linear algebra, probability theory, integral calculus and many more tasks

A

SciPy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

provides high-level data structure and a vast variety of tools for analysis. The great feature of this package is the ability to translate rather complex operations with data into one or two commands.

A

Pandas (pandas.pydata.org)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

contains many built-in methods for grouping, filtering, and combining data, as well as the time-series functionality

A

Pandas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

is a low-level library for creating two-dimensional diagrams and graphs.

A

Matplotlib (matplotlib.org)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

With iths help, you can build diverse charts, from histograms and scatterplots to non-Cartesian coordinates graphs.

A

Matplotlib (matplotlib.org)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Moreover, many popular plotting libraries are designed to work in conjunction with ____
matplotlib
26
is essentially a higher-level API based on the matplot library.
Seaborn (seaborn.pydata.org)
27
It contains more suitable default settings for processing charts.
Seaborn
28
Also, there is a rich gallery of visualizations including some complex types like time series, jointplots, and violin diagrams
Seaborn
29
is a popular library that allows you to build sophisticated graphics easily.
Plotly (plot.ly/python/)
30
The package is adapted to work in interactive web applications.
Plotly
31
Among its remarkable visualizations are contour graphics, ternary plots, and 3D charts
Plotly
32
The ____library creates interactive and scalable visualizations in a browser using JavaScript widgets.
Bokeh (bokeh.pydata.org/en/latest/)
33
The library provides a versatile collection of graphs, styling possibilities, interaction abilities in the form of linking plots, adding widgets, and defining callbacks, and many more useful features.
Bokeh
34
is a popular framework for deep and machine learning, developed in Google Brain.
TensorFlow (tensorflow.org)
35
It provides abilities to work with artificial neural networks with multiple data sets.
TensorFlow
36
Among the most popular TensorFlow applications are _____ and more.
object identification, speech recognition,
37
is a large framework that allows you to perform tensor computations with GPU acceleration, create dynamic computational graphs and automatically calculate gradients.
PyTorch (pytorch.org)
38
Above this, ____ offers a rich API for solving applications related to neural networks
PyTorch
39
is a high-level library for working with neural networks, running on top of TensorFlow, Theano, and now as a result of the new releases.
Keras (keras.io)
40
It simplifies many specific tasks and greatly reduces the amount of monotonous code. However, it may not be suitable for some complicated things.
Keras (keras.io)
41
These packages allow you to train neural networks based on the Keras library directly with the help of Apache Spark
Dist-keras (joerihermans.com/work/distributed-keras/)
42
dist-keras and others are gaining popularity and developing rapidly, and it is very difficult to single out one of the libraries since they are all designed to ______
solve a common task.
43
This Python module based on NumPy and SciPy is one of the best libraries for working with data.
Scikit-learn (scikit-learn.org/stable)
44
It provides algorithms for many standard machine learning and data mining tasks such as clustering, regression, classification, dimensionality reduction, and model selection
Scikit-learn
45
is an extension module that makes several frequent item set mining implementations available as functions.
PyFim
46
In PyFim, Currently _______ are available as functions, although the interfaces do not offer all of the options of the command line progarm
apriori, eclat, fpgrowth, sam, relim, carpenter, ista, accretion and apriacc
47
Often the results of machine learning models predictions are not entirely clear, and this is the challenge that ___ library helps to deal with.
eli5
48
it is a package for visualization and debugging machine learning models and tracking the work of an algorithm step by step.
Eli5 (eli5.readthedocs.io/en/latest/)
49
It provides support for scikit-learn, XGBoost, LightGBM, lightning, and sklearn-crfsuite libraries and performs the different tasks for each of them
eli5
50
is a set of libraries, a whole platform for natural language processing.
NLTK (nltk.org)
51
With the help of ____, you can process and analyze text in a variety of ways, tokenize and tag it, extract information, etc.
NLTK
52
is also used for prototyping and building research systems
NLTK
53
is a Python library for robust semantic analysis, topic modeling and vector-space modeling, and is built upon Numpy and Scipy.
Gensim (radimrehurek.com/gensim)
54
Gensim provides an implementation of popular NLP algorithms, such as _____.
word2vec
55
Although gensim has its own models.wrappers.fasttext implementation, the ____ can also be used for efficient learning of word representations.
fasttext library
56
is a library used to create spiders bots that scan website pages and collect structured data.
Scrapy (scrapy.org)
57
In addition, Scrapy can extract data from the ___
API
58
The library happens to be very handy due to its extensibility and portability
Scrapy
59
Introduces for multi-dimensional arrays and matrices, as well as functions that allow to easily perform advanced mathematical and statistical operations on those objects
NumPy
60
Provides vectorization of mathematical operations on array and matrices which significantly improves the performance
NumPy
61
Many other python libraries are built on ____
NumPy
62
adds data structures and tools designed to work with table - like data (similar to Series and Data Frames in R)
Pandas
63
Provides tools and data manipulation: reshaping, sorting, slicing, aggregation etc.
Pandas
64
Allow handling missing data
Pandas
65
provides machine learning algorithms: classification, regression, clustering, and model validation
Scikit-Learn
66
Build on NumPy, SciPy, and matplotlib
Scikit-Learn
67
Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats
matplotlib
68
A set of functionalities similar to those of MATLAB
matplotlib
69
Line plots, scatter plots, bar charts, histograms, pie charts etc
matplotlib
70
Relatively low-level; some effort needed to create advanced visualization
matplotlib
71
based on matplotlib. Provides high level interface for drawing attractive statistical graphics
Seaborn
72
Seaborn is similar (in style) to the popular ___ library in R
ggplot2
73
Loading Python Libraries
import numpy as np import scipy as sp impor pandas as pd import matplotlib as mpl import seaborn as sns
74
Press ____ to execute jupyter cell
Shift+Enter
75
There are numerous commands to read other data formats:
pd.read.excel(‘myfile.xlsx’, sheet_name = ‘Sheet1’, index_col = None, na_values = [‘NA’]) pd.read_stata(‘myfile.dts’)
76
List first 5 records
df.head()
77
To view the first 10 records
pd.iloc[:10]
78
To view the last few records
df.tail(10)
79
The most general dtype. Will be assigned to your column if column has mixed type numbers and strings
object (string)
80
Numeric characters, 64 refers to the memory allocated to hold this character
Int64 (Int)
81
Numeric characters with decimals. If a column contains number and Nans, pandas will default to float64, in case your missing value has a decimal
Float64 (Float)
82
Values meant to hold time data. Look into these for time series experiments
Datetime64, timedelta[ns] (N/A)
83
Check a particular column type
df['salary'].dtype
84
Check types for all the columns
df.dtypes
85
list the types of the columns
dtypes
86
list the column names
columns
87
list the row labels and column names
axes
88
number of dimensions
ndim
89
number of elements
size
90
return a tuple representing the dimensionality
shape
91
numpy representation of the data
values
92
Unlike attributes, python methods have ___
parentheses.
93
All attributes and methods can be listed with a ____
dir() function
94
first/last n rows
head( [n] ), tail( [n] )
95
generate descriptive statistics (for numeric columns only)
describe()
96
return max/min values for all numeric columns
max(), min()
97
return mean/median values for all numeric columns
mean(), median()
98
standard deviation
std()
99
returns a random sample of the data frame
sample([n])
100
drop all the records with missing values
dropna()
101
Using “group by” method we can:
Split the data into groups based on some criteria Calculate statistics (or apply a function) to each group Similar to dplyr() function in R
102
group data using rank
df_rank = df.groupby(['rank'])
103
To subset the data we can apply ____
Boolean indexing.
104
To subset the data we can apply Boolean indexing. This indexing is commonly known as a ____
filter
105
Any ____ can be used to subset the data:
Boolean operator
106
There are a number of ways to subset the Data Frame:
one or more columns one or more rows a subset of rows and columns
107
Rows and columns can be selected by their position or label
Slicing
108
When selecting one column, it is possible to use single set of brackets, but the resulting object will be a ____(not a DataFrame):
Series
109
When we need to select more than one column and/or make the output to be a DataFrame, we should use ____
double brackets:
110
When summing the data, missing values will be treated as ___
zero
111
If all values are missing, the sum will be equal to____
NaN
112
methods ignore missing values but preserve them in the resulting arrays
cumsum() and cumprod()
113
Missing values in ___ method are excluded (just like in R)
GroupBy
114
Many descriptive statistics methods have ___ option to control if missing data should be excluded. This value is set to True by default (unlike R)
skipna
115
computing a summary statistic about each group, i.e. compute group sums or means compute group sizes/counts
Aggregation
116
Common aggregation functions:
min, max count, sum, prod mean, median, mode, mad std, var
117
are useful when multiple statistics are computed per column
agg()
118
Basic statistic (count, mean, std, min, quantiles, max)
describe()
119
Minimum and maximum values
min, max
120
Arithmetic average, median, and mode
mean, median, mode
121
Variance and standard deviation
var, std
122
Standard error of mean
sem
123
Sample skewness
skew
124
kurtosis
kurt
125
histogram
displot
126
estimate of central tendency for a numeric variable
barplot
127
similar to boxplot, also shows the probability density of the data
violinplot
128
Scatterplot
jointplot
129
regression plot
regplot
130
Pairplot
pairplot
131
Boxplot
boxplot
132
categorical scatterplot
swarmplot
133
general categorical plot
factorplot
134
both have a number of function for statistical analysis
statsmodel and scikit-learn
135
mostly used for regular analysis using R style formulas
statsmodel
136
is more tailored for Machine Learning
scikit-learn
137
statsmodels:
inear regressions ANOVA tests hypothesis testings many more
138
scikit-learn:
kmeans support vector machines random forests many more