CAIC 10-12 part 1 Flashcards

(166 cards)

1
Q

How do you launch Jupyter Notebook?

A

Navigate to the MLSALab folder in Terminal or Command Prompt and run the command: jupyter notebook

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What file do you access after launching Jupyter Notebook?

A

churn.csv

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the purpose of the empty notebook in Jupyter Notebook?

A

To explore data and build models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you run code in a Jupyter Notebook cell?

A

Click on the run button on the toolbar.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What command is used to install Python packages for data manipulation and visualization?

A

! pip3 install pandas
! pip3 install matplotlib
! pip3 install scikit-learn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What function returns basic statistics about the data in a DataFrame?

A

describe()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What function is used to plot histograms for selected columns in a DataFrame?

A

hist()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What command calculates the correlation between features in a DataFrame?

A

corr()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the purpose of transforming Geography and Gender values in the dataset?

A

To convert categorical strings to ordinal numbers for ML algorithms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What library is commonly used for machine learning in Python?

A

sklearn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What command is used to drop unnecessary columns from the dataset?

A

churn_data.drop(columns = [‘Geography™, ‘Gender™, ‘RowNumber™, ‘Surname™], inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the purpose of splitting the dataset into training and testing sets?

A

To prepare for model training and validation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What function is used to split the dataset into training and testing sets?

A

train_test_split()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What method is used to train the model using the random forest algorithm?

A

fit()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What function is used to calculate the model accuracy?

A

accuracy_score()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the main goal of conjoint analysis?

A

To identify customer preferences and evaluate product features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What types of products can conjoint analysis be applied to?

A
  • Consumer goods
  • Electrical goods
  • Life insurance policies
  • Retirement communities
  • Luxury goods
  • Air travel
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the primary method used in conjoint analysis?

A

A research-based statistical method to evaluate different attributes of products.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Name the three types of conjoint studies.

A
  • Ranking-based conjoint
  • Rating-based conjoint
  • Choice-based conjoint
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a key feature of choice-based conjoint analysis?

A

Respondents choose between products with the same level of attributes but different combinations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the four steps in designing a conjoint experiment?

A
  • Identifying the sort of research
  • Determining the pertinent characteristics
  • Specifying the attributes’ levels
  • Managing the number of alternatives
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a fractional factorial design approach used for in conjoint analysis?

A

To decrease the number of profiles while ensuring enough data for statistical analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the overall utility value in conjoint analysis?

A

The evaluation of the entire product.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the partial utility value in conjoint analysis?

A

The degree of influence of individual elements on the purchase decision.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What Python packages are mentioned for data analysis and manipulation?
* pandas * numpy * scipy
26
What method can be scaled to include many more features?
The same method used for analyzing laptop specifications
27
What is the purpose of the Python package pandas?
Data analysis and data manipulation
28
What does the Python package NumPy allow you to use?
Matrices and arrays, as well as mathematical and statistical functions
29
What does the statsmodels Python package provide?
A complement to SciPy for statistical computations, including estimation and inference for statistical models
30
What is the purpose of the Seaborn package?
Effective data visualization
31
What is the first step in the provided code block?
Import the necessary packages and functions, and create a sample DataFrame with simulated data
32
What does 'X' represent in the code?
Predictor variables excluding the score
33
What does 'y' represent in the code?
Target variable, which is the score
34
What is the purpose of creating dummy variables in the analysis?
To encode categorical variables for inclusion in the model
35
What does OLS stand for?
Ordinary Least Squares
36
What does the OLS method minimize?
The sum of squares of the differences between observed and predicted values
37
In OLS, what does a linear regression model establish?
The relationship between a dependent variable and at least one independent variable
38
What does an R-squared value of 0.685 indicate?
The model accounts for 68.5% of the variation in the score variable
39
What does the F-statistic evaluate?
Whether a set of variables is statistically significant
40
What does the intercept represent in a linear regression model?
The outcome when all variables are set to 0
41
What does the standard error measure?
The accuracy with which the coefficient was measured
42
What does a low p-value indicate?
A high likelihood that the coefficient was not determined by chance
43
What is the significance of the condition number in regression analysis?
It measures the sensitivity of the model to changes in the data
44
What does multicollinearity refer to?
Two or more independent variables that are strongly related to each other
45
What is the purpose of sorting the data_res DataFrame by weight?
To identify the most significant variables in the model
46
What indicates that a variable is not statistically significant?
A p-value above 5%
47
What does the term 'homoscedasticity' refer to?
The uniform distribution of errors in the data
48
What does the Omnibus test assess?
The normality of the distribution of residuals
49
What is the difference between AIC and BIC?
BIC penalizes free parameters more severely than AIC
50
What is the goal of the next code block after OLS modeling?
To read a dataset and transform categorical data into a one-hot vector representation
51
What does the get_dummies function do in pandas?
Transforms categorical data into a one-hot vector representation
52
What is the outcome of the OLS model fit in the consumer goods product example?
It estimates the linear relationship using the one-hot encoded product features
53
What is the main difference between AIC and BIC?
AIC is not consistent, whereas BIC is consistent. ## Footnote AIC is more suited for out-of-sample prediction, while BIC is better for model selection.
54
What does BIC penalize more severely than AIC?
BIC penalizes free parameters more severely than AIC. ## Footnote This makes BIC more stringent in model selection.
55
What is the significance of AIC in model evaluation?
AIC examines whether the model has overfitting risk. ## Footnote AIC often looks for unidentifiable models in high-dimensional realities.
56
What does cross-validation help achieve with AIC?
Cross-validation can be made asymptotically equal with the help of AIC.
57
What type of relationship can be visualized using weights in a regression model?
Factors that are positively and negatively related can be visualized. ## Footnote For example, a weight of 100 grams is positively related while a weight of 400 grams is negatively related.
58
What is the target variable normalized to in the discussed predictive models?
The target variable is normalized from a ranking to a 1 to 10 score.
59
What is Logistic Regression primarily used for?
Logistic Regression is used for classification tasks.
60
What distinguishes binary from multinomial logistic regression?
Binary logistic regression differentiates between two classes, while multinomial logistic regression is used when the target variable has three or more potential values.
61
What is ordinal logistic regression used for?
Ordinal logistic regression is utilized when the target variable is ordinal in nature with quantitative meaning.
62
What is the primary distinction between linear and logistic regression?
Linear regression is used for regression problems with continuous values, whereas logistic regression is used for classification problems with discrete values.
63
What does MSE stand for?
Mean Squared Error.
64
What does a large MSE indicate about a linear regression model?
It indicates that the model does not accurately predict the data.
65
What is the effect of outliers on MSE?
MSE is sensitive to outliers, as large outlier errors amplify the MSE.
66
What is the role of Random Forest in machine learning?
Random Forest is used for both classification and regression and is known for adaptability and user-friendliness.
67
How does Random Forest reduce variance?
By averaging predictions from multiple decision trees constructed from different samples.
68
What is the function of the RandomForestClassifier in scikit-learn?
It is used to fit models to data for classification tasks.
69
What is XGBoost known for?
XGBoost is known for its speed and performance in machine learning tasks.
70
How does XGBoost handle large datasets?
It is parallelizable and can operate on clusters of GPUs or networks of PCs.
71
What is the process of customer segmentation?
The practice of classifying customers into groups based on shared traits.
72
What are common criteria for customer segmentation?
* Age * Gender * Marital status * Location * Life stage
73
What are typical methods for gathering consumer data?
* Face-to-face interviews * Online surveys * Online marketing and web traffic information * Focus groups
74
What is the first stage of understanding customer segments?
To understand the data that will be used.
75
What are some gathering methods in consumer goods?
* Face-to-face interviews with customers * Online surveys * Online marketing and web traffic information * Focus groups ## Footnote These methods help organizations collect data from customers for analysis.
76
What is the first stage in understanding customer segments?
Exploration of the data to check the variables. ## Footnote This involves handling non-structured data and adjusting data types.
77
Which Python packages are used for data analysis?
* Pandas * NumPy * SciPy * Yellowbrick * Matplotlib ## Footnote Each package serves specific functions in data manipulation, statistical computations, and visualization.
78
What command is used to limit the maximum rows displayed in Pandas?
pd.options.display.max_rows = 20 ## Footnote This command helps in controlling the display of data for better readability.
79
What method is used to get a statistical summary of a DataFrame in Pandas?
data.describe() ## Footnote This method provides a statistical overview of the data, including count, mean, and standard deviation.
80
How many missing values are there in the Income column?
26 missing values ## Footnote Missing income values can impact the analysis, necessitating data cleaning.
81
What method is used to parse a date column in Pandas?
pd.to_datetime() ## Footnote This method converts a column to datetime format for easier manipulation.
82
What feature indicates the number of days a customer has been registered?
Customer_For ## Footnote This feature is calculated as the difference between the registration date and the earliest registration date.
83
What is the purpose of feature engineering in data analysis?
To clean and structure the data for better understanding and treatment. ## Footnote This involves creating new features and simplifying existing ones.
84
What is the command to create an Age variable from Year_Birth?
data[‘Age’] = pd.to_datetime('today').year - data[‘Year_Birth’] ## Footnote This calculates the current age of customers based on their birth year.
85
What does the Living_With feature represent?
Simplification of marital status to describe the living situation of couples. ## Footnote This feature consolidates various marital status categories into broader terms.
86
How is the total number of children in a household indicated?
Children feature, calculated as the sum of Kidhome and Teenhome. ## Footnote This gives insight into the family structure of customers.
87
What does the Is_Parent feature indicate?
Whether a customer has children, represented as an integer. ## Footnote This feature helps in analyzing spending behavior based on parental status.
88
What method is used to visualize the distribution of Age?
data[‘Age’].plot.hist(figsize=(12,6)) ## Footnote This command creates a histogram to visualize the age distribution of customers.
89
What is the maximum age cap set to remove outliers?
Age < 99 ## Footnote This helps in filtering out unrealistic age values from the dataset.
90
What is the purpose of creating a correlation matrix?
To analyze relationships between different variables. ## Footnote This helps in identifying patterns and correlations in the data.
91
What method is used to create a correlation matrix?
corr method
92
What type of mask is used to show only the lower triangle of the correlation matrix?
numpy mask
93
What library is used to display the correlation matrix visually?
Seaborn
94
What does a negative correlation between children and expenditure indicate?
As the number of children increases, expenditure decreases
95
What is the purpose of segmenting clients into groups?
To better target different audience subgroups
96
List the benefits of audience segmentation.
* Creating targeted marketing communication * Applying the right pricing options * Concentrating on the most lucrative clients * Providing better client service * Promoting and cross-promoting other goods and services
97
What is the first step in preprocessing data for clustering?
Encoding the categorical variables using a label encoder
98
What scaler is used to normalize the values in the dataset?
StandardScaler
99
What is PCA commonly used for in data analysis?
Dimensionality reduction
100
What does PCA stand for?
Principal Component Analysis
101
Fill in the blank: The basic principle of PCA is to keep as much information as possible while reducing the number of _______.
variables
102
What does explained variance measure in PCA?
How much of a dataset's variability may be attributed to each unique primary component
103
What technique is used for clustering in this context?
Agglomerative clustering
104
True or False: Clustering is a supervised method.
False
105
What is the elbow method used for?
To determine the number of clusters to be formed
106
What is the purpose of the KElbowVisualizer in K-means clustering?
To plot the explained variation as a function of the number of clusters
107
How many clusters were determined to be the best choice for the dataset?
Four clusters
108
What does the term 'Clusters' refer to in this context?
Groups formed based on similarities among data points
109
What visualization is used to show the distribution of clusters?
3D scatter plot
110
List the cluster patterns observed in income versus spending.
* Cluster 0: high spending and average income * Cluster 1: high spending and high income * Cluster 2: low spending and low income * Cluster 3: high spending and low income
111
What feature is created to sum accepted promotions?
TotalProm
112
What type of plot is used to visualize total promotions vs cluster?
Count plot
113
What is the purpose of the Boxen plot in the analysis?
To find the spend distribution per cluster
114
What method is used to visualize the number of deals purchased per type of cluster?
Boxen plot
115
What is a major conclusion drawn about promotional campaigns?
Transactions were successful despite promotions not being widespread
116
What is the purpose of using a Seaborn joint plot in the analysis?
To visualize both the relationships and distributions of different variables. ## Footnote Joint plots are useful for exploring the relationship between pairs of variables in a dataset.
117
What does price elasticity of demand measure?
How much a product’s consumption changes in response to price changes. ## Footnote It indicates the responsiveness of quantity demanded to price changes.
118
Define elastic demand.
A good is elastic if a price adjustment results in a significant shift in either supply or demand.
119
What is inelastic demand?
When a price adjustment does not significantly affect demand or supply.
120
How is price elasticity mathematically defined?
Price elasticity = (variation in quantity demanded) / (variation in price). ## Footnote Each term represents specific variables in the elasticity formula.
121
What indicates that a good has inelastic demand?
The absolute value of elasticity is less than 1.
122
What happens to revenue when the price elasticity is exactly 1?
Revenue is maximized.
123
What are the two exceptions to the rule of demand with positive elasticity?
Veblen goods and Giffen goods.
124
What Python packages are mentioned for data analysis?
* pandas * numpy * statsmodels * matplotlib * seaborn
125
What information does the 'data' DataFrame contain?
Transactional data of items sold by a food truck and some external variables.
126
What command is used to perform a descriptive statistical analysis after dropping ITEM_ID?
data.drop(['ITEM_ID'], axis=1).describe().
127
What type of plot allows us to see the distribution and relationships between variables at a glance?
Pair plot.
128
What variable was used to visualize price differences among categories in a Seaborn histogram plot?
CAT variable.
129
What does a negative price elasticity indicate?
A decrease in quantity demanded when price increases.
130
Fill in the blank: A good with an elastic demand of -2 has a drop in quantity that is ______ as great as its price rise.
twice.
131
What does the joint plot of Family_Size versus Spent reveal about Cluster 1?
Cluster 1 represents small family sizes.
132
What is the focus of the relationship plot created for items in CAT = 2?
The relationship between price and quantity of items sold as part of a combo.
133
What should be analyzed to determine the optimal prices for different items?
Price differences and elasticities for each item.
134
What is the significance of exploring price elasticity for a food truck owner?
To understand how price increases affect demand for hamburgers.
135
What is the primary objective of the analysis conducted on the food truck's data?
To determine how price changes influence the quantity sold.
136
What relationship does the document examine at the item level?
The relationship between price and quantity sold to determine the price that maximizes revenue.
137
What method is used to normalize price and quantity data?
Subtract the mean and divide by the range.
138
What range do normalized prices and quantities fall into?
-1 to 1.
139
What does a rolling average do to the data?
Softens the curves.
140
What correlation is noted between price and temperature?
Negative correlation; people do not prefer certain food combinations in hot weather.
141
What does a positive correlation with school breaks suggest?
Increased product purchases during school breaks.
142
What is the relationship between price and weekend sales?
Negative relationship; less traffic during weekends may lead to lower sales.
143
What is a demand curve?
A graph that depicts the relationship between the price of a good and the quantity demanded.
144
What generally happens to demand as price rises, according to the law of demand?
Demand declines.
145
What are some exceptions to the law of demand?
* Speculative bubbles * Veblen goods * Giffen goods
146
How do demand curves interact with supply curves?
They establish an equilibrium price.
147
What is the first step in finding the demand curve for an item?
Isolate the data for each item into a separate DataFrame.
148
What statistical method is used to analyze the price versus demand relationship?
Ordinary Least Squares (OLS) regression.
149
What does the price elasticity of demand indicate?
How sensitive the quantity demanded is to a change in price.
150
What visualization is used to analyze model performance?
Partial regression plots.
151
What is the purpose of the function create_model_and_find_elasticity?
To create models, determine price elasticity, and return results.
152
What does the code simulate to find the optimal pricing?
It simulates revenue for each possible price.
153
What is the relationship between price and quantity sold when the demand curve is inelastic?
More of the product will continue to sell even if the price increases.
154
What does a concave demand curve with negative elasticity indicate?
If the price increases, fewer units will be sold.
155
What does the function find_optimal_price do?
It finds the maximum revenue by determining the optimal price.
156
What is the significance of normalizing data variables in analysis?
To visualize the relationship regardless of the unit of measure.
157
What is the formula for calculating the normalized QUANTITY?
test[‚QUANTITY™] = (test[‚QUANTITY™]-test[‚QUANTITY™].mean())/(test[‚QUANTITY™].max()-test[‚QUANTITY™].min())
158
What is the formula for calculating the normalized REVENUE?
test[‚R™] = (test[‚REVENUE™]-test[‚REVENUE™].mean())/(test[‚REVENUE™].max()-test[‚REVENUE™].min())
159
What type of plot is generated for Price Elasticity?
A plot with the title 'Price Elasticity - Item' and the item_id.
160
How do you find the index of the maximum REVENUE?
ind = np.where(test[‚REVENUE™] == test[‚REVENUE™].max())[0][0]
161
What does values_at_max_profit contain?
A dictionary with keys 'PRICE', 'QUANTITY', and 'REVENUE' containing their respective values at maximum profit.
162
What is the purpose of the optimal_price dictionary?
To store the optimal prices for all items.
163
What is the output of the loop that iterates over the optimal_price dictionary?
It prints the item_id and the corresponding optimal price.
164
What does negative elasticity imply for pricing?
Higher prices lead to lower quantities sold.
165
What is product recommendation?
A filtering system that anticipates and presents goods a user would be interested in buying.
166
Fill in the blank: Product recommendation is used to generate _______ that keep users engaged with your product and service.
[recommendations]