Compiled Summatives - Sheet1 Flashcards

(98 cards)

1
Q

What is the primary focus of statistics?

Predictive modeling
Data mining
Application of algorithms to inform strategic decisions
Collection, analysis, interpretation, presentation, and organization of data

A

Collection, analysis, interpretation, presentation, and organization of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which of the following methods is commonly used in statistics to understand data distributions and relationships?

Algorithm application
Data mining
Hypothesis testing and regression analysis
Predictive modeling

A

Hypothesis testing and regression analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does analytics emphasize in addition to statistical methods?

Data presentation
Data interpretation
Predictive modeling and data mining
Data collection

A

Predictive modeling and data mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which of the following best describes the scope of analytics?

Integrates statistical methods with advanced computational techniques
Focuses solely on hypothesis testing
Limited to data collection and presentation
Only involves data organization

A

Integrates statistical methods with advanced computational techniques

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the first step in the data analysis process

Get actionable information
Extract patterns
Prepare data
Apply machine learning techniques

A

Prepare data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of the following is not listed as a data source from the chart?

Printed Books
Email
Social Media Posts
Audio

A

Printed Books

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the second step of the process involve?

Finding patterns using algorithms
Making decisions based on information
Collecting raw information
Cleaning and transforming databases

A

Finding patterns using algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In which step would you apply machine learning techniques according to this flowchart?

Step 2- Extract Patterns
None of the above steps explicitly mention applying machine learning techniques
Step 3 - Get Actionable Information
Step 1 - Prepare Data

A

Step 2- Extract Patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What outcome does this flowchart suggest as a result of following these steps?

Creation of new databases
Learning how to code in various programming languages
Development of new software programs
Gaining insights or making informed decisions based on analyzed data

A

Gaining insights or making informed decisions based on analyzed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does transactional data primarily consist of?

Visual representations of data
General summaries of transactions
Structured, detailed information
Unstructured and random information

A

Structured, detailed information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which of the following is an example of transactional data?

Credit card payment
Social media posts
Weather forecasts
Movie reviews

A

Credit card payment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What type of information is included in contractual, subscription, or account data?

Social media interactions
General market trends
Information about the type of product combined with customer characteristics
Weather patterns

A

Information about the type of product combined with customer characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which of the following is an example of a product type mentioned in the statement?

Loan
Weather forecast
Movie review
Social media post

A

Loan

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the primary aim of surveys?

To extract sociodemographic and behavioral data from a particular group of people
To organize social events for communities
To entertain a particular group of people
To provide financial assistance to people

A

To extract sociodemographic and behavioral data from a particular group of people

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Surveys are typically in the form of:

Novels
Music albums
Questionnaires
Art exhibitions

A

Questionnaires

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which of the following is NOT an example of unstructured data?

Social media posts
Media files
Sensor data
Spreadsheets

A

Spreadsheets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is unstructured data?

Information that resides in a traditional row-column database
Data that is always textual
Data that is always numerical
Information that does not reside in a traditional row-column database

A

Information that does not reside in a traditional row-column database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which of the following is an example of a purpose for which data poolers gather data?

Marketing and credit risk assessment
Weather forecasting
Event planning
Cooking recipes

A

Marketing and credit risk assessment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the primary role of data poolers?

To provide financial advice
and sell data for specific purposes
To develop software applications
To create new databases

A

and sell data for specific purposes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the first phase in the data analytics process?

Business Understanding
Modelling
Data Preparation
Evaluation

A

Business Understanding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the primary goal of the Business Understanding phase?

Cleaning data for better quality
Evaluating the model
Evaluating the model
Applying machine learning algorithms

A

Evaluating the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Which phase involves selecting related data from various databases?

Data Understanding
Deployment
Data Preparation
Modelling

A

Data Understanding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Which of the following is NOT a type of database mentioned in the Data Understanding phase?

Relational Databases
Temporal, Sequence or Time-Series Database
Social Media Databases
Data Warehouses

A

Social Media Databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is another term for Data Preparation?

Data Modelling
Data Preprocessing
Data Transformation
Data Cleaning

A

Data Preprocessing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Which of the following activities is NOT part of Data Preparation? Aggregating data Filling in missing values Applying machine learning algorithms Filtering outliers
Applying machine learning algorithms
26
What does Data Transformation involve? Converting different measurements into a unified numerical scale Evaluating the model Selecting related data from databases Cleaning data for better quality
Converting different measurements into a unified numerical scale
27
Which of the following is an example of categorical values? Filtered data Numerical scales Ordinal values (less, moderate, strong) Aggregated data
Ordinal values (less, moderate, strong)
28
What is the primary focus of the Modelling phase? Applying statistical and machine learning algorithms Identifying business tasks Selecting related data Cleaning data
Applying statistical and machine learning algorithms
29
Which phase involves evaluating the performance of the model? Deployment Data Preparation Business Understanding Evaluation
Evaluation
30
What is the final phase in the data analytics process? Modelling Deployment Evaluation Data Understanding
Deployment
31
Which activity is part of the Data Preparation phase? Identifying relevant data for the problem description Evaluating the model Applying machine learning algorithms Filtering outliers and redundancies
Filtering outliers and redundancies
32
What type of data can be found in a Temporal, Sequence or Time-Series Database? Static data Aggregated data Time-based data Categorical data
Time-based data
33
Which phase involves selecting the related data from many available databases to correctly describe a given business task? Data Understanding Evaluation Data Preparation Modelling
Data Understanding
34
What is the definition of Mean The range of values in a dataset The average value of a dataset The middle value in a dataset The most frequently occurring value in a dataset
The average value of a dataset
35
How is the Mean calculated? By identifying the most frequent value By summing all values and dividing by the number of values By subtracting the smallest value from the largest value By finding the middle value
By summing all values and dividing by the number of values
36
What does the Median represent? The most frequently occurring value in a dataset The middle value when arranged in order The difference between the highest and lowest values The average value of a dataset
The middle value when arranged in order
37
Which measure of central tendency can have multiple values? Median Mean Range Mode
Mode
38
What is the primary purpose of measures of central tendency? Measuring dispersion Solving equations Calculating probability Organizing, summarizing, and visualizing data
Organizing, summarizing, and visualizing data
39
Formula for mean of population data
40
Formula for mean of sample data
41
What is the midrange of the data set 11, 13, 4, 30, 9, 15? 15 17 16 18
17
42
Median Formula for Grouped Datasets
t
43
What does the SUM function do? Calculates the mean value of a dataset Adds a range of cells Returns the median value of a dataset Returns the maximum value of a dataset
Adds a range of cells
44
Which function would you use to calculate the arithmetic average of a range of cells? SUMIF AVERAGE MEDIAN MAX
AVERAGE
45
For finding the smallest value in your data set, which function will you use? AVERAGE MIN SUMIF MAX
MIN
46
Given the class boundaries 50-60, 60-70, 70-80, 80-90, and 90-100 with frequencies 5, 12, 9, 6, and 4 respectively, what is the total frequency (N)? 30 36 45 40
36
47
What is the midpoint ((d_i)) of the class 70-80? 70 80 65 75
75
48
Calculate the product of the midpoint and frequency for the class 80-90. 600 570 540 510
510
49
Match each question to its corresponding statistical method. Which factors move together? Are there differences in distribution? Are two populations similar?
Correlation Coefficient Categorical Distribution Analysis of Variance (ANOVA, F-Test)
50
What branch of statistics involves using sample data to make conclusions or predictions about a larger population? Inferential Statistics Descriptive Statistics Non-parametric Statistics Bayesian Statistics
Inferential Statistics
51
Which method measures the linear relationship between two numerical variables? Pearson Correlation Coefficient Chi-Square Test ANOVA T-test
Pearson Correlation Coefficient
52
What does the F-Test in ANOVA compare? Variances within and between groups Means of two samples Medians of two samples Standard deviations of two samples
Variances within and between groups
53
Which technique is NOT used for hypothesis testing? Predictive Modeling Z-test T-test Chi-Square Test
Predictive Modeling
54
What does the significance level (α) indicate in hypothesis testing? The probability of rejecting the null hypothesis when it is true. The maximum allowed sample size. The minimum sample size for an accurate test. The variance within a sample.
The probability of rejecting the null hypothesis when it is true.
55
What is the significance level (α) commonly set at? 0.05 or 5% 0.10 or 10% 0.01 or 1% 0.20 or 20%
0.05 or 5%
56
In the given example, which two categorical variables are being tested for association? Gender (male/female) and smoking status (smoker/non-smoker) Age group and education level Income level and exercise frequency Ethnicity and diet preference
Gender (male/female) and smoking status (smoker/non-smoker)
57
A T-test is a statistical test used to determine whether there is a significant difference between sample and population means, or between the means of two samples. True False
False
58
Use a Z-test when the population standard deviation (σ) is unknown and must be estimated from the sample. True False
False
59
What is the formula for Pearson Correlation Coefficient
60
If you know the population standard deviation and have a large sample size (n > 30), you can use a Z-test for comparing means. True False
True
61
If the population standard deviation is unknown or the sample size is small (n < 30), use a t-test to compare means. True False
True
62
If the test statistic is greater than the critical t-alue, we reject the null hypothesis. True False
False
63
In the given example, the calculated t-value of -4.22 is less than the critical t-value of -2.821, so we reject the null hypothesis. True False
True
64
Match the following types of analytics with their corresponding questions Descriptive Analytics Diagnostic Analytics Predictive Analytics
What has happened or what is happening now? Why it happened? What will likely happen?
65
Which of the following activities are associated with Data Exploration? A) Data cleaning B) Data augmentation and transformation C) Exploratory data analysis D) Feature selection E) Identify data dependencies and correlations F) Identify trends or anomalies in the data C, E, F A, B, D B, D, F A, C, E
C, E, F
66
Which of the following activities are associated with Data Exploration? Choose 3 correct answers Identify data dependencies and correlations Identify trends or anomalies in the data Exploratory data analysis Data cleaning Feature selection Data augmentation and transformation
Identify data dependencies and correlations Identify trends or anomalies in the data Exploratory data analysis
67
Which of the following activities are associated with Data Modification? A) Data cleaning Data augmentation and transformation Exploratory data analysis Feature selection Identify data dependencies and correlations Identify trends or anomalies in the data
A) Data cleaning Data augmentation and transformation Feature selection
68
Which process involves removing or correcting errors in the data? Data cleaning Data augmentation ata transformation Feature selection
Data cleaning
69
What is the purpose of Feature Selection? To reduce the number of variables for modeling To identify trends in the data To enhance the data with additional information To clean the data
To reduce the number of variables for modeling
70
Which activity involves adding new data points or modifying existing ones to improve the dataset? Data augmentation Data cleaning Exploratory data analysis Feature selection
Data augmentation
71
Which of the following is NOT typically a part of Data Exploration? Cleaning the data Identifying data dependencies Identifying trends in the data Exploratory data analysis
Cleaning the data
72
Which activity is crucial for understanding the relationships between different variables in a dataset? Identifying data dependencies and correlations Data cleaning Data augmentatio Feature selection
Identifying data dependencies and correlations
73
Can you use the model already for prediction purposes? No, you still need to investigate the model’s goodness-of-fit. Yes, the model is ready for predictions.
No, you still need to investigate the model’s goodness-of-fit.
74
What do you need to prove before using the model for predictions? If your predictors are significant The model's accuracy
If your predictors are significant
75
Simple Linear Regression Match the Symbol: y β x α ε
dependent variable beta coefficient independent variable alpha intercept error term
76
Which of the following methods is best for visualizing the relationship between TV ad spend and sales? Scatter plot Line graph Bar chart Pie chart
Scatter plot
77
What does ANOVA stand for? Analysis of Varianc Analysis of Variables Analysis of Values Analysis of Vectors
Analysis of Varianc
78
In ANOVA, what does the explained variability represent? The amount of variation in the response variable that may be attributed to the predictors explicitly stated in the model The total variation in the response variable The amount of variation that cannot be explained by the model The amount of variation attributed to random error
The amount of variation in the response variable that may be attributed to the predictors explicitly stated in the model
79
Which part of the variation does ANOVA decompose? Both explained and unexplained variability Only the explained variability Only the unexplained variability Neither explained nor unexplained variability
Both explained and unexplained variability
80
Why is ANOVA used in statistical analysis? To compare the means of different groups o measure the central tendency of data To determine the correlation between variables To visualize data distributions
To compare the means of different groups
81
In multiple regression, what is the purpose of including multiple independent variables? To improve the prediction accuracy by accounting for more factors To increase the complexity of the model To ensure the residuals are normally distributed To reduce the sample size
To improve the prediction accuracy by accounting for more factors
82
Which of the following is a key assumption of linear regression? The residuals are normally distributed The relationship between the independent and dependent variables is non-linear The independent variables are highly correlated The dependent variable is categorical
The residuals are normally distributed
83
Which of the following libraries are used for mathematical and statistical operations on multi-dimensional arrays and matrices in Python? NumPy Pandas Matplotlib
NumPy
84
Which of the following libraries are used for data visualization in Python? Matplotlib SciPy NumPy
Matplotlib
85
Which of the following libraries are used for sorting, grouping, and rearranging data in Python? Pandas NumPy SciPy Matplotlib
Pandas
86
Which of the following libraries are used for processing large multidimensional arrays and matrices in Python? SciPy Pandas PyTorch
SciPy
87
Which of the following libraries are used for deep learning in Python? TensorFlow Keras Scikit-learn
TensorFlow
88
Which of the following libraries are used for natural language processing in Python? NLTK Scrapy Scikit-learn
NLTK
89
Which of the following libraries are used for data scraping in Python? Scrapy Gensim NLTK Pandas
Scrapy
90
Which of the following libraries are used for efficient learning of word representations in Python? Gensim Scrapy NLTK
Gensim
91
Which of the following libraries are used for creating spiders bots that scan website pages and collect structured data in Python? Scrapy SciPy Pandas
Scrapy
92
Which of the following libraries are used for object identification, speech recognition, and more in Python? PyTorch Keras Dist-keras
PyTorch
93
Which of the following libraries are used for reading data, selecting and filtering in data, and data manipulations in Python? There are two correct answer in the options, just choose one. NumPy Pandas SciPy PyTorch
NumPy Pandas
94
Which of the following libraries are used for creating two-dimensional diagrams and graphs in Python? Matplotlib NumPy SciPy Seaborn
Matplotlib
95
Which of the following libraries are used for creating interactive and scalable visualizations in a browser using JavaScript widgets in Python?
Plotly Bokeh
96
Which Python libraries are built on NumPy? There are two correct ansers from the choices, just select one. Pandas Scikit-Learn Seaborn Matplotlib
Pandas Scikit-Learn
97
Which Python library provides machine learning algorithms? Scikit-Learn NumPy Matplotlib Pandas
Scikit-Learn
98
Which data type in Pandas corresponds to a column with mixed data types? object int64 float64 timedelta[ns]
object