Sir kyle Flashcards

(55 cards)

1
Q

What is data analytics?
a) The process of collecting and organizing data
b) The process of analyzing data to make decisions
c) The process of creating data
d) The process of deleting outdated data

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is data analytics important for businesses?
1/1
a) It helps in predicting market trends
b) It provides insights for better decision-making
c) It identifies business performance issues
d) All of the above

A

d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which type of analytics predicts future outcomes based on data?
1/1
a) Descriptive analytics
b) Diagnostic analytics
c) Predictive analytics
d) Prescriptive analytics

A

c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is descriptive analytics?
1/1
a) Analytics that explains what has happened
b) Analytics that predicts what will happen
c) Analytics that determines why something happened
d) Analytics that recommends actions to take

A

a

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which of these is an example of structured data?
1/1
a) Social media posts
b) Email contents
c) Customer database with names and phone numbers
d) Images stored in a file

A

c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What type of data visualization is best suited for showing parts of a whole?
1/1
a) Line chart
b) Pie chart
c) Scatter plot
d) Histogram

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Big Data refers to datasets that are…
1/1
a) Easy to store and manage
b) Too large and complex for traditional data-processing methods
c) Small but require a lot of computation
d) Structured and easy to analyze

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the purpose of A/B testing in data analytics?
1/1
a) To compare two versions of a product or feature to determine which performs better
b) To clean data
c) To automate the analysis process
d) To visualize complex data

A

a

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of the following describes prescriptive analytics?
1/1
a) Provides insights into why things happened
b) Describes what is happening in real-time
c) Recommends actions based on data analysis
d) Predicts future trends

A

c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the significance of data governance in analytics?
1/1
a) To regulate the storage of data
b) To ensure data privacy, security, and compliance
c) To visualize large datasets
d) To improve the speed of data processing

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which of the following is a type of data analytics?
1/1
A) Predictive Analytics
B) Descriptive Analytics
C) Prescriptive Analytics
D) All of the above

A

d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What type of data is “Gender” in a dataset?
1/1
A) Quantitative
B) Qualitative
C) Continuous
D) Interval

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which chart is most commonly used to show trends over time?
1/1
A) Pie Chart
B) Bar Chart
C) Line Chart
D) Scatter Plot

A

c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In data cleaning, which process removes duplicate values in a dataset?
A) Normalization
B) Deduplication
C) Data Merging
D) Standardization

A

a

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the significance of data governance in analytics?
1/1
a) To regulate the storage of data
b) To ensure data privacy, security, and compliance
c) To visualize large datasets
d) To improve the speed of data processing

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which of the following is NOT a form of data visualization?
1/1
a) Bar Chart
b) Line Graph
c) Base Graph
d) Scatter Plot

A

c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which chart is best suited for showing the distribution of data across different categories?
1/1
a) Line chart
b) Pie chart
c) Bar chart
d) Scatter plot

A

c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When creating a histogram, the X-axis represents:
1/1
a) Data frequency
b) Data values or ranges
c) Percentages
d) None of the above

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In a histogram, what does the height of each bar represent?
1/1
a) The sum of data values in that range
b) The frequency or count of data in a specific range
c) The total data collected
d) The average of the data points in that bin

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

If the bars in a histogram are skewed to the right, what does this indicate about the distribution of the data?
1/1
a) Symmetric distribution
b) Positively skewed distribution
c) Negatively skewed distribution
d) Uniform distribution

A

b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Which measure of central tendency is most affected by outliers?
1/1
a) Mean
b) Median
c) Mode
d) All are equally affected

A

a

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

The median is defined as:
1/1
a) The average of all values
b) The most frequently occurring value
c) The middle value when data is ordered
d) The range of the dataset

A

c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When would the median be a better measure of central tendency than the mean?
1/1
a) When data is symmetrically distributed
b) When data has outliers or is skewed
c) When data is categorical
d) When data contains repeated values

19
Q

What does the mean of a dataset represent?
1/1
a) The most frequently occurring value
b) The value that divides the data into two equal parts
c) The average of all data points
d) The value with the highest frequency

20
If the mean and median of a dataset are equal, what type of distribution does the data likely have? 1/1 a) Skewed to the left b) Skewed to the right c) Relatively Symmetric d) Uniform distribution
c
21
Which measure of central tendency divides the dataset into two equal parts? 1/1 a) Mean b) Median c) Mode d) Interquartile range
b
22
In a dataset where the mean is greater than the median, what can you infer about the shape of the distribution? 1/1 a) It is symmetric b) It is positively skewed (right-skewed) c) It is negatively skewed (left-skewed) d) It is normally distributed
b
22
When analyzing income data, which measure of central tendency is typically preferred and why? 1/1 a) Mean, because it includes all data values b) Median, because it is less influenced by extreme outliers c) Mode, because it represents the most common income level d) Mean, because it minimizes the impact of variance
b
22
In a dataset with outliers, why might the median be a better measure of central tendency than the mean? 1/1 a) The median reflects all values in the dataset b) The mean is distorted by extreme values, while the median is not c) The mode is more reliable than the mean d) The mean and median are always equal
b
23
1. Questions 2. Data Collection 3. Data Cleaning 4. Data Analysis 5. Data Interpretation
Data Analytics Workflow
23
Why are measures of central tendency important for summarizing large datasets? 1/1 a) They reduce the complexity of data by providing a single representative value b) They eliminate the need to analyze individual data points c) They measure the spread of the data d) They provide insight into data variability
a
23
1965, Intel co-founder ____ predicted that the number of transistors on a chip would double roughly every two years, with a minimal rise in cost1
Gordon Moore
24
“I would expect that next year, people will share twice as much information as they share this year, and next year, they will be sharing twice as much as they did the year before”
Mark Zuckerberg
25
characteristic of members of a population e.g., market share, revenue, season, Bike_Rentals, temperature, date, weather condition
Variables
25
Observations can be named without particular order or ranking imposed on the data. Words, letters, and even numbers are used to classify the data
Nominal Value
25
observations of variable e.g., 11%, $225M, summer, 985, 23.5˚, 1/12/2011, mcdonalds
Data
25
contains variables and observations Array (rows and columns)
Data Set
26
Indicates an actual amount (numerical). The order and the difference between the variables can be known. It limitation is it has no “true zero”.
Interval Level
26
The degree to which all required data is known.
Completeness
26
Describes ranking or order. The difference or ratio between rankings may not always be the same.
Ordinal Value
26
It has the same properties as the interval level. The order and difference can be described. Additionally, it has a true zero and the ratio between two points has a meaning
Ratio Level
26
Accuracy. Ensure your data is close to the true values (real-world objects it represents). Validity. If it measures what it is supposed to measure Completeness. The degree to which all required data is known. Consistency. Ensure your data is consistent within the same dataset and/or across multiple data sets. Uniformity. The degree to which the data is specified using the same unit of measure.
DATA QUALITY DIMENTIONS
26
Ensure your data is close to the true values (real-world objects it represents).
Accuracy
26
If it measures what it is supposed to measure
Validity
27
Right positively skewed: The right tail is longer Values of data extend to the right
Skewed to the RighT
27
Ensure your data is consistent within the same dataset and/or across multiple data sets.
Consistency
27
Gather data from various sources, such as databases, files, APIs, or surveys. Ensure that the data collected is relevant to your research question or analysis objectives.
Data Collection and Acquisition
27
The degree to which the data is specified using the same unit of measure.
Uniformity
27
Examine the raw data to get a sense of its structure and contents. Check for missing values, outliers, and anomalies that may require attention..
Data Inspection
27
Address missing data by deciding whether to fill in missing values or remove records with missing values. Correct any data entry errors, inconsistencies, or outliers, duplicated records. Standardize data formats (e.g., date formats, data types) to ensure consistency.
Data Cleansing
27
Encode categorical variables into numerical format using techniques like one-hot encoding or label encoding. Normalize or scale numerical features if necessary to bring them to a common scale.
Data Transformation
27
Combine data from multiple sources if needed, ensuring that there are common identifiers to merge the data correctly.
Data Integration
27
Create visualizations to explore the data further and identify patterns, relationships, or outliers. Visualization helps in understanding the data's characteristics and guiding further analysis.
Data Visualization
27
Data Creation/Collection Data Ingestion (ETL) Data Storage Data Presentation and Visualization Data Sharing and Distribution Data Archiving and Retention Data Backup and Disaster Recovery Data Deletion and Disposal
Data Life Cycle
27
Left negatively skewed: The left tail is longer Values of data extend to the left
Skewed to the left