Sir kyle Flashcards by Jep Serrato

What is data analytics?
a) The process of collecting and organizing data
b) The process of analyzing data to make decisions
c) The process of creating data
d) The process of deleting outdated data

How well did you know this?

Not at all

Perfectly

Why is data analytics important for businesses?
1/1
a) It helps in predicting market trends
b) It provides insights for better decision-making
c) It identifies business performance issues
d) All of the above

How well did you know this?

Not at all

Perfectly

Which type of analytics predicts future outcomes based on data?
1/1
a) Descriptive analytics
b) Diagnostic analytics
c) Predictive analytics
d) Prescriptive analytics

How well did you know this?

Not at all

Perfectly

What is descriptive analytics?
1/1
a) Analytics that explains what has happened
b) Analytics that predicts what will happen
c) Analytics that determines why something happened
d) Analytics that recommends actions to take

How well did you know this?

Not at all

Perfectly

Which of these is an example of structured data?
1/1
a) Social media posts
b) Email contents
c) Customer database with names and phone numbers
d) Images stored in a file

How well did you know this?

Not at all

Perfectly

What type of data visualization is best suited for showing parts of a whole?
1/1
a) Line chart
b) Pie chart
c) Scatter plot
d) Histogram

How well did you know this?

Not at all

Perfectly

Big Data refers to datasets that are…
1/1
a) Easy to store and manage
b) Too large and complex for traditional data-processing methods
c) Small but require a lot of computation
d) Structured and easy to analyze

How well did you know this?

Not at all

Perfectly

What is the purpose of A/B testing in data analytics?
1/1
a) To compare two versions of a product or feature to determine which performs better
b) To clean data
c) To automate the analysis process
d) To visualize complex data

How well did you know this?

Not at all

Perfectly

Which of the following describes prescriptive analytics?
1/1
a) Provides insights into why things happened
b) Describes what is happening in real-time
c) Recommends actions based on data analysis
d) Predicts future trends

How well did you know this?

Not at all

Perfectly

What is the significance of data governance in analytics?
1/1
a) To regulate the storage of data
b) To ensure data privacy, security, and compliance
c) To visualize large datasets
d) To improve the speed of data processing

How well did you know this?

Not at all

Perfectly

Which of the following is a type of data analytics?
1/1
A) Predictive Analytics
B) Descriptive Analytics
C) Prescriptive Analytics
D) All of the above

How well did you know this?

Not at all

Perfectly

What type of data is “Gender” in a dataset?
1/1
A) Quantitative
B) Qualitative
C) Continuous
D) Interval

How well did you know this?

Not at all

Perfectly

Which chart is most commonly used to show trends over time?
1/1
A) Pie Chart
B) Bar Chart
C) Line Chart
D) Scatter Plot

How well did you know this?

Not at all

Perfectly

In data cleaning, which process removes duplicate values in a dataset?
A) Normalization
B) Deduplication
C) Data Merging
D) Standardization

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

Which of the following is NOT a form of data visualization?
1/1
a) Bar Chart
b) Line Graph
c) Base Graph
d) Scatter Plot

How well did you know this?

Not at all

Perfectly

Which chart is best suited for showing the distribution of data across different categories?
1/1
a) Line chart
b) Pie chart
c) Bar chart
d) Scatter plot

How well did you know this?

Not at all

Perfectly

When creating a histogram, the X-axis represents:
1/1
a) Data frequency
b) Data values or ranges
c) Percentages
d) None of the above

How well did you know this?

Not at all

Perfectly

In a histogram, what does the height of each bar represent?
1/1
a) The sum of data values in that range
b) The frequency or count of data in a specific range
c) The total data collected
d) The average of the data points in that bin

How well did you know this?

Not at all

Perfectly

If the bars in a histogram are skewed to the right, what does this indicate about the distribution of the data?
1/1
a) Symmetric distribution
b) Positively skewed distribution
c) Negatively skewed distribution
d) Uniform distribution

How well did you know this?

Not at all

Perfectly

Which measure of central tendency is most affected by outliers?
1/1
a) Mean
b) Median
c) Mode
d) All are equally affected

How well did you know this?

Not at all

Perfectly

The median is defined as:
1/1
a) The average of all values
b) The most frequently occurring value
c) The middle value when data is ordered
d) The range of the dataset

How well did you know this?

Not at all

Perfectly

When would the median be a better measure of central tendency than the mean?
1/1
a) When data is symmetrically distributed
b) When data has outliers or is skewed
c) When data is categorical
d) When data contains repeated values

Study These Flashcards

What does the mean of a dataset represent?
1/1
a) The most frequently occurring value
b) The value that divides the data into two equal parts
c) The average of all data points
d) The value with the highest frequency

Study These Flashcards

If the mean and median of a dataset are equal, what type of distribution does the data likely have? 1/1 a) Skewed to the left b) Skewed to the right c) Relatively Symmetric d) Uniform distribution

Which measure of central tendency divides the dataset into two equal parts? 1/1 a) Mean b) Median c) Mode d) Interquartile range

In a dataset where the mean is greater than the median, what can you infer about the shape of the distribution? 1/1 a) It is symmetric b) It is positively skewed (right-skewed) c) It is negatively skewed (left-skewed) d) It is normally distributed

When analyzing income data, which measure of central tendency is typically preferred and why? 1/1 a) Mean, because it includes all data values b) Median, because it is less influenced by extreme outliers c) Mode, because it represents the most common income level d) Mean, because it minimizes the impact of variance

In a dataset with outliers, why might the median be a better measure of central tendency than the mean? 1/1 a) The median reflects all values in the dataset b) The mean is distorted by extreme values, while the median is not c) The mode is more reliable than the mean d) The mean and median are always equal

1. Questions 2. Data Collection 3. Data Cleaning 4. Data Analysis 5. Data Interpretation

Data Analytics Workflow

Why are measures of central tendency important for summarizing large datasets? 1/1 a) They reduce the complexity of data by providing a single representative value b) They eliminate the need to analyze individual data points c) They measure the spread of the data d) They provide insight into data variability

1965, Intel co-founder ____ predicted that the number of transistors on a chip would double roughly every two years, with a minimal rise in cost1

Gordon Moore

“I would expect that next year, people will share twice as much information as they share this year, and next year, they will be sharing twice as much as they did the year before”

Mark Zuckerberg

characteristic of members of a population e.g., market share, revenue, season, Bike_Rentals, temperature, date, weather condition

Variables

Observations can be named without particular order or ranking imposed on the data. Words, letters, and even numbers are used to classify the data

Nominal Value

observations of variable e.g., 11%, $225M, summer, 985, 23.5˚, 1/12/2011, mcdonalds

Data

contains variables and observations Array (rows and columns)

Data Set

Indicates an actual amount (numerical). The order and the difference between the variables can be known. It limitation is it has no “true zero”.

Interval Level

The degree to which all required data is known.

Completeness

Describes ranking or order. The difference or ratio between rankings may not always be the same.

Ordinal Value

It has the same properties as the interval level. The order and difference can be described. Additionally, it has a true zero and the ratio between two points has a meaning

Ratio Level

Accuracy. Ensure your data is close to the true values (real-world objects it represents). Validity. If it measures what it is supposed to measure Completeness. The degree to which all required data is known. Consistency. Ensure your data is consistent within the same dataset and/or across multiple data sets. Uniformity. The degree to which the data is specified using the same unit of measure.

DATA QUALITY DIMENTIONS

Ensure your data is close to the true values (real-world objects it represents).

Accuracy

If it measures what it is supposed to measure

Validity

Right positively skewed: The right tail is longer Values of data extend to the right

Skewed to the RighT

Ensure your data is consistent within the same dataset and/or across multiple data sets.

Consistency

Gather data from various sources, such as databases, files, APIs, or surveys. Ensure that the data collected is relevant to your research question or analysis objectives.

Data Collection and Acquisition

The degree to which the data is specified using the same unit of measure.

Uniformity

Examine the raw data to get a sense of its structure and contents. Check for missing values, outliers, and anomalies that may require attention..

Data Inspection

Address missing data by deciding whether to fill in missing values or remove records with missing values. Correct any data entry errors, inconsistencies, or outliers, duplicated records. Standardize data formats (e.g., date formats, data types) to ensure consistency.

Data Cleansing

Encode categorical variables into numerical format using techniques like one-hot encoding or label encoding. Normalize or scale numerical features if necessary to bring them to a common scale.

Data Transformation

Combine data from multiple sources if needed, ensuring that there are common identifiers to merge the data correctly.

Data Integration

Create visualizations to explore the data further and identify patterns, relationships, or outliers. Visualization helps in understanding the data's characteristics and guiding further analysis.

Data Visualization

Data Creation/Collection Data Ingestion (ETL) Data Storage Data Presentation and Visualization Data Sharing and Distribution Data Archiving and Retention Data Backup and Disaster Recovery Data Deletion and Disposal

Data Life Cycle

Left negatively skewed: The left tail is longer Values of data extend to the left

Skewed to the left

Sir kyle Flashcards

(55 cards)