Module 2 Flashcards
What is the Data Life cycle ?
Create, Store, Share, Archive and Destroy
refers to the systematic process
of gathering, measuring, and analyzing
information from various sources to get a
complete and accurate picture of an area of
interest.
Data Collection
Different methods of collecting data:
• Interviews
• Questionnaires
• Observations
• Experiments
• Published Sources and Unpublished Sources
__ is the process of gathering,
combining, structuring and organizing data for
use in business intelligence, analytics and data
science applications.
Data preparation
Data preparation steps?
Gather/Data collect, Discover, Clean and validate data, enrich the data and Store the data
Collects relevant data to make decisions based on facts and evidence.
Informed decision
Identifies trends, patterns,
and anomalies to pinpoint problems and
develop solutions.
Problem Solving
Analyzes data to
gain insights into current trends and
anticipate future developments.
Understanding trends
Identifies areas for
process optimization and resource allocation.
Improving Efficiency
Inspires new ideas and drives
innovation by revealing opportunities and
challenges.
Innovation
Ensures data accuracy and
reliability through cleaning and validation.
Data Quality
Standardizes data
formats and units for consistent and
comparable data.
Data Consistency
Organizes and
structures data for easy access and analysis.
Data Accessibility
Improves
analysis efficiency by working with well-
prepared data.
Data Analysis Efficiency
Derives accurate and
meaningful insights from clean and
consistent data.
Accurate Insights
___ refers to the process of ensuring that
data is accurate, consistent, and reliable for its
intended use. It involves implementing quality
management techniques to make sure the data
meets the specific needs of an organization in a
particular context.
Data quality
___ contribute to the overall reliability and usefulness of
the data.
Data quality dimensions
The data aligns with reality and is free from errors.
Accuracy
All required data elements are present
Completeness
Data is formatted and represented consistently across
different sources.
Consistency
The data is relevant to the purpose for which it is collected.
Validity
Each data record is distinct and has a unique identifier.
Uniqueness
The data is relevant to the current time period.
Timeliness
Involves considering the rights and privacy of
individuals whose data is being collected and
ensuring transparency and fairness in data
handling processes.
Ethical Considerations in Data Collection