What phases are involved in data understanding? (2)
- data acquisition (aka data gathering)
- data preparation
What is data acquisition?
Also known as data gathering, it involves gathering data from different sources and transforming the data into formats that are suitable for analytic solution development.
What happens when the requirements phase is completed?
the data science team will embark on data acquisition or data gathering
What's data wrangling? (3 actions, 3 results)
The process of gathering, selecting, and transforming data to ensure that it is usable, free of noise and has as little bias as possible to meet defined analytic objectives.
What steps are involved in data wrangling? (3)
- Checking for missing values
- Identifying outliers
- Formatting the data.
What is data management?
- It's an organization’s way of ________________ (4) data.
Makes sure that the data housed within an organization is ______________ (2)
- It's an organization’s way of acquiring, storing, securing and processing data.
- Makes sure that the data housed within an organization is accessible and accurate
What group(s) manage data management?
- Managed by the IT team in an organization.
- Business users will participate too
List the organization responsible from the 11 knowledge areas for data management. List the areas.
- Data management body of knowledge (DAMA-DMBOK)
Who is involved in the data management process?
- Multiple departments
Who's responsible for designing an organization's data management framework?
- Defines how data is accessed and managed within an organization.
- planning, oversight, and control over management of data and the use of data and data-related resources.
the overall structure of data and data-related resources as an integral part of the enterprise architecture
Data Modeling & Design
analysis, design, building, testing, and maintenance (was Data Development in the DAMA-DMBOK 1st edition)
Data Storage & Operations
structured physical data assets storage deployment and management (was Data Operations in the DAMA-DMBOK 1st edition)
ensuring privacy, confidentiality and appropriate access
Data Integration & Interoperability
acquisition, extraction, transformation, movement, delivery, replication, federation, virtualization and operational support ( a Knowledge Area new in DMBOK2)
Documents & Content
storing, protecting, indexing, and enabling access to data found in unstructured sources (electronic files and physical records), and making this data available for integration and interoperability with structured (database) data.
Describe Reference & Master Data (idea (1), how (2), and results (2))
Idea: Managing shared data
- Standardizing data definitions
- Standardizing the use of data values
- Reduce redundancy
- Ensure better data quality
Data Warehousing & Business Intelligence
managing analytical data processing and enabling access to decision support data for reporting and analysis .
collecting, categorizing, maintaining, integrating, controlling, managing, and delivering metadata .
The Data Quality area involves what? (3 actions on 1, 1 action on another)
- Defining, monitoring, maintaining data integrity
- Improving data quality.
Your client has an established data management structure in place, this means that:
Your client regards their data as a resource that should be reliable, and should be kept secure.
Correct: Data management ensures that a company is mindful of the security, integrity, and overall quality of their data and data infrastructure.
When setting analytic objectives, it is good practice to define a data statement. This ensures that you have assessed a business's data science readiness. Why is data gathering not conducted during the data science readiness assessment?
Data gathering is best conducted after all analytic requirements are gathered.
Correct: Although analytic objectives have been set prior to this stage, it is important to define the analytic and business requirements to ensure that the right questions are answered and the right data is gathered.
What does data governance impact?
- High level: Impacts the decisions that can be made from the data available to the organization.
- Has positive implications for the: quality, security, and integrity of data
Who should have a data governance strategy?
Any organization that stores and utilizes data
What are some benefits of data governance? (4)
- Accessibility: Provides a reliable and consistent view of enterprise wide data
- Quality improvement: It ensures that there is a plan for improved quality of data
- Reduces silos: Maps the location of data in the enterprise reducing the scourge of data silos
- Improves data management overall
What group(s) administer data governace?
Data management team
In the context of data, what is a stakeholder?
An individual or group that make or is affected by data driven decisions within an organization
What are Data governance best practices for organizations? (3)
- Data has integrity
- Data related decisions and controls are transparent and can be audited
- Data is unbiased
What is unbiased data?
general idea: data represents all members of a population that could be served by an organization.
This best practice will positively influence the development of ethical models and algorithms for analytic solutions.