M6 U3 - Data Management - Q2 Flashcards Preview

11637 Foundations of Computational Data Science > M6 U3 - Data Management - Q2 > Flashcards

Flashcards in M6 U3 - Data Management - Q2 Deck (35)
Loading flashcards...

What phases are involved in data understanding? (2)

  • data acquisition (aka data gathering)
  • data preparation


What is data acquisition?

Also known as data gathering, it involves gathering data from different sources and transforming the data into formats that are suitable for analytic solution development.


What happens when the requirements phase is completed?

the data science team will embark on data acquisition or data gathering


What's data wrangling? (3 actions, 3 results)

The process of gathering, selecting, and transforming data to ensure that it is usable, free of noise and has as little bias as possible to meet defined analytic objectives.


What steps are involved in data wrangling? (3)

  • Checking for missing values
  • Identifying outliers
  • Formatting the data.


What is data management?

  • It's an organization’s way of ________________ (4) data.
    Makes sure that the data housed within an organization is ______________ (2)

  • It's an organization’s way of acquiring, storing, securing and processing data.
  • Makes sure that the data housed within an organization is accessible and accurate



What group(s) manage data management?

  • Managed by the IT team in an organization.
  • Business users will participate too


List the organization responsible from the 11 knowledge areas for data management. List the areas.

  • Data management body of knowledge (DAMA-DMBOK) 


Who is involved in the data management process?

  • Multiple departments


Who's responsible for designing an organization's data management framework?

data architects


Data Governance

  • Defines how data is accessed and managed within an organization. 
  • planning, oversight, and control over management of data and the use of data and data-related resources. 


Data Architecture

the overall structure of data and data-related resources as an integral part of the enterprise architecture


Data Modeling & Design

analysis, design, building, testing, and maintenance (was Data Development in the DAMA-DMBOK 1st edition)


Data Storage & Operations

structured physical data assets storage deployment and management (was Data Operations in the DAMA-DMBOK 1st edition)


Data Security

ensuring privacy, confidentiality and appropriate access


Data Integration & Interoperability

acquisition, extraction, transformation, movement, delivery, replication, federation, virtualization and operational support ( a Knowledge Area new in DMBOK2)


Documents & Content

storing, protecting, indexing, and enabling access to data found in unstructured sources (electronic files and physical records), and making this data available for integration and interoperability with structured (database) data.


Describe Reference & Master Data (idea (1), how (2), and results (2))

Idea: Managing shared data


  • Standardizing data definitions
  • Standardizing the use of data values


  • Reduce redundancy
  • Ensure better data quality


Data Warehousing & Business Intelligence

managing analytical data processing and enabling access to decision support data for reporting and analysis .



collecting, categorizing, maintaining, integrating, controlling, managing, and delivering metadata .


The Data Quality area involves what? (3 actions on 1, 1 action on another)

  • Defining, monitoring, maintaining data integrity
  • Improving data quality.


Your client has an established data management structure in place, this means that:

Your client regards their data as a resource that should be reliable, and should be kept secure.


Correct: Data management ensures that a company is mindful of the security, integrity, and overall quality of their data and data infrastructure.


When setting analytic objectives, it is good practice to define a data statement. This ensures that you have assessed a business's data science readiness. Why is data gathering not conducted during the data science readiness assessment?

Data gathering is best conducted after all analytic requirements are gathered.


Correct: Although analytic objectives have been set prior to this stage, it is important to define the analytic and business requirements to ensure that the right questions are answered and the right data is gathered.


What does data governance impact?

  • High level: Impacts the decisions that can be made from the data available to the organization.
  • Has positive implications for the: quality, security, and integrity of data


Who should have a data governance strategy?

Any organization that stores and utilizes data


What are some benefits of data governance? (4)

  • Accessibility: Provides a reliable and consistent view of enterprise wide data
  • Quality improvement: It ensures that there is a plan for improved quality of data
  • Reduces silos: Maps the location of data in the enterprise reducing the scourge of data silos
  • Improves data management overall


What group(s) administer data governace?

Data management team


In the context of data, what is a stakeholder?

An individual or group that make or is affected by data driven decisions within an organization


What are Data governance best practices for organizations? (3)

  • Data has integrity
  • Data related decisions and controls are transparent and can be audited
  • Data is unbiased


What is unbiased data?

general idea: data represents all members of a population that could be served by an organization.


This best practice will positively influence the development of ethical models and algorithms for analytic solutions.