Unit 3 Flashcards
(42 cards)
Primary Key
field that uniquely identifies a record. ex: student id or employee id
Foreign Key
Field that is in a relational table, that relates to a primary key of another table
Schema
Blueprint on how the database is constructed.
Big Data
Data come from everywhere—including smartphone metadata, internet usage records, social media activity, computer usage records, and countless other data sources—to be sifted for patterns and trends.
Four Vs of Big Data
Volume (amount), Variety (Various forms), Veracity (quality and trustworthiness), Velocity (speed).
Data mining/Data discovery
examination of huge sets of data to find patterns, connections, outliers and hidden relationships. It is a BI tool used for decision making.
Structed data
resides in fixed formats, well labeled, easily queried and searched.
unstructured data
is unorganized data(social media/big data) cannot be easily read or or processed by a computer. not stored in rows and columns
semi structured data
in between structured and unstructured data, can be read but takes work. example is email.
Big data tools
ETL and Hadoop
Extract
Once you have determined where your data is coming from and where you want it to reside, you can start extracting. usually comes from CRM or ERP
Transform
Once you have extracted data, they need to be transformed to fit into the database table. This may involve removing decimals and dollar signs from financial transactions so it will fit into the structured data table.
Load
Once data are transformed, they are ready to finally be transferred into the data warehouse and data mart. The more often this is done, the more up-to-date analytic reports can be.
Hadoop
an infrastructure for storing and processing large sets of data across multiple servers. Instead of centralized files in one place like a data warehouse or data mart, Hadoop uses a distributed file system that allows files to be stored on multiple servers
Which restriction applies to the data in the primary field of a database?
The primary key has to be unique
The data are full of missing, misplaced, or duplicate data, which the data analyst needs to remove.
Which process can this data analyst use to remove such data?
Normalization
A data analyst wants to use software to look for useful patterns and hidden relationships in this large set of social media data. Which process can be used to look for these patterns and relationships?
Data Mining
A business analyst wants to use the social media data to create and present business intelligence. She will create visualizations that will be used by the executive team. Which tool is appropriate for creating and presenting this business intelligence?
Tableau
scrubbing the data
the process of transforming data into an accurate, clean, and error-free form
Master Data Management (MDM)
is a methodology or process used to define, organize, and manage all the data of an organization that provides a reference for decision-making. Master data management tools can be used to support master data management by removing duplicates, standardizing data, and incorporating rules to prevent incorrect data from entering the system, thus creating an accurate source of master data
Data Governance
is managing the availability, integrity, and security of the data to ensure that the data remain high quality and valid for data analytics. Policies and procedures are established that define the data governance program, such as who has access, who has update capabilities, when and how backups are made and stored, and who administers the policies to ensure that they are followed
Data Management Process
can be defined as acquiring data, making sure the data is valid, and then storing and processing the data into useable information for a business
data management return on investment
(ROI)
value of having the data available to process to make key decisions. ROI is negatively impacted by improper data management
Benefits of data management
predictive analytics, BI, and data analysis