Chapter 2: Mastering the Data Flashcards
(18 cards)
What is the goal of mastering the data in accounting?
To make sure data is complete, clean, and organized so it can be analyzed effectively.
where can data be found
internal: company systems (ERP)
external: websites, government sources
how can data be stored
flat file: all data in one place (excel sheet)
Relational DB: data stored in linked tables
why RDMS
–> reduce redundancy and maintain a single version of the truth
ensures:
- completeness
**- no redundancy **
- business rules enforcement
- communication and integration of business
give the attributes of Rational databases
(backbone how we structure data in rational database)
primary key
foreign key
composite key
descriptive attributes
what are PK en FK?
PK: UNIQUE IDENTIFIER, ensures each row in table is unique
FK: creation of relationships between 2 tables –> attributes that point to the primary key
what are CK and descriptive attributes
CK: combination of 2 FK, used for line items with much detail
Descriptive attributes: actual business information(everything else)
what and why: data dictionary
document describes every column in dataset: what it means, if it is required and what data it accepts
–> why? to prevent confusion, help understand the data –> is dus eig nodig aangezien verschillende mensen die data gaan bekijken en schept duidelijkheid
ETL PROCESS: what and give the different steps
what? Extraction, Transform, Load: the 3 main steps to prepare data for analysis
the 5 steps: DOVCL
**Determine purpose and scope
obtain data
validate data
clean data
load data
WHAT AND WHY
ETL process step 1: determine purpose and scope of data request
identify objective, required data, potential risks & how results will be used
purpose: help specify which data we need, what format and by which date
questions before beginning the process:
purpose of data request?
what business problem will it address?
what is MITIGATION PLAN? = plan om risico’s of problemen te verminderen of voorkomen
WHAT AND WHY
ETL process step 2
obtain the data
collect data from right sources, either by REQUESTING or EXTRACTING yourself
EXTRACTING YOURSELF:
1) identify tables that contain info you need
2) which attributes contain info u need in each table?
3) identify how tables related
4) only extract relevant information
WHAT AND WHY
+ HOW DO YOU MAKE SURE DATA IS VALID?
ETL process step 3
validate the data for completeness and integrity
–> check if data is complete, accurate and makes sense
making sure data is valid by:
-
comparing the number of records
=how many do we have vs how many should we have -
comparing descriptive statistics for numerical field
= look at for ex min,max to have quick look for strange things -
validating data/time field
=looking if correctly filled in -
comparing string limits for text field
= text not too long?
WHAT AND WHY
+ HOW DO YOU MAKE SURE DATA IS CONSISTENT AND READY FOR ANALYSIS
ETL process step 4: clean the data
–> fix error, remove duplicates, make data consistent
data consistent and ready for analysis by doing:
- removing headlines and subtotals
- cleaning leading zeros and nonpintable characters
- formatting negative numbers
- correcting inconsistencies
WHAT AND WHY
ETL process step 5: load data for data analysis
–> put cleaned data in tool so you can analyse it
What are the ethical issues encountered in data collection and use?
ensure data is used responsibly and securely
asses whether individuals have right to restrict access
questions suggested by the institute of Business ethics to allow businesses to protect privacy of stakeholders
give the questions of institute of business ethics to allow business to protect privacy of stakeholders
- how does company use data
- does company send privacy notice to individuals when personal data is collected?
- does company have safeguards in place to mitigate the risks of data misuse?
- does company have appropriate tools to manage risks of data misuse?
- does company conduct appropriate due diligence when sharing with or acquiring data from third parties?
(chatgpt)
ethics in data analytics
What ethical issues can arise in data collection and usage?
Misuse of personal data
No privacy notice
Poor risk control
Data leaks or unauthorized sharing
(chatgpt)
ethics in data analytics
what are good ethical practices in data use?
send PRIVACY NOTICES
asses data risks
use SAFEGUARDS and SECURITY TOOLS
do DUE DILIGENCE with third parties