Chapter 2 Flashcards

(28 cards)

1
Q

What is Data Management?

A

The process that a firm uses to acquire, organise, store, manipulate and distribute data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Data Wrangling?

A

Process of retrieving, cleansing, integrating, transforming and enriching data to support subsequent data analysis.

Transforms raw data into a format that is more appropriate and easier to analyse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why do we need Data Wrangling?

A

The increasing volume and variety of data compel firms to spend great amounts of time and resources on gathering, cleaning and organising data before performing any analysis.

As the amount of data grows the need and difficulties of involving data wrangling increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some objectives of Data Wrangling?

A
  • improves data quality
  • reduce time and effort to perform analytics
  • reveals true intelligence of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Database?

A

A collection of data logically organised to enable easy retrieval, management and distribution of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a Data Management System?

A

Software application for defining, manipulating and managing data in databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a Relational Database?

A

Most common type of database that is modelled to offer flexibility and ease of data retrieval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Data Modeling?

A

Process of defining the structure of a database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an ERD?

A

Entity relationship diagram is a graphical representation used to illustrate the structure of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 6 key elements of an ERD?

A
  • entity
  • instance
  • relationships
  1. Primary Key
  2. Foreign Key
  3. Composite Primary Key
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the 3 different relationships an ERD can have?

A

1:1
1:M
M:N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can we retrieve data that is stored in a relational database?

A

By using database queries like SQL: a language for manipulating data in a relational database using relative simple and intuitive commands

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a Data Warehouse?

A

Central repository of data from multiple departments within a firm. Primary purpose is to support managerial decision making and therefore data in a data warehouse is organised around subjects such as sales, customers or products that are relevant to business decision making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why should data be integrated from different databases in different departments?

A
  • ETL process is used
  • retrieve, reconcile and transform data into consistent formats
  • to load the final data into the datawarehouse
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a Data Mart?

A

Small scale data warehouse or subset of a warehouse that focuses on a specific subject or decision area and conforms to a multidimensional data model AKA star scheme

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What type of database is used for Big Data?

A

NoSQL= not only SQL

Non relational database that supports the storage of a wide range of data types (structured, semi structured or unstructured)

17
Q

What is Data Inspection?

A

Once raw data is extracted from the database, warehouse or mart we have to review and inspect the data to assess the quality and relevance of the information for the analysis.

We also need to count and sort the data to get a better understanding of the data as well as to determine if the data set is complete or if it has any missing values.

18
Q

What is Data Preparation?

A

Happens after we inspected the data and we examine 2 different techniques: handling missing values and sub-setting data

19
Q

Why are some values or data missing?

A
  • respondents decline to provide information due to sensitive nature
  • some items do not apply to every respondent
  • caused by human errors, sloppy data collection or equipment failures
20
Q

What are the 2 Strategies for dealing with missing values?

A

Omission and imputation

21
Q

What is Subsetting?

A

Process of extracting portions of a data set that are relevant to the analysis

22
Q

What is Data Transformation?

A

Data conversion process from one format to another. Performed to meet the requirements of statistical and data
Mining techniques used for the analysis

23
Q

What are the 2 ways to transform numerical data?

A

Binning and Mathematical transformation

24
Q

Sometimes, nominal and ordinal variables come with too many categories. Which potential problems could this cause?

A
  • pull down model performance
  • several parameters
  • if the variable has categories that rarely occur it can be difficult to capture its impact
  • relatively small samples may not contain any observations in some categories which can cause errors when the analytical model is later applied to a larger data set with observations in all categories
  • if one category dominates in terms of occurrence, the categorical variable will fail to make a positive impact since modelling success is dependent on being able to differentiate among the observations
25
What is Category reduction?
Effective tool for collapsing some of the categories to create fewer non overlapping categories
26
What is a Dummy Variable?
Indicator or binary variable used to describe 2 categories of a variable
27
What is Category Scores?
Transformation that allows categorical variables to be treated as numerical variables in certain analytical models
28
Where can ERDs be used?
- database design - database troubleshooting - business information systems - business process re-engineering - education - research