SECTION 2: The Data Analytics Lifecycle Flashcards
(59 cards)
What are the six phases in a data analytics project?
Discovery, Data preparation, Model planning, Model execution, Communicate results, Operationalize
The phases may vary in terminology across organizations.
What is the purpose of the discovery phase?
Identify the project’s purpose, define questions of interest, assess resources and constraints, and establish desired outcomes.
What activities are involved in the discovery phase?
- Assessing available resources
- Framing the problem
- Identifying key stakeholders
- Interviewing the analytics sponsor
- Developing the initial hypothesis
- Identifying potential data sources
What is the data preparation phase focused on?
Gathering and preparing the necessary data for analysis using various sources and tools.
What does the model planning phase involve?
Choosing appropriate analytical models based on project objectives and available data.
What occurs during the model execution phase?
Applying chosen models to prepared data, interpreting results, and refining models.
What is the focus of the communicate results phase?
Presenting findings in a meaningful format for various stakeholders.
What is the goal of the operationalize results phase?
Implementing insights from the project into real-world applications.
Identify the key roles involved in executing analytic projects.
- End users
- Project sponsors
- Project managers
- Data analysts
- Business intelligence analysts
- Database administrators
- Data engineers
- Data scientists
What is the purpose of the data analytics lifecycle?
Provide a systematic and iterative framework for managing big data challenges and data science projects.
What are the four main characteristics of big data?
- Variety
- Velocity
- Veracity
- Volume
Define ‘variety’ in the context of big data.
Diverse types of data, including structured, semi-structured, and unstructured formats.
What does ‘velocity’ refer to in big data?
The speed at which data is produced, collected, and processed.
What is ‘veracity’ in data analytics?
The accuracy, reliability, and quality of the data collected and analyzed.
What does ‘volume’ mean in big data?
The sheer amount of data generated and handled by businesses, ranging from terabytes to petabytes.
What is the extract, load, transform (ELT) process?
A key aspect of data preparation that combines data transformation flexibility with data preservation.
What is data cleaning?
Processes for handling errors, missing data, and other problems in dirty data.
What is a corporate data warehouse?
A centralized storage system for a company’s data that is often the ideal location for data mining tasks.
What are the types of qualitative data?
- Nominal
- Ordinal
What are the types of quantitative data?
- Interval
- Ratio
What is a type I error in hypothesis testing?
Rejection of the null hypothesis when the null hypothesis is true.
What is a type II error in hypothesis testing?
Acceptance of a null hypothesis when the null hypothesis is false.
What is clustering in data analytics?
A technique used to group similar objects or data points together based on their characteristics.
What are common tools used in the model planning phase?
- R
- SQL Analysis Services
- Python
- Apache Spark
- RapidMiner
- KNIME