Describe the major parts of the data science lifecycle
Define Business understanding. Who's involved?
This must include:
- defining business and analytical objectives
- identifying data sources.
Members involved: The client and data science team are involved in this step to ensure that the analytic solutions meets the business objectives.
Define Data Acquisition
This process involves obtaining data from various sources and may also require setting up a data collection task and infrastructure. Data preparation techniques are employed to ensure the data is useful for analysis.
Define Data Preparation.
This is the process of cleaning and transforming raw data prior to processing and analysis. This needs to be done carefully as assumptions made here may influence, or even limit, the use of the data during analysis.
Define Data Exploration and Cleaning. (4)
- Identifying variables
- Conducting uni-variate and multi-variate analysis
- Identifying outliers, anomalies and missing values
- Feature creation and selection
What's the purpose of Feature Engineering
It's needed to prepare proper datasets that are compatible with the suitable algorithms, and to improve the performance of models by leveraging domain knowledge to capture the signal of interest in the features.
The ability to match the training performance on unseen test data is referred to as the models ability to generalize
At what stage in the DS Lifecycle do you identify the business objectives of a data science project?
The process of using transforming raw data into informative properties that represent the business problem you are trying to solve is called:
What are the roles on a typical data science team? (7)
- Data Scientist.
- Data Engineer.
- Solutions Architect.
- Machine Learning (ML) Engineer.
- Data/Business Analyst.
- Software Engineer.
- Domain Experts.
This role involves solving business tasks using machine learning model development and statistical techniques. This individual identifies trends and patterns within the data and makes predictions based on trends. The data scientist will write code to support the data analysis and model building process.
The Data Engineer specializes in data structures and algorithms, as well as in working with data through the operation of databases and other large repositories.
This is a customer facing role that ensures end-to-end customer deployment for company-related data services. The Solutions Architect interacts with clients to design, coordinate, and execute solution prototypes.
- performs modeling and software engineering tasks
- This individual spends a considerable amount of time programming and creating ML solutions but must also have strong statistical skills.
- different from the data scientist in that she is further away from the domain-side of the project.
- Has data gathering, analysis, and visualization skills.
- Compared to data scientists, they are typically firmly rooted in the business domain and less technically proficient in systems programming and advanced machine learning.
- Like the data scientist, she provides insights from data to inform decision making.
- Develops key performance indicators and utilizes business intelligence and analytics tools.
The Software Engineer handles the alignment between the business objectives and solution and is responsible for integrating the implemented data-driven system into the appropriate applications within the enterprise.
Also known as subject matter experts, they are the actors who know the most about the problem on the business side. Their role is to define the framework for the data science project, and hence they are a key participant in the process. The domain expert will translate business needs and characteristics to the data scientists, and eventually assess the solution as successful or not from the perspective of whether it achieved the business objective.
What's involved in modeling?
This multi-step process involves feature engineering, algorithm selection, model training and evaluation.