Section 2 Flashcards
—- focus on the benefits and implications of findings, while — focus on the business impact, risks, and return on investment
Business users, project sponsors
A situation in which the inputs to the model are outside the range it was trained on, potentially causing inaccurate or invalid outputs
Out-of-bounds operation
The system where the model is deployed and integrated with existing business processes as opposed to a sandbox or testing environment
Production environment
A small-scale deployment of the model in a live setting, allowing the data science team to manage risk, evaluate performance, and adjustments before a full-scale deployment
Pilot project
What is data? What is information?
Data is the raw material used by analysts, while information refers to processed or organized data
What order does the data analytics lifecycle follow?
Discovery phase, Data preparation phase, Model planning phase, Model execution phase, Communicate results phase, Operationalize phase
The data analytics team familiarizes themselves with the business domain, examines relevant historical data, and assesses available resources.It also involves framing the business problem as an analytics challenge and formulating initial hypotheses to test and explore the data
Discovery phase
Requires the establishment of an analytic sandbox where the team can work with data and perform analytics throughout the project
Data preparation
The team determines the methods, techniques, and workflow to be used during the subsequent model building phase
Model planning
The team develops datasets for testing, training, and production purposes, builds and executes models based on the planning phase and evaluates the need for more robust tools or environments for executing models and workflows
Model execution
Involves determining the project’s success or failure based on the criteria developed in the discovery phase. The team identifies key findings, quantifies the business value, and develops a narrative to summarize and communicate the results to stakeholders
Communicate results
The team delivers, reports, briefings, code, and technical documents. A pilot project may be implemented to test the models in a production environment, ensuring that the results are framed effectively and demonstrate clear value to stakeholders
Operationalization
Refers to the vast amount of information collected, stored and analyzed by businesses and organizations; its unique aspects can differ between organizations and include up to 7 characteristics; however, for this course, we will focus on the main 4 variety, velocity, veracity, and volume
Big data
The diverse types of data,including structured, semi-structured, and unstructured formats; big data comes from numerous sources
Variety
The speed at. which data is produced, collected and processed; in the context of big data, velocity refers to the need for quick analysis and decision-making based on the data gathered
Velocity
The accuracy, reliability and quality of the data collected and analyzed; ensuring data — is essential for gaining valuable insights and making informed decisions
Veracity
The sheer amount of data generaetd and handled by businesses; big data involves dealing with enormous quantities of data ranging from terabytes to petabytes and beyond, which can be challenging in terms of storage and processing
Volume
By the end of this phase, the project team should have a clear understanding of the business problem nd the data available and should be ready to move forward to the analysis phase
Discovery phase
Items necessary for a successful project; can include items such as technology, tools, systems, data, and people
Resources
The process of stating the data analytics problem to be solved
Framing
Involves data mining, which refers to the process of discovering hidden patterns, trends and insights in large datasets, that can then be used by an organization to make informed decisions
Data preparation phase
The extract, load, transfomr process is a key aspect of —-, which combines data transformation flexibility with data preservation
Data preparation
Programming language and software framework for statistical analysis and graphics available under the GNU General public license
R
Emphasizes identifying appropriate models for clustering, classification or uncovering relationships that correspond with the hypotheses establsihed in the discovery phase
Model planning phase