Section 2 Flashcards

1
Q

—- focus on the benefits and implications of findings, while — focus on the business impact, risks, and return on investment

A

Business users, project sponsors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A situation in which the inputs to the model are outside the range it was trained on, potentially causing inaccurate or invalid outputs

A

Out-of-bounds operation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The system where the model is deployed and integrated with existing business processes as opposed to a sandbox or testing environment

A

Production environment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A small-scale deployment of the model in a live setting, allowing the data science team to manage risk, evaluate performance, and adjustments before a full-scale deployment

A

Pilot project

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is data? What is information?

A

Data is the raw material used by analysts, while information refers to processed or organized data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What order does the data analytics lifecycle follow?

A

Discovery phase, Data preparation phase, Model planning phase, Model execution phase, Communicate results phase, Operationalize phase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The data analytics team familiarizes themselves with the business domain, examines relevant historical data, and assesses available resources.It also involves framing the business problem as an analytics challenge and formulating initial hypotheses to test and explore the data

A

Discovery phase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Requires the establishment of an analytic sandbox where the team can work with data and perform analytics throughout the project

A

Data preparation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The team determines the methods, techniques, and workflow to be used during the subsequent model building phase

A

Model planning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

The team develops datasets for testing, training, and production purposes, builds and executes models based on the planning phase and evaluates the need for more robust tools or environments for executing models and workflows

A

Model execution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Involves determining the project’s success or failure based on the criteria developed in the discovery phase. The team identifies key findings, quantifies the business value, and develops a narrative to summarize and communicate the results to stakeholders

A

Communicate results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The team delivers, reports, briefings, code, and technical documents. A pilot project may be implemented to test the models in a production environment, ensuring that the results are framed effectively and demonstrate clear value to stakeholders

A

Operationalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Refers to the vast amount of information collected, stored and analyzed by businesses and organizations; its unique aspects can differ between organizations and include up to 7 characteristics; however, for this course, we will focus on the main 4 variety, velocity, veracity, and volume

A

Big data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The diverse types of data,including structured, semi-structured, and unstructured formats; big data comes from numerous sources

A

Variety

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The speed at. which data is produced, collected and processed; in the context of big data, velocity refers to the need for quick analysis and decision-making based on the data gathered

A

Velocity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The accuracy, reliability and quality of the data collected and analyzed; ensuring data — is essential for gaining valuable insights and making informed decisions

A

Veracity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The sheer amount of data generaetd and handled by businesses; big data involves dealing with enormous quantities of data ranging from terabytes to petabytes and beyond, which can be challenging in terms of storage and processing

A

Volume

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

By the end of this phase, the project team should have a clear understanding of the business problem nd the data available and should be ready to move forward to the analysis phase

A

Discovery phase

19
Q

Items necessary for a successful project; can include items such as technology, tools, systems, data, and people

A

Resources

20
Q

The process of stating the data analytics problem to be solved

A

Framing

21
Q

Involves data mining, which refers to the process of discovering hidden patterns, trends and insights in large datasets, that can then be used by an organization to make informed decisions

A

Data preparation phase

22
Q

The extract, load, transfomr process is a key aspect of —-, which combines data transformation flexibility with data preservation

A

Data preparation

23
Q

Programming language and software framework for statistical analysis and graphics available under the GNU General public license

A

R

24
Q

Emphasizes identifying appropriate models for clustering, classification or uncovering relationships that correspond with the hypotheses establsihed in the discovery phase

A

Model planning phase

25
Q

Is a technique used in data analytics to group similar objects or data points together based on their characteristics or attributes

A

Clustering

26
Q

This phase includes evauluating the structure of datasets, ensuring analytical techniques align with business objectives, deciding on a single model or a series of techniques, and examining existing approaches to similar problems

A

Model planning phase

27
Q

What can contribute to efficient model planning>

A

R, SQL Analysis services, Python, Apache Spark, RapidMiner, and KNIME

28
Q

Data organized in a specific format or schema, making it easier to analyze

A

Structured data

29
Q

Data that lacks a specific format or structure often requiring additional processing before analysis

A

Unstructured data

30
Q

Is dedicated to developing datasets for various purposes, such as training, testing, and production.Initially, training data is created for model development, while hold-out data is set aside for model evaluation

A

Model execution phase

31
Q

A separate dataset, also called hold-out data, used to evaluate the models performance and accuracy on unseen data

A

Test data

32
Q

Includes developing an fitting an analytical model on the training data

A

Model building phase

33
Q

The process of evaluating the technical merits of a model, such as accuracy, comprehensibility, and confidence in predictions

A

Model assessment

34
Q

The percentage of records classified correctly or incorrectly, used to measure the accuracy of a model

A

Error rate

35
Q

Measure that indicates the change in concentration of a particular class when the model is used to select a group from the general population

A

Lift

36
Q

A performance measurement for binary response models, comparing the true positive rate with the false positive rate

A

ROC Charts

37
Q

Analysts compare outcomes to success and failure criteria, articulate findings for stakeholders and assess the significance of their results

A

Communicate results phase

38
Q

Individuals or groups interested in the project and its outcomes

A

Stakeholders

39
Q

Measures whether the observed results are likely to have occurred by chance or if they indicate a genuine relationship between variables

A

Statistical significance

40
Q

This phase marks the first time most analytics teams deploy new analytical methods or models to a production environment

A

Operationalize

41
Q

This approach allows the team to evaluate the model’s performance and make necessary adjustments in a live setting before implementing it across the enterprise

A

Operationalize phase

42
Q

A small-scale deployment of the model in a live setting, allowing the data science team to manage risk, evaluate performance, and make adjustments before a full-scale deployment

A

Pilot project

43
Q

The system where the model is deployed and integrated with existing business processes as opposed to a sandbox or testing environment

A

Production environement

44
Q

A situation in which the inputs to the model are outside the range it was trained on, potentially causing inaccurate or invalid outputs

A

Out-of-bounds-operations