M2 U1 - Distilling the Analytic Objective - Q1 Flashcards Preview

11637 Foundations of Computational Data Science > M2 U1 - Distilling the Analytic Objective - Q1 > Flashcards

Flashcards in M2 U1 - Distilling the Analytic Objective - Q1 Deck (22)
Loading flashcards...

At what point do you begin to define analytic objectives?

Once you Set Business Goals


Who is responsible for framing the project’s analytic objective?

Client and data science team collaborate on all aspects of framing the analytic goal, where either side supports the other in understanding its business-related and technical components.


The problem (statement?) underlying an analytic objective should satisfy the following criteria:

  • Solving the problem must be part of a possible solution vision towards the business objective

  • Data can facilitate that solution, and that data either exists or is feasible to gather.

  • The problem must be specific and realistic and add enough value if executed properly.


What should be done before the team commits to fulfilling any analytical expectations on the part of the client?

data science projects/consultations involve a preliminary data survey that informs, or even precedes, longer substantive discussion. This checks for:

1. Readiness of the organization

2. Technical objections relating to the data.


What is the main purpose of the problem statement in the analytic objective?

It focuses on specific steps necessary to achieve the business objective that can benefit from data-driven methods in the project.


Describe a common variant of classification

sequence labeling: where individual data points are not independent but form a series. Typical examples: sentiment analysis of text, labeling images as depicting certain objects (animals, cars, etc.).



What are the ways to characterize a data science task? (7)

  1. Classification: Individual instances in the dataset have a categorical (i.e. non-numerical) label associated with them, or will be labeled as such. The goal of the project is to develop a system capable of categorizing data points using these labels. A common variant of classification is sequence labeling, where individual data points are not independent but form a series. Typical examples: sentiment analysis of text, labeling images as depicting certain objects (animals, cars, etc.).
  2. Regression: Individual instances in the dataset have a numerical label associated with them whose magnitude carries a specific meaning. The goal of the project is to develop a model capable of predicting this target score for individual data instances. Like classification, regression is often applied over dependent series of data points. Typical examples: predicting measurements in medical or demographic data over time
  3. Retrieval & Ranking: The dataset can be thought of as one or more collections of data points and queries. In response to the query, one or more “ideal” data points should be retrieved and presented in a “correct” order. Typical example: search engines
  4. Recommendation: The dataset consists of users and items, as well as information about preferences of users for specific items. The task is to find items to recommend to users towards the maximization of some utility. Typical examples: movie and product recommendations
  5. Clustering: The dataset is assumed to have some latent structure which should be discovered by dividing data points into groups that are close to each other in variants of the feature space. Typical examples: exploratory analyses of unlabeled data
  6. Anomaly Detection: The dataset is assumed to have some latent structure which should be discovered in order to identify instances that do not adhere to the pattern. Typical example: fake customer review detection in online retail data
  7. Domain-specific tasks: Aside from the generic task types explained above, some domains have developed specific task patterns and associated evaluation metrics that should be used if the analytic goal is sufficiently specific. This is particularly true of natural language processing and image analysis. For example, machine translation will commonly be characterized as a text generation task and models will be evaluated using a specialized BLEU score. Similarly, models that segment a part of an image containing a specific object may be evaluated using average precision in conjunction with an “intersection over union” threshold.


Which primary role does the task statement play in the analytic objective?

It characterizes the task for purposes of planning and the nature of the eventual evaluation.


The statement of methods fits what criteria? (3)

  1. It must be precise enough so that the data science team understands it as a concise summary of the technical approach .
  2. At the same time, it should allow for testing different techniques around the main conceptual idea.
  3. The methods should be suitable to produce the target functionality/insight for the task given the available data.


The most general categories of methods one can identify in the methods statement typically include:

  1. Supervised learning methods, which involves learning to predict a target variable (typically through regression or classification) by training on “true” example data points whose target variable has manually been labeled or is available by other means.
  2. Unsupervised learning methods deal with finding patterns in unlabeled data without an explicit prediction target.
  3. Semi-Supervised learning methods encompass hybrid methods that combine supervised and unsupervised learning in different ways.


What's a good beginning strategy when formulating (evaluating?) proposed methods?

Checking your proposed methods against the target functionality or insight by conceptually thinking through its application and explicitly formulating expected results.


The statement of methods is important because:

Similar to the task and problem statement, you are gaining an understanding of the business needs.


Unsupervised learning methods are best applied to

tasks were an outcome is not known.


What are the most important criteria for evaluating data?

  1. the data is presumed to contain patterns that are informative for the analytic objective
  2. allows one to successfully tackle the proposed task and make progress towards solving the problem in a data-driven way.


How does data collection and curation relate to the field of data science?

Data collection and curation is a complex sub-discipline of data science and is equally important as data analysis .


What is a constructive analytical objective?

 states that it is in principle possible to develop a desired functionality from the available methods and data without the need to fully optimize its performance yet. One can think of it as a proof-of-concept or prototyping endeavor.


What's different about analytic objectives in academic settings (as opposed to industry settings)?

In academic settings, the overarching interest may be that of advancing the state of the art in research, and hence the statement may either not include an explicit business objective or state it as a problem solution vision.


What are Benchmarking objectives?

In scenarios where the feasibility of an analytical task has been established, projects may be targeted towards improving over the state of the art in some performance metric by using innovative methods/features/data. This is typically the case if one works on leaderboard-type datasets where there are models.


What are Exploratory objectives?

Exploratory objectives are typically formed when data is available that is related to a problem of interest, but needs to be surveyed before it can be used in projects pursuing constructive or benchmarking objectives.

Seems like this is used to create something toward enabling something else in the future?


What are the elements of a well-framed analytic objective? (6)

  1. Understanding the business objective
  2. Identifying the problem
  3. Focusing on a well defined task
  4. Checking proposed method against the target insight
  5. Proposing data collection and curation methods
  6. Framing the analytic objective


What should an analytic objective state? (3)

An analytic objective should state (1) what specific functionality, insight, or resource is gained from leveraging that data and methods you propose (2) relative to the current situation, and (3) assuming the project is successful as proposed.


What's the template for an analytic objective?

As an incremental step towards business objective O

We work towards solving problem P

by focusing on specific task T

and applying analytic methods in conjunction with data D

to create valuable functionality and/or produce insight I