Part 1: The Tech Lead Flashcards

1
Q

What is meant by data sparsity?

A

There may be a lack of representation for some types of data, while the total sample size may be large.

For example, for data aggregated for autonomous driving from the front-facing camera, there may be very few samples of the yellow traffic light at intersections just because they appear less often.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are outliers?

A

Points that differ significantly from other observations can shift data distributions significantly. For example, web crawlers that explore all links on a page can shift web behavior data when accidentally included in user behavior analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the types and targets of data science projects?

A

Batch or live.
Diagnostic or predictive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In the data science project taxonomy, what does hindsight refer to?

A

Batch x Diagnostic
“What has happened?”
Historical

One standard DS practice is to run an A/B test with a hold-out set composed of customers who do not see an email marketing campaign. We can run the campaign for a few months and assess whether the lack of a marketing campaign impacts long-term engagement.

This kind of hindsight with a long experimentation cycle is not efficient in driving improvements in operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In the data science project taxonomy, what does insight refer to?

A

Diagnostic x Real Time
“What is happening?”
Near term.

Another practice is to produce a real-time dashboard illustrating trends in long-term engagement. We can follow the decays of long-term engagements across user vintages to detect early trends of success or issues. These trends allow the organization to make business decisions in real time with insights from the dashboards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In the data science project taxonomy, what does * foresight * refer to?

A

Batch x Predictive
“What may happen?”
Inferential

Foresight—Given historical data, we can also build a model predicting long-term engagement using detectable short-term engagement characteristics, such as open rate, click-through rate (CTR), unsubscribe rate, landing page session length, and session frequency. A prediction model can anticipate long-term effects with short-term observation, so we gain the foresight to adjust our email marketing strategies week-to-week.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In the data science project taxonomy, what does * intelligence * refer to?

A

Real time x Predictive
“Make it happen”
Influential

Yet more powerful approaches can include real-time analytics on channels such as email to learn the customer segments. We can then prepare sequences of touches on the next best actions (NBAs) to drive long-term engagement for specific segments of users. When we can adapt the content of the next touches based on individual responses in real time, we are beginning to see the intelligence in driving long-term engagements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are four data characteristics to consider in a data science project?

A
  • Unit of decisioning.
  • Sample size/sparsity/outliers.
  • Sample distribution/imbalance.
  • Data types.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the data characteristic * unit of decisioning *?

A

The granularity for modelling or analysis. E.g. are we interested in * per employee *, * per business function * etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the data characteristic * sample imbalance *?

A

The orders of magnitude between class labels. Can be addressed by over sampling, under sampling or synthetic sample generation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the data characteristic * outliers *?

A

Extreme values that can shift a data distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the data characteristics * data types *?

A
  • Tabular, image, text, video etc.
  • Time sequenced/series data.
  • Graph data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the benefit of feature engineering?

A

Feature engineering allows us to summarize a vast amount of data meaningfully.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is momentum based modelling strategy expected to achieve?

A

The model is expected to:
- Capture trends in the environments.
- Abstract away fundamental factors that are not expected to change in a certain time window.
- Predict what would happen if those trends continue.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a required for a Foundational modeling strategy?

A

Clear causal mechanisms drive the predictability of the outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the major stumbling blocks in project execution according to Gartner?

A

Specifying projects from vague requirements and prioritizing them.

Planning and managing a DS project for success.

Striking a balance among hard trade-offs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does RICE stand for in the priority refinement framework?

A

Reach, Impact, Confidence, Effort.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does Reach refer to in RICE?

A

Reach refers to how specific a population a data science project can reach. There are tradeoffs to consider when assessing the reach of a data science project, such as the data available on populations of interest and the size of these populations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does Impact refer to in RICE?

A

Impact is the anticipated lift to key operating metrics for the reachable population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does Confidence refer to in RICE?

A

Confidence refers to the likelihood that the project will produce business impact.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

When quantifying confidence, what operational risks should one consider?

A
  • What data is available?
  • Of what’s available, what is reliable?
  • Of what’s reliable, what is statistically significant?
  • Of what’s statistically significant, what is predictable?
  • Of what’s predictable, what is implementable?
  • Of what’s implementable, what is ROI positive?
  • Of what is ROI positive, is there a business partner ready to operationalize it to create business value?
22
Q

What does Effort refer to in RICE?

A

This is the effort to ensure a data science projects continuing success. This is often distributed heavily to time not spent actually building the model, rather the execution phases such as Proof of Concept stages, phases of iterative improvement, reviews and presentations.

23
Q

What is prioritising with alignment to data strategy?

A

This refers to prioritising projects based on the relationships between other projects that can offer learnings that informs the viability or direction of other related projects.

24
Q

In the Nine types of data science projects, what does Tracking and Specification definition refer to?

A

Producing a document with measurements to be tracked.

25
Q

In the Nine types of data science projects, what does Monitoring and Rollout refer to?

A

Accept tracking capabilities and interpret A/B test results with a recommendation for launch decision.

26
Q

In the Nine types of data science projects, what does Metrics definition and dashboarding refer to?

A

Define metrics to guide business operations and provide visibility on dashboards.

26
Q

In the Nine types of data science projects, what does Data insights and Deep Dives refer to?

A

Produce a report highlighting insights on a given area of the business/product.

27
Q

In the Nine types of data science projects, what does Modelling and API development refer to?

A

Deploy machine learning models or APIs with A/B testing to assess effectiveness.

28
Q

In the Nine types of data science projects, what does Data enrichment refer to?

A

Improving modelling capability through enriching data features with better accuracy and coverage.

29
Q

In the Nine types of data science projects, what does Data Consistency refer to?

A

Align on single source of truth across different business units for critical business operations/metrics.

30
Q

In the Nine types of data science projects, what does Infrastructure Improvement refer to?

A

Demonstrate productivity gain for partners or the data science team for improving infrastructure/moving to cloud etc.

31
Q

In the Nine types of data science projects, what does Regulatory Compliance refer to?

A

Specific compliant data infrastructure and document processes for peers to follow to get legal approval.

32
Q

What are common failure modes of data science projects?

A
  • Customer of the project is not clearly defined
  • Stakeholders are not included in the decision process
  • Project goals and impact are not clarified and aligned to company strategy
  • Affected partners are not informed
  • Value of the project is not clearly defined
  • Delivery mechanism is not defined
  • Metrics of success are not aligned
  • Company strategy changes after project definition
  • Data quality is not sufficient for the success of the project
33
Q

What is Project Motivation when planning a data science project?

A

The project must have a clear customer with a challenge to be resolved. The customer must be able to receive the solution and assess whether of not it have solved the challenge.

34
Q

What is Project Definition when planning a data science project?

A

A project must have clearly defined:
- Inputs: Data it will need.
- Outputs: What resolves a customers problem i.e. a risk score from a ML model.
- Metrics for success: The anticipated lift in a business relevant metric.

35
Q

What is Solution Architecture when planning a data science project?

A

This outlines the following:
- Technology choices
- Feature engineering and modelling strategy
- Configurations, tracking, and testing

36
Q

What is Execution Timeline when planning a data science project?

A

The milestone and phases used to execute and deliver a project. This is split in to the following components:
- Phases of execution
- Synchronization cadence

37
Q

What is Risk to Anticipate when planning a data science project?

A

New data source—Data availability, correctness, and completeness may be at risk.

Partner re-org—Disruption of prior alignment with partners; re-alignment is required.

New data-driven feature—Tracking bugs, data pipeline issues, population drifts from product feature updates, and integration issues must be discovered and resolved.

Existing product—Feature upgrades can change the meaning of metrics and signals.

Solution architecture dependencies—Technology platform updates can disrupt the operations of DS capabilities.

38
Q

What is one element of a modelling proof of concept related to data?

A

Validating data quality:
Ensuring data availability and correctness.

39
Q

What is one element of the modelling proof of concept related to the model?

A

Producing a simple model, such as a linear regression, to demonstrate model feasibility.

40
Q

What is one element of the modelling proof of concept with regards to project requirements and outcomes?

A

Defining inputs and outputs formats, metrics that will be check and a measure of success.

41
Q

What is the purpose of a modelling proof of concept?

A

The purpose is to remove data risks (availability, correctness, and completeness) before aligning with product and engineering on integration specification.

42
Q

What comes after the modelling PoC?

A

The Product Proof of Concept.

43
Q

What is the purpose of the Product PoC?

A
  • Defining the success criteria.
  • Product and engineering specifications are aligned.
  • Engineering resources are allocated in sprints.
  • Additional input features are developed.
  • Models are refined, and A/B tests are scheduled.
  • The validated learning in the second phase is assessing capability/market fit, as observed from A/B test results.
44
Q

The first product PoC may not produce satisfactory results, what should one do if this occurs?

A

To address the “unknown unknown” at the time of planning, an additional one to three build, measure, learn iterations can be planned to learn, align, build, test, and assess a new data-driven capability.

45
Q

What is a good synchronisation cadence between collaborating teams in a DS project?

A

Weekly syncs.

This creates a project rhythm and keeps the project top of mind for the coordinating teams. Weekly milestones also allow data scientists to break large projects into approachable pieces and facilitate transparency in communicating DS project progress.

46
Q

What are some risks to anticipate that can threaten a DS project?

A

New data source—Data availability, correctness, and completeness may be at risk.

Partner re-org—Disruption of prior alignment with partners; re-alignment is required.

New data-driven feature—Tracking bugs, data pipeline issues, population drifts from product feature updates, and integration issues must be discovered and resolved.

Existing product—Feature upgrades can change the meaning of metrics and signals.

Solution architecture dependencies—Technology platform updates can disrupt the operations of DS capabilities.

47
Q

What three ways do data science projects differ from software/data engineering projects?

A

Project team size—Involves 1–2 data scientists, compared with 3–10 engineers.

Project uncertainty—Data-dependent risks exist on top of engineering risks.

Project value—Demonstrated through A/B tests, feature completion is not enough.

48
Q

What are the conditions needed for a momentum based modelling strategy to be successful?

A
  • A predictable underlying process exists in the domain
  • Quantifiable signals can be timely processed
  • Levers exist for fast response
49
Q

What are some common use cases of a momentum based modelling strategy?

A
  • High-frequency trading (capturing market microstructure and order book dynamics).
  • Recommendation systems (such as recency, frequency, monetization, or RFM based models).
50
Q

In what areas are foundation-modelling strategies used?

A
  • Sales forecasting.
  • Infrastructure load prediction.
  • Financial account balance forecasting.
  • Structural modelling in economics.

Where various cyclical patterns drive the outcome.