Intro to ML Flashcards
(21 cards)
POC to Production Gap
Proof-of-concept to production
“ML model code is 5-10% of ML project code
refer to [D. Sculley et all NIPS 2015: Hidden Technical Debt in Machine Learning System] diagram
ML project lifecycle
“SDMD”
scoping (X->Y) -> data -> modeling -> deployment
scoping:
* define project [X->Y]
Data:
* define data and establish baseline
* label and organize data
Modeling
* select and train model
* perform error analysis
Deployment
* deploy in production
* monitor & maintain system
Nuance between research/academia and production team’s refinement to ML model?
- code (algorithm/model)
- hyperparameters
- data
research/academia:
tend to hold data the same
optimize code and hyperparameters
production team:
tend to hold code the same
optimize data and hyperparameters
Edge devices [definition?]
Edge devices are pieces of equipment that serve to transmit data between the local network and the cloud.
They are able to translate between the protocols, or languages, used by local devices into the protocols used by the cloud where the data will be further processed.
MLOps stand for?
an emerging discipline, and comprises a set of tools and principles to support progress through the ML project lifecycle.
Concept drift vs. data drift
data drift
[X changes]
e.g. a politician suddenly becomes famous
concept drift
[X -> Y] mapping changes
e.g. house size doesn’t change, but price change
realtime vs. Batch
speech -> realtime
hospital record from patient -> Batch
cloud vs. Edge/Browser
edge/browser -> good to always have as well, in case internet is not accessible or shut down
checklist of things to consider to create ML software
- realtime or Batcch
- cloud vs Edge/Browser
- computer resources (CPU/GPU/memory)
- Latency, throughput (QPS)
- Logging
- security and privacy
throughout (QPS)
Throughput(QPS) - queries per second: This is the number of requests that are successfully executed/serviced per unit of time. For example, if the throughput is 50/minute, this means that on your server, per minute, 50 requests are executed successfully (accepted, processed and responded properly)
Common ML deployment cases
- New product/capability
- automate/assist with manual task
- replace previous ML system
Key ideas:
* Gradual ramp up with monitoring
* Rollback
rollback
if new model not work, go back to previous-working model
gradual ramp up with monitory
not direct big travel to new model
start from a small traffic and then ramp up
shadow mode (deployment)
ML system shadows the human and runs in parallel.
ML system’s output not used for any decisions during this phase.
canary deployment
- roll out to small fraction (say 5%) of traffic initially
- monitor system and ramp up traffic gradually
origin:
canary in a coal mine
which refers to how coal miners used to use canaries to spot if there’s a gas leak
Blue green deployment
blue version = old version
green version = new version
router suddenly switch from old to new
benefit:
* easy way to enable rollback
degrees of automation
human only -> shadow mode -> AI assistance -> partial automation (send to human if algorithm is not sure) -> full automation (only AI)
both AI assistance and partial automation are “human in the loop” deployments
common for factory
consumer software -> full automation is more necessary
monitoring 思路
- brainstorm the things that could go wrong
- brainstorm a few statistics/metrics that will detect the problem
- it is ok to use many metrics initially and gradually remove the ones you find not useful
e.g.
software metrics | memory, compute, latency, throughput, server load
Input metrics [x] | avg input length, avg input volume, num missing values, avg image brightness
output metrics [y] | # time return “” (null), # times user redoes search, #timer user switches to typing (give up on your speech system), CTR
iterative process for ML model deployment
ML model iteration:
ML model/data ->experiment -> error analysis -> [go back]
deployment iteration:
deployment/monitoring -> traffic -> performance analysis -> [go back]
techniques for monitoring
*set thresholds for alarms
*adapt metrics and thresholds over time
VAD?
Voice activity detection