6 - Medical data Flashcards

(20 cards)

1
Q

What are the five main types of medical data?

A

Tabular, unstructured text, signal/time-series, genomic/omics, medical images.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are common challenges in medical data?

A

Privacy, regulation, missing data, noisy labels, biases, domain shifts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are examples of tabular healthcare data?

A

Age, sex, lab test values, diagnoses, medications, procedures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the three types of variables in tabular data?

A

Categorical, ordinal, continuous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is data/label leakage?

A

Using future or target information in training inputs, violating causality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the Clever-Hans effect?

A

Model relies on spurious correlations instead of causal features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is domain shift?

A

When the joint or marginal distributions of the data change between training and test data.
● Prior shift (p(y)): Change in the probability of observing a certain phenotype (e.g. during pandemic waves).
● Covariate shift (p(x)): Change in the distribution of patient features (e.g., fibrinogen levels based on time of year).
● General domain shift (p(y,x)): Change in the joint distribution.
● Concept shift (p(y∣x)): Change in disease definition, diagnostic method, or wet lab procedures (e.g., antigen test replaced by PCR test).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is missing-not-at-random (MNAR)?

A

Missingness depends on unobserved data, e.g., too sick to test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are methods of data imputation?

A

Mean, median, GAIN, MICE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is federated learning?

A

Training a model across multiple sites without sharing patient data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Name three ML methods used in medical data.

A

Random forest, gradient boosting, support vector machines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are TabNet and VIME?

A

Specialized DL methods for tabular data using self-supervised learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are common interpretability methods?

A

ICE plots –> local & model-specific; show how a single prediction changes as you vary one feature

SHAP, LIME –> Local methods, preferred by clinicians

GINI importance –> built-in in decision tree models; measures how much each feature reduces impurity when used in splits

PDPs, permutation importance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What makes medical time series data challenging?

A

High-dimensional, irregular sampling, noise, domain shifts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is time series data in medicine?

A

Sequential measurements like ECG, ICU monitors, lab tests over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are time-aware neural models?

A

neural network architectures specifically designed to handle irregular, time-dependent data — like what you often see in healthcare (e.g. vital signs, lab tests, ICU data)
T-LSTM, GRU-D, Neural ODEs, Temporal Fusion Transformers.

17
Q

What is unstructured medical data?

A

Clinical notes, reports, scanned forms—requires NLP.

18
Q

Name a generalist medical language model.

19
Q

What is ClinicalBERT?

A

A BERT NLP model fine-tuned on clinical notes.

20
Q

What are NLP tasks you might need when working with unstructured medical data?

A
  • Named Entity Recognition
  • Relation Extraction
  • Document Classification
  • De-identification
  • Summarization