6 - Medical data Flashcards

Question 1

Q

What are the five main types of medical data?

Answer

A

Tabular, unstructured text, signal/time-series, genomic/omics, medical images.

Question 2

Q

What are common challenges in medical data?

Answer

A

Privacy, regulation, missing data, noisy labels, biases, domain shifts.

Question 3

Q

What are examples of tabular healthcare data?

Answer

A

Age, sex, lab test values, diagnoses, medications, procedures.

Question 4

Q

What are the three types of variables in tabular data?

Answer

A

Categorical, ordinal, continuous.

Question 5

Q

What is data/label leakage?

Answer

A

Using future or target information in training inputs, violating causality.

Question 6

Q

What is the Clever-Hans effect?

Answer

A

Model relies on spurious correlations instead of causal features.

Question 7

Q

What is domain shift?

Answer

A

When the joint or marginal distributions of the data change between training and test data.
● Prior shift (p(y)): Change in the probability of observing a certain phenotype (e.g. during pandemic waves).
● Covariate shift (p(x)): Change in the distribution of patient features (e.g., fibrinogen levels based on time of year).
● General domain shift (p(y,x)): Change in the joint distribution.
● Concept shift (p(y∣x)): Change in disease definition, diagnostic method, or wet lab procedures (e.g., antigen test replaced by PCR test).

Question 8

Q

What is missing-not-at-random (MNAR)?

Answer

A

Missingness depends on unobserved data, e.g., too sick to test.

Question 9

Q

What are methods of data imputation?

Answer

A

Mean, median, GAIN, MICE.

Question 10

Q

What is federated learning?

Answer

A

Training a model across multiple sites without sharing patient data.

Question 11

Q

Name three ML methods used in medical data.

Answer

A

Random forest, gradient boosting, support vector machines.

Question 12

Q

What are TabNet and VIME?

Answer

A

Specialized DL methods for tabular data using self-supervised learning.

Question 13

Q

What are common interpretability methods?

Answer

A

ICE plots –> local & model-specific; show how a single prediction changes as you vary one feature

SHAP, LIME –> Local methods, preferred by clinicians

GINI importance –> built-in in decision tree models; measures how much each feature reduces impurity when used in splits

PDPs, permutation importance

Question 14

Q

What makes medical time series data challenging?

Answer

A

High-dimensional, irregular sampling, noise, domain shifts.

Question 15

Q

What is time series data in medicine?

Answer

A

Sequential measurements like ECG, ICU monitors, lab tests over time.

Question 16

Q

What are time-aware neural models?

Answer

Study These Flashcards

A

neural network architectures specifically designed to handle irregular, time-dependent data — like what you often see in healthcare (e.g. vital signs, lab tests, ICU data)
T-LSTM, GRU-D, Neural ODEs, Temporal Fusion Transformers.