3 - Data Science Foundations Flashcards
What is the main focus of Kamala’s role at Stardust Health Insurance?
Ensuring that patients receive the most cost-effective care possible.
What does cost-effective medical care produce?
Desirable outcomes at a reasonable cost.
What is the Pareto principle?
A relationship where a small percentage of causes produce a large majority of results.
What key demographic information does Kamala consider about the patient population?
Patients’ age and gender distributions, common conditions, and medications.
What percentage of reimbursements went to 20 percent of patients at Stardust?
80 percent.
Why is the health care reimbursement a significant expenditure for Stardust?
It represents the company’s largest source of expenditure.
What is Kamala’s target population for her analysis?
Patients who have received surgery.
What is exploratory data analysis?
An analysis that generates insights into the data set, including limitations, summary statistics, and relationships between variables.
What is included in the claims database used for analysis?
Claims made by all covered patients from 2015 to 2023.
What age group is excluded from the claims data due to privacy concerns?
Clients below age 18.
What is the main table in the claims database called?
The procedures table.
What is imputation in data analysis?
Replacing a missing value with a nonmissing value.
True or False: The date of patient encounter is always recorded in the claims database.
False.
What percentage of the time is the ‘Date of patient encounter’ field empty?
17 percent.
What is a common issue with using averages in data analysis?
Averages can be misleading if extreme values are present.
What measure of central tendency is less sensitive to extreme values?
Median.
What was the average age of claimants in 2018?
46 years.
What was the average age of claimants in 2019?
50 years.
What is one potential question Kamala could answer with the claims data?
What diagnosis and procedure codes are most costly for the company?
What should Kamala consider regarding missing data in her analysis?
The implications of exclusion or imputation on data quality.
Fill in the blank: The data set covers claims from ______ to 2023.
2015
What is one limitation of the claims data set according to Kamala?
It does not include data on family history.
What is the significance of the 80/20 rule in Kamala’s analysis?
It highlights that a small percentage of patients contribute to a large portion of costs.
What type of data is stored in a separate table for patient identification?
Name, date of birth, social security number, address, and employer information.