Objective 4 - Predictive Modeling Flashcards Preview

Group & Health Specialty > Objective 4 - Predictive Modeling > Flashcards

Flashcards in Objective 4 - Predictive Modeling Deck (32):
1

Risk factors that indicate whether a person may have high claims

1. Inherent risk factors, such as age, sex, and race
2. Medical condition-related factors, such as diabetes or cancer
3. Family history (for conditions that are inheritable)
4. Lifestyle risk factors, such as smoking, lack of exercise, and poor nutrition
5. External risk factors, such as industry, location, and education

2

Types of medical management interventions

1. Care coordination (focuses on the system) - includes case management, discharge planning, and in-hospital care coordination
2. Condition management (focuses on the patient) - includes disease management and risk factor management
3. Provider management (focuses on the provider) - includes provider profiling, pay-for-performance, and accountable care organizations

3

Areas where condition-based models are used in healthcare financial applications

1. Program management - identifying high-risk individuals, financial modeling and resource allocation, and program evaluation (eg, calculating savings)
2. Provider or health plan reimbursement - normalizing populations to pay providers or plans for the risks they accept and to evaluate provider effectiveness. Profiling providers to assess quality and efficiency
3. Actuarial and U/W functions - pricing health plans, underwriting groups, and projecting future claims costs

4

Types of predictive models that are not based on medical conditions (traditional "non-condition risk-based" models)

1. Age/sex - rates are established for a group based on the average age/sex factor of the members in the group (works best for large groups w/ age/sex factors close to 1.0)
2. Prior cost - the prior year's claims are used to project future costs (is reasonably accurate for large groups, but not for smaller groups)
3. Combination of age/sex and prior cost - often used for rating smaller groups

5

Sources of data for developing risk factors

1. Claims data - for medical condition-related risk factors such as diabetes or cancer
2. Self-reported data - for lifestyle related risk factors such as smoking, stress, lack of exercise, poor nutrition, etc. (see separate list of risk factors identified by a health risk assessment)
3. External data - for lifestyle-related risk factors such as industry, geography, education, and income level

6

Risk factors identified by a health risk assessment

1. Personal disease history
2. Family disease history
3. Health screenings and immunizations
4. Alcohol consumption
5. Injury prevention behavior
6. Nutrition
7. Physical activity
8. Skin protection
9. Stress and well-being
10. Tobacco use
11. Weight management
12. Women's health (eg, pregnancy status)
13. General health assessment
14. Functional health status
15. Mental health status

7

Types of data sources for predictive modeling

1. Physician referral/chart (high reliability, low practicality) - medical charts provide the most information, but have serious drawbacks (see separate list)
2. Enrollment (high reliability, high practicality) - can be used to convert claims data into PMPM amounts
3. Claims (medium reliability, high practicality) - usually available to health plans and continually refreshed as events occur. Data quality varies greatly (must check for accuracy). Lots of info is provided in claim forms for hospital (UB04) and professional (CMS 1500) claims.
4. Pharmacy (medium reliability, high practicality) - high quality data that completes quickly. But there is no diagnosis on the claims, and prescriptions that aren't filled won't generate claims
5. Laboratory values (high reliability, low practicality) - can be difficult to obtain, and vendors do not use a standard format
6. Self-reported (low/medium reliability, low practicality) - will become important since members can report info that isn't available elsewhere, but there are drawbacks (see separate list)

8

Drawbacks of using data from medical charts

1. They do not cover OON services or drugs prescribed by OON providers
2. They do not record the patient's compliance with physician orders (such as prescription filling)
3. Transcribing the data and transferring it to a uniform format is time consuming and requires highly-trained staff
4. There is not uniformity in how physicians code conditions and their severity
5. Charts are typically unavailable to the health plan or the actuary

9

Advantages and disadvantages of using diagnosis codes for identifying member conditions

Advantages:
1. Codes are almost always present on medical claims
2. A uniform format exists
3. Usefulness for identifying conditions
Disadvantages:
1. Usually only the primary and secondary codes are populated in the claims data
2. Coding errors may occur
3. Codes may sometimes be selected to drive maximum reimbursement
4. Different physicians may follow different coding practices

10

Drawbacks of using survey data

1. Surveys must be commissioned, budgeted, and executed in order to generate the data
2. Data isn't updated as medical events occur, so it can become stale unless the survey is updated periodically
3. Response bias can make it dangerous to draw conclusions from survey responses
4. Respondents may submit untruthful answers

11

Questions to answer when building a clinical identification algorithm

A clinical identification algorithm is a set of rules that is applied to a claims data set to identify the conditions present in the population
1. Where are the diagnoses?
2. What is the source of the diagnosis (claims, medical charts, etc.)?
3. If the source is claims, what claims should be considered (inpatient, outpatient, lab, etc.)?
4. If the claim contains more than one diagnosis, how many diagnoses will be considered for identification?
5. Over what time span, and how often, will a diagnosis have to appear in claims for that diagnosis to be incorporated?
6. What procedures may be useful for determining severity of a diagnosis?
7. What prescription drugs may be used to identify conditions?

12

Challenges when constructing a condition-based model

1. The large # of procedure and drug codes
2. Deciding the severity level at which to recognize the condition
3. The impact of co-morbidities for conditions that are often found together
4. The degree of certainty with which the diagnosis has been identified
5. The extent of the data (claims data will cover all members, but self-reported data will not)
6. The type of benefit design that underlies the data

13

Definitions of sensitivity and specificity

When building clinical identification algorithms, the proper balance between sensitivity and specificity must be found
1. Sensitivity - the % of members correctly identified as having a condition ("true positives")
2. Specificity - the % of members correctly identified as not having a condition ("true negatives")
Specificity may be more important for underwriting, while sensitivity may be more important for care management, since clinicians can verify the presence of a condition.

14

External sources of clinical identification algorithms

1. HEDIS (from the NCQA) has algorithms for identifying some conditions (eg, asthma, high blood pressure, diabetes)
2. Disease Management Association of America (now Care Continuum Alliance) developed algorithms for identifying chronic diseases
3. Grouper models - commercially-available models that identify member conditions and score them for relative risk and cost
4. Literature - articles will sometimes report the codes that are used for analysis

15

Reasons for using commercially-available grouper models

1. Building algorithms from scratch requires a considerable amount of work
2. Models must be maintained to accommodate new codes, which requires even more work
3. Commercially-available models are accessible to many users. Providers and plans often require that payments be based on a model that is available for review and validation

16

Common features of Medicare prospective payment systems

1. A system of averages - providers cannot expect to make a profit on each case, but efficient providers can make a reasonable return on average
2. Increased complexity - DRGs are more complicated than a system based on per diem payments
3. Relative weights - associated with each patient group to reflect the average resources used by efficient providers
4. Conversion factor (base price) - the dollar amount for a unit of services. Is multiplied by the relative weight to determine payment
5. Outliers - unusual cases that require above-average resources and receive extra payments
6. Updates - the conversion factor and relative weights are adjusted annually to reflect new technologies and changing practice patterns
7. Access and quality - policymakers monitor PPSs and survey patients to ensure that beneficiaries have adequate access to high quality care and that providers are compensated adequately

17

Challenges with patient classification systems based on coding systems

1. Need for new DRGs - due to new diseases and new procedures
2. ICD coding - some codes may not be sufficiently precise as diseases and procedures are refined
3. Upcoding - providers may be tempted to exaggerate a patient's secondary diagnoses to get paid more
4. New coding systems - adopting the new ICD-10 systems will be a major challenge for hospitals and CMS

18

Factors for choosing the right predictive model

1. Correlation structure - more complicated models may be needed for data containing correlated variables
2. Purposed of the analysis
3. The nature of the available data
4. Characteristics of the outcome variable (eg, quantitative vs. qualitative, unrestricted vs. truncated, binary choice vs. unrestricted choice)
5. Distribution of the outcome variable (eg, normal vs. skewed)
6. Functional relationship (eg, linear vs. non-linear) - when the equation cannot be transformed into a linear form, iterative processes or a maximum likelihood procedure may be used instead of ordinary regression methods
7. Complex decision model - whether a single equation model is sufficient or a simultaneous equation model is needed (if there is more than one dependent variable)

19

Steps of the data warehousing process

1. Identify which patients to include in the dataset
2. Identify which data elements to merge with the patient list
3. Identify what the data says about the patient (eg, create flags that describe the patient's health and risk status)
4. Attach the derived variables and flags to the patient identifiers to create a picture of the patient history

20

Characteristics for assessing the quality of a model

1. Parsimony - should introduce as few variables as are necessary to produce the desired results
2. Identifiability - if there are more dependent variables than independent equations, then issues such as bias will result
3. Goodness of fit - variations in the outcomes variable should be explained to a high degree by the explanatory variables (measured by R^2 and other statistics)
4. Theoretical consistency - results should be consistent with the analyst's prior knowledge of the relationships between variables
5. Predictive power - should predict well when applied to data that was not used in building the model

21

Statistics for determining whether a model is good

1. R^2 - measures how much of the variation in the dependent variable is explained by the variation in the independent variables. A more valid measure may be Adjusted R^2 = 1 - (1 - R^2) * (N - 1) / (N - k - 1), where N = # of observations and k = # of parameters.
2. Regression coefficients - examine the signs of the parameter estimates to ensure they make sense, then determine whether the value of the parameter estimate is statistically significant
3. F-Test - ratio of variance explained by the model divided by unexplained or error variance
4. Statistics used for logistic models:
a. Hosmer-Lemeshow statistic
b. Somers' D statistic
c. C-statistic
5. Multicollinearity - occurs when a linear relationship exists between the independent variables. May be addressed by removing one of the collinear variables.
6. Heteroscedasticity - occurs when the error terms do not have a constant variance
7. Autocorrelation - occurs when there is a correlation to the error term in the regression function

22

Re-sampling methods for validating a model

These approaches help test the model's predictive power
1. Bootstrap - the sampling distribution of an estimator is estimated by sampling with replacement from an original sample
2. Jackknife - the estimate of a statistic is systematically re-computed, leaving out one observation at a time from the sample set
3. Cross-validation - subsets of data are held out for use as validating sets
4. Permutation test - a reference distribution is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points

23

Factors used in developing risk scores in the CMS-HCC risk model

HCC = hierarchical condition category
1. Demographics - age and gender factors are the starting point. Higher risk scores are assigned to beneficiaries who are eligible for both Medicaid and Medicare.
2. Disabled indicators - a separate set of age and gender factors are used for beneficiaries under age 65 who are eligible for Medicare due to disability
3. Separate models are used for beneficiaries who:
a. Reside in a long-term care institution, or
b. Suffer from end-stage renal disease
4. New enrollees - since no claim history exists, only age and gender factors are used. Separate factors are developed for new enrollees
5. A prospective risk adjustment methodology is used to risk-adjust future payments based on actual historical medical experience
6. Calibration - every 2 yrs, CMS re-calibrates by updating the model weights to reflect new prescription drugs and changes in medical technologies, practice patterns, and provider coding practices
7. Health status risk factors are developed from the beneficiary's diseases (using ICD-9 codes and grouping into HCCs)

24

Central features of Massachusetts health care reform

1. Establishment of an exchange (purchasing pool)
2. A Requirement that all employers establish Section 125 accounts (so employees could pay premiums on a pre-tax basis)
3. Large subsidies for families living below 300% of FPL
4. For those above 300% of FPL, availability of a more limited plan (so insurance would be affordable even outside the subsidy range)
5. A mandate that all individuals must purchase health insurance coverage
6. Funding through use of federal funds previously paid to safety net hospitals or paid for uncompensated care

25

Steps for developing and using predictive models for care management programs

1. Choose a disease or condition - programs should focus on diseases that:
a. Are reasonably prevalent in typical commercial populations
b. Can lead to costly exacerbations if not appropriately treated, and
c. Have treatments that are relatively low cost that are within the control of the member
2. Rank conditions based on intervenability (the susceptibility of the condition to external management). Prioritize interventions based on an intervenability score rather than the highest risk scores.
3. Identify the population - construct algorithms to identify members who are at risk
4. Plan the intervention - identify the issue to address and the mechanism by which it will be addressed. Use care management nurses to assess and design a care plan for patients identified by the predictive model.
5. Perform economic modeling of the proposed program - must decide the best population penetration level to achieve the most savings. The Risk Management Economic Model can be used for this
6. Develop the predictive model.
7. Test actual outcomes against predictions, and use this info to modify the model and the program

26

Metrics that should be recognized in the Risk Management Economic Model

1. The # and risk-intensity of members to be targeted - the # must be large enough to produce savings that offset implementation costs, but not so large that marginal costs exceed marginal savings
2. Types of interventions to be used in the program - such as mail or automated outbound dialing
3. The # of nurses and other staff needed for the program, and program costs
4. The methodology for contacting and enrolling members
5. The rules for integrating the program with the rest of the care management system
6. The timing and #s of contacts, enrollments, and interventions
7. The predicted behavior of the target population if there were no intervention, and the predicted effectiveness of the intervention at modifying that behavior

27

Most common types of health risk

1. Pricing risk - made up of severity and frequency of events (known risk)
2. Underwriting risk - risk that overall pool will perform worse than expected (unknown risk)

28

Definition of a high-risk member

A member who has a significant probability of experiencing higher-than-average costs in the near future (such as next 12 mos). There is not a consistently successful method for identifying these members.

29

Commercially-available grouper models

1. Johns Hopkins Adjusted Clinical Groups (ACG) System - Case-mix adjustment measure for ambulatory and inpatient diagnoses, based on Aggregated Diagnosis Groups, age, and sex
2. Diagnosis Related Groups (DRGs) - Used by CMS and some commercial payers to ensure consistent reimbursement of hospitals for patients with the same risk profile; accounts for complications and co-morbidities
3. Chronic Illness and Disability Payment System (CDPS) - diagnosis-based risk adjustment model used by states to adjust payments for Medicaid beneficiaries
4. Clinical Risk Groups (CRGs) - identify groups of individuals requiring similar amounts and types of resources; similar to DRGs, but for all care over an extended time period
5. Diagnostic Cost Groups/Hierarchical Condition Category (DCG/HCC) - Developed as a health adjuster for Medicare inpatient and ambulatory care based on age, gender, dual eligible status, disabled status and diagnosis, with a focus on high-cost diagnoses
6. Sightlines DxCG Risk Solutions - uses demographics and claim (medical & pharmacy) to quantify the illness burden of a population for commercial, Medicare and Medicaid populations; a relative risk score is developed.
7. Episode Treatment Groups (ETGs) - case-mix adjustment and episode-building system used to develop a relative risk score based on complete treatment episodes

30

Drug grouper models

1. Therapeutic class groupers - use these models to group drugs into a hierarchy of therapeutic classes
a. American Hospital Formulary Service (AHFS)
b. Generic Product Identifier (GPI)
2. Drug-based risk adjustment models - infers the member's diagnosis from the therapeutic class of drugs the member uses and generates a relative risk score
a. Medicaid Rx
b. Pharmacy Risk Groups (PRGs)
c. RxGroups (DxCG)

31

Worksheets required for Medicare Advantage bid

1. Bid-specific base period experience and key assumptions for contract year projection
2. Calculates projected allowed costs for contract year. (credibility blended if necessary)
3. Projected cost sharing by medical service category for the contract year
4. Development of net medical costs, including expenses and margin and supplemental benefits
5. Calculates benchmark and evaluates whether the plan realizes a savings or needs to charge a basic member premium
6. Summary of results
7. Pricing for optional supplemental benefit packages

32

CommCare risk sharing arrangements

1. Risk adjustment methodology applied to the medical capitation rate is a form of risk sharing between the plans and the Connector
2. Aggregate risk sharing corridors apply to all health plans. The Connector shares 50% of the risk for claims more than 2% above or below the capitation payment.
3. Specific outlier stop-loss pool that pays for 75% of specific claims above a $150K threshold. Funded by health plans at 1.25% of the capitation rate.