The Centers for Medicare & Medicaid Services (CMS) hierarchical condition categories (HCC) model, implemented in 2004, is a risk-adjustment model used to adjust Medicare payments to health care plans for the health expenditure risk of their enrollees. It’s intended use is to pay insurance plans appropriately for their expected relative costs. For example, health plans that care for overwhelmingly healthy populations are paid less than those that care for much sicker populations.
The full model includes variable interactions, demographics (e.g., age, gender), and indicator variables for Medicaid enrollment and disabled status. However, this post will only cover using the HCC model to cluster diagnosis codes into meaningful categories. We’ll implement HCCs for ICD 9 codes using the 2014 model, but once ICD 10 is implemented in my environment, I’ll update the post.
To see the different versions of the model, please visit:
- 2014 Model
- 2015 Model
- ICD 10 Mapping
The HCC diagnostic classification system has four components:
- Classify over 14,000 ICD 9 diagnosis codes into 805 diagnostic groups, which represent a well-specified medical condition.
- Diagnosis groups are further aggregated into 189 condition categories, which describe a broader set of diseases. Diseases within a condition category are related clinically and with respect to cost.
- Only use the subset of 189 condition categories that best predict Medicare Part A (inpatient) and Part B (outpatient) medical expenditures. The most recent version (version 22) of the CMS HCC model includes 79 of the disease categories, excluding those that contain diagnoses that are vague/nonspecific (e.g., symptoms), discretionary in medical treatment or coding (e.g., osteoarthritis), not medically significant (e.g., muscle strain), or transitory/definitively treated (e.g., appendicitis). The model also excludes categories that do not empirically add to costs, as well as categories that are fully defined by the presence of procedures or durable medical equipment. This focuses the model on medical problems that are present, rather than services offered.
- Hierarchies are imposed among related condition categories, so that a person is coded for only the most severe manifestation among related diseases. For example, there are four condition categories related to Ischemic Heart Disease; acute myocardial infarction (81), unstable angina & other acute ischemic heart disease (82), angina pectoris (83), and coronary atherosclerosis (84). A patient with an ICD 9 code in condition category 81 is excluded from being coded in categories 82, 83, or 84, even if diagnosis codes in those categories are present.
Load data.table, lubridate, and stringr:
library(data.table)
library(lubridate)
library(stringr)
The function below converts the text in the SAS file V22H79L1.txt into a table with two columns. The first column is the HCC category and the second column is the HCC category label.
HCC_Labels <- function(hcc){
## Set working directory
setwd(hcc)
## Load packages
require(data.table)
require(stringr)
## Read lines of SAS file
labels <- readLines("V22H79L1.TXT")[9:89]
## Combine lines that refer to same HCC
labels <- paste(labels, collapse = " ")
labels <- unlist(tstrsplit(labels, " HCC"))
## Convert to data table
labels <- as.data.table(labels)
## Change column name
setnames(labels, "labels", "HCC")
## Separate each line into two variables
labels[, c("HCC","HCC_Label") := tstrsplit(HCC, "=", fixed = TRUE)]
## Remove white space
Trim <- function(x){str_trim(x, side = "both")}
labels <- labels[, lapply(.SD, Trim)]
## Remove quotations from label
labels[, HCC_Label := gsub('"', "", HCC_Label)]
## Remove letters from HCC variable
labels[, HCC := as.integer(gsub("HCC", "", HCC))]
## Return label table
return(labels)
}
Run the function above and assign the output to a table called labels:
labels <- HCC_Labels(hcc)
This step uses ICD9 codes extracted from your EHR and the F2213L2P.TXT crosswalk file to group ICD 9 codes into the 79 relevant condition categories included in version 22 of the CMS HCC model.
Assign_CCs <- function(ICD9_Codes, hcc){
## Load packages
require(data.table)
require(lubridate)
## Read in crosswalk
setwd(hcc)
crosswalk <- as.data.table(read.table("F2213L2P.TXT"))
########## Clean ICD 9 codes
## Fix column names of ICD9 codes
if(names(ICD9_Codes) != c("Patient_Identifier", "Encounter_Identifier", "ICD_Diagnosis_Code", "Diagnosis_Date"))
setnames(ICD9_Codes, names(ICD9_Codes), c("Patient_Identifier", "Encounter_Identifier", "ICD_Diagnosis_Code", "Diagnosis_Date"))
## Fix dates on ICD 9 codes
if(class(ICD9_Codes$Diagnosis_Date) != "Date")
ICD9_Codes[, c("Diagnosis_Date", "Time", "AM_PM") := tstrsplit(Diagnosis_Date, " ", fixed = TRUE)][, c("Time", "AM_PM") := NULL][, Diagnosis_Date := as.Date(fast_strptime(Diagnosis_Date, format = "%m/%d/%Y"))]
## Remove periods
if(sum(grepl(".", ICD9_Codes$ICD_Diagnosis_Code)) != 0)
ICD9_Codes[, ICD_Diagnosis_Code := gsub("\\.", "", ICD_Diagnosis_Code)]
########## Clean crosswalk
## Change names
setnames(crosswalk, names(crosswalk), c("ICD_Diagnosis_Code", "HCC"))
########## Merge ICD 9 codes with crosswalk
ICD9_Codes <- merge(ICD9_Codes, crosswalk, by = "ICD_Diagnosis_Code")
## Return ICD 9 codes
return(ICD9_Codes)
}
The merge above is an inner join that assigns ICD 9 codes to condition categories. Note that many ICD 9 codes you extract from the EHR will be dropped, because they do not correspond to condition categories that are relevant for the model. For example, when I ran the function above on a sample of ~840,000 ICD 9 codes extracted from the EHR, only ~135,000 are assigned to condition categories. This should emphasize that HCCs should only be used to identify the prevalence of chronic conditions that are defined in the model.
Run the function above and pass the result to a table called ICD9s_CCs:
ICD9s_CCs <- Assign_HCCs(ICD9_Codes, hcc)
Now that the ICD 9 codes are classified by patient and by condition category, you can calculate the presence of the 79 condition categories for each patient:
Collapse_CCs <- function(ICD9s_CCs, labels){
## Load data.table
require(data.table)
## Count diagnosis codes for each patient in each category
cc_by_patient <- ICD9s_CCs[, .(Freq = .N), by = c("Patient_Identifier", "HCC")]
## Only keep categories where patients have more than one diagnosis code
cc_by_patient <- cc_by_patient[Freq >= 2]
## Only keep the columns for patient and HCC
cc_by_patient <- cc_by_patient[, .(Patient_Identifier, HCC)]
########## Convert long and skinny table to long and fat
## Identify all the HCCs for which you have to build out dummy variables
dummies <- as.character(labels$HCC)
## Build out dummy variables for each category
cc_by_patient[,(paste("HCC", dummies, sep = "")):=lapply(dummies,function(x) as.numeric(HCC==as.character(x)))]
## Drop HCC variable
cc_by_patient[, HCC := NULL]
####### Collapse data to 1 row per patient
cc_by_patient <- cc_by_patient[, lapply(.SD, max), by = Patient_Identifier]
## Return data for all 79 CCs for each patient
return(cc_by_patient)
}
Note that the function above takes labels (the output of step 1) and ICD9s_CCs (the output of step 2) as inputs. In return, the function builds a wide table with one row for each patient and 80 columns. The first column is patient identifier and the following 79 columns are for each of the condition categories. A value of 1 indicates presence of a condition and a value of 0 indicates absence of a condition.
Run the function above and assign the output to Condition_Categories:
Condition_Categories <- Collapse_CCs(ICD9s_CCs, labels)
So far, we have been working with condition categories, NOT hierarchical condition categories. In the table built above, Condition_Categories, it’s possible for patients to have “Diabetes without complication” as well as “Diabetes with chronic complications”. We’ll fix this issue by requiring patients to only have the most severe condition within a group of related conditions.
To determine the hierarchical structure of the condition categories, we’ll build a table from the V22H79H1.TXT file. The table will have three columns: Condition Name, HCC, and a list of HCCs to zero if a patient has a given HCC in the second column. For example, a patient with a value of 1 for HCC 8 (“Metastatic Cancer and Acute Leukemia”) has 0 values overwritten for HCC 9 (“Lung and Other Severe Cancers”), HCC 10 (“Lymphoma and Other Cancers”), HCC 11 (“Colorectal, Bladder, and Other Cancers”), and HCC 12 (“Breast, Prostate, and Other Cancers and Tumors”).
HCC_Table <- function(hcc){
## Load packages
require(data.table)
require(stringr)
## Load V22H79H1.txt
setwd(hcc)
Hierarchy <- as.data.table(readLines("V22H79H1.txt")[28:58])
## Change name of first column
setnames(Hierarchy, "V1", "Condition")
## Split out condition categories to zero
Hierarchy[, c("Condition", "HCC", "To_Zero") := tstrsplit(Condition, "%", fixed = TRUE)]
## Strip away extra characters in HCC variable
Hierarchy[, HCC := gsub("=", "HCC", str_extract(HCC, "=[0-9]+"))]
## Strip away extra characters in To_Zero variable
Hierarchy[, To_Zero := gsub("STR|\\)|;| ", "", To_Zero)]
## Add HCC before every number To_Zero
Hierarchy[, To_Zero := gsub("\\(", "HCC", To_Zero)]
Hierarchy[, To_Zero := gsub(",", ",HCC", To_Zero)]
## Strip away extra characters in Condition variable
Hierarchy[, Condition := str_extract(Condition, "[a-zA-Z]+ *[0-9]")]
## Trim away white space from all columns
Trim <- function(x){str_trim(x, side = "both")}
Hierarchy <- Hierarchy[, lapply(.SD, Trim)]
## Return table
return(Hierarchy)
}
Run the function above and assign the output to Hierarchy_Table:
Hierarchy_Table <- HCC_Table(hcc)
The last step is implementing the hierarchical structure of condition categories (output of step 4) on the “Condition_Categories” table (output of step 3), which shows patient-level diagnoses for each of the 79 condition categories.
Hierarchy <- function(Hierarchy_Table, Condition_Categories){
## Get parameters for loops (number of hierarchy conditions and number of patients)
Hierarchy_Count <- nrow(Hierarchy_Table)
Patient_Count <- nrow(Condition_Categories)
## Loop for each hierarchy condition
for (i in 1:Hierarchy_Count){
## Get the name of the column the hierarchy condition is based on and the vector of columns to zero
colname <- Hierarchy_Table$HCC[i]
zero_list <- unlist(strsplit(Hierarchy_Table$To_Zero[i], split=","))
## Loop through all the columns to set to 0
for (k in zero_list){
set(Condition_Categories, which(Condition_Categories[[colname]] == 1), k, 0)
}
## Show progress through conditions
print(i)
}
## Return Hierarchical Condition Categories
return(Condition_Categories)
}
Run the function above and assign the output to HCCs:
HCCs <- Hierarchy(Hierarchy_Table, Condition_Categories)
And there you have it! For a list of patients, you’ve converted ICD9 codes into 79 chronic diseases.