The Clinical Classification Software (CCS) for diagnosis codes is a software tool developed as part of the Healthcare Cost and Utilization Project, a Federal-State-Industry partnership sponsored by the Agency for Healthcare Research and Quality.
In this post, we’ll deploy the CCS categorization scheme to International Classification of Disease (ICD) 9 diagnosis codes to identify diabetes and high blood pressure. This scheme collapses over 14,000 diagnosis codes into roughly 250 categories that are more useful in health data analysis.
To get a sense of what the scheme looks like, check out the single-diagnosis appendix, which maps sets of ICD 9 codes to diagnosis categories.
To see how ICD codes are organized, check out this public reference. Note that it’s a hierarchical scheme, where each additional decimal place conveys increasingly granular information.
The CCS crosswalk maps ICD9 codes without decimal places (relying on the unique sequence of digits), so you’ll have to remove decimals from your EHR data extract:
data[, ICD_Diagnosis_Code := gsub("\\.", "", ICD_Diagnosis_Code)]
You only need the first three columns from the CCS file and must to skip the first line when reading in the table. We’ll also change the names of the data to align with our EHR diagnosis codes:
CCS <- fread("~/$dxref 2015.csv", skip = 1, select = 1:3)
setnames(CCS, names(CCS), c("ICD_Diagnosis_Code", "CCS_CATEGORY", "CCS_CATEGORY_DESCRIPTION"))
The ICD 9 codes in the crosswalk file include dashes and extra spaces that must be removed:
Remove_Dash <- function(x){gsub("\\'", "", x)}
CCS <- CCS[, lapply(.SD, Remove_Dash)]
Trim <- function(x){str_trim(x, side = "both")}
CCS <- CCS[, lapply(.SD, Trim)]
If you’re not familiar with data.table syntax, note that .SD stands for subset of data and is used to specify the columns to which you want to apply a function. If you don’t specify any columns, the function is applied to all columns.
data <- merge(data, CCS, by = "ICD_Diagnosis_Code", all.x = TRUE)
Now, each ICD 9 code you pulled from the EHR is assigned to a diagnosis category.
diabetes <- data[CCS_CATEGORY_DESCRIPTION %in% c("DiabMel no c", "DiabMel w/cm")]
diabetes_patients <- diabetes[, .(Diabetes = .N), by = Patient_Identifier][Diabetes >= 2][, Diabetes := 1]
There’s two steps to the code above:
- Subset ICD 9 codes that are in diabetes CCS categories (diabetes mellitus with complications and without complications)
- For each patient, sum the number of ICD 9 codes in diabetes CCS categories. For patients with more than 2 codes, assign the variable “Diabetes” a value of 1.
hypertension <- data[CCS_CATEGORY_DESCRIPTION %in% c("HTN", "Htn complicn")]
htn_patients <- hypertension[, .(HTN = .N), by = Patient_Identifier][HTN >= 2][, HTN := 1]
There’s two steps to the code above:
- Subset ICD 9 codes that are in hypertension CCS categories (hypertension with complications and without complications)
- For each patient, sum the number of ICD 9 codes in hypertension CCS categories. For patients with more than 2 codes, assign the variable “HTN” a value of 1.