Intro

The Clinical Classification Software (CCS) for diagnosis codes is a software tool developed as part of the Healthcare Cost and Utilization Project, a Federal-State-Industry partnership sponsored by the Agency for Healthcare Research and Quality.

In this post, we’ll deploy the CCS categorization scheme to International Classification of Disease (ICD) 9 diagnosis codes to identify diabetes and high blood pressure. This scheme collapses over 14,000 diagnosis codes into roughly 250 categories that are more useful in health data analysis.

To get a sense of what the scheme looks like, check out the single-diagnosis appendix, which maps sets of ICD 9 codes to diagnosis categories.

To see how ICD codes are organized, check out this public reference. Note that it’s a hierarchical scheme, where each additional decimal place conveys increasingly granular information.

What you’ll need

  1. Extract a list of patient ICD 9 diagnosis codes from your EHR, typically along with patient identifiers, dates, and encounter identifiers. To identify diagnoses, you’ll have to decide on a window of time to look at. For most cases, we look at two years worth of ICD 9 codes, but you may be interested in smaller or larger time periods. In the code below, this data set is called ‘data’.
  2. Download the Single Level CCS zip file from the CCS website and within that folder, pull out the ‘$dxref 2015.csv’ file. In the code below, this data set is called ‘CCS’.

Packages

Load data.table and stringr:

library(data.table)
library(stringr)

Clean your ICD9 codes

The CCS crosswalk maps ICD9 codes without decimal places (relying on the unique sequence of digits), so you’ll have to remove decimals from your EHR data extract:

data[, ICD_Diagnosis_Code := gsub("\\.", "", ICD_Diagnosis_Code)]

Clean CCS crosswalk

You only need the first three columns from the CCS file and must to skip the first line when reading in the table. We’ll also change the names of the data to align with our EHR diagnosis codes:

CCS <- fread("~/$dxref 2015.csv", skip = 1, select = 1:3)
setnames(CCS, names(CCS), c("ICD_Diagnosis_Code", "CCS_CATEGORY", "CCS_CATEGORY_DESCRIPTION"))

The ICD 9 codes in the crosswalk file include dashes and extra spaces that must be removed:

Remove_Dash <- function(x){gsub("\\'", "", x)}
CCS <- CCS[, lapply(.SD, Remove_Dash)]

Trim <- function(x){str_trim(x, side = "both")}
CCS <- CCS[, lapply(.SD, Trim)]

If you’re not familiar with data.table syntax, note that .SD stands for subset of data and is used to specify the columns to which you want to apply a function. If you don’t specify any columns, the function is applied to all columns.

Merge ICD9 codes with CCS crosswalk

data <- merge(data, CCS, by = "ICD_Diagnosis_Code", all.x = TRUE)

Now, each ICD 9 code you pulled from the EHR is assigned to a diagnosis category.

Example: Identifying Diabetes

diabetes <- data[CCS_CATEGORY_DESCRIPTION %in% c("DiabMel no c", "DiabMel w/cm")]
diabetes_patients <- diabetes[, .(Diabetes = .N), by = Patient_Identifier][Diabetes >= 2][, Diabetes := 1]

There’s two steps to the code above:
- Subset ICD 9 codes that are in diabetes CCS categories (diabetes mellitus with complications and without complications)
- For each patient, sum the number of ICD 9 codes in diabetes CCS categories. For patients with more than 2 codes, assign the variable “Diabetes” a value of 1.

Example: Identifying Hypertension (High Blood Pressure)

hypertension <- data[CCS_CATEGORY_DESCRIPTION %in% c("HTN", "Htn complicn")]
htn_patients <- hypertension[, .(HTN = .N), by = Patient_Identifier][HTN >= 2][, HTN := 1]

There’s two steps to the code above:
- Subset ICD 9 codes that are in hypertension CCS categories (hypertension with complications and without complications)
- For each patient, sum the number of ICD 9 codes in hypertension CCS categories. For patients with more than 2 codes, assign the variable “HTN” a value of 1.

FAQ: