CPT is a registered trademark of the American Medical Association (AMA). For basic information, please see “Decoding the Codes: CPT and RUC simplified”, which can be found here. The codes and descriptions provide a common language to describe health care services and are used in medical billing.
The AMA hosts a web site to enable users to look up individual CPT codes as well as reimbursement rates for both facility and non-facility sites. You can also find many web pages that will break down CPT codes for a certain specialty (for example, psychotherapy). But what if you have extracted thousands of procedures from your electronic health record and need some description of what the codes are?
The Clinical Classification Software (CCS) for services and procedures is a software tool developed as part of the Healthcare Cost and Utilization Project, a Federal-State-Industry partnership sponsored by the Agency for Healthcare Research and Quality. This software classifies more than 9,000 CPT codes into 244 clinically meaningful categories.
After you download the zipped file, you’ll notice the table (2015_ccs_services_procedures.csv) has CPT codes in ranges in order to adhere to copyright restrictions. However, when trying to report on many thousands of procedures, you will need a crosswalk that assigns a CCS group to each CPT code. We’ll now pivot and walk through a problem that presents similar obstacles.
Let’s build a data table with three columns: range of codes, numerical class assignments, and labels for each class. We’ll do this for a catalogue of products being sold in a store.
########## Build Code Ranges
## Set seed and generate random data
set.seed(1234)
Codes <- data.table(Start = sample(c(100:500), 10, replace = FALSE), Size = sample(c(1:5), 10, replace = TRUE), With_Letter = sample(c(1:3), 10, replace = TRUE), Letter = sample(c("A", "B", "C", "D", "E", "F", "G"), 10, replace = TRUE))
## Build end of code range
Codes[, End := Start + Size]
## Change Start and End to Characters
Codes[, Start := as.character(Start)][, End := as.character(End)]
## For items where With_Letter == 1, place letter before start and end code
Codes[With_Letter == 1, Start := paste(Letter, Start, sep = "")]
Codes[With_Letter == 1, End := paste(Letter, End, sep = "")]
## For items where With_Letter == 3, place letter at end of start and end code
Codes[With_Letter == 3, Start := paste(Start, Letter, sep = "")]
Codes[With_Letter == 3, End := paste(End, Letter, sep = "")]
## Build Ranges
Codes[, Range := paste(Start, End, sep = "-")]
## Insert single quote at beginning and end of range
Codes[, Range := paste("'", Range, "'", sep = "")]
## Remove all columns except for Ranges
Codes[, c("Start", "Size", "With_Letter", "Letter", "End") := NULL]
########## Build numberic class and class labels
Labels <- data.table(Class = c(1:10), Label = c("Soap", "Shampoo", "Conditioner", "Laundry Detergent", "Flour", "Sugar", "Deoderant", "Toothbrush", "Toothpaste", "Floss"))
########## Bind ranges and labels
data <- cbind(Codes, Labels)
########## Print data
data
## Range Class Label
## 1: 'D145-D149' 1 Soap
## 2: 'B348-B351' 2 Shampoo
## 3: 'C343-C345' 3 Conditioner
## 4: 'D499-D504' 4 Laundry Detergent
## 5: 'B441-B443' 5 Flour
## 6: '353F-358F' 6 Sugar
## 7: '103-105' 7 Deoderant
## 8: '191B-193B' 8 Toothbrush
## 9: '361G-362G' 9 Toothpaste
## 10: 'F301-F303' 10 Floss
From above, we have a list of ranges that each categorize items in a store. Let’s say that you have a collection of receipts and on each receipt is an item number that falls within the ranges. If you want to easily classify each item, then you need a one-to-one crosswalk between item codes and categories.
We’ll break this problem into two parts. First, we’ll handle cases where the ranges of codes don’t have letters. Second, we’ll handle cases where the range of codes have a letter at the beginning or a letter at the end.
Finally, let’s say that each code must be 5 characters long. For example, we must append leading zeros to the code for item number A62, so that it appears as “A0062”.
Crosswalk_NoLetters <- function(data){
########## Clean Data
## Subset data to only contain ranges without letters
noletters <- data[!(grep("[A-Z]+", Range))]
## Remove single quote
noletters[, Range := gsub("'", "", Range)]
## Split Range into Start and End
noletters[, c("Start", "End") := tstrsplit(Range, "-", fixed=TRUE)]
## Convert class to numeric
noletters[, Start := as.numeric(Start)][, End := as.numeric(End)]
########## Set up writing out
## Create new table to write codes to
output <- data.table(Code=rep(0,100000), Class=rep(0,100000), Label=rep("empty",100000))
## Set line in output to write to
out_line <- 1L
########## Read through every line in data set
for(i in 1:nrow(noletters)){
## Calculate the number of times to repeat rows
Reps <- (noletters[i,End] - noletters[i,Start] + 1)
## And loop through to write new lines for each code
for(j in 1:Reps){
## The j index writes (Reps) new rows, where the value in the code column of each incremental row increases by 1
set(output,out_line + (j - 1L),1L, noletters[i,Start] + (j - 1L))
set(output,out_line + (j - 1L),2L, noletters[i,Class])
set(output,out_line + (j - 1L),3L, noletters[i,Label])
}
## Bump up the line to print from by the number of Replicate rows written
out_line <- as.integer(out_line + Reps)
}
########## Append leading zeros
## Keep output that contains data
output <- output[Code != 0]
## Convert code to a strong
output[, Code := as.character(Code)]
## Count number of leading zeros to insert
output[, Leading_Zeros := as.integer(5 - nchar(Code))]
## Fill leading zeros in Code column for the rows that require it
for (i in which(output$Leading_Zeros != 0))
set(output, i, 1L, paste(c(rep("0", times = output[i, Leading_Zeros]), output[i, Code]), collapse = ""))
## Drop leading zeros column
output[, Leading_Zeros := NULL]
########## Return output that contains data
return(output)
}
Crosswalk_Letters <- function(data){
########## Clean data
## Subset data to only contain ranges without letters
withletters <- data[grep("[A-Z]+", Range)]
## Remove single quote
withletters[, Range := gsub("'", "", Range)]
## Pull letter at beginning of line
withletters[grep("^[A-Z]", Range), At_Start := substr(Range, 1, 1)]
## Pull letter at end of line
withletters[grep("[A-Z]$", Range), At_End := str_sub(Range, -1)]
## Remove letters from range
withletters[, Range := gsub("[A-Z]", "", Range)]
## Split Range into Start and End
withletters[, c("Start", "End") := tstrsplit(Range, "-", fixed=TRUE)]
## Convert class to numeric
withletters[, Start := as.numeric(Start)][, End := as.numeric(End)]
########## Set up writing out
## Create new table to write codes to
output <- data.table(Code=rep(0,100000), Class=rep(0,100000), Label=rep("empty",100000), At_Start = rep("empty",100000), At_End = rep("empty",100000))
## Set line in output to write to
out_line <- 1L
########## Read through every line in data set
for(i in 1:nrow(withletters)){
## Calculate the number of times to repeat rows
Reps <- (withletters[i,End] - withletters[i,Start] + 1)
## And loop through to write new lines for each code
for(j in 1:Reps){
## The j index writes (Reps) new rows, where the value in the code column of each incremental row increases by 1
set(output,out_line + (j - 1L),1L, withletters[i,Start] + (j - 1L))
set(output,out_line + (j - 1L),2L, withletters[i,Class])
set(output,out_line + (j - 1L),3L, withletters[i,Label])
set(output,out_line + (j - 1L),4L, withletters[i,At_Start])
set(output,out_line + (j - 1L),5L, withletters[i,At_End])
}
## Bump up the line to print from by the number of Replicate rows written
out_line <- as.integer(out_line + Reps)
}
########## Drop output without data
output <- output[Code != 0]
########## Append leading zeros
## Convert code back to string
output[, Code := as.character(Code)]
## Count number of leading zeros to insert
output[, Leading_Zeros := as.integer(4 - nchar(Code))]
## Fill leading zeros in Code column for the rows that require it
for (i in which(output$Leading_Zeros != 0))
set(output, i, 1L, paste(c(rep("0", times = output[i, Leading_Zeros]), output[i, Code]), collapse = ""))
########## Replace letter at beginning or end
## Add letter at end
output[!is.na(At_End), Code := paste(Code, At_End, sep = "")]
## Add letter at start
output[!is.na(At_Start), Code := paste(At_Start, Code, sep = "")]
########## Remove extra columns
output[, c("At_Start", "At_End", "Leading_Zeros") := NULL]
## Return output
return(output)
}
crosswalk <- rbind(Crosswalk_NoLetters(data), Crosswalk_Letters(data))
print(crosswalk)
## Code Class Label
## 1: 00103 7 Deoderant
## 2: 00104 7 Deoderant
## 3: 00105 7 Deoderant
## 4: D0145 1 Soap
## 5: D0146 1 Soap
## 6: D0147 1 Soap
## 7: D0148 1 Soap
## 8: D0149 1 Soap
## 9: B0348 2 Shampoo
## 10: B0349 2 Shampoo
## 11: B0350 2 Shampoo
## 12: B0351 2 Shampoo
## 13: C0343 3 Conditioner
## 14: C0344 3 Conditioner
## 15: C0345 3 Conditioner
## 16: D0499 4 Laundry Detergent
## 17: D0500 4 Laundry Detergent
## 18: D0501 4 Laundry Detergent
## 19: D0502 4 Laundry Detergent
## 20: D0503 4 Laundry Detergent
## 21: D0504 4 Laundry Detergent
## 22: B0441 5 Flour
## 23: B0442 5 Flour
## 24: B0443 5 Flour
## 25: 0353F 6 Sugar
## 26: 0354F 6 Sugar
## 27: 0355F 6 Sugar
## 28: 0356F 6 Sugar
## 29: 0357F 6 Sugar
## 30: 0358F 6 Sugar
## 31: 0191B 8 Toothbrush
## 32: 0192B 8 Toothbrush
## 33: 0193B 8 Toothbrush
## 34: 0361G 9 Toothpaste
## 35: 0362G 9 Toothpaste
## 36: F0301 10 Floss
## 37: F0302 10 Floss
## 38: F0303 10 Floss
## Code Class Label