I put a lot of effort in to my first article to calculate the comorbidities of a patient according to the Charlson & Elixhauser scores. The available scripts were in SAS and Stata, as I started out using SPSS I decided to implement the code in the neat Python plugin that SPSS provides. In this post I’ll provide you with a detailed walk through of my code, and hopefully it will save you some time.
Update: I have created a new, easy-to-use package in R that has been thoroughly tested. You can find it on my github-page here (currently called comorbidities.icd10)
The code
The final code is rather simple, it finds the base dataset with the surgery codes, then it looks for the dataset with the ICD-codes, loops through them and adds the comorbidities to the base dataset:
BEGIN PROGRAM PYTHON .
import re
import spss
import spssdata
import datetime
print "\nStart calculation of charlsons with merge to main database"
base_dataset = FetchBaseDataSet(key_name, ['OPPAR', 'OPPDAT', 'DODDAT', 'DIAGNOS', 'DIAGRP', 'ICD', 'SNITT'])
if (base_dataset == ""):
raise Exception("You must activate the dataset containing the main data")
else:
print "Chosen the '{0}' for base table to merge upon".format(base_dataset)
print "Select file"
file_path = SelectOneFile("*.sav", "Select files with diagnostic variables used for Charlsons")
print "Get all the surgery dates from the base dataset ({0})".format(base_dataset)
all_surgery_dates = GetSurgeryDates(base_dataset, key_name, "OPPDAT", "DATREOP", debug)
print "Import file"
charlson_dataset_name = "ch_dataset_name"
ImportFile(file_path, charlson_dataset_name)
print "Get Charlson end Elixhausers data from file"
all_scores = GetDataFromSet(charlson_dataset_name, key_name, 'INDATUMA', 'UTDATUMA', all_surgery_dates, debug)
spss.Submit("DATASET CLOSE {0} .".format(charlson_dataset_name))
print "Set the Charlson's and Elixhauser's values for {0}:{1} subjects".format(len(all_scores["charlsons"]), len(all_scores["elixhausers"]))
SetScoreValues(base_dataset, key_name, 'OPPDAT', all_scores, debug)
print "Calculation of Charlson's comorbidity index and adding variables finished!"
END PROGRAM .
The above is of course just the main script, the true magic is in the myriad of functions. I have some functions that help me selecting the opened dataset of interest and the files that I want to open and extract data from, note the wx library needs to be installed and loaded, se the collapsed code block below.
def FetchBaseDataSet(key_name, base_dataset_vars):
"""
A function that looks for the dataset the contains the base data defined by the
base variables
"""
if ((key_name in base_dataset_vars) == False):
base_dataset_vars.extend([key_name])
# Get the dataset with the base data
ds_name = ""
tag,err = spssaux.createXmlOutput('Dataset Display',omsid='DatasetDisplay', subtype='Datasets')
dslist = spssaux.getValuesFromXmlWorkspace(tag, 'Datasets')
matching_datasets = []
matching_datasets_key_indexes = []
closest_match = []
for ds in dslist:
spss.Submit("DATASET ACTIVE {0} .".format(ds))
key_index = -1
variable_count = 0
matches = []
for i in range(spss.GetVariableCount()):
var_name = spss.GetVariableName(i)
if (re.match(key_name, var_name)):
key_index = i
if (var_name in base_dataset_vars):
variable_count += 1
matches.append(var_name)
if (len(closest_match) < len(matches)):
closest_match = matches
if (variable_count >= len(base_dataset_vars)):
matching_datasets.append(ds)
matching_datasets_key_indexes.append(key_index)
if (len(matching_datasets) > 1):
app = wx.PySimpleApp()
dlg = wx.SingleChoiceDialog(None, "Multiple possible datasets found", "Choose the one that you want to use as base dataset", matching_datasets)
selection = None
if (dlg.ShowModal() == wx.ID_OK):
selection = dlg.GetSelection()
dlg.Destroy()
app.Destroy()
if (selection == None):
raise Exception("You have to select at least one dataset")
ds_name = matching_datasets[selection]
key_index = matching_datasets_key_indexes[selection]
elif (len(matching_datasets) == 1):
ds_name = matching_datasets[0]
key_index = matching_datasets_key_indexes[0]
else:
missed_vars = []
print len(closest_match), len(base_dataset_vars)
for v in base_dataset_vars:
if ((v in closest_match) == False):
missed_vars.append(v)
print missed_vars
raise Exception("No dataset matching requested variables found in the {0} datasets, closest match didn't have variables: {1}".format(str(dslist), ", ".join(missed_vars)))
if (key_index == -1):
raise Exception("Key {0} not found although {0} seems to be the correct dataset".format(key_name, ds_name))
elif(key_index > 0 and re.match('18.0.2.100', spss.version.version)):
# Code to fix bug in 18.0.3 and prev TODO: Check for fix
spss.Submit("DATASET ACTIVE {0} .".format(ds_name))
spss.Submit("ADD FILES /FILE=* /KEEP {0} all .".format(key_name))
return ds_name
def SelectOneFile(fileWildcard, dialogMessage):
app = wx.PySimpleApp()
dlg = wx.FileDialog(None,
message=dialogMessage,
defaultDir=os.getcwd(),
defaultFile="",
wildcard=fileWildcard,
style=wx.FD_FILE_MUST_EXIST)
if (dlg.ShowModal() == wx.ID_OK):
file_path = dlg.GetPath()
else:
file_path = None
dlg.Destroy()
app.Destroy()
if (file_path == None):
raise ValueError("You have to choose a file")
return file_path
def ImportFile(file_path, ds_name):
import_cmd = u"GET FILE='{0}' .".format(file_path)
spss.Submit(import_cmd)
spss.Submit("CACHE.")
spss.Submit("EXECUTE.")
# Todo, add check for this part
spss.Submit("DATASET NAME {0} .".format(ds_name))
Now here’s the heart of the action is of course the extracting data part, some of the core functions can be found on the comorbidity page.
def GetDataFromSet(ds_name, key_name, in_date_name, out_date_name, surgery_dates, print_debug = False):
"""
Returns dictionary with ["charlsons"/"elixhausers"][key_name -> variable][admission_dates]
where admission_dates is "{0}<>{1}" between admission and discharge date formatted yyyy-mm-dd
Datetime: obj.strftime("%Y-%m-%d")
Here also the furthest back date is set, see days_b4_surgery_2_include variable
"""
# Set the time span x years b4 surgery
days_b4_surgery_2_include = 365.25*5
spss.Submit("DATASET ACTIVE {0} .".format(ds_name))
diagnosis_vars = []
for i in range(spss.GetVariableCount()):
var_name = spss.GetVariableName(i)
if (re.match('bdia', var_name, re.IGNORECASE)):
diagnosis_vars.append(var_name)
elif(var_name.strip().upper() == 'HDIA'):
diagnosis_vars.append(var_name)
if (len(diagnosis_vars) == 0):
raise Exception('Could not find any bdia variables, check that the file is open and active with the bdia variables')
else:
print "Using the variables {0} for calculating charlsons".format(", ".join(diagnosis_vars))
ind = [key_name, in_date_name, out_date_name]
# Due to errors in PAR where some in_date_name are empty use also the strings
# these have the same name, just add A to the end
for vn in diagnosis_vars:
ind.append(vn)
# Can't use cvtDates=(in_date_name, out_date_name, ) due to corrupt values in
# original dataset
with spssdata.Spssdata(indexes=ind, accessType='r') as cr:
all_scores = {"charlsons": {}, "charlsons_icd10": {},
"rs_charlsons": {}, "rs_charlsons_icd10": {},
"elixhausers": {}, "elixhausers_icd10": {}}
for row in cr:
key_var = row[0]
admission_date = False
date_string = row[1].strip()
if (len(date_string) == 8):
year = int(date_string[0:4])
month = int(date_string[4:6])
day = int(date_string[6:8])
try:
admission_date = datetime.datetime(year, month, day)
except Exception:
print "Warning: Faulty date: {0}-{1}-{2}".format(year, month, day)
admission_date = False
discharge_date = False
date_string = row[2].strip()
if (len(date_string) == 8):
year = int(date_string[0:4])
month = int(date_string[4:6])
day = int(date_string[6:8])
try:
discharge_date = datetime.datetime(year, month, day)
except Exception:
print "Warning: Faulty date: {0}-{1}-{2} for key {3}".format(year, month, day, key_var)
discharge_date = False
# This should actually always be true, although it doesn't really
# matter for the software,
# turns out that admission_date and discharge date are sometimes
# faulty in the original data, the string val is Ok but the datetime
# is a little funky, se prev. conversion of datetime
if (surgery_dates.has_key(key_var) and type(admission_date) == datetime.datetime and type(discharge_date) == datetime.datetime):
admission_dates = "{0}<>{1}".format(admission_date.strftime("%Y-%m-%d"), discharge_date.strftime("%Y-%m-%d"))
if (print_debug):
print "\nLooking for match with: {0} for the dates: {1}".format(key_var, admission_dates)
# Loop through all the surgery dates to see if the match the time
# period for the current studied admission
for surgery_date, reop_date in surgery_dates[key_var].iteritems():
time_delta = surgery_date - admission_date
if (time_delta.days >= 0 and time_delta.days <= days_b4_surgery_2_include):
# Check if the current admission is the one where the surgery occurred
same_admission_as_surgery = False
time_delta = surgery_date - discharge_date
if (time_delta.days <= 0):
same_admission_as_surgery = True
# Diseases that can't occur the same admission as the hip surgery admission
# since they were probably not preexisting but occurred at the admission, after surgery
acute_charlson_diseases = ["MI", "CEVD"]
# Slightly different from the Charlsons
acute_elixhausers_diseases = ["FLUID_EL", "B_LOSS"]
charlsons_parts = {}
charlsons_parts_icd10_parts = {}
rs_charlsons_parts = {}
rs_charlsons_icd10_parts = {}
elixhausers_parts = {}
elixhausers_parts_icd10_parts = {}
#Walk through the bdia variables
for i in range(len(ind) - len(diagnosis_vars), len(ind)):
#Empty row is not interesting
icd = row[i].strip()
if (icd != "" and icd != "M161" and icd != "715B"):
# Crucial for swedish ICD 9
icd = TranslateFromSwedishIcd9(icd)
# Check for icd version both for reasons of
# calculation speed and for subsets of only
# icd 10 calculations
# (RS Charlsons is only truly valid for icd 10)
if (IsIcd10Code(icd, surgery_date.year)):
icd_version = 10
else:
icd_version = 9
# Check for matching Charlsons codes
charlsons_ret = FindCharlsonsMatch(icd, icd_version, 2, print_debug)
if (len(charlsons_ret) > 0):
for ch_key, val_not_used in charlsons_ret.iteritems():
if ((ch_key in acute_charlson_diseases and same_admission_as_surgery) == False):
charlsons_parts[ch_key] = GetCharlsonWeight(ch_key)
if (icd_version == 10):
charlsons_parts_icd10_parts[ch_key] = 1
# Check for matching Royal Society Charlsons codes
rs_charlsons_ret = FindRoyalCharlsonsMatch(icd, icd_version, same_admission_as_surgery == False, False, print_debug)
if (len(rs_charlsons_ret) > 0):
for rs_ch_key, val_not_used in rs_charlsons_ret.iteritems():
#rs_charlsons_parts[rs_ch_key] = GetCharlsonWeight(ch_key)
# RCS discourages the weights from 1980:s
rs_charlsons_parts[rs_ch_key] = 1
# RS is originally designed only for ICD 10 codes
# and therefore this part detects wheather it's an
# icd 10 code or not
if (icd_version == 10):
rs_charlsons_icd10_parts[rs_ch_key] = 1
# Check for matching Elixhauser codes
elixhausers_ret = FindElixhausersMatch(icd, icd_version, print_debug)
if (len(elixhausers_ret) > 0):
for elix_key, val_not_used in elixhausers_ret.iteritems():
if ((elix_key in acute_elixhausers_diseases and same_admission_as_surgery) == False):
elixhausers_parts[elix_key] = GetElixhausersWeight(elix_key)
if (icd_version == 10):
elixhausers_parts_icd10_parts[elix_key] = 1
if (print_debug):
print "{0} - {1} admission dates to compare with surgery date = {2}".format(admission_date.strftime("%Y-%m-%d"), discharge_date.strftime("%Y-%m-%d"), surgery_date.strftime("%Y-%m-%d"))
print "Charlson score: {0}".format(sum([i for i in charlsons_parts.values()]))
print "Charlson ICD 10 score: {0}".format(sum([i for i in charlsons_parts_icd10_parts.values()]))
print "RS Charlson score: {0}".format(sum([i for i in rs_charlsons_parts.values()]))
print "RS Charlson ICD 10 score: {0}".format(sum([i for i in rs_charlsons_icd10_parts.values()]))
print "Elixhauser score: {0}".format(sum([i for i in elixhausers_parts.values()]))
print "Elixhauser ICD 10 score: {0}".format(sum([i for i in elixhausers_parts_icd10_parts.values()]))
# Save Charlsons values
if (len(charlsons_parts) > 0):
if (all_scores["charlsons"].has_key(key_var)):
all_scores["charlsons"][key_var][admission_dates] = charlsons_parts
else:
all_scores["charlsons"][key_var] = {admission_dates: charlsons_parts}
if (len(charlsons_parts_icd10_parts) > 0):
if (all_scores["charlsons_icd10"].has_key(key_var)):
all_scores["charlsons_icd10"][key_var][admission_dates] = charlsons_parts_icd10_parts
else:
all_scores["charlsons_icd10"][key_var] = {admission_dates: charlsons_parts_icd10_parts}
# Save Royal Society Charlsons values
if (len(rs_charlsons_parts) > 0):
if (all_scores["rs_charlsons"].has_key(key_var)):
all_scores["rs_charlsons"][key_var][admission_dates] = rs_charlsons_parts
else:
all_scores["rs_charlsons"][key_var] = {admission_dates: rs_charlsons_parts}
# Save Royal Society Charlsons ICD 10 values
if (len(rs_charlsons_icd10_parts) > 0):
if (all_scores["rs_charlsons_icd10"].has_key(key_var)):
all_scores["rs_charlsons_icd10"][key_var][admission_dates] = rs_charlsons_icd10_parts
else:
all_scores["rs_charlsons_icd10"][key_var] = {admission_dates: rs_charlsons_icd10_parts}
# Save Elixhausers values
if (len(elixhausers_parts) > 0):
if (all_scores["elixhausers"].has_key(key_var)):
all_scores["elixhausers"][key_var][admission_dates] = elixhausers_parts
else:
all_scores["elixhausers"][key_var] = {admission_dates: elixhausers_parts}
if (len(elixhausers_parts_icd10_parts) > 0):
if (all_scores["elixhausers_icd10"].has_key(key_var)):
all_scores["elixhausers_icd10"][key_var][admission_dates] = elixhausers_parts_icd10_parts
else:
all_scores["elixhausers_icd10"][key_var] = {admission_dates: elixhausers_parts_icd10_parts}
# If time delta was OK then don't rerun this for the other hip
break;
return all_scores
Then for saving the data into the base dataset I have this function:
def SetScoreValues(ds_name, key_name, surgery_date_name, all_scores, print_debug = False):
charlson_prefix = "Ch_"
charlson_score_var_name = "CCI_"
charlson_icd10_prefix = "Ch_ICD10_"
charlson_icd10_score_var_name = "CCI_ICD10_"
rs_charlson_prefix = "RS_Ch_"
rs_charlson_score_var_name = "RS_CCI_"
rs_charlson_icd10_prefix = "RS_Ch_ICD10_"
rs_charlson_icd10_score_var_name = "RS_CCI_ICD10_"
elixhausers_prefix = "Elx_"
elixhausers_score_var_name = "ElxCI_"
elixhausers_icd10_prefix = "Elx_ICD10_"
elixhausers_icd10_score_var_name = "ElxCI_ICD10_"
# The years is also dependant on GetDataFromSet
years = [1, 2, 5]
spss.Submit("DATASET ACTIVE {0} .".format(ds_name))
# Remove the variable if it already exists from a previous run
# For safety use only in debug mode
if (print_debug or True):
vars_to_delete = ""
for i in range(spss.GetVariableCount()):
var_name = spss.GetVariableName(i).strip()
if (re.match(charlson_prefix, var_name)
or re.match(charlson_icd10_prefix, var_name)
or re.match(rs_charlson_prefix, var_name)
or re.match(rs_charlson_icd10_prefix, var_name)
or re.match(elixhausers_prefix, var_name)
or re.match(elixhausers_icd10_prefix, var_name)
or var_name[0:len(charlson_score_var_name)] == charlson_score_var_name
or var_name[0:len(charlson_icd10_score_var_name)] == charlson_icd10_score_var_name
or var_name[0:len(rs_charlson_score_var_name)] == rs_charlson_score_var_name
or var_name[0:len(rs_charlson_icd10_score_var_name)] == rs_charlson_icd10_score_var_name
or var_name[0:len(elixhausers_score_var_name)] == elixhausers_score_var_name
or var_name[0:len(elixhausers_icd10_score_var_name)] == elixhausers_icd10_score_var_name):
if (vars_to_delete == ""):
vars_to_delete = var_name
else:
vars_to_delete += ", {0}".format(var_name)
if (len(vars_to_delete) > 0):
print "Warning deleting variables: {0}".format(vars_to_delete)
spss.Submit("DELETE VARIABLES {0}.".format(vars_to_delete));
with spssdata.Spssdata(indexes=[key_name, surgery_date_name], cvtDates=(surgery_date_name, ), accessType='w') as cr:
# Prepare variables, old vars should have been deleted
for year in years:
# Prepare Charlsons variables
general_prefix = "{0}{1}yr_".format(charlson_prefix, year)
main_prefix = "{0}{1}yr".format(charlson_score_var_name, year)
vdefs = GetCharlsonVdef(general_prefix, main_prefix)
for vd in vdefs:
cr.append(vd)
general_prefix = "{0}{1}yr_".format(charlson_icd10_prefix, year)
main_prefix = "{0}{1}yr".format(charlson_icd10_score_var_name, year)
vdefs = GetCharlsonVdef(general_prefix, main_prefix)
for vd in vdefs:
cr.append(vd)
# Prepare RS Charlsons variables
general_prefix = "{0}{1}yr_".format(rs_charlson_prefix, year)
main_prefix = "{0}{1}yr".format(rs_charlson_score_var_name, year)
vdefs = GetRCSharlsonVdef(general_prefix, main_prefix)
for vd in vdefs:
cr.append(vd)
# Prepare RS Charlsons Icd 10 variables
general_prefix = "{0}{1}yr_".format(rs_charlson_icd10_prefix, year)
main_prefix = "{0}{1}yr".format(rs_charlson_icd10_score_var_name, year)
vdefs = GetRCSharlsonVdef(general_prefix, main_prefix)
for vd in vdefs:
cr.append(vd)
# Prepare Elixhausers variables
general_prefix = "{0}{1}yr_".format(elixhausers_prefix, year)
main_prefix = "{0}{1}yr".format(elixhausers_score_var_name, year)
vdefs = GetElixhausersVdef(general_prefix, main_prefix)
for vd in vdefs:
cr.append(vd)
general_prefix = "{0}{1}yr_".format(elixhausers_icd10_prefix, year)
main_prefix = "{0}{1}yr".format(elixhausers_icd10_score_var_name, year)
vdefs = GetElixhausersVdef(general_prefix, main_prefix)
for vd in vdefs:
cr.append(vd)
# Commit the prepared variables
cr.commitdict()
for row in cr:
key_var = row[0]
surgery_date = row[1]
if (print_debug):
print "\n-_ Check key: {0} _-".format(key_var)
if (all_scores["charlsons"].has_key(key_var)):
subject_scores = all_scores["charlsons"][key_var]
if (print_debug):
print "* Charlsons *"
print "Scores: {0}".format(subject_scores)
SetCharlsonScoreValues(cr, subject_scores, surgery_date,
charlson_score_var_name, charlson_prefix, years)
else:
for y in years:
cr.setvalue("{0}{1}yr".format(charlson_score_var_name, y), 0)
if (all_scores["charlsons_icd10"].has_key(key_var)):
subject_scores = all_scores["charlsons_icd10"][key_var]
if (print_debug):
print "* Charlsons ICD 10 *"
print "Scores: {0}".format(subject_scores)
SetCharlsonScoreValues(cr, subject_scores, surgery_date,
charlson_icd10_score_var_name, charlson_icd10_prefix, years)
else:
for y in years:
cr.setvalue("{0}{1}yr".format(charlson_icd10_score_var_name, y), 0)
if (all_scores["rs_charlsons"].has_key(key_var)):
subject_scores = all_scores["rs_charlsons"][key_var]
if (print_debug):
print "* RS Charlsons *"
print "Scores: {0}".format(subject_scores)
SetRCSharlsonScoreValues(cr, subject_scores, surgery_date,
rs_charlson_score_var_name, rs_charlson_prefix, years)
else:
for y in years:
cr.setvalue("{0}{1}yr".format(rs_charlson_score_var_name, y), 0)
if (all_scores["rs_charlsons_icd10"].has_key(key_var)):
subject_scores = all_scores["rs_charlsons_icd10"][key_var]
if (print_debug):
print "* RS Charlsons ICD 10*"
print "Scores: {0}".format(subject_scores)
SetRCSharlsonScoreValues(cr, subject_scores, surgery_date,
rs_charlson_icd10_score_var_name, rs_charlson_icd10_prefix, years)
else:
for y in years:
cr.setvalue("{0}{1}yr".format(rs_charlson_icd10_score_var_name, y), 0)
if (all_scores["elixhausers"].has_key(key_var)):
subject_scores = all_scores["elixhausers"][key_var]
if (print_debug):
print "* Elixhausers *"
print "Scores: {0}".format(subject_scores)
SetElixhausersScoreValues(cr, subject_scores, surgery_date,
elixhausers_score_var_name, elixhausers_prefix, years)
else:
for y in years:
cr.setvalue("{0}{1}yr".format(elixhausers_score_var_name, y), 0)
if (all_scores["elixhausers_icd10"].has_key(key_var)):
subject_scores = all_scores["elixhausers_icd10"][key_var]
if (print_debug):
print "* Elixhausers ICD 10 *"
print "Scores: {0}".format(subject_scores)
SetElixhausersScoreValues(cr, subject_scores, surgery_date,
elixhausers_icd10_score_var_name, elixhausers_icd10_prefix, years)
else:
for y in years:
cr.setvalue("{0}{1}yr".format(elixhausers_icd10_score_var_name, y), 0)
# Commit all scores
cr.CommitCase()
for i in range(spss.GetVariableCount()):
var_name = spss.GetVariableName(i)
if (re.match(charlson_icd10_prefix, var_name)):
spss.Submit("IF MISSING({0}) {0} = 0.".format(var_name))
# Change if 10 or more years
weight = GetCharlsonWeight(var_name[len("{0}Xyr_".format(charlson_icd10_prefix)):])
spss.Submit("VALUE LABELS {0} 0 'No' {1} 'Yes'.".format(var_name, weight))
spss.Submit("VARIABLE LEVEL {0} (NOMINAL) .".format(var_name))
elif (re.match(charlson_prefix, var_name)):
spss.Submit("IF MISSING({0}) {0} = 0.".format(var_name))
# Change if 10 or more years
weight = GetCharlsonWeight(var_name[len("{0}Xyr_".format(charlson_prefix)):])
spss.Submit("VALUE LABELS {0} 0 'No' {1} 'Yes'.".format(var_name, weight))
spss.Submit("VARIABLE LEVEL {0} (NOMINAL) .".format(var_name))
elif (re.match(rs_charlson_icd10_prefix, var_name)):
spss.Submit("IF MISSING({0}) {0} = 0.".format(var_name))
weight = 1
spss.Submit("VALUE LABELS {0} 0 'No' {1} 'Yes'.".format(var_name, weight))
spss.Submit("VARIABLE LEVEL {0} (NOMINAL) .".format(var_name))
elif (re.match(rs_charlson_prefix, var_name)):
spss.Submit("IF MISSING({0}) {0} = 0.".format(var_name))
weight = 1
spss.Submit("VALUE LABELS {0} 0 'No' {1} 'Yes'.".format(var_name, weight))
spss.Submit("VARIABLE LEVEL {0} (NOMINAL) .".format(var_name))
elif (re.match(elixhausers_icd10_prefix, var_name)):
spss.Submit("IF MISSING({0}) {0} = 0.".format(var_name))
weight = GetElixhausersWeight(var_name[len("{0}Xyr_".format(elixhausers_icd10_prefix)):])
spss.Submit("VALUE LABELS {0} 0 'No' {1} 'Yes'.".format(var_name, weight))
spss.Submit("VARIABLE LEVEL {0} (NOMINAL) .".format(var_name))
elif (re.match(elixhausers_prefix, var_name)):
spss.Submit("IF MISSING({0}) {0} = 0.".format(var_name))
weight = GetElixhausersWeight(var_name[len("{0}Xyr_".format(elixhausers_prefix)):])
spss.Submit("VALUE LABELS {0} 0 'No' {1} 'Yes'.".format(var_name, weight))
spss.Submit("VARIABLE LEVEL {0} (NOMINAL) .".format(var_name))
spss.Submit("EXECUTE.")
Now I have a few more helper functions for setting the ICD-scores to the given variables:
def GetDaysB4Surgery(surgery_date, admission_dates):
"""
returns days between admission and surgery date
The value is positive if admission occured b4
the surgery, surgery_date - admission date
"""
# Get the admission time
adm_times = admission_dates.split("<>")[0].split("-")
admission_date = datetime.datetime(int(adm_times[0]), int(adm_times[1]), int(adm_times[2]))
# check how admissino relates to the surgery date
time_delta = surgery_date - admission_date
return time_delta.days
def SetCharlsonScoreValues(cr, subject_scores, surgery_date, main_score_name, general_score_name, years):
return _SetIcdScoreValues(cr, subject_scores, surgery_date, main_score_name, general_score_name, years, "Charlsons")
def SetRCSharlsonScoreValues(cr, subject_scores, surgery_date, main_score_name, general_score_name, years):
return _SetIcdScoreValues(cr, subject_scores, surgery_date, main_score_name, general_score_name, years, "RS Charlsons")
def SetElixhausersScoreValues(cr, subject_scores, surgery_date, main_score_name, general_score_name, years):
return _SetIcdScoreValues(cr, subject_scores, surgery_date, main_score_name, general_score_name, years, "Elixhausers")
def _SetIcdScoreValues(cr, subject_scores, surgery_date, main_score_name, general_score_name, years, score_type):
# Create a variable for storing all the
# scores, also add the different years
combined_parts = {}
for y in years:
combined_parts[y] = {}
# Combine the different scores into one
# All scores are now date valid, moved previous check to GetDataFromSet
for admission_dates, score_parts in subject_scores.iteritems():
delta_days = GetDaysB4Surgery(surgery_date, admission_dates)
for y in years:
if (delta_days >= 0 and delta_days <= y*365.25):
for key_disease, weight in score_parts.iteritems():
combined_parts[y][key_disease] = weight
for y in years:
#Clean score with hierarchy
if (score_type == "Charlsons"):
combined_parts[y] = Adjust4CharlsonHierarchy(combined_parts[y])
elif (score_type == "RS Charlsons"):
combined_parts[y] = Adjust4RCSharlsonHierarchy(combined_parts[y])
elif (score_type == "Elixhausers"):
combined_parts[y] = Adjust4ElixhausersHierarchy(combined_parts[y])
else:
raise Exception("Invalid score_type {0}".format(score_type))
# Calculate the sum for elixhausers by adding all the parts
combined_score = sum([i for i in combined_parts[y].values()])
cr.setvalue("{0}{1}yr".format(main_score_name, y), combined_score)
for key_disease, score in combined_parts[y].iteritems():
if (key_disease != main_score_name):
try:
vn = "{0}{1}yr_{2}".format(general_score_name, y, key_disease)
cr.setvalue(vn, score)
except Exception:
print "Error when seeting: ", vn
That's all folks! It's a lot of code but the functions are fairly well contained and their names should hopefully be self-explanatory. If you are still confused don't hesitate to post a question and I'll try to clarify.
The python SPSS plugin - a few words from experience
The SPSS python plugin is a brilliant addition to the software, and if used with the spss.Spssdata() function it is actually rather fast (stay away from the Datastep alternative). Although Jon Peck at IBM has been really helpful and quick at answering my questions, I get the feeling that IBM is not full-heartedly supporting his efforts. For instance the plugin isn't installed by default, it can be a little tricky to find on the web-page, and there is no syntax-highlighting/code folding support in SPSS (I use Eclipse for developing the code). My conclusion is therefore that for the future projects I won't use SPSS, just pure Python/R.