Extracting comorbidities from a database in SPSS

Using large databases for extracting data can be cumbersome, fortunately it's more reliable than sifting for gold. The image is CC by Won-Tolla.

Using large databases for extracting data can be cumbersome, fortunately it’s more reliable than sifting for gold. The image is CC by Won-Tolla.

I put a lot of effort in to my first article to calculate the comorbidities of a patient according to the Charlson & Elixhauser scores. The available scripts were in SAS and Stata, as I started out using SPSS I decided to implement the code in the neat Python plugin that SPSS provides. In this post I’ll provide you with a detailed walk through of my code, and hopefully it will save you some time.

Update: I have created a new, easy-to-use package in R that has been thoroughly tested. You can find it on my github-page here (currently called comorbidities.icd10)

The code

The final code is rather simple, it finds the base dataset with the surgery codes, then it looks for the dataset with the ICD-codes, loops through them and adds the comorbidities to the base dataset:

BEGIN PROGRAM PYTHON .
import re
import spss
import spssdata
import datetime

print "\nStart calculation of charlsons with merge to main database"

base_dataset = FetchBaseDataSet(key_name, ['OPPAR', 'OPPDAT', 'DODDAT', 'DIAGNOS', 'DIAGRP', 'ICD', 'SNITT'])
if (base_dataset == ""):
    raise Exception("You must activate the dataset containing the main data")
else:
    print "Chosen the '{0}' for base table to merge upon".format(base_dataset)

print "Select file"
file_path = SelectOneFile("*.sav", "Select files with diagnostic variables used for Charlsons")

print "Get all the surgery dates from the base dataset ({0})".format(base_dataset)
all_surgery_dates = GetSurgeryDates(base_dataset, key_name, "OPPDAT", "DATREOP", debug)

print "Import file"
charlson_dataset_name = "ch_dataset_name"
ImportFile(file_path, charlson_dataset_name)

print "Get Charlson end Elixhausers data from file"
all_scores = GetDataFromSet(charlson_dataset_name, key_name, 'INDATUMA', 'UTDATUMA', all_surgery_dates, debug)
spss.Submit("DATASET CLOSE {0} .".format(charlson_dataset_name))

print "Set the Charlson's and Elixhauser's values for {0}:{1} subjects".format(len(all_scores["charlsons"]), len(all_scores["elixhausers"]))
SetScoreValues(base_dataset, key_name, 'OPPDAT', all_scores, debug)

print "Calculation of Charlson's comorbidity index and adding variables finished!"
END PROGRAM .

The above is of course just the main script, the true magic is in the myriad of functions. I have some functions that help me selecting the opened dataset of interest and the files that I want to open and extract data from, note the wx library needs to be installed and loaded, se the collapsed code block below.

def FetchBaseDataSet(key_name, base_dataset_vars):
    """
    A function that looks for the dataset the contains the base data defined by the
    base variables
    """

    if ((key_name in base_dataset_vars) == False):
        base_dataset_vars.extend([key_name])
        
    # Get the dataset with the base data
    ds_name = ""
    tag,err = spssaux.createXmlOutput('Dataset Display',omsid='DatasetDisplay', subtype='Datasets')
    dslist = spssaux.getValuesFromXmlWorkspace(tag, 'Datasets')

    matching_datasets = []
    matching_datasets_key_indexes = []
    closest_match = []
    for ds in dslist:
        spss.Submit("DATASET ACTIVE {0} .".format(ds))
        key_index = -1
        variable_count = 0
        matches = []
        for i in range(spss.GetVariableCount()):
            var_name = spss.GetVariableName(i)
            if (re.match(key_name, var_name)):
                key_index = i

            if (var_name in base_dataset_vars):
                variable_count += 1
                matches.append(var_name)

        if (len(closest_match) < len(matches)):
            closest_match = matches

        if (variable_count >= len(base_dataset_vars)):
            matching_datasets.append(ds)
            matching_datasets_key_indexes.append(key_index)

    if (len(matching_datasets) > 1):
        app = wx.PySimpleApp()
        dlg = wx.SingleChoiceDialog(None, "Multiple possible datasets found", "Choose the one that you want to use as base dataset", matching_datasets)

        selection = None
        if (dlg.ShowModal() == wx.ID_OK):
            selection = dlg.GetSelection()

        dlg.Destroy()
        app.Destroy()

        if (selection == None):
            raise Exception("You have to select at least one dataset")

        ds_name = matching_datasets[selection]
        key_index = matching_datasets_key_indexes[selection]
    elif (len(matching_datasets) == 1):
        ds_name = matching_datasets[0]
        key_index = matching_datasets_key_indexes[0]
    else:
        missed_vars = []
        print len(closest_match), len(base_dataset_vars)
        for v in base_dataset_vars:
            if ((v in closest_match) == False):
                missed_vars.append(v)
        print missed_vars
        raise Exception("No dataset matching requested variables found in the {0} datasets, closest match didn't have variables: {1}".format(str(dslist), ", ".join(missed_vars)))

    if (key_index == -1):
        raise Exception("Key {0} not found although {0} seems to be the correct dataset".format(key_name, ds_name))
    elif(key_index > 0 and re.match('18.0.2.100', spss.version.version)):
        # Code to fix bug in 18.0.3 and prev TODO: Check for fix
        spss.Submit("DATASET ACTIVE {0} .".format(ds_name))
        spss.Submit("ADD FILES /FILE=* /KEEP {0} all .".format(key_name))

    return ds_name

def SelectOneFile(fileWildcard, dialogMessage):
    app = wx.PySimpleApp()
    dlg = wx.FileDialog(None,
       message=dialogMessage,
        defaultDir=os.getcwd(),
        defaultFile="",
        wildcard=fileWildcard,
        style=wx.FD_FILE_MUST_EXIST)
    if (dlg.ShowModal() == wx.ID_OK):
        file_path = dlg.GetPath()
    else:
        file_path = None
    dlg.Destroy()
    app.Destroy()

    if (file_path == None):
        raise ValueError("You have to choose a file")

    return file_path

def ImportFile(file_path, ds_name):
    import_cmd = u"GET FILE='{0}' .".format(file_path)
    spss.Submit(import_cmd)
    spss.Submit("CACHE.")
    spss.Submit("EXECUTE.")
    # Todo, add check for this part
    spss.Submit("DATASET NAME {0} .".format(ds_name))

Now here’s the heart of the action is of course the extracting data part, some of the core functions can be found on the comorbidity page.

def GetDataFromSet(ds_name, key_name, in_date_name, out_date_name, surgery_dates, print_debug = False):
    """
    Returns dictionary with ["charlsons"/"elixhausers"][key_name -> variable][admission_dates]
    where admission_dates is "{0}<>{1}" between admission and discharge date formatted yyyy-mm-dd
    Datetime: obj.strftime("%Y-%m-%d")

    Here also the furthest back date is set, see days_b4_surgery_2_include variable
    """
    # Set the time span x years b4 surgery
    days_b4_surgery_2_include = 365.25*5
    spss.Submit("DATASET ACTIVE {0} .".format(ds_name))
    diagnosis_vars = []
    for i in range(spss.GetVariableCount()):
        var_name = spss.GetVariableName(i)
        if (re.match('bdia', var_name, re.IGNORECASE)):
            diagnosis_vars.append(var_name)
        elif(var_name.strip().upper() == 'HDIA'):
            diagnosis_vars.append(var_name)

    if (len(diagnosis_vars) == 0):
        raise Exception('Could not find any bdia variables, check that the file is open and active with the bdia variables')
    else:
        print "Using the variables {0} for calculating charlsons".format(", ".join(diagnosis_vars))

    ind = [key_name, in_date_name, out_date_name]
    # Due to errors in PAR where some in_date_name are empty use also the strings
    # these have the same name, just add A to the end

    for vn in diagnosis_vars:
        ind.append(vn)

    # Can't use cvtDates=(in_date_name, out_date_name, ) due to corrupt values in
    # original dataset
    with spssdata.Spssdata(indexes=ind, accessType='r') as cr:
        all_scores = {"charlsons": {}, "charlsons_icd10": {}, 
                      "rs_charlsons": {}, "rs_charlsons_icd10": {}, 
                      "elixhausers": {}, "elixhausers_icd10": {}}
        for row in cr:
            key_var = row[0]
            admission_date = False
            date_string = row[1].strip()
            if (len(date_string) == 8):
                year = int(date_string[0:4])
                month = int(date_string[4:6])
                day = int(date_string[6:8])
                try:
                    admission_date = datetime.datetime(year, month, day)
                except Exception:
                    print "Warning: Faulty date: {0}-{1}-{2}".format(year, month, day)
                    admission_date = False

            discharge_date = False
            date_string = row[2].strip()
            if (len(date_string) == 8):
                year = int(date_string[0:4])
                month = int(date_string[4:6])
                day = int(date_string[6:8])
                try:
                    discharge_date = datetime.datetime(year, month, day)
                except Exception:
                    print "Warning: Faulty date: {0}-{1}-{2} for key {3}".format(year, month, day, key_var)
                    discharge_date = False
            
            # This should actually always be true, although it doesn't really
            # matter for the software,
            # turns out that admission_date and discharge date are sometimes
            # faulty in the original data, the string val is Ok but the datetime
            # is a little funky, se prev. conversion of datetime
            if (surgery_dates.has_key(key_var) and type(admission_date) == datetime.datetime and type(discharge_date) == datetime.datetime):
                admission_dates = "{0}<>{1}".format(admission_date.strftime("%Y-%m-%d"), discharge_date.strftime("%Y-%m-%d"))

                if (print_debug):
                    print "\nLooking for match with: {0} for the dates: {1}".format(key_var, admission_dates)

                # Loop through all the surgery dates to see if the match the time
                # period for the current studied admission
                for surgery_date, reop_date in surgery_dates[key_var].iteritems():

                    time_delta = surgery_date - admission_date
                    if (time_delta.days >= 0 and time_delta.days <= days_b4_surgery_2_include):

                        # Check if the current admission is the one where the surgery occurred
                        same_admission_as_surgery = False
                        time_delta = surgery_date - discharge_date
                        if (time_delta.days <= 0):
                            same_admission_as_surgery = True

                        # Diseases that can't occur the same admission as the hip surgery admission
                        # since they were probably not preexisting but occurred at the admission, after surgery
                        acute_charlson_diseases = ["MI", "CEVD"]
                        # Slightly different from the Charlsons
                        acute_elixhausers_diseases = ["FLUID_EL", "B_LOSS"]

                        charlsons_parts = {}
                        charlsons_parts_icd10_parts = {}
                        rs_charlsons_parts = {}
                        rs_charlsons_icd10_parts = {}
                        elixhausers_parts = {}
                        elixhausers_parts_icd10_parts = {}

                        #Walk through the bdia variables
                        for i in range(len(ind) - len(diagnosis_vars), len(ind)):
                            #Empty row is not interesting
                            icd = row[i].strip()
                            if (icd != "" and icd != "M161" and icd != "715B"):

                                # Crucial for swedish ICD 9
                                icd = TranslateFromSwedishIcd9(icd)

                                # Check for icd version both for reasons of 
                                # calculation speed and for subsets of only 
                                # icd 10 calculations 
                                # (RS Charlsons is only truly valid for icd 10)  
                                if (IsIcd10Code(icd, surgery_date.year)):
                                    icd_version = 10
                                else:
                                    icd_version = 9

                                # Check for matching Charlsons codes
                                charlsons_ret = FindCharlsonsMatch(icd, icd_version, 2, print_debug)
                                if (len(charlsons_ret) > 0):
                                    for ch_key, val_not_used in charlsons_ret.iteritems():
                                        if ((ch_key in acute_charlson_diseases and same_admission_as_surgery) == False):
                                            charlsons_parts[ch_key] = GetCharlsonWeight(ch_key)

                                            if (icd_version == 10):
                                                charlsons_parts_icd10_parts[ch_key] = 1

                                # Check for matching Royal Society Charlsons codes
                                rs_charlsons_ret = FindRoyalCharlsonsMatch(icd, icd_version, same_admission_as_surgery == False, False, print_debug)
                                

                                if (len(rs_charlsons_ret) > 0):
                                    for rs_ch_key, val_not_used in rs_charlsons_ret.iteritems():
                                        #rs_charlsons_parts[rs_ch_key] = GetCharlsonWeight(ch_key)
                                        # RCS discourages the weights from 1980:s 
                                        rs_charlsons_parts[rs_ch_key] = 1
                                        
                                        # RS is originally designed only for ICD 10 codes
                                        # and therefore this part detects wheather it's an
                                        # icd 10 code or not 
                                        if (icd_version == 10):
                                            rs_charlsons_icd10_parts[rs_ch_key] = 1


                                # Check for matching Elixhauser codes
                                elixhausers_ret = FindElixhausersMatch(icd, icd_version, print_debug)
                                if (len(elixhausers_ret) > 0):
                                    for elix_key, val_not_used in elixhausers_ret.iteritems():
                                        if ((elix_key in acute_elixhausers_diseases and same_admission_as_surgery) == False):
                                            elixhausers_parts[elix_key] = GetElixhausersWeight(elix_key)

                                            if (icd_version == 10):
                                                elixhausers_parts_icd10_parts[elix_key] = 1

                        if (print_debug):
                            print "{0} - {1} admission dates to compare with surgery date = {2}".format(admission_date.strftime("%Y-%m-%d"), discharge_date.strftime("%Y-%m-%d"), surgery_date.strftime("%Y-%m-%d"))
                            print "Charlson score: {0}".format(sum([i for i in charlsons_parts.values()]))
                            print "Charlson  ICD 10 score: {0}".format(sum([i for i in charlsons_parts_icd10_parts.values()]))
                            print "RS Charlson score: {0}".format(sum([i for i in rs_charlsons_parts.values()]))
                            print "RS Charlson ICD 10 score: {0}".format(sum([i for i in rs_charlsons_icd10_parts.values()]))
                            print "Elixhauser score: {0}".format(sum([i for i in elixhausers_parts.values()]))
                            print "Elixhauser ICD 10 score: {0}".format(sum([i for i in elixhausers_parts_icd10_parts.values()]))

                        # Save Charlsons values
                        if (len(charlsons_parts) > 0):
                            if (all_scores["charlsons"].has_key(key_var)):
                                all_scores["charlsons"][key_var][admission_dates] = charlsons_parts
                            else:
                                all_scores["charlsons"][key_var] = {admission_dates: charlsons_parts}

                        if (len(charlsons_parts_icd10_parts) > 0):
                            if (all_scores["charlsons_icd10"].has_key(key_var)):
                                all_scores["charlsons_icd10"][key_var][admission_dates] = charlsons_parts_icd10_parts
                            else:
                                all_scores["charlsons_icd10"][key_var] = {admission_dates: charlsons_parts_icd10_parts}

                        # Save Royal Society Charlsons values
                        if (len(rs_charlsons_parts) > 0):
                            if (all_scores["rs_charlsons"].has_key(key_var)):
                                all_scores["rs_charlsons"][key_var][admission_dates] = rs_charlsons_parts
                            else:
                                all_scores["rs_charlsons"][key_var] = {admission_dates: rs_charlsons_parts}

                        # Save Royal Society Charlsons ICD 10 values
                        if (len(rs_charlsons_icd10_parts) > 0):
                            if (all_scores["rs_charlsons_icd10"].has_key(key_var)):
                                all_scores["rs_charlsons_icd10"][key_var][admission_dates] = rs_charlsons_icd10_parts
                            else:
                                all_scores["rs_charlsons_icd10"][key_var] = {admission_dates: rs_charlsons_icd10_parts}

                        # Save Elixhausers values
                        if (len(elixhausers_parts) > 0):
                            if (all_scores["elixhausers"].has_key(key_var)):
                                all_scores["elixhausers"][key_var][admission_dates] = elixhausers_parts
                            else:
                                all_scores["elixhausers"][key_var] = {admission_dates: elixhausers_parts}

                        if (len(elixhausers_parts_icd10_parts) > 0):
                            if (all_scores["elixhausers_icd10"].has_key(key_var)):
                                all_scores["elixhausers_icd10"][key_var][admission_dates] = elixhausers_parts_icd10_parts
                            else:
                                all_scores["elixhausers_icd10"][key_var] = {admission_dates: elixhausers_parts_icd10_parts}

                        # If time delta was OK then don't rerun this for the other hip
                        break;
    return all_scores

Then for saving the data into the base dataset I have this function:

def SetScoreValues(ds_name, key_name, surgery_date_name, all_scores, print_debug = False):
    charlson_prefix = "Ch_"
    charlson_score_var_name = "CCI_"
    charlson_icd10_prefix = "Ch_ICD10_"
    charlson_icd10_score_var_name = "CCI_ICD10_"
    rs_charlson_prefix = "RS_Ch_"
    rs_charlson_score_var_name = "RS_CCI_"
    rs_charlson_icd10_prefix = "RS_Ch_ICD10_"
    rs_charlson_icd10_score_var_name = "RS_CCI_ICD10_"
    elixhausers_prefix = "Elx_"
    elixhausers_score_var_name = "ElxCI_"
    elixhausers_icd10_prefix = "Elx_ICD10_"
    elixhausers_icd10_score_var_name = "ElxCI_ICD10_"
    # The years is also dependant on GetDataFromSet
    years = [1, 2, 5]

    spss.Submit("DATASET ACTIVE {0} .".format(ds_name))
    
    # Remove the variable if it already exists from a previous run
    # For safety use only in debug mode
    if (print_debug or True):
        vars_to_delete = ""
        for i in range(spss.GetVariableCount()):
            var_name = spss.GetVariableName(i).strip()
            if (re.match(charlson_prefix, var_name)
                or re.match(charlson_icd10_prefix, var_name)
                or re.match(rs_charlson_prefix, var_name)
                or re.match(rs_charlson_icd10_prefix, var_name)
                or re.match(elixhausers_prefix, var_name)
                or re.match(elixhausers_icd10_prefix, var_name)
                or var_name[0:len(charlson_score_var_name)] == charlson_score_var_name
                or var_name[0:len(charlson_icd10_score_var_name)] == charlson_icd10_score_var_name
                or var_name[0:len(rs_charlson_score_var_name)] == rs_charlson_score_var_name
                or var_name[0:len(rs_charlson_icd10_score_var_name)] == rs_charlson_icd10_score_var_name
                or var_name[0:len(elixhausers_score_var_name)] == elixhausers_score_var_name
                or var_name[0:len(elixhausers_icd10_score_var_name)] == elixhausers_icd10_score_var_name):
                if (vars_to_delete == ""):
                    vars_to_delete = var_name
                else:
                    vars_to_delete += ", {0}".format(var_name)

        if (len(vars_to_delete) > 0):
            print "Warning deleting variables: {0}".format(vars_to_delete)
            spss.Submit("DELETE VARIABLES {0}.".format(vars_to_delete));

    with spssdata.Spssdata(indexes=[key_name, surgery_date_name], cvtDates=(surgery_date_name, ), accessType='w') as cr:
        # Prepare variables, old vars should have been deleted
        for year in years:
            # Prepare Charlsons variables
            general_prefix = "{0}{1}yr_".format(charlson_prefix, year)
            main_prefix = "{0}{1}yr".format(charlson_score_var_name, year)

            vdefs = GetCharlsonVdef(general_prefix, main_prefix)
            for vd in vdefs:
                cr.append(vd)

            general_prefix = "{0}{1}yr_".format(charlson_icd10_prefix, year)
            main_prefix = "{0}{1}yr".format(charlson_icd10_score_var_name, year)

            vdefs = GetCharlsonVdef(general_prefix, main_prefix)
            for vd in vdefs:
                cr.append(vd)

            # Prepare RS Charlsons variables
            general_prefix = "{0}{1}yr_".format(rs_charlson_prefix, year)
            main_prefix = "{0}{1}yr".format(rs_charlson_score_var_name, year)

            vdefs = GetRCSharlsonVdef(general_prefix, main_prefix)
            for vd in vdefs:
                cr.append(vd)

            # Prepare RS Charlsons Icd 10 variables
            general_prefix = "{0}{1}yr_".format(rs_charlson_icd10_prefix, year)
            main_prefix = "{0}{1}yr".format(rs_charlson_icd10_score_var_name, year)

            vdefs = GetRCSharlsonVdef(general_prefix, main_prefix)
            for vd in vdefs:
                cr.append(vd)

            # Prepare Elixhausers variables
            general_prefix = "{0}{1}yr_".format(elixhausers_prefix, year)
            main_prefix = "{0}{1}yr".format(elixhausers_score_var_name, year)

            vdefs = GetElixhausersVdef(general_prefix, main_prefix)
            for vd in vdefs:
                cr.append(vd)

            general_prefix = "{0}{1}yr_".format(elixhausers_icd10_prefix, year)
            main_prefix = "{0}{1}yr".format(elixhausers_icd10_score_var_name, year)

            vdefs = GetElixhausersVdef(general_prefix, main_prefix)
            for vd in vdefs:
                cr.append(vd)

        # Commit the prepared variables
        cr.commitdict()

        for row in cr:
            key_var = row[0]
            surgery_date = row[1]
            if (print_debug):
                print "\n-_ Check key: {0} _-".format(key_var)

            if (all_scores["charlsons"].has_key(key_var)):
                subject_scores = all_scores["charlsons"][key_var]
                if (print_debug):
                    print "* Charlsons *"
                    print "Scores: {0}".format(subject_scores)

                SetCharlsonScoreValues(cr, subject_scores, surgery_date, 
                                       charlson_score_var_name, charlson_prefix, years)
            else:
                for y in years:
                    cr.setvalue("{0}{1}yr".format(charlson_score_var_name, y), 0)

            if (all_scores["charlsons_icd10"].has_key(key_var)):
                subject_scores = all_scores["charlsons_icd10"][key_var]
                if (print_debug):
                    print "* Charlsons ICD 10 *"
                    print "Scores: {0}".format(subject_scores)

                SetCharlsonScoreValues(cr, subject_scores, surgery_date, 
                                       charlson_icd10_score_var_name, charlson_icd10_prefix, years)
            else:
                for y in years:
                    cr.setvalue("{0}{1}yr".format(charlson_icd10_score_var_name, y), 0)

            if (all_scores["rs_charlsons"].has_key(key_var)):
                subject_scores = all_scores["rs_charlsons"][key_var]
                if (print_debug):
                    print "* RS Charlsons *"
                    print "Scores: {0}".format(subject_scores)

                SetRCSharlsonScoreValues(cr, subject_scores, surgery_date, 
                                         rs_charlson_score_var_name, rs_charlson_prefix, years)
            else:
                for y in years:
                    cr.setvalue("{0}{1}yr".format(rs_charlson_score_var_name, y), 0)

            if (all_scores["rs_charlsons_icd10"].has_key(key_var)):
                subject_scores = all_scores["rs_charlsons_icd10"][key_var]
                if (print_debug):
                    print "* RS Charlsons ICD 10*"
                    print "Scores: {0}".format(subject_scores)

                SetRCSharlsonScoreValues(cr, subject_scores, surgery_date,
                                        rs_charlson_icd10_score_var_name, rs_charlson_icd10_prefix, years)
            else:
                for y in years:
                    cr.setvalue("{0}{1}yr".format(rs_charlson_icd10_score_var_name, y), 0)

            if (all_scores["elixhausers"].has_key(key_var)):
                subject_scores = all_scores["elixhausers"][key_var]
                if (print_debug):
                    print "* Elixhausers *"
                    print "Scores: {0}".format(subject_scores)

                SetElixhausersScoreValues(cr, subject_scores, surgery_date, 
                                          elixhausers_score_var_name, elixhausers_prefix, years)
            else:
                for y in years:
                    cr.setvalue("{0}{1}yr".format(elixhausers_score_var_name, y), 0)

            if (all_scores["elixhausers_icd10"].has_key(key_var)):
                subject_scores = all_scores["elixhausers_icd10"][key_var]
                if (print_debug):
                    print "* Elixhausers ICD 10 *"
                    print "Scores: {0}".format(subject_scores)

                SetElixhausersScoreValues(cr, subject_scores, surgery_date, 
                                          elixhausers_icd10_score_var_name, elixhausers_icd10_prefix, years)
            else:
                for y in years:
                    cr.setvalue("{0}{1}yr".format(elixhausers_icd10_score_var_name, y), 0)

            # Commit all scores
            cr.CommitCase()

    for i in range(spss.GetVariableCount()):
        var_name = spss.GetVariableName(i)
        if (re.match(charlson_icd10_prefix, var_name)):
            spss.Submit("IF MISSING({0}) {0} = 0.".format(var_name))
            # Change if 10 or more years
            weight = GetCharlsonWeight(var_name[len("{0}Xyr_".format(charlson_icd10_prefix)):])
            spss.Submit("VALUE LABELS {0} 0 'No' {1} 'Yes'.".format(var_name, weight))
            spss.Submit("VARIABLE LEVEL {0} (NOMINAL) .".format(var_name))
        elif (re.match(charlson_prefix, var_name)):
            spss.Submit("IF MISSING({0}) {0} = 0.".format(var_name))
            # Change if 10 or more years
            weight = GetCharlsonWeight(var_name[len("{0}Xyr_".format(charlson_prefix)):])
            spss.Submit("VALUE LABELS {0} 0 'No' {1} 'Yes'.".format(var_name, weight))
            spss.Submit("VARIABLE LEVEL {0} (NOMINAL) .".format(var_name))
        elif (re.match(rs_charlson_icd10_prefix, var_name)):
            spss.Submit("IF MISSING({0}) {0} = 0.".format(var_name))
            weight = 1
            spss.Submit("VALUE LABELS {0} 0 'No' {1} 'Yes'.".format(var_name, weight))
            spss.Submit("VARIABLE LEVEL {0} (NOMINAL) .".format(var_name))
        elif (re.match(rs_charlson_prefix, var_name)):
            spss.Submit("IF MISSING({0}) {0} = 0.".format(var_name))
            weight = 1
            spss.Submit("VALUE LABELS {0} 0 'No' {1} 'Yes'.".format(var_name, weight))
            spss.Submit("VARIABLE LEVEL {0} (NOMINAL) .".format(var_name))
        elif (re.match(elixhausers_icd10_prefix, var_name)):
            spss.Submit("IF MISSING({0}) {0} = 0.".format(var_name))
            weight = GetElixhausersWeight(var_name[len("{0}Xyr_".format(elixhausers_icd10_prefix)):])
            spss.Submit("VALUE LABELS {0} 0 'No' {1} 'Yes'.".format(var_name, weight))
            spss.Submit("VARIABLE LEVEL {0} (NOMINAL) .".format(var_name))
        elif (re.match(elixhausers_prefix, var_name)):
            spss.Submit("IF MISSING({0}) {0} = 0.".format(var_name))
            weight = GetElixhausersWeight(var_name[len("{0}Xyr_".format(elixhausers_prefix)):])
            spss.Submit("VALUE LABELS {0} 0 'No' {1} 'Yes'.".format(var_name, weight))
            spss.Submit("VARIABLE LEVEL {0} (NOMINAL) .".format(var_name))

    spss.Submit("EXECUTE.")

Now I have a few more helper functions for setting the ICD-scores to the given variables:

def GetDaysB4Surgery(surgery_date, admission_dates):
    """
    returns days between admission and surgery date
    The value is positive if admission occured b4
    the surgery, surgery_date - admission date
    """
    # Get the admission time
    adm_times = admission_dates.split("<>")[0].split("-")
    admission_date = datetime.datetime(int(adm_times[0]), int(adm_times[1]), int(adm_times[2]))

    # check how admissino relates to the surgery date
    time_delta = surgery_date - admission_date

    return time_delta.days

def SetCharlsonScoreValues(cr, subject_scores, surgery_date, main_score_name, general_score_name, years):
    return _SetIcdScoreValues(cr, subject_scores, surgery_date, main_score_name, general_score_name, years, "Charlsons")

def SetRCSharlsonScoreValues(cr, subject_scores, surgery_date, main_score_name, general_score_name, years):
    return _SetIcdScoreValues(cr, subject_scores, surgery_date, main_score_name, general_score_name, years, "RS Charlsons")

def SetElixhausersScoreValues(cr, subject_scores, surgery_date, main_score_name, general_score_name, years):
    return _SetIcdScoreValues(cr, subject_scores, surgery_date, main_score_name, general_score_name, years, "Elixhausers")

def _SetIcdScoreValues(cr, subject_scores, surgery_date, main_score_name, general_score_name, years, score_type):
    # Create a variable for storing all the
    # scores, also add the different years
    combined_parts = {}
    for y in years:
        combined_parts[y] = {}

    # Combine the different scores into one
    # All scores are now date valid, moved previous check to GetDataFromSet
    for admission_dates, score_parts in subject_scores.iteritems():
        delta_days = GetDaysB4Surgery(surgery_date, admission_dates)

        for y in years:
            if (delta_days >= 0 and delta_days <= y*365.25):
                for key_disease, weight in score_parts.iteritems():
                    combined_parts[y][key_disease] = weight

    for y in years:
        #Clean score with hierarchy
        if (score_type == "Charlsons"):
            combined_parts[y] = Adjust4CharlsonHierarchy(combined_parts[y])
        elif (score_type == "RS Charlsons"):
            combined_parts[y] = Adjust4RCSharlsonHierarchy(combined_parts[y])
        elif (score_type == "Elixhausers"):
            combined_parts[y] = Adjust4ElixhausersHierarchy(combined_parts[y])
        else:
            raise Exception("Invalid score_type {0}".format(score_type))

        # Calculate the sum for elixhausers by adding all the parts
        combined_score = sum([i for i in combined_parts[y].values()])
        cr.setvalue("{0}{1}yr".format(main_score_name, y), combined_score)

        for key_disease, score in combined_parts[y].iteritems():
            if (key_disease != main_score_name):
                try:
                    vn = "{0}{1}yr_{2}".format(general_score_name, y, key_disease)
                    cr.setvalue(vn, score)
                except Exception:
                    print "Error when seeting: ", vn

That's all folks! It's a lot of code but the functions are fairly well contained and their names should hopefully be self-explanatory. If you are still confused don't hesitate to post a question and I'll try to clarify.

The python SPSS plugin - a few words from experience

The SPSS python plugin is a brilliant addition to the software, and if used with the spss.Spssdata() function it is actually rather fast (stay away from the Datastep alternative). Although Jon Peck at IBM has been really helpful and quick at answering my questions, I get the feeling that IBM is not full-heartedly supporting his efforts. For instance the plugin isn't installed by default, it can be a little tricky to find on the web-page, and there is no syntax-highlighting/code folding support in SPSS (I use Eclipse for developing the code). My conclusion is therefore that for the future projects I won't use SPSS, just pure Python/R.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.