Extracting comorbidities from a database in SPSS

Using large databases for extracting data can be cumbersome, fortunately it's more reliable than sifting for gold. The image is CC by Won-Tolla.
Using large databases for extracting data can be cumbersome, fortunately it’s more reliable than sifting for gold. The image is CC by Won-Tolla.

I put a lot of effort in to my first article to calculate the comorbidities of a patient according to the Charlson & Elixhauser scores. The available scripts were in SAS and Stata, as I started out using SPSS I decided to implement the code in the neat Python plugin that SPSS provides. In this post I’ll provide you with a detailed walk through of my code, and hopefully it will save you some time.

Update: I have created a new, easy-to-use package in R that has been thoroughly tested. You can find it on my github-page here (currently called comorbidities.icd10)

The code

The final code is rather simple, it finds the base dataset with the surgery codes, then it looks for the dataset with the ICD-codes, loops through them and adds the comorbidities to the base dataset:

The above is of course just the main script, the true magic is in the myriad of functions. I have some functions that help me selecting the opened dataset of interest and the files that I want to open and extract data from, note the wx library needs to be installed and loaded, se the collapsed code block below.

Now here’s the heart of the action is of course the extracting data part, some of the core functions can be found on the comorbidity page.

Then for saving the data into the base dataset I have this function:

Now I have a few more helper functions for setting the ICD-scores to the given variables:

That’s all folks! It’s a lot of code but the functions are fairly well contained and their names should hopefully be self-explanatory. If you are still confused don’t hesitate to post a question and I’ll try to clarify.

The python SPSS plugin – a few words from experience

The SPSS python plugin is a brilliant addition to the software, and if used with the spss.Spssdata() function it is actually rather fast (stay away from the Datastep alternative). Although Jon Peck at IBM has been really helpful and quick at answering my questions, I get the feeling that IBM is not full-heartedly supporting his efforts. For instance the plugin isn’t installed by default, it can be a little tricky to find on the web-page, and there is no syntax-highlighting/code folding support in SPSS (I use Eclipse for developing the code). My conclusion is therefore that for the future projects I won’t use SPSS, just pure Python/R.

Flattr this!

This entry was posted in Research and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.