I am wondering if there are any packages for python that is capable of performing survival analysis. I have been using the survival package in R but would like to port my work to python.
AFAIK, there aren't any survival analysis packages in python. As mbq comments above, the only route available would be to Rpy.
Even if there were a pure python package available, I would be very careful in using it, in particular I would look at:
- How often does it get updated.
- Does it have a large user base?
- Does it have advanced techniques?
One of the benefits of R, is that these standard packages get a massive amount of testing and user feed back. When dealing with real data, unexpected edge cases can creep in.
Check out the Lifelines$^1$ project for a simple and clean implementation of survival models in Python, including
- Estimators of survival functions
- Estimators of cumulative hazard curves
- Cox's proportional hazard regression model
- Aalen's additive regression model
- multivariate testing
- built on top of Pandas
- built in plotting functions
- simple interface
Documentation is available here: documentation and examples
from lifelines import KaplanMeierFitter survival_times = np.array([0.,3.,4.5, 10., 1.]) events = np.array([False, True, True, False, True]) kmf = KaplanMeierFitter() kmf.fit(survival_times, event_observed=events) print kmf.survival_function_ print kmf.median_ kmf.plot()
Example plots from the built-in plotting library:
- Disclaimer: I'm the main author. Ping me (email in profile) for questions or feedback about Lifelines.
python-asurv is an effort to port the asurv software for survival methods in astronomy. Might be worth keeping an eye on, but cgillespie is right about the things to watch out for: it has a long way to go and development doesn't seem active. (AFAICT only one method exists and even completed, the package may be lacking for, say, biostatisticians.)
You're probably better off using survival package in R from Python through something like RPy or PypeR. I haven't had any problems doing this myself.
PyIMSL contains a handful of routines for survival analyses. It is Free As In Beer for noncommercial use, fully supported otherwise. From the documentation in the Statistics User Guide...
Computes Kaplan-Meier estimates of survival probabilties: kaplanMeierEstimates()
Analyzes survival and reliability data using Cox’s proportional hazards model: propHazardsGenLin()
Analyzes survival data using the generalized linear model: survivalGlm()
Estimates using various parametric modes: survivalEstimates()
Estimates a reliability hazard function using a nonparametric approach: nonparamHazardRate()
Produces population and cohort life tables: lifeTables()
You can now use R from within IPython, so you might want to look into using IPython with the R extension.
Apart from using
RPy or equivalent there are a number of survival analysis routines in the statsmodels (formerly
sicpy.statsmodel) python library. They are in the "sandbox" package though, meaning they aren't supposed to be ready for production right now.
E.g. you have the Cox model of proportional hazard coded here.