km.support

Introduction

The km.support function is part of the conf package. The function calculates the the support values for Kaplan and Meier’s product–limit estimator1. The Kaplan-Meier product-limit estimator (KMPLE) is used to estimate the survivor function for a data set of positive values in the presence of right censoring, and the support values are all possible values of the KMPLE for a specific sample size.

The km.support function finds the support values of the KMPLE for a particular sample size $$n$$ (the number of items on test) using an induction algorithm2. The support values are returned as a list with two components: numerators and denominators. This allows the user to generate exact fractions.

Installation Instructions

The km.support function is accessible following installation of the conf package:

install.packages("conf")
library(conf)

Details

The KMPLE is a nonparametric estimate of the survival function from a data set of lifetimes that includes right-censored observations and is used in a variety of application areas. For simplicity, we will refer to the object of interest generically as the item and the event of interest as the failure.

Let $$n$$ denote the number of items on test. The KMPLE of the survival function $$S(t)$$ is given by $\hat{S}(t) = \prod\limits_{i:t_i \leq t}\left( 1 - \frac{d_i}{n_i}\right),$ for $$t \ge 0$$, where $$t_1, \, t_2, \, \ldots, \, t_k$$ are the times when at least one failure is observed ($$k$$ is an integer between 1 and $$n$$, which is the number of distinct failure times in the data set), $$d_1, \, d_2, \, \ldots, \, d_k$$ are the number of failures observed at times $$t_1, \, t_2, \, \ldots, \, t_k$$, and $$n_1, \, n_2, \, \ldots, \, n_k$$ are the number of items at risk just prior to times $$t_1, \, t_2, \, \ldots, \, t_k$$. It is common practice to have the KMPLE ‘’cut off’’ after the largest time recorded if it corresponds to a right-censored observation3. The KMPLE drops to zero after the largest time recorded if it is a failure; the KMPLE is undefined, however, after the largest time recorded if it is a right-censored observation.

The support values in km.support are the calculated from $$\hat{S}(t)$$ at any $$t \ge 0$$ for all possible outcomes of an experiment with $$n$$ items on test. The function has only the sample size $$n$$ as its argument.

Example

To illustrate a simple case, consider the KMPLE for one particular experiment when there are $$n = 4$$ items on test, failures occur at times $$t = 1$$ and $$t = 3$$, and right censorings occur at times $$t = 2$$ and $$t = 4$$. In this setting, the KMPLE is

$\begin{equation*} \hat{S}(t) = \begin{cases} 1 & \qquad 0 \le t < 1 \\ \left(1 - \frac{1}{4}\right) = \frac{3}{4} & \qquad 1 \leq t < 3 \\ \left(1 - \frac{1}{4}\right) \left(1 - \frac{1}{2}\right) = \frac{3}{8} & \qquad 3 \leq t < 4 \\ \text{NA} & \qquad t \geq 4, \end{cases} \end{equation*}$

where NA indicates that the KMPLE is undefined.

The KMPLE in this experiment has 3 of the 8 support values that are produced by km.support, as can be seen in the output below, and NA. The NA’s will not be displayed in the output.

library(conf)
#  display unsorted numerators and denominators of support values for n = 4
n = 4
s = km.support(n)
s
#> $num #> [1] 0 1 1 2 1 3 3 1 #> #>$den
#> [1] 1 1 2 3 3 4 8 4
#  display sorted support values for n = 4 as decimals
sort(s$num / s$den)
#> [1] 0.0000000 0.2500000 0.3333333 0.3750000 0.5000000 0.6666667 0.7500000
#> [8] 1.0000000
#  display sorted support values for n = 4 as exact fractions
i <- order(s$num / s$den)
m <- length(s$num) f <- "" for (j in i[2:(m - 1)]) f <- paste(f, s$num[j], "/", s\$den[j], ", ", sep = "")
cat(paste("The ", m, " support values for n = ", n, " are: 0, ", f, "1.\n", sep = ""))
#> The 8 support values for n = 4 are: 0, 1/4, 1/3, 3/8, 1/2, 2/3, 3/4, 1.

Consider the another KMPLE for a different outcome of the same experiment when there are $$n = 4$$ items on test. This time we observe 4 failures at times $$t = 1,2,3,4$$. In this setting, the KMPLE is

$\begin{equation*} \hat{S}(t) = \begin{cases} 1 & \qquad 0 \le t < 1 \\ \left(1 - \frac{1}{4}\right) = \frac{3}{4} & \qquad 1 \leq t < 2 \\ \left(1 - \frac{1}{4}\right) \left(1 - \frac{1}{3}\right) = \frac{1}{2} & \qquad 2 \leq t < 3 \\ \left(1 - \frac{1}{4}\right) \left(1 - \frac{1}{3}\right) \left(1 - \frac{1}{2}\right) = \frac{1}{4} & \qquad 3 \leq t < 4 \\ \text{0} & \qquad t \geq 4, \end{cases} \end{equation*}$

The KMPLE in this experiment has 5 (3 new ones: 1/2, 1/4, and 0) of the 8 support values that are produced by km.support, as can be seen in the output above. By looking at all possible combinations of outcomes, the remaining support values will be found.

Package Notes

The function km.support is also called from the functions km.pmf and km.surv, which are also part of the conf package.

1. Kaplan, E. L., and Meier, P. (1958), “Nonparametric Estimation from Incomplete Observations,” Journal of the American Statistical Association, 53, 457–481.↩︎

2. Qin Y., Sasinowska H. D., Leemis L. M. (2023), “The Probability Mass Function of the Kaplan–Meier Product–Limit Estimator,” The American Statistician, 77 (1), 102–110.↩︎

3. Kalbfleisch, J. D., and Prentice, R. L. (2002), The Statistical Analysis of Failure Time Data (2nd ed.), Hoboken, NJ: Wiley.↩︎