The whSample Package

whSample helps analysts quickly generate statistical samples from Excel or Comma Separated Value (CSV) files and write them to a new Excel workbook. Users have a choice of Simple Random or Stratified Random samples, and a third choice of having each stratum included in a separate worksheet.

See package vignettes for detailed documentation.


The workhorse function is sampler. A helper function, ssize, estimates the minimum sample size necessary to achieve statistical requirements using a Normal Approximation to the Hypergeometric Distribution. This distribution spans the probabilities of yes/no-type responses without replacement. These parameters are:

ssize(N, ci=0.95, me=0.07, p=0.50) (showing the defaults) only requires the N argument. Used as a standalone, it can be used to explore sample sizes under other conditions. For example, a probe sample may suggest that a 50-50 probability isn’t realistic. A revised sample size can be estimated with the observed success probability (p=0.6, for example).


The sampler function calls ssize to get its sample size estimate. Therefore, it requires the ci, me, and p arguments, which it passes to ssize.

sampler also takes four additional arguments:

The defaults for these arguments are backups=5, irisData=F, seed=NULL, and keepOrg=T. The default seed will tell sampler to use the current system time in milliseconds (a common seeding approach). The keep-original option (keepOrg) defaults to TRUE, but should be keepOrg=F for populations larger than about a million records since Excel’s row limit is 1,048,576 and sampler adds some header and blank lines to its output.

To override any of these defaults, enter name=value as an argument.

sampler uses a series of menus to guide users through the sampling process.


sampler creates a new Excel workbook with three parts:


You can install whSample from CRAN with:


or get the latest developmental version with:


Other necessary packages

sampler depends on several external packages to run properly. If you’re running a developmental version, make sure these packages are installed on your computer:


ssize(5000): N=5000, other arguments use defaults

ssize(5000, p=0.60): N=5000, with a 60% expected rate of occurrence

sampler(): Uses all defaults, gets N from the source data.

sampler(backups=2, seed=12345): Overrides specific defaults