- Rsquared performance metric for regression/continuous outcomes was
previously calculated using
`defaultSummary()`

function from`caret`

which uses the square of Pearson correlation coefficient (r-squared), instead of the correct coefficient of determination which is calculated as`1 - rss/tss`

, where`rss`

= residual sum of squares,`tss`

= total sum of squares. The correct formula for R-squared is now being applied.

- Prevent bug if
`x`

is a single predictor.

- Updated documentation for custom filter functions.

- Added
`prc()`

which enables easy building of precision-recall curves from ‘nestedcv’ models and`repeatcv()`

results. - Added
`predict`

method for`cva.glmnet`

. - Removed magrittr as an imported package. The standard R pipe
`|>`

can be used instead. - Added
`metrics()`

which gives additional performance metrics for binary classification models such as F1 score, Matthew’s correlation coefficient and precision recall AUC. - Added
`pls_filter()`

which uses partial least squares regression to filter features. - Enabled parallelisation over repeats in
`repeatedcv()`

leading to significant improvement in speed.

- Fixed issue with xgboost on linux/windows with parallel processing
in
`nestcv.train()`

. If argument`cv.cores`

>1, openMP multithreading is now disabled, which prevents caret models`xgbTree`

and`xgbLinear`

from crashing, and allows them to be parallelised efficiently over the outer CV loops. - Improvements to
`var_stability()`

and its plots. - Fixed major bug in multivariate Gaussian and Cox models in
`nestcv.glmnet()`

- Added new feature
`repeatcv()`

to apply repeated nested CV to the main`nestedcv`

model functions for robust measurement of model performance.

- Added new feature via
`modifyX`

argument to all`nestedcv`

models. This allows more powerful manipulation of the predictors such as scaling, imputing missing values, adding extra columns through variable manipulations. Importantly these are applied to train and test input data separately. - Added
`predict()`

function for`nestcv.SuperLearner()`

- Added
`pred_SuperLearner`

wrapper for use with`fastshap::explain`

- Fixed parallelisation of
`nestcv.SuperLearner()`

on windows.

- Added support for multivariate Gaussian and Cox models in
`nestcv.glmnet()`

- Added argument
`verbose`

in`nestcv.train()`

,`nestcv.glmnet()`

and`outercv()`

to show progress. - Added argument
`multicore_fork`

in`nestcv.train()`

and`outercv()`

to allow choice of parallelisation between forked multicore processing using`mclapply`

or non-forked using`parLapply`

. This can help prevent errors with certain multithreaded caret models e.g.`model = "xgbTree"`

. - In
`one_hot()`

changed`all_levels`

argument default to`FALSE`

to be compatible with regression models by default. - Add coefficient column to
`lm_filter()`

full results table

- Fixed significant bug in
`lm_filter()`

where variables with zero variance were incorrectly reporting very low p-values in linear models instead of returning`NA`

. This is due to how rank deficient models are handled by`RcppEigen::fastLmPure`

. Default method for`fastLmPure`

has been changed to`0`

to allow detection of rank deficient models. - Fixed bug in
`weight()`

caused by`NA`

. Allow`weight()`

to tolerate character vectors.

- Better handling of dataframes in filters.
`keep_factors`

option has been added to filters to control filtering of factors with 3 or more levels. - Added
`one_hot()`

for fast one-hot encoding of factors and character columns by creating dummy variables. - Added
`stat_filter()`

which applies univariate filtering to dataframes with mixed datatype (continuous & categorical combined). - Changed one-way ANOVA test in
`anova_filter()`

from`Rfast::ftests()`

to`matrixTests::col_oneway_welch()`

for much better accuracy

- Fixed bug caused by use of weights with
`nestcv.train()`

(Matt Siggins suggestion)

- Added
`n_inner_folds`

argument to`nestcv.train()`

to make it easier to set the number of inner CV folds, and`inner_folds`

argument which enables setting the inner CV fold indices directly (suggestion Aline Wildberger)

- Fixed error in
`plot_shap_beeswarm()`

caused by change in fastshap 0.1.0 output from tibble to matrix - Fixed bug with categorical features and
`nestcv.train()`

- Add argument
`pass_outer_folds`

to both`nestcv.glmnet`

and`nestcv.train`

: this enables passing of passing of outer CV fold indices stored in`outer_folds`

to the final round of CV. Note this can only work if`n_outer_folds`

= number of inner CV folds and balancing is not applied so that`y`

is a consistent length.

- Fix: ensure
`nfolds`

for final CV equals`n_inner_folds`

in`nestcv.glmnet()`

- Improve
`plot_var_stability()`

to be more user friendly - Add
`top`

argument to shap plots

- Modified examples and vignette in anticipation of new version of fastshap 0.1.0

- Add vignette for variable stability and SHAP value analysis
- Refine variable stability and shap plots

- Switch some packages from Imports to Suggests to make basic installation simpler.
- Provide helper prediction wrapper functions to make it easier to use
package
`fastshap`

for calculating SHAP values. - Add
`force_vars`

argument to`glmnet_filter()`

- Add
`ranger_filter()`

- Disable printing in
`nestcv.train()`

from models such as`gbm`

. This fixes multicore bug when using standard R gui on mac/linux. - Bugfix if
`nestcv.glmnet()`

model has 0 or 1 coefficients. - Add multiclass AUC for multinomial classification.

`nestedcv`

models now return`xsub`

containing a subset of the predictor matrix`x`

with filtered variables across outer folds and the final fit`boxplot_model()`

no longer needs the predictor matrix to be specified as it is contained in`xsub`

in`nestedcv`

models`boxplot_model()`

now works for all`nestedcv`

model types- Add function
`var_stability()`

to assess variance and stability of variable importance across outer folds, and directionality for binary outcome - Add function
`plot_var_stability()`

to plot variable stability across outer folds - Add
`finalCV = NA`

option which skips fitting the final model completely. This gives a useful speed boost if performance metrics are all that is needed. `model`

argument in`outercv`

now prefers a character value instead of a function for the model to be fitted- Bugfixes

- Add check model exists in
`outercv`

- Perform final model fit first in
`nestcv.train`

which improves error detection in caret. So`nestcv.train`

can be run in multicore mode straightaway. - Removes predictors with variance = 0
- Fix bug caused by filter p-values = NA

- Add confusion matrix to results summaries for classification
- Fix bugs in extraction of inner CV predictions for
`nestcv.glmnet`

- Fix multinomial
`nestcv.glmnet`

- Add
`outer_train_predict`

argument to enable saving of predictions on outer training folds - Add function
`train_preds`

to obtain outer training fold predictions - Add function
`train_summary`

to show performance metrics on outer training folds

- Add examples of imbalance datasets
- Fix rowname bug in
`smote()`

- Add support for nested CV on ensemble models from
`SuperLearner`

package - Final CV on whole data is now the default in
`nestcv.train`

and`nestcv.glmnet`

- Fix windows parallelisation bugs

- Fix bug in
`nestcv.train`

for caret models with tuning parameters which are factors - Fix bug in
`nestcv.train`

for caret models using regression - Add option in
`nestcv.train`

and`nestcv.glmnet`

to tune final model parameters using a final round of CV on the whole dataset - Fix bugs in LOOCV
- Add balancing to final model fitting
- Add case weights to
`nestcv.train`

and`outercv`

- Add
`randomsample()`

to handle class imbalance using random over/undersampling - Add
`smote()`

for SMOTE algorithm for increasing minority class data - Add bootstrap wrapper to filters,
e.g.
`boot_ttest()`

- Final lambda in
`nestcv.glmnet()`

is mean of best lambdas on log scale - Added
`plot_varImp`

for plotting variable importance for`nestcv.glmnet`

final models

- Corrected handling of multinomial models in
`nestcv.glmnet()`

- Align lambda in
`cva.glmnet()`

- Improve plotting of error bars in
`plot.cva.glmnet`

- Bugfix: plot of single
`alphaSet`

in`plot.cva.glmnet`

- Updated documentation and vignette

- Parallelisation on windows added
- hsstan model has been added (Athina Spiliopoulou)
- outer_folds can be specified for consistent model comparisons
- Checks on x, y added
- NA handling
- summary and print methods
- Implemented LOOCV
- Collinearity filter
- Implement lm and glm as models in outercv()
- Runnable examples have been added throughout

- Major update to include nestedcv.train function which adds nested CV
to the
`train`

function of`caret`

- Note passing of extra arguments to filter functions specified by
`filterFUN`

is no longer done through`...`

but with a list of arguments passed through a new argument`filter_options`

.

- Initial build of nestedcv
- Added outercv.rf function for measuring performance of rf
- Added cv.rf for tuning mtry parameter
- Added plot_caret for plotting caret objects with error bars on the tuning metric