
Robust Outlier-Adjusted Mean-Shift Estimation of State Space Models
Source:R/roams_SSM.R
roams_SSM.RdFits a robust state space model to multivariate time series data using iterative parameter estimation and outlier detection. This procedure is inspired by the Iterative Procedure for Outlier Detection (IPOD) algorithm of She and Owen (2011) and is applied over a sequence of regularization parameters (\(\lambda\)'s), identifying outliers via Mahalanobis residuals and re-fitting the model iteratively.
Usage
roams_SSM(
y,
init_par,
build,
num_lambdas = 20,
custom_lambdas = NA,
cores = 1,
B = 50,
lower = NA,
upper = NA,
tol = 1e-04,
lambda_min = 2,
excessive_outliers_iter_limit = 1,
control = list(parscale = init_par)
)Arguments
- y
A numeric matrix of observations, with each row corresponding to a time point.
- init_par
A numeric vector of initial parameter values for optimization.
- build
A function that accepts a parameter vector and returns a
dlmmodel (as used indlm::dlmMLE()). Thespecify_SSMfunction can be used to create thisbuildfunction.- num_lambdas
Integer. The number of \(\lambda\) values to evaluate. Ignored if
custom_lambdasis specified. Default is 20.- custom_lambdas
Optional numeric vector. If supplied, these are the exact \(\lambda\) values used for model fitting. If not provided or set to
NA, thennum_lambdas\(\lambda\)'s are automatically chosen.- cores
Integer. Number of CPU cores to use for parallel processing. Default is 1 (sequential execution).
- B
Integer. Maximum number of iterations per \(\lambda\). Default is 50.
- lower
Optional numeric vector of lower bounds for optimization. If
NA, defaults to-Inffor all parameters. Must be of same length asinit_par.- upper
Optional numeric vector of upper bounds for optimization. If
NA, defaults toInffor all parameters. Must be of same length asinit_par.- tol
Tolerance level for checking convergence of the ROAMS procedure. Default is
1e-4.- lambda_min
Minimum \(\lambda\) value to consider when constructing the sequence of \(\lambda\)'s. Ignored if
custom_lambdasis specified. Default is 2.- excessive_outliers_iter_limit
Integer. Maximum number of iterations allowed where \(\ge 50\%\) of timepoints are flagged as outliers. This many outliers suggests \(\lambda\) is too low. Allows ROAMS to get through these \(\lambda\)'s quicker. Default is 1.
- control
A named list of control options to pass to
optimviadlm::dlmMLE(). Default islist(parscale = init_par), which can help the optimizer if parameters are on vastly different scales.
Value
If more than one \(\lambda\) values are used, returns an object of class roams_SSM_list — a list containing a roams_SSM model for each \(\lambda\). If only one \(\lambda\) value is used (i.e. custom_lambdas is manually specified as a single value), returns a single roams_SSM object.
Each roams_SSM object includes:
lambda- The \(\lambda\) value used.prop_outlying- Proportion of non-missing time points identified as outliers.BIC- Bayesian Information Criterion of the final model.loglik- Log-likelihood of the fitted model.RSS- Residual sum of squares.gamma- Matrix of estimated outlier adjustments.iterations- Number of iterations performed.Optimization output from
dlm::dlmMLE()from the final iteration.y- The original data matrix.build- The original build function used to specify the model.
Details
The ROAMS procedure alternates between estimating model parameters via maximum likelihood and identifying outlying observations based on Mahalanobis distance of residuals. For each iteration:
A
dlmmodel is fit usingdlm::dlmMLE().Mahalanobis distance of residuals (Mahalanobis residuals) are computed.
Observations with Mahalanobis residual (plus a \(\log(|\mathbf{S}_{t|t-1}|)\) adjustment) above the current \(\lambda\) threshold are treated as missing in the next iteration.
The algorithm stops when the change in parameters and outlier estimates is sufficiently small or if too many outliers are detected (more than 50% of complete observations).