Increasingly sophisticated MCMC and Monte Carlo algorithms raise the scope for errors, either in derivations of mathematical quantities needed for sampling, or implementation errors. Testing for such errors should be an *integral and routine* part of the workflow of any Bayesian analysis.

**mcunit** makes it easy to do this, by allowing for unit testing MCMC and other Monte Carlo implementations within the framework of the testthat package (Wickham 2011). The methods correspond to those proposed in Gandy and Scott (2020). They are based on statistical hypothesis testing, and are designed to achieve an arbitrarily low false rejection probability, so that the user can be confident that a correctly implemented algorithm will almost certainly not be rejected. By default, this false rejection rate is set to \(10^{-5}\).

MCMC samplers can be testes using `expect_mcmc`

or `expect_mcmc_reversible`

. These functions use different statistical hypothesis tests described in Section 2.1 and 2.2 of (Gandy and Scott 2020) respectively. `expect_mcmc_reversible`

requires that the sampler to be tested is (or at least, is expected to be) reversible.

This vignette demonstrates how to use these tests, using the following simple example. Consider the model \[y \sim \theta_1 + \theta_2 + \epsilon,\] where \(\theta := (\theta_1,\theta_2)\) is apriori independent, zero mean normal with standard deviation \(\sigma=10\). The white noise term \(\epsilon\) is independent from \(\theta\) and also zero mean normal but with variance \(\sigma_{\epsilon}^2\) = 0.1. While inference is easy here, we consider drawing samples from the posterior \(\pi(\theta \mid y)\) using a Gibbs sampler. The posterior conditional distributions for \(\theta_1\) and \(\theta_2\) are normal with expectations \[\mathbb{E}(\theta_i | y, \theta_j) = \frac{\sigma^2}{\sigma^2_{\epsilon} + \sigma^2}(y - \theta_j),\] and variances \[\mathbb{V}(\theta_i | y, \theta_j) = \frac{1}{\frac{1}{\sigma^2_{\epsilon}} + \frac{1}{\sigma^2}}.\]

Start with a correctly implemented random scan Gibbs sampler. First load the package and set the seed.

`require(mcunit)`

`## Loading required package: mcunit`

`set.seed(10)`

The following function updates one element of \(\theta\) given the other and given the observed data \(y\).

```
gibbsUpdate <- function(y, theta_j) {
# Samples theta_i given y and theta_j
mean <- 100 * (y - theta_j) / (100 + 0.1)
var <- 1. / (1. / 100 + 1. / 0.1)
rnorm(1, mean=mean, sd=sqrt(var))
}
```

The argument `y`

is the observed data, and `theta_j`

is the component to condition on. From this, we define a correctly implemented random scan sampler.

```
randomScan <- function(theta, y, thinning) {
# Random Scan Gibbs
for(i in 1:thinning) {
# select index to update
i <- sample.int(2,1)
theta[i] <- gibbsUpdate(y, theta[i %% 2 + 1])
}
theta
}
```

This function takes the current state of the chain `theta`

, the data `y`

and a thinning parameter which determines the number of individual updates to make before returning the new state.

Our task is to test the correctness of `randomScan`

. We will do this using both `expect_mcmc`

and `expect_mcmc_reversible`

. This latter method can be used because the sampler should, if correct, be reversible.

Both methods require a list `object`

which describes the MCMC sampler to be tested. The list must contain the following elements:

`object$genprior`

: A function with no arguments that generates a sample from the prior distribution. No default value.`object$gendata`

: A function that takes as input the parameter value (of the type generated by genprior) and returns the observed data as an arbitrary R object. No default value.`object$stepMCMC`

: A function that takes three arguments:`theta`

: the current position of the chain (of the same type as produced by the prior),`dat`

: the observed data (of the same type as produced by gendat)`thinning`

: the number of steps the chain should take. 1 corresponds to one step.`object$test`

: Function that takes either one or two arguments, and returns a vector with components which will be used for checking the MCMC sampler. The first argument is interpreted as a parameter value, and if a second argument exists, it is interpreted as a data value. An example is the identity function: \(f(\theta) = \theta\). Alternatively, if you have access to the model’s likelihood function, you could use \(p(y \mid \theta)\).

Please see the documentation for `expect_mcmc`

and `expect_mcmc_reversible`

for further details on this.

We begin constructing this list by defining the prior sampler and the data sampler.

```
obj <- list()
obj$genprior <- function() rnorm(n=2, mean=0, sd=10)
obj$gendata <- function(theta) sum(theta) + rnorm(n=1, mean=0, sd=sqrt(0.1))
```

As test functions, we use the components \(\theta_1\), \(theta_2\), the prior density \(\pi(\theta)\) and likelihood function \(p(y \mid \theta)\). The density and likelihood appear to be a good default choice because they are one-dimensional regardless of the dimension of the parameter vector and data.

```
priorDensity <- function(theta) prod(dnorm(theta, mean=0, sd=10))
likelihood <- function(theta, y) dnorm(y, mean=sum(theta), sd=sqrt(0.1))
testVec <- function(theta, y) c(theta[1], theta[2], priorDensity(theta), likelihood(theta, y))
obj$test <- testVec
```

Finally, we are left to define a single MCMC step.

```
obj$stepMCMC <- function(theta, dat, thinning) randomScan(theta, dat, thinning)
print(obj)
```

```
## $genprior
## function ()
## rnorm(n = 2, mean = 0, sd = 10)
##
## $gendata
## function (theta)
## sum(theta) + rnorm(n = 1, mean = 0, sd = sqrt(0.1))
##
## $test
## function (theta, y)
## c(theta[1], theta[2], priorDensity(theta), likelihood(theta,
## y))
##
## $stepMCMC
## function (theta, dat, thinning)
## randomScan(theta, dat, thinning)
```

First consider `expect_mcmc`

. We use \(100\) thinning steps between samples to give the test some power to detect errors.

`expect_mcmc(obj, thinning = 100)`

No errors were detected, because there are none. Recall that the false rejection rate is set to \(10^{-5}\) by default, and so repeated application of this test should very rarely flag `randomScan.`

This sampler is also reversible, and so we try `expect_mcmc_reversible`

.

`expect_mcmc_reversible(obj, thinning = 10, nsteps = 10)`

Again, no error is detected (as expected).

Consider systematic scan of the vector \(\theta\) rather than random scan.

```
systematicScan <- function(theta, y, thinning) {
# Systematic Scan Gibbs
for(i in 1:thinning) {
theta[1] <- gibbsUpdate(y, theta[2])
theta[2] <- gibbsUpdate(y, theta[1])
}
theta
}
```

This sampler is correct and `expect_mcmc`

is unlikely to falsely detect errors.

```
obj$stepMCMC <- function(theta, dat, thinning) systematicScan(theta, dat, thinning)
expect_mcmc(obj, thinning = 100)
```

Good. What happens using `expect_mcmc_reversible`

with thinning of \(1\)?

`expect_mcmc_reversible(obj, thinning = 1, nsteps = 10)`

`## Error: Test failed for components with p-values=NULL in iteration 2`

The test failed. This illustrates why `expect_mcmc_reversible`

should not be used for a non-reversible sampler - we have no gaurantee over the false rejection rate. Even though this sampler is correctly implemented, we are erroneously detecting an error.

Now we make a genuine mistake in the sampler, and investigate whether the methods detect it. We replace the variance terms \(\sigma^2\) and \(\sigma^2_{\epsilon}\) in \(\mathbb{V}(\theta_i | y, \theta_j)\) with their corresponding standard deviations (i.e. \(\sigma\) and \(\sigma_{\epsilon}\)).

```
gibbsUpdate <- function(y, theta_j) {
# Samples theta_i given y and theta_j
mean <- 100 * (y - theta_j) / (100 + 0.1)
var <- 1. / (1. / 10 + 1. / sqrt(0.1))
rnorm(1, mean=mean, sd=sqrt(var))
}
```

Also make sure to switch back to random scan…

`obj$stepMCMC <- function(theta, dat, thinning) randomScan(theta, dat, thinning)`

Rerunning the tests…

`expect_mcmc(obj, thinning = 100)`

`## Error: Test failed for components with p-values=NULL in iteration 1`

`expect_mcmc_reversible(obj, thinning = 10, nsteps = 10)`

`## Error: Test failed for components with p-values=NULL in iteration 1`

Great! Both tests detect a problem. Even better, you can be pretty much sure there is a problem because the chance of a false rejection is upper bounded by \(10^{-5}\)!

Gandy, A., and Scott, J. (2020), “Unit testing for MCMC and other Monte Carlo methods.”

Wickham, H. (2011), “Testthat: Get started with testing,” *The R Journal*, 3, 5–10.