The Azimuth Project
Method of common random numbers in Monte Carlo methods (changes)

Showing changes from revision #5 to #6: Added | Removed | Changed

Method of Common Random Numbers in Monte Carlo Methods



The Method of Common Random Numbers in Monte Carlo methods is really a set of reasoning about when using the same sequence of random numbers in Monte Carlo estimations of different systems – which includes the same “general system” but with different choices of system parameters – is beneficial. It is an one of various variance reduction techniques.


Suppose that we’re interested in estimating some systems ff and gg, which can be estimated by evaluating expressions which use random numbers, e.g., a simple example is function integration. One is often interested in fgf-g rather than the ff and gg themselves, where we write f({x})f({y})f(\{x\})-f(\{y\}) to denote the difference between ff and gg estimated using sets of random number “runs” {x}\{x\} and {y}\{y\}. Consider the simple case where a Monte Carlo “run” consists of just one random number (eg, 1D function integral estimation). Then using either differently generated or common random numbers result in unbiased estimators? of the expectation (i.e., mean) of fgf-g (by their separate unbiasedness and the linearity of expectation). Now we can look at the variance of the esimator. We have

var {x},{y}(fg)=var({f(x)})+var({g(y)})2cov({f(x)},{g(x)}) var_{\{x\},\{y\}}(f-g)=var(\{f(x)\})+var(\{g(y)\})-2 cov(\{f(x)\},\{g(x)\})

(where the clumsy, imprecise notation is attempting to point out what these expressions denote for finite sets of samples). Now if {x}\{x\} and {y}\{y\} are distinct, then {f(x)}\{f(x)\} and {g(x)}\{g(x)\} are effectively samples from independent variables and hence the covariance is zero. However, if we can arrange for the covariance to be positive then the overall variance is reduced, giving us a better estimate. In the 1-D case there are conditions on ff and ggwhich don’t depend on “knowing the answer we’re trying to use Monte Carlo methods to find out” – for this covariance to be positive when we deliberately set {x}={y}\{x\}=\{y\} (ie, when we use Common Random Numbers).

In the cases where a “run” uses a vector of random numbers it’s more unclear. It is definitely the case that

  1. You have to use the same common random numbers at the same point in the simulation run for each different system. This can sometimes be a tricky piece of software engineering (particularly if you are using PRNGs and regenerating the same sequence of random numbers as they’re used rather than storing the numbers for reuse in later runs).

  2. It’s not clear what the requirements are on ff and gg to guarantee positive covariance in the vector case, and if we end up with negative covariance we’ll end up with a potentially worse estimate than if we’d used different random numbers.

Any edits correcting errors in the above gratefully accepted

Intuitive justification

Consider ff and gg as “competitors” so that their difference measures “better performance”. If one does this by testing them independently on random conditions one may get better “coverage of underlying conditions” but it’s obscured what the breakdown between the “better” competitors performance is due to easier conditions and actual intrinsic ability. In contrast, if you test them both on the same conditions, you will see a lesser variety of underlying conditions and you might miss those where the “loser” is actually better than the “winner”, but you do know that in those cases you’ve tested the difference is purely due to skill.

Practical impact

In time constrained estimation conditions where one believes the positive covariance underlying Common Random Variables applies, there are several practical reasons for using it:

  1. Simply by using it one obtains a lower variance estimate for the same computational cost.

  2. On parallel/SIMD computers it may be cost less to simultaneously simulate NN evaluations than the cost of NN serial evaluations, providing one can use the same random variables. (This particularly applies to processes with stochastic branching/choices.) Thus even better estimates may be obtained within a fixed computational budget.


Further refs TODO.