Histogram of Monte Carlo Simulation

Montecristo · Jan 26, 2013

Great site! First post.

How do you plot a histogram with a large number of simulations, for example, 5000 results.

In some examples I reviewed, I noticed the results are compiled in buckets. Is this the proper way to do it, while keeping in mind the type of distribution?

Thanks in advance.

Hui · Jan 26, 2013

Montecristo

Firstly, Welcome to the Chandoo.org Forums

Buckets is exactly the way to go

Typically 20 is a good number of buckets, but the more buckets the finer the detail but the less samples will be in each bucket

So each bucket will be roughly (Max-Min)/20 wide

Then use a Sumproduct() or Countifs() to count the number of solutions in the range of each bucket

If you want to post a file I can give more specific advice

Montecristo · Jan 26, 2013

Awesome. Thanks a lot!

Is there any theoretical basis for the number of buckets? I'm not a statistics expert. Any potential skew by using the wrong number of buckets?

Hui · Jan 29, 2013

Sorry for the delay in getting back to you

The answer depends on what answer you want to get?

If you are only after the probability of getting below /above a certain value, you can set that value as the edge of a bucket and then the summation of the output data will give you the answer directly in 1 or 2 buckets

If you are interested in the outlying results at +/- 3, 4 or 5 SD's you will need to run many more than iterations to achieve the same results.

What is more important is that you have sufficient number of points to allow an accurate curve of the distribution to be generated.

So to that end you need enough samples in each bucket to ensure that that bucket is being represented correctly.

AS I said above if you are interested in the data beyond +/-3, 4 or 5SD's or more you need many more samples than if your interested in the data between -3,+3 SD's

My rules of thumb:

If you are interested in data that is

Outside

1SD you'll need to run 100 iterations

2SD's you'll need to run 600 iterations

3SD's you'll need to run 12000 iterations

4SD's you'll need to run 500,000 iterations

5SD's you'll need to run 50,000,000 iterations

So I mostly run 10,000 iterations on models and then think about the implications of what is beyond that and how important it is to the system I'm modelling

The above is my rule of thumb based on needing 30 samples in the area beyond the SD boundary limits listed to adequately define a random variable.

The other variable here is what is your model and Standard Deviation data based on?

Don't get overly hung up on outliers beyond +/- 3SD's if your SD is based on an estimate or your model is based on data that is +/-25% anyway.

Disclaimer: I'm an engineer, not a Mathematician or Statistician. If your making business decisions based on this style of work / analysis, I'd suggest you seek the relevant professional advice.

Histogram of Monte Carlo Simulation

Montecristo

New Member

Hui

Excel Ninja

Montecristo

New Member

Hui

Excel Ninja