• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Histogram of Monte Carlo Simulation

Montecristo

New Member
Great site! First post.


How do you plot a histogram with a large number of simulations, for example, 5000 results.


In some examples I reviewed, I noticed the results are compiled in buckets. Is this the proper way to do it, while keeping in mind the type of distribution?


Thanks in advance.
 
Montecristo


Firstly, Welcome to the Chandoo.org Forums


Buckets is exactly the way to go


Typically 20 is a good number of buckets, but the more buckets the finer the detail but the less samples will be in each bucket


So each bucket will be roughly (Max-Min)/20 wide


Then use a Sumproduct() or Countifs() to count the number of solutions in the range of each bucket


If you want to post a file I can give more specific advice
 
Awesome. Thanks a lot!


Is there any theoretical basis for the number of buckets? I'm not a statistics expert. Any potential skew by using the wrong number of buckets?
 
Sorry for the delay in getting back to you


The answer depends on what answer you want to get?


If you are only after the probability of getting below /above a certain value, you can set that value as the edge of a bucket and then the summation of the output data will give you the answer directly in 1 or 2 buckets


If you are interested in the outlying results at +/- 3, 4 or 5 SD's you will need to run many more than iterations to achieve the same results.


What is more important is that you have sufficient number of points to allow an accurate curve of the distribution to be generated.


So to that end you need enough samples in each bucket to ensure that that bucket is being represented correctly.


AS I said above if you are interested in the data beyond +/-3, 4 or 5SD's or more you need many more samples than if your interested in the data between -3,+3 SD's


My rules of thumb:

If you are interested in data that is

Outside

1SD you'll need to run 100 iterations

2SD's you'll need to run 600 iterations

3SD's you'll need to run 12000 iterations

4SD's you'll need to run 500,000 iterations

5SD's you'll need to run 50,000,000 iterations


So I mostly run 10,000 iterations on models and then think about the implications of what is beyond that and how important it is to the system I'm modelling


The above is my rule of thumb based on needing 30 samples in the area beyond the SD boundary limits listed to adequately define a random variable.


The other variable here is what is your model and Standard Deviation data based on?

Don't get overly hung up on outliers beyond +/- 3SD's if your SD is based on an estimate or your model is based on data that is +/-25% anyway.


Disclaimer: I'm an engineer, not a Mathematician or Statistician. If your making business decisions based on this style of work / analysis, I'd suggest you seek the relevant professional advice.
 
Back
Top