Last week in the Chandoo.org Forums a user asked a question
“How do I convert a Roman Numeral to a Number eg: MMMCCCLVII to 3357”
User Xlstime presented the solution of:
=MATCH(A2,INDEX(ROMAN(ROW(INDIRECT(“1:3999”))),0),0)
Today we are going to look at how and why that simple formula works
As always at Formula Forensics you can follow along using a sample file: Download Sample File
Solution
Excel has a Roman function wherein =Roman(3357, 0) will return MMMCCCLVII
I knew there was no such reverse Roman function prior to Excel 2013 and so my initial thought was to look at a VBA Solution.
See notes on the Arabic Excel function at the end of the post.
However Xlstime presented =MATCH(A2,INDEX(ROMAN(ROW(INDIRECT(“1:3999”))),0),0)
How does this work?
lets start by pulling it apart from the inside out
=MATCH(A2,INDEX(ROMAN(ROW(INDIRECT(“1:3999”))),0),0)
The Indirect Function simply takes its inputs and converts them to a Range, in this case 1:3999.
We will study why 3999 later
=MATCH(A2,INDEX(ROMAN(ROW(INDIRECT(“1:3999”))),0),0)
The next function working out is Row()
Excel will convert the function ROW(INDIRECT(“1:3999”)) to an array of Row Numbers
={1;2;3;4;5;6;7;8;9;10;11; …. 3995;3996;3997;3998;3999}
You can see this if you goto cell D7 in the Sample File, press F2 and then F9
I limited the numbers to 300 as Excel cannot display more than 8,192 digits
Stepping out one more function:
=MATCH(A2,INDEX(ROMAN(ROW(INDIRECT(“1:3999”))),0),0)
The Roman() function converts its inputs into Roman Numbers
eg: Roman(58) will return LVIII
But as we are feeding it an array of numbers from 1 to 3999 Excel handles all these and converts them to an Array of Roman Numbers
Goto D9 in the sample file =ROMAN(ROW(INDIRECT(“1:300”))) press F2 and then F9
Excel returns an array of roman numbers
={“I”;”II”;”III”;”IV”;”V”;”VI”;”VII”;”VIII”; … “CCXCVI”;”CCXCVII”;”CCXCVIII”;”CCXCIX”;”CCC”}
We have limited the example to 300 as Excel cannot display more than 8,192 characters when processing a Function using F9.
Stepping out one more function
=MATCH(A2,INDEX(ROMAN(ROW(INDIRECT(“1:3999”))),0),0)
The Excel Index() function is taking the Array of Roman Numerals and Converting it into a single Column array
This isn’t technically needed but it simplifies the solution
If you goto cell D11 in the sample file =INDEX(ROMAN(ROW(INDIRECT(“1:3999”))),0) press F2 and then F9
Excel returns an array of roman numbers
={“I”;”II”;”III”;”IV”;”V”;”VI”;”VII”;”VIII”; … “CCXCVI”;”CCXCVII”;”CCXCVIII”;”CCXCIX”;”CCC”}
This is exactly the same as the previous output from the Roman() function above, Except that it is now a Single Vertical Array. This is important for the next function.
Stepping out one more function
=MATCH(A2,INDEX(ROMAN(ROW(INDIRECT(“1:3999”))),0),0)
We can see here that the Array of Roman Numerals is now being fed into a Match() function.
Match uses the Syntax =Match(Lookup value, Lookup Array, Match Type)
Match returns the position of the Lookup value within the array
So in our example
The Lookup value is A2 or our Roman Numeral MMMCCCLVII
The Lookup Array is an array of Roman Numerals from 1 to 3999
={“I”;”II”;”III”;”IV”;”V”;”VI”;”VII”;”VIII”; … “MMMCMXCV”; “MMMCMXCVI”; “MMMCMXCVII”; “MMMCMXCVIII”; “MMMCMXCIX”}
and the Match Type is 0 or an exact Match
So the Match function will lookup the value MMMCCCLVII in the array and find it in position number 3357, which happens to correspond to the Number of the Roman Numeral and Return 3357 as the result.
Why are we limited to 3999 numbers.
The Excel Roman() function is limited to numbers up to 3999
Why 3999?
Because in Roman Numerals there is no Letter for 5,000 and 4,000 would be shown as M before the Letter for 5,000.
Why did we need the Index() function?
If you goto D17 in the sample file you will see the formula:
=MATCH(A2,ROMAN(ROW(INDIRECT(“1:3999”))),0)
It is the same formula as above but without the Index() function
You will see that it is returning a #VALUE! error
If you edit the formula with F2 and then press F9 to process the function you will see it now shows 3357
What s happening here?
Pressing F9 is the same as Array Entering the Function
so if you edit the function pressing F2 and now Array Enter the function by pressing Ctrl+Shift+Enter, excel now returns 3357
The Index() function puts a wrapper around the array for processing by the Match() function and so Array Entering is avoided.
Most array formulas that require Ctrl+Shift+Enter can be rewritten incorporating an INDEX wrapper and will not require the Ctrl+Shift+Enter confirmation.
The Excel Arabic Function
In 2013, Microsoft introduced the Arabic function to Excel
To use simply use =Arabic(A2) or =Arabic(“MMMCCCLVII”)
You can read about the Syntax of the function in the Excel Help.
Download
You can download a copy of the above file and follow along, Download Sample File.
A Challenge
Can you solve the problem another way ?
Post your solutions in the comments below.
Other Posts in this Series
The Formula Forensics Series contains a wealth of useful solutions and information specifically about how Normal Formula and specifically Array Formula work.
You can learn more about how to pull Excel Formulas apart in the following posts: http://chandoo.org/wp/formula-forensics-homepage/
If you have a formula and you want to understand how it works contact Hui and it may be featured in future posts.
6 Responses to “Formula Forensics No. 041 – Convert a Roman Numeral to a Number”
shortened it a bit but same principle
=SUMPRODUCT(MATCH(A2,ROMAN(ROW(1:3999),0),0))
minus the sumproduct
entered as an array
=MATCH(A2,ROMAN(ROW(1:3999),0),0)
Minus the optional ROMAN argument.
=MATCH(A2,ROMAN(ROW(1:3999)),0)
Thank you Chandoo for publishing this.
Hi Chandoo,
Great post.
Question: why would you include the INDIRECT function in the formula? I tried =MATCH(A2,INDEX(ROMAN(ROW(1:3999)),0),0) and this gives the same result.
Regards,
Herbert
The formula from the post is remarkably similar to one i posted back in 2007:
http://www.pcreview.co.uk/threads/convert-roman-to-arabic-numerals.3271425/
Interestingly, it actually seems to run slightly faster and use less memory with the additional INDEX() function inserted since the array formulas have a small overhead associated so i agree with Hui that it is probably a good idea imo remove CSE.
Another possibility which is longer but significantly more efficient:
=SUMPRODUCT(LEN(C1)-LEN(SUBSTITUTE(C1,{"M","D","C","L","X","V","I";"CM","CD","XC","XL","IX","IV",0},"")),
{1000,500,100,50,10,5,1;-100,-100,-10,-10,-1,-1,0})
Credit is due to Harlan Grove from which it was adapted.
http://www.pcreview.co.uk/threads/function-for-roman-numerals.1779148/
As chandoo says, the ARABIC function is definitely the way to go in 2013 version and above.