• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

How to find and convert Roman numerals?

Xiq

Active Member
Hello great people on the Chandoo-forums!

I'm trying to match different databases with conference titles (50k+ rows). The problem for me is that there is no consistency with the numerals. Also they are not on a fixed position within a text string. Any idea's on how to fix this?

An example of conference titles:
  • XXIVth International Congress of Chandoo and Alike
  • 9th Biennial Conference on Excel and Questions
  • Conference of the 20th Random-Name-and-Formulea
  • Workshop for Great People with XIVth Version of Excel
  • Joint Symposium for Me, You and Them/XXIInd Conference for the Rest
  • III Seminar of the Numerals and You
Also: the German word "ist" is of course not a numeral o_O


After some clean up (like TRIM and SUBSTITUTE), I tried a few big, slow and ugly formulas :D. See below:
Helper Formula A
Code:
{=MATCH(1,MMULT(--ISNUMBER(FIND({"st ","th ","rd ","nd "},MID(ConferenceT,ROW(OFFSET(A$1,,,LEN(ConferenceT))),{4}))),{1;1;1;1})*(MMULT(--(MID(ConferenceT,ROW(OFFSET(A$1,,,LEN(ConferenceT))),{1})={"x","v","i"}),{1;1;1})),)}
Result: location of the character that looks like a Roman numeral {"x","v","i"} AND has {"st ","th ","rd ","nd "} after the potential Roman numeral.

Helper Formula B
Code:
=FIND("^^",SUBSTITUTE(LEFT(ConferenceT, Formula A)," ","^^",LEN(LEFT(ConferenceT, Formula A))-LEN(SUBSTITUTE(LEFT(ConferenceT, Formula A)," ",""))))
Formula A = result of Formula A

Result: location of the last space, but before the potential Roman numeral.

Helper Formula C
Code:
=RIGHT(LEFT(ConferenceT, Formula A, Formula A - Formula B)
ConferenceT = the cleaned up title
Formula x = result of Formula x

Result: Isolating the potential text string e.g. “XVII”

Helper Formula D
Code:
{=SUM(MMULT(--(MID(Formula C,ROW(OFFSET(A$1,,,LEN(Formula C))),{1})={"x","v","i"}),{1;1;1}))=LEN(Formula C)}
Result: checking if the isolated text string is indeed a Roman numeral

Helper Formula E
Code:
{=IF(Formula D =TRUE,MATCH(Formula C,ROMAN(ROW($1:$39)),),NA())}
Result: If true, try to match the text string with every Roman numeral between 1 and 39 (at 40 you get the letter L in your Roman numeral).

Final Formula F
Code:
=IFERROR(LEFT(ConferenceT, Formula B)& Formula E &RIGHT(ConferenceT,LEN(ConferenceT)- Formula A,ConferenceT)
 

Attachments

Off topic: you can see I had some fun using the tricks I found at the Formula-Challenge section :p
 
At least I know it's a real challenge if even Excel Ninja's say so :D
The VBA functions look interesting, they should speedup the calculation time. Because the piece of formula I'm using (and also posted on the forum-link you gave me) is quite heavy.
 
Wow! That looks great Haseeb!
I wouldn't have thought to simply create an array of numerals to search for (and I was first a bit puzzled on how match could give me the actual numeral it found in the text-string :p )

Is there also a way to substitute the found numeral with the converted one?
I.g.: "XXth Conference" --> "20 Conference"; or even betther, "XXth Conference" --> "20th Conference"
I found a few problems with the results that need fixing. Basically every char+space that looks like a Roman numeral (but isnt) is giving me the wrong result. Also the years in the text-string can give a problem.
 
Here is another problem I've found:

"XXIVth International Congress of ..." results 14, but should be 24
"IVth Meeting of the ..." results in 5, but should be 4

You can't simply use small or large functions, but maybe the largest count of symbols?
 
Xiq,

For me it's very hard to separate roman characters from a string. I think VBA might be a good option, or a good brain guy come with crazy formula. I am zero in VBA. Hope some one will help you with VBA.

Anyway; try this Array Formula. It works in posted samples.

=IFERROR(LOOKUP(40,VLOOKUP(40,MATCH(RIGHT(TRIM(MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A2,"st"," "),"nd"," "),"rd"," "),"th"," ")," ",REPT(" ",200)),ROW(A$1:A$20)*200-199,200)),{1,2,3,4,5,6}),ROMAN(ROW(A$1:A$39)),0),{1,2,3,4,5,6})),MATCH(40,1/MMULT(ISNUMBER(FIND(ROW(A$1:A$39)&{"st ","nd ","rd ","th "},A2))+0,{1;1;1;1})))
 
You have me scratching my head here :confused:, really wonderful piece of magic!

Trying to understand what you have made, but I have no clue how you used "LOOKUP(40,{...}" to find the correct result...

In anycase, thanks a lot Haseeb! This indeed helps :D
 
Back
Top