Xiq
Active Member
Hello great people on the Chandoo-forums!
I'm trying to match different databases with conference titles (50k+ rows). The problem for me is that there is no consistency with the numerals. Also they are not on a fixed position within a text string. Any idea's on how to fix this?
An example of conference titles:

After some clean up (like TRIM and SUBSTITUTE), I tried a few big, slow and ugly formulas :D. See below:
Helper Formula A
Result: location of the character that looks like a Roman numeral {"x","v","i"} AND has {"st ","th ","rd ","nd "} after the potential Roman numeral.
Helper Formula B
Formula A = result of Formula A
Result: location of the last space, but before the potential Roman numeral.
Helper Formula C
ConferenceT = the cleaned up title
Formula x = result of Formula x
Result: Isolating the potential text string e.g. “XVII”
Helper Formula D
Result: checking if the isolated text string is indeed a Roman numeral
Helper Formula E
Result: If true, try to match the text string with every Roman numeral between 1 and 39 (at 40 you get the letter L in your Roman numeral).
Final Formula F
I'm trying to match different databases with conference titles (50k+ rows). The problem for me is that there is no consistency with the numerals. Also they are not on a fixed position within a text string. Any idea's on how to fix this?
An example of conference titles:
- XXIVth International Congress of Chandoo and Alike
- 9th Biennial Conference on Excel and Questions
- Conference of the 20th Random-Name-and-Formulea
- Workshop for Great People with XIVth Version of Excel
- Joint Symposium for Me, You and Them/XXIInd Conference for the Rest
- III Seminar of the Numerals and You
After some clean up (like TRIM and SUBSTITUTE), I tried a few big, slow and ugly formulas :D. See below:
Helper Formula A
Code:
{=MATCH(1,MMULT(--ISNUMBER(FIND({"st ","th ","rd ","nd "},MID(ConferenceT,ROW(OFFSET(A$1,,,LEN(ConferenceT))),{4}))),{1;1;1;1})*(MMULT(--(MID(ConferenceT,ROW(OFFSET(A$1,,,LEN(ConferenceT))),{1})={"x","v","i"}),{1;1;1})),)}
Helper Formula B
Code:
=FIND("^^",SUBSTITUTE(LEFT(ConferenceT, Formula A)," ","^^",LEN(LEFT(ConferenceT, Formula A))-LEN(SUBSTITUTE(LEFT(ConferenceT, Formula A)," ",""))))
Result: location of the last space, but before the potential Roman numeral.
Helper Formula C
Code:
=RIGHT(LEFT(ConferenceT, Formula A, Formula A - Formula B)
Formula x = result of Formula x
Result: Isolating the potential text string e.g. “XVII”
Helper Formula D
Code:
{=SUM(MMULT(--(MID(Formula C,ROW(OFFSET(A$1,,,LEN(Formula C))),{1})={"x","v","i"}),{1;1;1}))=LEN(Formula C)}
Helper Formula E
Code:
{=IF(Formula D =TRUE,MATCH(Formula C,ROMAN(ROW($1:$39)),),NA())}
Final Formula F
Code:
=IFERROR(LEFT(ConferenceT, Formula B)& Formula E &RIGHT(ConferenceT,LEN(ConferenceT)- Formula A,ConferenceT)