Handling spelling mistakes while searching your data using excel
Spelling mistakes are a thing of day to day carporate life. Most of the data in spreadsheets is entered by people and hence prone to having spelling mistakes or alternate spellings. For eg. a person named John could have been spelled as Jon. And when John calls you back to confirm his reservation and you use the search / vlookup to find his information the result would empty.
Here is one technique that I use often when the data has spelling mistakes or I need to do fuzzy search to fetch items that sound or spelled similar. Take the 2 texts you want to compare and,
- Remove all the vowels – AEIOU
- Replace PH with F, Z & J with G, CK with K, W with V, LL with L, SS with S
- Remove any Hs
- Finally compare both texts
To simplify the above 4 steps I have written a small VBA UDF (User Defined Function) that takes a text parameter and performs the above 4 steps.
Function SimpleText(thisTxt As String) As String
' this function generates a simple text from input text that
' can be used for fuzzy search
thisTxt = LCase(thisTxt)
thisTxt = Replace(thisTxt, "a", "")
thisTxt = Replace(thisTxt, "e", "")
thisTxt = Replace(thisTxt, "i", "")
thisTxt = Replace(thisTxt, "o", "")
thisTxt = Replace(thisTxt, "u", "")
thisTxt = Replace(thisTxt, "ph", "f")
thisTxt = Replace(thisTxt, "z", "g")
thisTxt = Replace(thisTxt, "ck", "k")
thisTxt = Replace(thisTxt, "w", "v")
thisTxt = Replace(thisTxt, "j", "g")
thisTxt = Replace(thisTxt, "ll", "l")
thisTxt = Replace(thisTxt, "ss", "s")
thisTxt = Replace(thisTxt, "h", "")
SimpleText = thisTxt
The above code can be used to perform fuzzy text searches or searches on unclean data. Of course, the above substitution rules are what I find good enough. Feel free to define additional rules as per your needs so that your fuzzy searches work even better.
If you are looking for generating SOUNDEX codes for excel strings you can use this excel soundex UDF. Soundex codes are phonetic codes generated for words based on how they sound, thus 2 words sounding similar (for eg. excess, access) would have same soundex code. You can use these codes to perform fuzzy searches.
More on text processing using excel:
- Split text using excel formulas
- Get initials from names
- Access & process RSS Feed data from your excel sheets
My name is Chandoo. Thanks for dropping by. My mission is to make you awesome in Excel & your work. I live in Wellington, New Zealand. When I am not F9ing my formulas, I cycle, cook or play lego with my kids. Know more about me.
Thank you and see you around.
Leave a Reply
|« Sorting a list of items in random order in excel – using formulas||Save a quarter buck everytime you buy coffee at starbucks »|