fbpx

Handling spelling mistakes while searching your data using excel

Share

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn

Dilbert.com

Spelling mistakes are a thing of day to day carporate life. Most of the data in spreadsheets is entered by people and hence prone to having spelling mistakes or alternate spellings. For eg. a person named John could have been spelled as Jon. And when John calls you back to confirm his reservation and you use the search / vlookup to find his information the result would empty.

handling-spelling-mistakes-data-excel-vba-udfHere is one technique that I use often when the data has spelling mistakes or I need to do fuzzy search to fetch items that sound or spelled similar. Take the 2 texts you want to compare and,

  • Remove all the vowels – AEIOU
  • Replace PH with F, Z & J with G, CK with K, W with V, LL with L, SS with S
  • Remove any Hs
  • Finally compare both texts

To simplify the above 4 steps I have written a small VBA UDF (User Defined Function) that takes a text parameter and performs the above 4 steps.


Function SimpleText(thisTxt As String) As String
' this function generates a simple text from input text that
' can be used for fuzzy search
thisTxt = LCase(thisTxt)
thisTxt = Replace(thisTxt, "a", "")
thisTxt = Replace(thisTxt, "e", "")
thisTxt = Replace(thisTxt, "i", "")
thisTxt = Replace(thisTxt, "o", "")
thisTxt = Replace(thisTxt, "u", "")
thisTxt = Replace(thisTxt, "ph", "f")
thisTxt = Replace(thisTxt, "z", "g")
thisTxt = Replace(thisTxt, "ck", "k")
thisTxt = Replace(thisTxt, "w", "v")
thisTxt = Replace(thisTxt, "j", "g")
thisTxt = Replace(thisTxt, "ll", "l")
thisTxt = Replace(thisTxt, "ss", "s")
thisTxt = Replace(thisTxt, "h", "")
SimpleText = thisTxt
End Function

The above code can be used to perform fuzzy text searches or searches on unclean data. Of course, the above substitution rules are what I find good enough. Feel free to define additional rules as per your needs so that your fuzzy searches work even better.

If you are looking for generating SOUNDEX codes for excel strings you can use this excel soundex UDF. Soundex codes are phonetic codes generated for words based on how they sound, thus 2 words sounding similar (for eg. excess, access) would have same soundex code. You can use these codes to perform fuzzy searches.

More on text processing using excel:

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn

Share this tip with your colleagues

Excel and Power BI tips - Chandoo.org Newsletter

Get FREE Excel + Power BI Tips

Simple, fun and useful emails, once per week.

Learn & be awesome.

Welcome to Chandoo.org

Thank you so much for visiting. My aim is to make you awesome in Excel & Power BI. I do this by sharing videos, tips, examples and downloads on this website. There are more than 1,000 pages with all things Excel, Power BI, Dashboards & VBA here. Go ahead and spend few minutes to be AWESOME.

Read my storyFREE Excel tips book

Chandoo is an awesome teacher
5/5

– Jason

Excel formula list - 100+ examples and howto guide for you

From simple to complex, there is a formula for every occasion. Check out the list now.

Calendars, invoices, trackers and much more. All free, fun and fantastic.

Still on fence about Power BI? In this getting started guide, learn what is Power BI, how to get it and how to create your first report from scratch.

8 Responses to “Handling spelling mistakes while searching your data using excel”

  1. James says:

    As ever, a great "practical" example that is easy to customise / add to :>) I think I will add "K -> C" so that Katherine is matched to Catherine.

    Is similar to the Metaphone function which is an improvement on SOUNDEX.

    I need to do this thing, though, with a few million records :>(

    I'd love to see more practical data cleaning "how to's" e.g. transforming and standardizing phone numbers

    Input: (301) 754-6350
    Transform: (999)999-9999
    Output: 301 | 754 | 6350

  2. Chandoo says:

    @James : thanks very much. Cleaning up phone numbers is a good idea. I will write about it sometime.

  3. Alex J says:

    Chandoo,
    Did you realize that your blog article spells "corporate life as "carporate life".

    Ironic, no? Shouldn't blogs have speel checkers (sic)? How about blog comment boxes?

  4. Chandoo says:

    @Alex... you noticed! That was intentional... believe it or not... it is meant be situational humor. 🙂

  5. Alex J says:

    so was my comment about "speel checkers" 🙂

  6. [...] Handling spelling mistakes in your data Splitting text using excel formulas Generating initials from names using excel Adding a range of cells using Concat() [...]

  7. JP says:

    Ross over at Methods In Excel has a post about fuzzy matching. I posted some code and some links there, and Ross has a workbook with some of the more popular methods (Levenshtein, Soundex, etc).

    http://www.blog.methodsinexcel.co.uk/2008/09/17/fuzzy-logic/

    Enjoy,
    JP

  8. [...] I have used fuzzyText UDF (user defined formula) so that we can search against this list even when you have a spelling mistake in the fund name. For more information see fuzzy text search using excel. [...]

Leave a Reply