• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Search for multiple texts in cell, return specific text

mariur89

New Member
Ok, this is a great challange! Honor to the one who solves this one :)

I'm making a budget.
I want to sort my spendings into categories. My spendings are downloaded as a spreadsheet from my bank. It looks like this:

Details Sum
Category
VISA Mc Donalds -130,00
766 Statoil -640,00
76234 TAXI 201 -357,00
BurgerKing -130,00

The categories in this example would be:
1: lunch
2: gas

Now, I want to search for multiple strings in the A-column and return the correct category for each spending. F.x: I will search for texts like "donalds","burger","oil". If no match is found, the return value should be an empty string. The result should look like:

Details Sum Category
VISA Mc Donalds -130,00 lunch
766 Statoil -640,00 gas
76234 TAXI 201 -357,00
BurgerKing -130,00 lunch

I have tried =IFERROR(IF(SEARCH("*donalds*";A1);"lunch";"");""), but I have only been successful with searching for one string. Using macro is an option, but I want to try solving this one without use of macro.

I'll be greatful for all the help you can provide!
 
Hi Mate - Here is one more way....try it in the same workbook as attached earlier...in post no3

=A2&" "&IFERROR(INDEX($F$1:$F$3,MATCH(1,IF(ISERROR(SEARCH($E$1:$E$3,$A2)),0,1),0)),"")
 
Bear in mind that my approach might become quite slow on very large datasets if you have many, many search terms and/or very long lists of transactions that you want to categorise, because my approach searches every row in the Details column for every single one of the items in the Lookup table.

Take the formula in cell D3. What it does is search the cell B3 for every one of the search terms in the Search Term category of the Lookup table. In my dataset, there's 60 rows in the lookup table, so that formula in D3 does the equivalent of 60 individual searches on B3.

Given I have 150 rows of transactional data, that means that 150 * 60 = 9000 searches are being performed. If your transactions list and your lookup list was to grow to be very large, then you might notice that Excel struggles to do all the calculation. In that case, you've got two options:
1. After you've used the formula to categorise data, then use copy/paste special to turn those formulas into values, as there's no point having them recalcuating again and again when nothing has changed; or
2. Use a purpose-built User Defined Function, as it will probably be faster. I'm working on a UDF, and will post a link to it here once I have it ready. I don't know when that will be.
 
Hi Jeff,

You could also use LOOKUP there without CSE and it will be faster.
=LOOKUP(99,SEARCH(Categories[Search Term],Table1[[#This Row],[Details]]),Categories[Category])
Note: I have written formula in Excel 2007 which probably differs with Excel 2010 version onwards for tables.

Or in normal range terms it would be
=LOOKUP(99;SEARCH($G$3:$G$62;B3);$H$3:$H$62)

There are couple of differences in the results which are due to the function returning different word found out first.
 
Hi Shrivallabha. I never thought of using LOOKUP. Great stuff.

Seems the difference in things returned is purely down to sort order in the lookup list. For instance, given the lookup terms Tax, Taxi, and Taxidermist, then my formula requires the lookup list to be sorted Z to A whereas yours requires it to be sorted A to Z.

I wonder if there's any 'gotchas' of either approach to be aware of.
 
@shrivallabha Might be safer to replace 99 with a larger number, or even 9^9 (which will be way larger than the largest list that this would ever work on, to be sure).

@mariur89 Shrivallabha's formula is better than mine, so use that one. (And I'll put it in the book).
 
@shrivallabha: Great solution.

As a Newbie, I'd like to understand and learn a bit more. Why does your formula does not need to enter wildcards and still is able to find the different categories.

For you it might be a simple thing, but as I said, just trying to learn something.
 
Hi Jeff,

In real world situations I have used it only couple of times and that too for someone else in the office. However, I have used it here on Chandoo's and MrExcel wherever I found this situation.

I think it was Aladin Akyurek who introduced this formula. I just merely stole it and now use it. Since LOOKUP is binary and doesn't require CSE it will be faster.

I have seen people use different values where I have used 99 e.g. some use 2^15(32768) which is just higher than maximum number of characters in one cell as per Excel specification i.e. 32767. My take is that people if they choose to understand the construct can use any suitable value.

----

Hi Mahir,

As I have mentioned above I will provide you a link where Aladin Akyurek has explained this.
http://www.mrexcel.com/forum/excel-questions/102091-9-9999999-a.html
http://www.mrexcel.com/forum/showthread.php?t=18954
http://www.mrexcel.com/forum/showthread.php?t=310278 (Post #7)

That should clarify the logic behind the basic idea of behind the construct. Here's brief explanation for correlation.

Our objective is to find to a word / phrase in the cell content and then show the value that correlates to it. So to do this we use LOOKUP as it can handle arrays without having to do CSE.

LOOKUP uses three arguments.
Arg 1: LOOKUP value: This should be selected as a value which is higher than the maximum value expected in Arg 2. If this condition is not fulfilled then the function may return error in some cases. We are using 99.

Arg 2: LOOKUP Vector: To build this vector up we use another function SEARCH in following manner. SEARCH is not case sensitive. If search was case sensitive we have to use FIND. SEARCH as it is commonly used returns the position of match found where.
=SEARCH("String to search","String being searched","Start position which is optional")
When we pass a multi cell range to first argument to SEARCH/FIND function it generates an array of result as if it was done on several cells individually. It returns an array like {#value;1;#value;#value;#value}. This is where LOOKUP's second virtue comes into picture. It simply ignores errors and doesn't need error handler. Now if we were searching using 99 it will return the second entry (i.e. 1) as it is smaller than Argument 1.

Arg 3: optional RESULT Vector: We use this to return the entry next to the MATCH found by the LOOKUP construct. Suppose this array was {"a";"b";"c";"d";"e"} then it would return "b" vis-a-vis Lookup vector.

Hope this helps you.
 
@shrivallabha: Thank you for the great explanation. Aladin Akyurek must be the Guru of Excel Formula's. He is mentioned so many times in Mike Girvin's Mastering Excel Array Formula's.

Thank you also for the links. I will look through them.

Have a good day!
 
@shrivallabha: in terms of Arg1, I see at http://excelxor.com/2014/10/20/extracting-numbers-from-a-string-1-consecutive-numbers-at-start/ that XORLX has perhaps a better approach:

=-LOOKUP(1,-LEFT(A1,ROW(INDIRECT("1:"&LEN(A1)))))


Multiplying this array by -1 (which is effectively what preceding the array by a single minus sign does) results in:

=-LOOKUP(1,{-1;-12;-123;#VALUE!;#VALUE!;#VALUE!})
For readers not sure how LOOKUP operates, it is sufficient here to point out that, if the lookup_value is not found within the lookup_vector, and providing that there are no values within the lookup_vector greater than the lookup_value, then the function returns the last (in this case numerical) entry from the lookup_vector. The function also has the useful property that it ignores any error values within the lookup_vector.

This then was the reason for first negating the values in the lookup_vector, which, by creating an array consisting of either negative numbers, zeroes (it’s possible our desired extraction could begin with 0, e.g. 0123ABC) or error values, ensures that a lookup_value of 1 will always be a sufficient and legitimate choice.

Here, for example, since 1 is not found within the lookup_vector (and cannot be, if you think about it, unless we had some initial string such as e.g. -123ABC, which I will not be considering in this post), then the formula returns the last numerical value in the array, i.e. -123.

Of course, this is by no means the only way to handle this construction; we could equally leave the lookup_vector un-negated. Here, though, we would have to spend a little thought to ensure that our choice of lookup_value is sufficiently large (1 will always be such in the construction I chose).

Not that this is difficult, and indeed you will more than likely see this un-negated construction in practice, using a lookup_value of either the so-called “Big Num” (i.e. 9.99999999999999E+307, the largest allowed positive number in Excel) or, as some prefer, a mere “very large” value such as 10^10 (which has the benefit of being not so unwieldy-looking as “Big Num”).

By first negating values in the lookup_vector, and so allowing a choice of unity for the lookup_value, I like to think that there is some efficiency, even simplicity, in the set-up I chose perhaps lacking in the alternative constructions given above. Of course, this is just a personal preference, and readers should obviously choose whatever values they wish, providing of course that those choices are rigorous.
 
Hey if I have Several categories,
Ex: Apple, Boy, Cat, Doll, Pineapple

I want to go through any given text and pull out these categories from it, if any.
Ex: the boy has the pineapple.

I want to get "boy,pineapple"
 

Attachments

  • Chandoo.several_text.xlsx
    8.6 KB · Views: 6
Back
Top