• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Extract text string from in between two _ ?

BenefitRich

New Member
Greetings all!

I have a bunch of cells that are lists of filenames that all follow the same format:

[pre]
Code:
DFC_000094_AUST_bill.xls
DFC_000095_BRAZ_bill.xls
DFC_000096_CANA_bill.xls
DFC_000097_CHIL_bill.xls[/pre]
I need to extract the four letter code that is in between the 2nd and 3rd "_" for each. AUST, BRAZ, CANA, CHIL


Right now, the best way I have to do this is as follows, but I have to imagine there's a better way: [Assuming the filename is in cell A1]:

=MID(A1,(SEARCH("_",A1,(SEARCH("_",A1)+1))+1),((SEARCH("_",A1,(SEARCH("_",A1,(SEARCH("_",A1)+1))+1)))-(SEARCH("_",A1,(SEARCH("_",A1)+1))+1)))


Where I use the Mid function and find the first "_" with a Search, then add 1 to that position and embed that in another Search, but this time it will start right after the 1st "_" and so it will give me the 2nd "_". To get the # of spaces for the MID to go over, I then subtract the position of the 2nd "_" from the position of the 3rd "_" which I get using the same recursive search route.


Can you think of a better way?


Thanks in advance!
 
Hi BenefitRich,


Can u plz go through this link and see if you can do it by yourself (I am sure you can!)?


http://chandoo.org/wp/2012/06/19/extract-numbers-from-text-excel/


This article has a sample workbook (downloadable) as well.


Post back in case you face any challenge in adapting the technique.


Kaushik
 
Hi BenefitRich..


By three most important line from your code..

* all follow the same format

* the four letter code

* and "_" for each


Finally I end up with a not fully dependable formula.. :)


Code:
=MID(A1,SEARCH("_????_",A1)+1,4)


Regards,

Deb
 
oops.. Montrey.. I knew I have choosen a shortcut.. but you reached there before me.. :)


Regards,

Deb
 
No problem Deb.

My formula has assumptions like yours too! :)

It assumes only 4 letters will be extracted and the text string always ends with "_bill.xls"


If Rich informs us otherwise we can help him out. If the above assumptions I mentioned hold true. then my formula will work every time. Here it is again for easy viewing.

=LEFT(RIGHT(A1,13),4)
 
Hi, BenefitRich!

If Debraj Roy and Montrey assumptions regarding starting position and length are correct, you could try this too, it's shorter:

=MID(A1,12,4)

Regards!


@Debraj Roy, Montrey

Hi!

:p

Regards!
 
@ Dos6.22

I love the way you treat all long filename and change them into.. XXXXXX~1.3

Please re-introduce the same.. I dont want filename > 8.3.. :(

Regards!
 
Thank you all for the wonderful response.

Unfortunately, the text between the two "_" delimiters will NOT always be exactly four characters - though I do see that the few lines I provided did make it look that way :(

Some other Data examples:

[pre]
Code:
DFC_000140_OLD_bill.xls
DFC_000094_AUST_bill.xls
DFC_000095_BRAZ_bill.xls
DFC_000096_CANA_bill.xls
DFC_000097_CHIL_bill.xls
DFC_000149_US.BTQ_bill.xls[/pre]
Unfortuantely, this fact means that many of the solutions above won't work but I thank you! [koi,SirJB7,Montrey,Debraj Roy]


kaushik03 - I'll look at that. 


Now, I CAN guarantee the prefix and suffix will always be the same length! So I thank you all for reminding me of that - So the solution I've culled from the above would work like this:

Since I know I have 11 "garbage" characters to the left and 9 garbage characters to the right, the following will be a cleaner way to extract what I want: Use MID to start on the 12 Character and then subtract the combined # of garbage characters from LEN(A1)


[code]=MID(A1,12,LEN(A1)-11-9)

or rewritten: =MID(A1,12,LEN(A1)-20)[/code]


Thanks all!
 
Hi ,


I think I have come in late , and have not gone through all the posts ; however if you want the string of text between the second and third underscores , such as AUST , BRAZ , CANA , CHIL ,... then use the following formula :


=MID(SUBSTITUTE(SUBSTITUTE(A1,"_",CHAR(9),2),"_",CHAR(10),2),FIND(CHAR(9),SUBSTITUTE(SUBSTITUTE(A1,"_",CHAR(9),2),"_",CHAR(10),2))+1,FIND(CHAR(10),SUBSTITUTE(SUBSTITUTE(A1,"_",CHAR(9),2),"_",CHAR(10),2))-FIND(CHAR(9),SUBSTITUTE(SUBSTITUTE(A1,"_",CHAR(9),2),"_",CHAR(10),2))-1)


You can reduce the length of the above formula , if you use 2 helper columns.


There was a question asked a few days earlier , in which Luke had posted this technique , here :


http://chandoo.org/forums/topic/really-stuck-with-the-following-problem


Narayan
 
Hi,


Try this:


=MID(A1,LARGE(IF(MID(A1,ROW(A1:A30),1)="_",ROW(A1:A30)),2)+1,LARGE(IF(MID(A1,ROW(A1:A30),1)="_",ROW(A1:A30)),1)-1-LARGE(IF(MID(A1,ROW(A1:A30),1)="_",ROW(A1:A30)),2))


Press Ctrl+Shift+Enter and drag down. Ur data is in column A


Regards,
 
Hi


You can also try this:


=MID(A1,SEARCH("_",A1,SEARCH("_",A1)+1)+1,(SEARCH("_",A1,SEARCH("_",A1,SEARCH("_",A1)+1)+1))-((SEARCH("_",A1,SEARCH("_",A1)+1)+1)))


Amritansh
 
Hi, BenefitRich!

I'd stick to amrit1186's suggestion:

=EXTRAE(A1;HALLAR("_";A1;HALLAR("_";A1)+1)+1;HALLAR("_";A1;HALLAR("_";A1;HALLAR("_";A1)+1)+1)-HALLAR("_";A1;HALLAR("_";A1)+1)-1) -----> in english: =MID(A1,SEARCH("_",A1,SEARCH("_",A1)+1)+1,SEARCH("_",A1,SEARCH("_",A1,SEARCH("_",A1)+1)+1)-SEARCH("_",A1,SEARCH("_",A1)+1)-1)

Besides no matter intermediate chunks length it uses normal (not array) formulas.

Regards!
 
=MID(A1,FIND("*",SUBSTITUTE(A1,"_","*",2))+1,FIND("*",SUBSTITUTE(A1,"_","*",3))-FIND("*",SUBSTITUTE(A1,"_","*",2))-1)
 
@Montrey

Hi!

I was wondering what if last part changes too, but I'm tending to think that the "bill.xls" won't change, so my vote goes to your formula.

Regards!

PS: Which means nothing, I know. In fact I just forgot what I was writing about. :p
 
Back
Top