Formula Forensic No. 021 – Find the 4th Slash !

Posted on May 17th, 2012 in Formula Forensics , Huis , Posts by Hui - 38 comments

Last week at the Chandoo.org Forums, Senthilkumar_rm posed the question:

“I Have file name as

D:\Data\Personal\sramasam\desktop\Exceldata.xls

I want to find the position of the 4th slash “\”

What is a simple way ? ”

 

I proposed 3 answers being:

=FIND("\",A1,FIND("\",A1,FIND("\",A1,FIND("\",A1)+1)+1)+1)
=SEARCH("\",A1,FIND("\",A1,FIND("\",A1,FIND("\",A1)+1)+1)+1)
=FIND(CHAR(135),SUBSTITUTE(G5,"\",CHAR(135),4))

and Faseeh proposed a 4th answer:

=(LARGE(((MID($A1,ROW(INDIRECT("1:"&1024)),1))="\")*ROW(INDIRECT("1:"&1024)),2))

So today at Formula Forensics we will examine all these and see what and why they all work.

As usual at Formula Forensics you can follow along by downloading a Sample File.

 

Using the Find() function

The Initial formula I proposed was:

=FIND("\",A1,FIND("\",A1,FIND("\",A1,FIND("\",A1)+1)+1)+1)

This uses the Excel Find() function repeatedly inside itself

The Excel Find() function uses the following syntax

The second solution I proposed was the same as the Find solution except that it used the Excel Search( ) function.

You can see above that Find and Search have exactly the same syntax.

Why have 2 functions that are effectively similar ?

Search is Case Insensitive.

eg: Chandoo = cHanDoo

Find is Case Sensitive.

eg: Chandoo <> cHanDoo

We will discuss the solution using:

=FIND("\",A1,FIND("\",A1,FIND("\",A1,FIND("\",A1)+1)+1)+1)

But the solution and description are equally applicable to the search based solution:

=SEARCH("\",A1,FIND("\",A1,FIND("\",A1,FIND("\",A1)+1)+1)+1)

So Find has the syntax: =Find( Text, Within Text, [Start No.])

But our formula has 4 Find() functions, where do we start?

Start in the Middle.

In our example: =FIND(“\”,A1,FIND(“\”,A1,FIND(“\”,A1,FIND(“\”,A1)+1)+1)+1)

FIND("\",A1)

Text:                      “\”

Within Text:      A1

Start No:              Optional = 0

This will find the first \ in out text string and return the value 3.

Checkout Cell D13.

 

Stepping out one find =FIND(“\”,A1,FIND(“\”,A1,FIND(“\”,A1,3+1)+1)+1)

FIND("\",A1,3+1)

Text:                      “\”

Within Text:      A1

Start No:              3 + 1

This will find the Second \ in out text string by starting at position 3 + 1 and hence return the value 8.

Checkout Cell D15.

 

Stepping out one more find =FIND(“\”,A1,FIND(“\”,A1,8+1)+1)

FIND("\",A1,8+1)

Text:                      “\”

Within Text:      A1

Start No:              8 + 1

This will find the Third \ in out text string by starting at position 8 + 1 and hence return the value 17.

Checkout Cell D17.

 

Finally we arrive at the outer Find, =FIND(“\”,A1,17+1)

=FIND("\",A1,17+1)

Text:                      “\”

Within Text:      A1

Start No:              17 + 1

This will find the fourth \ in out text string by starting at position 17 + 1 and hence return the answer of 26.

Checkout Cell D19.

Advantages:

Relatively simple formula

Disadvantage:

This formula must be manually re-made if you want to find either the 3rd, 5th or another occurrences.

 

Using the Find() & Substitute() functions

=FIND(CHAR(135),SUBSTITUTE(A1,"\",CHAR(135),4))

Using the Find() & Substitute() formula shown above take a different approach to solving the problem to the pure Find() based solution.

This solution works by using a feature of the substitute function that allows for the substitution of the Nth chosen character with another character.

The Excel Substitute() function has the following syntax :

In our example:

=FIND(CHAR(135),SUBSTITUTE(A1,"\",CHAR(135),4))

Text:             A1

Old_Text:    "\"

New_Text:   Char(135)

Instance_Num:     

So substitute will replace the 4th Slash with the character Ascii code 135. Char 135 or a character was chosen as it is unlikely to be used in normal text. If it is used another character code should be chosen.

The Find Function will then look for Char(135) in the Substituted text and return the position number of it.

Replacing the Char(135) with a Char(5) characters reduces this formula to 44 characters!

Advantages:

This is a very clear formula to understand

You can easily look for the 3rd or 5th character without changing the formula

Disadvantage:

If the Text String already contains the Char(135) character then another must be chosen or the formula will be wrong.

Using Faseeh’s Array Formula.

={(LARGE(((MID($A1,ROW(INDIRECT("1:"&1024)),1))="\")*ROW(INDIRECT("1:"&1024)),2))}

Faseeh’s Formula is based around the Large() function

=(LARGE(((MID($A1,ROW(INDIRECT("1:"&1024)),1))="\")*ROW(INDIRECT("1:"&1024)),2)) Ctrl Shift Enter

The Excel Large() function has the following syntax:

In Faseeh’s Formula we see

=LARGE(((MID($A1,ROW(INDIRECT("1:"&1024)),1))="\")*ROW(INDIRECT("1:"&1024)),2)

Array: ((MID($A1,ROW(INDIRECT("1:"&1024)),1))="\")*ROW(INDIRECT("1:"&1024))

K:     2

So we can see that the Large() function is looking for the Second Largest value (K=2) in the array.

Lets pull the Array apart

The Array consists of two parts separated by a Multiplication sign.

((MID($A1,ROW(INDIRECT("1:"&1024)),1))="\")*ROW(INDIRECT("1:"&1024))

We can look at each part separately and then combine them at the end.

The first part MID($A1,ROW(INDIRECT("1:"&1024)),1)="\"

Consists of MID($A1,ROW(INDIRECT("1:"&1024)),1) and an = "\"

In a Blank cell Say E25 enter =MID($A1,ROW(INDIRECT("1:"&1024)),1) and press F9 Not Enter

Excel responds with an Array: = {"D";":";"\";"D";"a";"t";"a";"\";"P";"e";"r";"s";"o";"n";"a";"l";"\";"s";"r";"a";"m";"a";"s";"a";"m";"\";"d";"e";"s";"k";"t";"o";"p";"\";"E";"x";"c";"e";"l";"d";"a";"t";"a";".";"x";"l";"s";""; … ;"" ;""}

We can see here that Excel has taken the value of cell A1 and broken it up as text and put each character into an element in the array.

So the whole line: MID($A1,ROW(INDIRECT("1:"&1024)),1)="\"

Is evaluating this array against a "\" and should return an array of True/False values

In a Blank cell E27 put the following: =MID($A1,ROW(INDIRECT(“1:”&1024)),1)=”\” and press F9 Not Enter

Excel responds with a an array of 1024 Falses

={FALSE;FALSE;FALSE; …. ;FALSE;FALSE}

That’s not quite what we expected ?

The Function is =MID($A1,ROW(INDIRECT(“1:”&1024)),1)=”\”

Which is a Boolean way of saying if the Middle 1 Character of Cell A1 from position 1 to 1024  is = “\” return an Array of the values.

What is going on here?

If we modify the formula slightly to =(MID($A1,ROW(INDIRECT(“1:”&1024)),1)=”\”)*1

And evaluate (F9) that in cell E28

Excel returns an Array ={0;0;1;0;0;0;0;1;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;1;0; … ;0;0}

The array consists of 1024 0’s except for the positions where the Array = “\” where it has 1’s.

 

Lets go inside this part of the formula and see what is happening.

=(MID($A1,ROW(INDIRECT(“1:”&1024)),1)

INDIRECT(“1:”&1024) returns a Reference of 1:1024 as Text which Indirect converts to a Range from row 1 to row 1024

=(MID($A1,ROW(INDIRECT(“1:”&1024)),1)

Row() returns the row number of the array

=(MID($A1,ROW(INDIRECT(“1:”&1024)),1)

Mid takes the Character from A1 starting at Position Row(INDIRECT(“1:”&1024)) and returns 1 character.

This effectively allows the array to retrieve each character from the Text of a1.

The Second Part

((MID($A1,ROW(INDIRECT("1:"&1024)),1))="\")*ROW(INDIRECT("1:"&1024))

We have seen that the first part of the equation

((MID($A1,ROW(INDIRECT("1:"&1024)),1))="\")

Returns an Array of 1’s and 0’s where the formula matched the "\" character.

The second part of the equation: ROW(INDIRECT("1:"&1024)) is used to return the position of the characters.

In a blank cell E30 type

=ROW(INDIRECT("1:"&1024)) and press F9 Not Enter

Excel will respond with ={1;2;3;4;5;6;7;8; … ;1023;1024}

This is a list of the Rows from the Range 1:1024

 

We can now return to the original function

((MID($A1,ROW(INDIRECT(“1:”&1024)),1))=”\”)*ROW(INDIRECT(“1:”&1024))

In a blank cell say E32 type: =((MID($A1,ROW(INDIRECT(“1:”&1024)),1))=”\”)*ROW(INDIRECT(“1:”&1024)) and press F9 Not Enter

Excel will respond with:

={0;0;3;0;0;0;0;8;0;0;0;0;0;0;0;0;17;0;0;0;0;0;0;0;0;26;0;0;0;0;0;0;0;34;0; … ;0;0;0;0}

This array is the combination of the two arrays discussed above.

That is it is the 1’s and 0’s where the \’s are multiplied by the Row Numbers.

We can see that the positions where the \’s are have the position numbers listed

={0;0;3;0;0;0;0;8;0;0;0;0;0;0;0;0;17;0;0;0;0;0;0;0;0;26;0;0;0;0;0;0;0;34;0; … ;0;0;0;0}

The Large Function in the original Formula:

=LARGE(((MID($A1,ROW(INDIRECT("1:"&1024)),1))="\")*ROW(INDIRECT("1:"&1024)),2)

Then extracts the second largest number, which in this case is the value 26

Advantages:

To assign the position number you need to know how many \’s are in the formula to start with

Disadvantages:

This is an Array Formula which some people struggle with

How the formula works is difficult to workout or explain.

 

Modified Formula

With a small modification you can introduce a constant to allow a variable n’th character to be retrieved without knowing how many \’s there were originally.

=LARGE(((MID($A1,ROW(INDIRECT("1:"&LEN(A1))),1))="\")*ROW(INDIRECT("1:"&LEN(A1))),LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))-4+1) Ctrl Shift Enter

Or

=LARGE(((MID($A1,ROW(INDIRECT("1:"&LEN(A1))),1))="\")*ROW(INDIRECT("1:"&LEN(A1))),LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))-B1+1) Ctrl Shift Enter

Where cell B1 contains 4

Where 4 is the 4th character you want to find the location for.

You can examine how the replacement of 2 with LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))-4+1 works

 

How Else Can You Solve Senthilkumar_rm’s Problem?

Your task is to find the location of the 4th Slash "\" in the text string: D:\Data\Personal\sramasam\desktop\Exceldata.xls

Can you solve Senthilkumar_rm’s problem another way?

Can you beat my 44 characters ?

Let us know in the comments below:

 

Download

You can download a copy of the above file and follow along, Download Here.

 

Formula Forensics “The Series”

This is the 21st post in the Formula Forensics series.

You can learn more about how to pull Excel Formulas apart in the following posts

Formula Forensic Series

 

Formula Forensics Needs Your Help

I need more ideas for future Formula Forensics posts and so I need your help.

If you have a neat formula that you would like to share and explain, try putting pen to paper and draft up a Post like above or;

If you have a formula that you would like explained, but don’t want to write a post, send it to Hui or Chandoo.

 

 

Written by Hui...
Tags: , , , , , , , , , ,
Home: Chandoo.org Main Page
? Doubt: Ask an Excel Question

38 Responses to “Formula Forensic No. 021 – Find the 4th Slash !”

  1. Jamie Bull says:

    I really like your substitute() method. Bearing in mind in the specific case here we know we're looking in a file path, you could replace the char(135) part with something that's not allowed in filenames.

    Something like =FIND("?",SUBSTITUTE(A1,"\","?",4))

    Ok, it's nothing new and prone to breaking if used in a different situation but yay, shorter! 35 chars. 

  2. Sunil says:

    FIND("/",SUBSTITUTE(A1,"\","/",4))

  3. Myles says:

    I also really like the SUBSTITUTE() option. I've used the first option in the past, the SUBSTITUTE() option simplifies the formula nicely.

    I'm not a fan of arrays due to their complexity for other users to pick up. Certainly to be avoided when there is a much simpler solution.

    Think Sunil just won the prize for the shortest solution! 

  4. Faseeh says:

    I personally liked Hui's Substitute() Option, but i believe that we should learn and understand every possible solution, might the concept apparently difficult be helpful somewhere else.
    Regards,
    Faseeh
     

    • Myles says:

      Faseeh,

      I absolutely agree that it is always worth trying (and sharing) as many different options as possible as you never know when they might come in handy for something else. 

      My comment was not supposed to be a criticism (& apologies if it came across that way), I just wanted to raise the point that arrays should be used with caution.

      Myles 

  5. Austinma says:

    Hi

    Personally I really like the find/substitute method as it seems a logical and simple solution.

    Out of interest, I was wondering how this information/read out might be applied.

    Having got the answer 26 what can I do with it or to but it another why would I need to know what the character is at position 26?

    Apologies if Senthilkumar_rm expanded on this in his original post.

    Keep up the good work.

    Regards

    Mark

  6. Ian Hinckley says:

    haha, beaten to it

    =FIND("*",SUBSTITUTE(A1,"/","*",4))

    if you want the last "/" rather than the 4th then use

    =FIND("*",SUBSTITUTE(A1,"/","*",LEN(A1)-LEN(SUBSTITUTE(A1,"/",""))))

    Have fun

    Ian 

  7. Myles says:

    Hi Mark,

    The position of a specific character is normally used to extract a specific piece of text from a text string.

    For eg. with D:\Data\Personal\sramasam\desktop\Exceldata.xls in cell A1

    Use Hui's formula to pull out the position of the 4th "/" as 26

    Use the Len() function to find the full string length  =Len(A1) = 47

    Finally use the Right() function to extract the text after the 4th "/" =RIGHT(A1,47-26) =  desktop\Exceldata.xls 

    Myles
     

    • Luke M says:

      Another way to extract a portion of a string is to create a giant gap, and then trim the excess:
      =TRIM(MID(SUBSTITUTE(A1,"\",REPT(" ",999),4),999,999)) 

    • Austinma says:

      @ Myles,
       
      Thanks for your message.
       
      I'm actually pleased that I came up with a scenario similar to the one you mention using the =RIGHT function. I thought about perhaps using the =MID function to extract some or all of the text. For example:
       
      =MID(A1,26,9)
       
      =\desktop\
       
      Thanks again,
       
      Mark
       

  8. Mike86 says:

    You could use the Small() function instead of Large().  That way finding the 4th smallest number is pretty straight forward.

    • Kyle McGhee says:

      Small would pull in the 0s returned in the array before returning the matches

    • Joe says:

      A little late on this one, but I liked what you were going for.

      It bugged me that Faseed found the second to last \ not really the fourth. Since the zeroes interfere with your suggestion to use the SMALL function, I tried finding the largest reciprocal like so:

      =1/(LARGE(((MID($A1,ROW(INDIRECT("1:"&1024)),1))="\")*(1/(ROW(INDIRECT("1:"&1024)))),4))

      Only 88 characters, vs Chandoo's modification that used 121.

  9. Pedro Wave says:

    Why to use a normal character as "?" or "/" to substitute "\"?
    I prefer to use a special character, the newline or line break or end-of-line or EOL or LF or CHAR(10), inserted into a cell when you press Alt+Enter
    The formula with only 33 visible chars:
    <code>=FIND("
    ",SUBSTITUTE(A1,"\","
    ",4))</code>

    • Jamie Bull says:

      Excel still counts Alt+Enter as a character though. Try putting a ' before the formula and using LEN(). Yours, mine and Sunil's versions all come out as 35 chars.

      • Pedro Wave says:

        Jamie, I said that 33 chars are visible but, as you noted, really are 35 chars into my formula. The advantage is that the EOL character not often found among the chars to replace. Observe what happens if the "?" or any other substituting "\" is included in the string.

  10. Pedro Wave says:

    Write the last formula without <code> tags.

  11. Kyle McGhee says:

    You can wrap the array portion of the LARGE function in Faseeh's formula in an INDEX function to remove the CSE requirement
    so LARGE(INDEX(MID(.........),0,1),....)

  12. Ron Wallace says:

    I like the substitute method since it can be modified to:

    =IFERROR(FIND("/",SUBSTITUTE($G$5,$G$6,"/",$H$6)),"Not that many in the string!")
     
    and then you enter the desired character in cell G6 with the repetition in cell H6 which allow the search without changing the formula.

     

  13. Rahul says:

    If I want to find first "/" from the right wether there are 4 "/" or 3 "/" or 8 "/" char available in a string. What should we use.  I mean that is there any formula exist which start searching from the right.

    • Hui... says:

      @Rahul

      This is the same as finding the last \ from the left.

      The formula LEN(G5)-LEN(SUBSTITUTE(G5,"\",""))
      Calculates the number of occurrences of the Character \ in the string.

      =FIND("*",SUBSTITUTE(G5,"\","*",LEN(G5)-LEN(SUBSTITUTE(G5,"\",""))))
      Will do what you want.

       

  14. Kyle McGhee says:

    Here is what I came up with:
    "Character to Find" is in cell C1 - \
    "Occurrence" to Find is in cell C2
    "Max Occurrences" is in cell C3 - formula (below)
    "String to Search" is in cell C4
     
    =INDEX(SMALL(INDEX(LARGE(SplitChar*SplitArray,kArray),,1),kArray),C2)
    Names:
    SplitChar
    =MID(Sheet1!$C$4,SplitArray,1)=Sheet1!$C$1
    SplitArray
    =ROW(INDIRECT("1:"&LEN(Sheet1!$C$4)))
    kArray
    =ROW(INDIRECT("1:"&Sheet1!$C$3))
     
    and in cell C3 for max occurrences enter 
    =SUMPRODUCT(--SplitCHar)
     
    It is basically the same as Faseeh except 3 differences
    1/I use the INDEX function to return arrays for the LARGE and SMALL functions, so the formula does not need to be CSE entered.
    2/ I used Named Formulas to bypass CSE and for clarity
    3/I added a SMALL function to invert the results of the LARGE function, so when you want to find the 1,2,3,n occurrences you would search with 1,2,3,n, not n,3,2,1

  15. My question is the same as was asked earlier... why do you want to find the 4th slash. If it is so you can use it as a starting point for pulling out the 5th delimited field, the I think this mini-blog article of my would be of interest...

    Get Field from Delimited Text String

    • Hui... says:

      @Rick

      That's a neat formula

      I may well plagiarize it soon.

      Hui...

       

      • Hi Hui,

        I am glad you liked the formula... feel free to plagiarize it anytime

        If you haven't already done so, check out my other articles here....

        Rick Rothstein's Corner

        and perhaps you will find some other things to plariagize as well.

        Rick

      • Chandoo says:

        Lovely formula indeed. Bookmarked and will be using as needed 🙂

        • @Chandoo,

          And I am glad you liked the formula as well. In case you or Hui (or anyone else reading this thread) missed the link at the bottom of my article, I just wanted to draw your attention to my related article title "Get "Reversed" Field from Delimited Text String" which allows you to easily get, say, the next to last field in a delimited string without needing to know how many total items are in the string. NOTE... be sure to read the whole thread as one of the responders offered a far better (more concise) formula than the one I posted in my article.

          Rick

          • Jeff Weir says:

            Ohhh....these just make my head hurt, Rick. How the heck do they work? Got me completely baffled.
            I have a much longer formula that like yours also returns the nth delimiter. It's like a light-year long:
            =MID(input,FIND("|",SUBSTITUTE(Delimiter&input&Delimiter,Delimiter,"|",ROW(INDEX(A:A,Fieldnumber):INDEX(A:A,Fieldnumber+1)))),FIND("|",SUBSTITUTE(Delimiter&input&Delimiter,Delimiter,"|",ROW(INDEX(A:A,Fieldnumber):INDEX(A:A,Fieldnumber+1))+1))-1-FIND("|",SUBSTITUTE(Delimiter&input&Delimiter,Delimiter,"|",ROW(INDEX(A:A,Fieldnumber):INDEX(A:A,Fieldnumber+1)))))

        • manish says:

          I didn't understand the  reason of using char(135).
          Please ellaborate

          • Hui... says:

            @Manish
            It is just a character that is unlikely to appear in "normal" Text
            You could just as easily have used an "e" if your text isn't going to contain an "e"

  16. manish says:

    Got it .. thanks

  17. [...] Of course there are many methods find where the last is. You can find a very excellent summary of these techniques in our formula forensics #21 – finding the 4th slash. [...]

  18. Achaibou Karim says:

    I have made a small VBA function which finds the last "char" in a text string. It can be easily adapted to give the nth location within the text string.

    Function findLastChar(text As String, char As String)
    For i = 1 To Len(text)
    If Mid(text, i, 1) = char Then
    findLastChar = i
    End If
    Next i
    End Function

  19. JonB says:

    well I believe the goal is to find the name of the file ...

    =MID(CELL("filename"),SEARCH("[",CELL("filename"))+1, SEARCH("]",CELL("filename"))-SEARCH("[",CELL("filename"))-1)

  20. How high do you want to go? For the 2nd you can use

    =FIND("-",A1,FIND("-",A1)+1)

    but that would get longer for 3rd or 4th

    For 3rd you could use this

    =FIND("^^",SUBSTITUTE(A1,"-","^^",3))

    change 3 to whatever instance you need

    from http://www.mrexcel.com/forum/excel-questions/473844-find-second-dash-cell.html

Leave a Reply