Extract file name from full path using formulas

Posted on October 23rd, 2012 in Excel Howtos , Formula Forensics - 47 comments

Today lets tackle a very familiar problem. You have a bunch of very long, complicated file names & paths. Your boss wants a list of files extracted from these paths, like below:

Extracting file names from full path using Excel formulas - how to?

Of course nothing is impossible. You just need correct ingredients.

What we need to extract file names from full path text - Excel formulas

I cannot help you with a strong cup of coffee, so go and get it. I will wait…

Back already? well, lets start the formula magic then.

Extracting file name from a path

If you observe the file paths carefully, to extract the file name, we need to know,

  • Position of last \ in the full path text

Of course there are many methods find where the last \ is. You can find a very excellent summary of these techniques in our formula forensics #21 – finding the 4th slash.

Today, let us see a new technique (well, sort of).

Finding the position of last \ using formulas

Before writing any formula, first let me clarify the only assumption:

  • File path is in cell B4

Now, last \ is nothing but first \ when read from right.

Read that line again.

Got it? Good, lets move on.

How do we find the first \ from right?

If we can list down all individual characters from path right to left, then we just have to find the first \ in that.

Listing down individual characters from a given text

To get 5th character from text in B4, we can use MID formula like this:

=MID(B4,5,1)

Suppose you want both 5th and 6th characters from B4, you can use:

=MID(B4,{5,6},1)

This formula returns an array of 5th and 6th characters from the text in B4.

Cool, extending the logic, =MID(B4, {6,5},1) would give 6th & 5th characters in B4.

Idea!

If we can replace {6,5} with decreasing numbers starting from length of text B4 all the way to 1, then we can list all characters in B4, right to left.

But this leads us to next problem – listing numbers from a specific value (length of B4) to 1 in descending order.

Listing numbers from n to 1 in that order

We can use ROW() formula to generate sequence of numbers like this:

=ROW(1:10) will give {1,2,3…,10}

note: this returns an array, so you need to use it with Ctrl+Shift+Enter

So if we can use =ROW(1:LEN(B4)) we could get numbers from 1 to length of text in B4 {1,2….LEN(B4)}

Unfortunately this will not work as 1:LEN(B4) is not a valid reference.

But we can fix that with INDIRECT, like this:

=ROW(INDIRECT(“1:” & LEN(B4)))

Tip: INDIRECT formula lets you construct a reference by using values in other cells as shown above.

Alternative: You can also use OFFSET to get the same result like this: =ROW(OFFSET($A$1,,,LEN(B4))). More on OFFSET here.

But wait…

So far, we have only generated numbers from 1 to n. But we need numbers from n to 1.

No sweat, we just subtract the numbers {1,2…n} from n+1 to get the list {n,n-1,n-2….2,1}

Like this:

=LEN(B4)+1 – ROW(INDIRECT(“1:” & LEN(B4)))

Using these numbers to list characters in file path in reverse order

Take a sip of that coffee, its getting cold!

Now, lets integrate our numbers in to MID like this:

=MID(B4, LEN(B4)+1 – ROW(INDIRECT(“1:” & LEN(B4))), 1)

The blue portion gives you numbers {n…2,1}

The orange portion gives you letters from right to left.

But we wanted the last \

Oh right. We do not need these letters from right to left. We instead want to find the last \ in our file path. So now we just ask Excel where the first \ is in this reversed text.

=MATCH(“\”, MID(B4, LEN(B4)+1 – ROW(INDIRECT(“1:” & LEN(B4))), 1), 0)

Blue portion gives you letters in reverse order

Orange portion finds the first \ in that.

Tip: Learn more about MATCH formula.

Extract the file name

Once you know where the last \ is, finding the file name is easy.

use =MID(B4, position_of_last_slash + 1, LEN(B4))

We need to +1 because we do not want the slash in our file name.

Demo of the entire formula in action

Okay, lets see all these steps in action in one go.

Extract file name from full path using Excel formulas - Demo

How to find the extension?

Extension is few letters added at the end of file to indicate its type. For example, excel files usually have xls, xlsx, xlsm as extension.

So how to find this extension?

Extension & file name are separated by a dot .

But often file name itself can have a dot.

In other words, Extension is text in the file name followed by last dot.

Sounds like same problem as finding the last \ and extracting file name. So I will skip the details.

But assuming the file name is in D4, extension can be found with =RIGHT(D4,MATCH(“.”,MID(D4,LEN(D4)-ROW(INDIRECT(“1:”&LEN(D4))),1),0))

NOTE on both formulas

Both file name & extension formulas are array formulas. This means after typing them, you need to press Ctrl+Shift+Enter to see correct result.

Bonus tip: Getting the file names & path from a folder

If you ever want to list down all files in a folder use this.

  1. Open command prompt (Start > Run > Cmd or Start > Cmd)
  2. Go to the folder using CD
  3. Type DIR /s/b >files.csv
  4. Close command prompt

Now you can see all the files in that folder in files.csv. Double click on it to open in Excel and run your magic :)

Download Example workbook

Click here to download the example workbook. The file uses slightly different formulas. But works just the same. Examine it and learn more.

How do you extract file names & as such?

Do you use formulas or do you rely on some other technique to extract portions of text like file names, mail addresses etc. Please share your tips & ideas using comments.

Extract often? You will dig this.

Analysts life is filled with 3 Es – extraction, exploration & explanation. And like a good assistant, Excel helps you in all 3.

If you find yourself with a shovel, bucket and boat load of data often, you are going to enjoy these articles:

Your email address is safe with us. Our policies

Written by Chandoo
Tags: , , , , , , , , , , , ,
Home: Chandoo.org Main Page
? Doubt: Ask an Excel Question

47 Responses to “Extract file name from full path using formulas”

  1. Akash Khandelwal says:

    How about using Text to column function present in Data tab of ribbon.
    Its quick and simple. ;)

  2. Iain says:

    I suggest doing a find and replace. Enter *\ in the find box, and nothing in the replace box. Do replace all and you’ll be left with the filename and extension

    • Akash Khandelwal says:

      Dats a Smart trick.. Thanx.. :)

    • Chandoo says:

      Awesome trick Iain… Thanks for sharing it :)

      • Gaurang M says:

        Hi Chandoo,

        As Iain suggested in the above post, that use Find and replace option, which is obviously a smart trick, but if i have the below transaction data with me:
        ”RTGS/SCINH13045648230/ABC INDUSTRIES INDIA LIMITED”

        and i want to extract ”SCINH13045648230″ from the above transaction and i use the above trick (Find and replace), it gives me ”ABC INDUSTRIES INDIA LIMITED” instead of ”SCINH13045648230″. Thus, this time the trick does not work since there are 2 forward slash.

        I have checked the MID/FIND formulas in various post, but i am unable to understand it since its going over my head :d

        Can you give me a solution for the above, if i want to extract only ‘SCINH13045648230″ from the above transaction.???

        Wish you a happy new year, i know its too late to wish you a happy new year, but i logged in late on this site thus wishing you late.

        Thanks in advance!

        Regards,
        Gaurang.

        • Rudra Sharma says:

          Hi Gaurang,
          I entered RTGS/SCINH13045648230/ABC INDUSTRIES INDIA LIMITED in Cell A1.
          In B1 I entered this formula:
          =SUBSTITUTE(A1,”/”,”#”,2)
          which gave me
          RTGS/SCINH13045648230#ABC INDUSTRIES INDIA LIMITED
          in C1, I entere this formula:
          =MID(B1,FIND(“/”,B1)+1,FIND(“#”,B1)-6)
          which gave me final output.
          You can enter a comined formula to get the same result.
          Here is combined one:
          =MID(SUBSTITUTE(A1,”/”,”#”,2),FIND(“/”,SUBSTITUTE(A1,”/”,”#”,2))+1,FIND(“#”,SUBSTITUTE(A1,”/”,”#”,2))-6)

        • Rudra Sharma says:

          Hi Gaurang,
          I entered RTGS/SCINH13045648230/ABC INDUSTRIES INDIA LIMITED in Cell A1.
          In B1 I entered this formula:
          =SUBSTITUTE(A1,”/”,”#”,2)
          which gave me
          RTGS/SCINH13045648230#ABC INDUSTRIES INDIA LIMITED
          in C1, I entered this formula:
          =MID(B1,FIND(“/”,B1)+1,FIND(“#”,B1)-6)
          which gave me final output.
          You can enter a combined formula to get the same result.
          Here is combined one:
          =MID(SUBSTITUTE(A1,”/”,”#”,2),FIND(“/”,SUBSTITUTE(A1,”/”,”#”,2))+1,FIND(“#”,SUBSTITUTE(A1,”/”,”#”,2))-6)

          • Gaurang M says:

            Hi Rudra,

            Thanks alot for your help :)

            Regards,
            Gaurang

          • Gaurang M says:

            Hi Rudra,
            The substitute formula which you have typed above contains an error coz whenever i try to put that substitute formula in B1 cell, it reflects error thus i am getting stuck and not able to get solution
            after that.

            Also, can you explain me why you have put the below formula
            MID(B1,FIND(“/”,B1)+1,FIND(“#”,B1)-6)and why +1 and -6 is used.

            Thanks in advance :)
            Regards,
            Gaurang

  3. Awesome post Chandoo!

    I’ll need to get more coffee after that!

    @Iain:

    Really smart trick! 

  4. ahamed says:

    @ Iyan : Can you pls explain little, the logic behind your suggestion. 
     
    @ Chandoo Its nice and bit complex …. still i’m trying to figure it out

  5. Chandoo says:

    @All.. here is the solution described by Iain.

    Extract by find replace - Demo

  6. Luke M says:

    Here’s a non-array formula:
    =TRIM(RIGHT(SUBSTITUTE(B4,”\”,REPT(” “,999)),999))

    Puts giant spaces in where all the slashes were, then trims the output. 

    @Chandoo
    How did you post a picture in the comments? Is that a Keymaster-only feature, or is there someway we can do that? 

    • Rasheed says:

      Dear Luke, to add picture on commnet do follow..

      Insert comment

      right click on the edg of comment box

      Format comment

      Go to colors and lines in titles

      Click fill effects in below of color

      select picture in pictur header

      Press OK

  7. Darin says:

    I can’t beat the simplicity of Luke’s solution, but I do have a non-array solution to contribute. Building off of one of your solutions for finding the 4 slash, I added in a calculation to determine the number of slashes in the path, and substituted that for the 4. 

    =MID(B4,FIND(CHAR(135),SUBSTITUTE(B4,”\”,CHAR(135),LEN(B4)-LEN(SUBSTITUTE(B4,”\”,””))))+1,999)

    The section in bold is calculating the total length of the path and subtracting out the length of the path with no slashes.

    Now that we know the total number of slashes, we can substitute that instance of the slash with char(135) and find char(135).

    Adding one to the position and returning from that position forward gives the path.

    • Jamie Bull says:

      How about this. It counts the number of slashes, replaces the last occurrence with null “” (since it’s not allowed in file paths in either Windows or *nix) then takes the characters to the right.

      =RIGHT(B2,LEN(B2)-FIND("", SUBSTITUTE(B2,"\","",LEN(B2)- LEN(SUBSTITUTE(B2,"\","")))))

      • Jamie Bull says:

        Seems no matter what I post here, I run up against the formatting on this comment system! Is there an escape character I can use to escape before “backslash zero” representing null?

  8. Johnny says:

    Sorry if this is remedial, but when I try to perform the first part:
    Suppose you want both 5th and 6th characters from B4, you can use:=MID(B4,{5,6},1)
    This formula returns an array of 5th and 6th characters from the text in B4.
    Cool, extending the logic, =MID(B4, {6,5},1) would give 6th & 5th characters in B4. 

     It neither returns the 5th and 6th characters or the 6th and 5th…only the 5th or 6th character, respectively. What am I doing wrong?

    • Chandoo says:

      The formula =MID(B4, {6,5}, 1} is an array formula. So when you enter it in a cell, you will only see one of the 2 values. To see both, just select the cell (F2) and select the formula and press F9.

      Another way to test whether this works is to wrap it in INDEX like this:

      =INDEX(MID(b4, {6,5}, 1), 1) will give you 6th char

      =INDEX(MID(b4, {6,5}, 1), 2) will give you 5th char

  9. Wookiee says:

    I have been trying to follow the example technique described in your post (I’m using my own blank spreadsheet with nothing more than a file path and file name in Cell B4). But I’m having problems with the MID command and using an array inside of a formula. I’d download the example file, but for two factors. One: I’m at work and not allowed to do such things, and two: I want to key the formulas from scratch so that I know I can repeat the process later on.

    First off, I’m not sure how to even enter an array inside of the MID formula. I have used array formulas in the past, but not often. And the entire formula has always been entered as an array; never just a segment inside of the formula.

    Furthermore, I’m not sure how or why one would get an array of 2 characters when the last portion of the MID formula specifies to return only 1 character.

    Please, someone help explain this to me so that I may delve further into this powerful technique and add it to my arsenal of Excel tips and tricks!

    Thank you.

    • Chandoo says:

      Hi Wookie,

      Thanks for your comment and interest in learning more.

      The array in =MID(B4, {6,5}, 1) helps you get a list of 2 values 6th character, 5th character.

      This is useful when you want to split a text in to individual letters and do some processing on it.

      Please read my reply to Johnny above to understand how to use it.

      • Wookiee says:

        Thanks, Chandoo! In my defense, Johnny’s question didn’t appear here when I responded to this thread. But I’m glad you responded to his question and, in doing so, clarified the issue for me.

        My confusion lied in the nested array. I thought the entire formula would have to be entered using CTL+SHIFT+ENTER; I hadn’t realized you could actually type the curly brackets to use an array within in a “regular” formula.

        One step down! Now to get cracking on the rest of that formula.

        PS. On a semi-related note, the posts I’ve made on this thread don’t seem to count towards my user post count. I’m not one to try and rack up “post points”, but I do like having the ability to go into my post history sometimes to review questions I’ve had or advice I’ve been able to offer others. I’ve been logged in under my User ID each time I replied here, so I wasn’t sure if this forum is classified differently? Please advise.

  10. Daniel Ferry says:

    I do love formula challenges!

    But in this case I think a UDF is a wise choice, especially since the object is to parse both the file name and the extension.

    The UDF can return both in one shot as an array formula that can be array-entered over both cells, D4:E4 for example:

    =ParseFile(B4) f

    Here is the UDF:

    <code>
    Public Function ParseFile(sPath As String) As Variant
        ParseFile = Array(Mid$(sPath, 1 + InStrRev(sPath, “\”)), Mid$(sPath, 1 + InStrRev(sPath, “.”)))
    End Function
    </code> 

  11. Ninad says:

    Learning something every day. Thanks guys.

  12. PPH says:

    <code>
    Function ExtractFileName(filespath) As String

    Dim x As Variant
    x = Split(filepath, Application.PathSeparator)
    ExtractFileName = x(UBound(x))

    End Function
    </code> 

    • PPH says:

      This works like a charm, and to give an example of how I use this, the following code will use the function to extract a value from a closed workbook (you never need to open it).  What I do in my workbook is allow the user to select an excel file that has a predetermined named range in it, this sub pulls out the value assigned to that range and allows the user to compare it to a value in the open workbook:

      <code>
      Sub Compare_Reports()

      Dim FileSelect As String
      FileSelect = Application.GetOpenFilename(“Excel files (*.xlsb), *.xls”)
      ExtractFileName (FileSelect)
      File = ExtractFileName(FileSelect)
      Path = Left(FileSelect, Len(FileSelect) – Len(F))
      Sheet = “SheetName Here”
      Range = “Named Range Here”
      ActiveSheet.Range(“A1″).Value = “='” & P & “[” & F & “]” & S & “‘!” & r
      End Sub 

  13. Alan Harris says:

    I have enjoyed your Excel tips, but I cannot let this one go without a comment: it is completely unrealistic. Like a similar problem from a few weeks ago: “If I was going there, I wouldn’t start from here!”
    1 How does this file data get into Excel? It must have been typed there. If so, then get the person to type it correctly and put each sub-folder into a different column
    2 If it was imported into Excel, then why not use search and replace to replace the backslashes with semi-colons and then import it?
    3 But the biggest one is: why not get your data sorted into meaningful folders in the fiorst place. I cannot beliebve that (in 2012) anyone ion the world would have suchj a bizarere array of folders.
    4 And finally: how did anyone get hold of the list to start with? To get hold of the list of filenames, you had to know the filenames, so why not just write down the filenames?

    • Wookiee says:

      Alan, it may be 2012, but it’s not particularly unrealistic to assume that someone could receive data in such an unfriendly format. I think that any of us who make a living using Excel can cite several examples of files with data in exceptionally unfriendly formats. And no amount of whining, pleading, or cajoling will enable me to force others to send me files in (what I consider to be) a logical format. It’s the ability to break down the data and extract the portions which we specifically need that enables us to excel with Excel.

      For instance, I have no specific need to perform the task described in this tip, but I find the formulæ and methods used to do so enlightening, and I’ll be happy to incorporate them into my repertoire so that next time I need to extract data of a similar nature, I’ll be better prepared to do so.

      • Glenn says:

        I have to concur with Wookie and have a real world example to illustrate. We often receive music cue sheets in Excel format (or image PDF that has to be converted to Excel using OCR) in which rather than listing the song titles, the client will list the file (including its path) as it occurs on their music database. The method proposed by Chandoo and others posted here would definitely be applicable to extracting the song titles from the file path strings.
        Thanks everyone.
         

  14. zur says:

    COULD ANYBODY HELP ME TO PASTE THE DOMO IN WORD FILE?

  15. […] thing about Excel is that you can do the same thing in several ways. Our yesterdays problem – Extracting file name from full path is no different. There are many different ways to do it, apart from writing a formula. Learn these […]

  16. Robert Clark says:

    I got this from the Mrexcel website. A slight variation on Darin’s solution

    Assuming that the full filename is in A2, and that you want the filename in column b and extension in column c,

    in B2:  =RIGHT(A2,LEN(A2)-SEARCH(“@”,SUBSTITUTE(A2,”\”,”@”,LEN(A2)-LEN(SUBSTITUTE(A2,”\”,””)))))

    in C2:  =RIGHT(B2,LEN(B2)-SEARCH(“@”,SUBSTITUTE(B2,”.”,”@”,LEN(B2)-LEN(SUBSTITUTE(B2,”.”,””)))))

  17. Kiev says:

    Oh, i really love the bonus tips. it is amazing and let me recall my memory of good old days of DOS.

  18. Rudra says:

    Hi,
    What happens if you change the positon of Asterisk(*) in find what: slot? 

  19. Zaigham says:

    Comparatively a short formula to find the file name

    =TRIM(RIGHT(SUBSTITUTE(A1,”\”,REPT(” “,255)),255))

  20. whoah this blog is great i really like studying your articles.
    Stay up the great work! You already know, lots of
    people are looking round for this info, you can aid them greatly.

  21. Andy Forrester says:

    Here is a User Defined Function based solution

    ‘parses long filename held in Adjacent cell to the left & returns only the file name
    Function Get_FName()
    Application.Volatile

    Dim Temp, Temp2, Ans As String
    Dim pos As Long

    ‘Reverse string such that last “\” becomes the first!
    Temp = StrReverse(Application.ThisCell.Offset(, -1))

    ‘find the first position of the “\”
    pos = InStr(Temp, “\”)
    ‘extract the [reversed] filename
    Temp2 = Mid(Temp, 1, pos – 1)
    ‘put it back to correct order
    Get_FName = StrReverse(Temp2)
    End Function

  22. Gaurang says:

    Hi Chandoo,
    First of all i would like to thank you for the wonderful teachings you are offering on your website.i have learned many formulas through your site as well as dashboard and also blessed.

    Now coming to the point, In “Listing down individual charecters from the text” section. i have one query you have mentioned that if you want both 5th and 6th characters from B4, you can use:
    =MID(B4,{5,6},1)

    but, when i applied the above formula it showed only 1st charecter.
    For eg: when i typed RAMESH in Column A1, the formula which i wrote is =MID(A1,{5,6},1) So, after putting this formula i should get S (5th) and H (6th) charecter. But, it gave me only S which is 5th charecter.
    So just wanted to know whether i have entered the correct formula or not.

    Thanks in advance for your help. :)

    Regards,
    Gaurang.

    • Rudra says:

      Hi Gaurang
      before you enter your forumla, select two cells. Enter your formula as Array formula i.e hit Ctrl+Shift + Enter together. The Cells should show 5th and 6th characters.

      With Regards
      Rudra

  23. Gaurang says:

    Hi Chandoo,
    First of all i would like to thank you for the wonderful teachings you are offering on your website.i have learned many formulas through your site as well as dashboard and i am really blessed for having such a site, because i dint get any help from my friends for excel even after pleeding but your site is like a gift by god to me thus now i dont go towards my friends for help who think they are experts.

    Now coming to the point, In “Listing down individual charecters from the text” section. i have one query you have mentioned that if you want both 5th and 6th characters from B4, you can use:
    =MID(B4,{5,6},1)

    but, when i applied the above formula it showed only 1st charecter.
    For eg: when i typed RAMESH in Column A1, the formula which i wrote is =MID(A1,{5,6},1) So, after putting this formula i should get S (5th) and H (6th) charecter. But, it gave me only S which is 5th charecter.
    So just wanted to know whether i have entered the correct formula or not.

    Thanks in advance for your help. :)

    Regards,
    Gaurang.

  24. Mithun Pandey says:

    I love with array forumale hence pasting my solution :) almost very similar to what is suggested above

    MID(path,2+LEN(path)-MATCH(“\”,MID(path,LARGE(ROW(INDIRECT(“1:”&LEN(path))),ROW(INDIRECT(“1:”&LEN(path)))),1),0),MATCH(“\”,MID(path,LARGE(ROW(INDIRECT(“1:”&LEN(path))),ROW(INDIRECT(“1:”&LEN(path)))),1),0)+1)

    path is the cell containing the path

  25. derikogay says:

    Well, you can use Long Path Tool for such problems.

Leave a Reply