• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Standardizing filenames based on a list of names and accounts.

Countderp

New Member
I need to standardize filenames and also add account numbers to the end of each name.

Currently, there are 20 directories, with pdf files in each directory. The names are currently in the form of "Lastname, Firstname Date.pdf", such as "Smith, John 100112.pdf". However, these vary in several ways. The name portion can vary by "Smith, John", "Smith, J.", "Smith J" "Smith,J.". The Date portion is in MMDDYY format, varying by "010112", "01.01.12", and "1.01.2012". Some files have the word "sig" at the end, and they are not to be renamed (e.g. "Smith, J. 010112 sig.pdf". I'm figuring on standardizing these to "Smith, J. 101012", but if there's a better idea then I'm open to it.

Now, on top of standardizing these, I need to append the account number to the end of the filename, so it would show "Smith, J. 010112_1111.pdf". There is a table of names in a spreadsheet, with Account in the A column, Lastname in B, Firstname in C.

Any ideas on what I can do here?


I'm not very familiar with Excel, I was picked to do this because of my experience with Java and C back in college.
 
Hi, Countderp!


First of all welcome to Chandoo's website Excel forums. Thank you for your joining us and glad to have you here.


As a starting point I'd recommend you to read the three first green sticky topics at this forums main page. There you'll find general guidelines about how this site and community operates (introducing yourself, posting files, netiquette rules, and so on).


Among them you're prompted to perform searches within this site before posting, because maybe your question had been answered yet.


Feel free to play with different keywords so as to be led thru a wide variety of articles and posts, and if you don't find anything that solves your problem or guides you towards a solution, you'll always be welcome back here. Tell us what you've done, consider uploading a sample file as recommended, and somebody surely will read your post and help you.


And about your question...


If you haven't performed yet the search herein, try going to the topmost right zone of this page (Custom Search), type the keywords "data standardization issue" and press Search button. You'd retrieve many links from this website, like the following one(s), maybe you find useful information and even the solution. If not please advise so as people who read it could get back to you as soon as possible.


http://chandoo.org/forums/topic/how-do-i-detect-count-or-list-all-missing-records#post-16796

http://chandoo.org/forums/topic/how-do-i-detect-count-or-list-all-missing-records#post-16813


A couple of additional comments/questions:

a) how many files are we talking about?

b) there is sort of utility that might be helpful just for analyzing file names

http://chandoo.org/forums/topic/excel-files-documenting-inspector


Regards!
 
I ran out of time to add the account number (should be a simply lookup), but here's the code that I wrote so far.

[pre]
Code:
Sub RenameFiles()
Dim FileName As String
Dim LName As String
Dim FInitial As String
Dim NewName As String, OldName As String
Dim xDate As String
Dim MyPath As String, FilesInPath As String
Dim FNum As Long

'Directory looping code copied from:
'http://www.rondebruin.nl/copy3.htm

'Fill in the pathfolder where the files are
MyPath = "C:UsersRontest"

'Add a slash at the end if the user forget it
If Right(MyPath, 1) <> "" Then
MyPath = MyPath & ""
End If

'If there are no pdf files in the folder exit the sub
FilesInPath = Dir(MyPath & "*.pdf")
If FilesInPath = "" Then
MsgBox "No files found"
Exit Sub
End If

FNum = 0
Do While FilesInPath <> ""
FNum = FNum + 1
ReDim Preserve myfiles(1 To FNum)
myfiles(FNum) = FilesInPath
FilesInPath = Dir()
Loop

If FNum > 0 Then
For FNum = LBound(myfiles) To UBound(myfiles)

FileName = myfiles(FNum)

'This is the section of code that manipulates the string into
'the correct syntax for you to use
If UCase(Right(c, 7)) = "SIG.PDF" Then
NewName = c.Value
Else
n = Split(FileName, ",")
LName = n(0)
FInitial = Left(Trim(n(1)), 1) & ". "
m = Split(Trim(n(1)), " ")
xDate = Left(m(1), Len(m(1)) - 4)
If InStr(xDate, ".") = 0 Then
xDate = Left(xDate, 2) & "." & Mid(xDate, 3, 2) & "." & Mid(xDate, 5, 4)
End If
xDate = WorksheetFunction.Substitute(xDate, ".", "/")

xDate = Format(DateValue(xDate), "mmddyy")
NewName = LName & ", " & FInitial & xDate & ".pdf"

End If
OldName = MyPath & myfiles(FNum)
NewName = MyPath & NewName

'This is what will change the name
Name OldName As NewName

Next FNum
End If

End Sub
[/pre]
 
Thank you for your responses!


@Sir I'm dealing with 6000 files here. I have actually received a formula that involves fixing the dates, and your script for importing filenames is invaluable, thank you for that. However, I still need to find a way to compare the last name and first initial, in all of its variations, against the table of names with accounts.


@Luke: Thank you for the script!

I just ran it and got an error that the variable isn't defined, and it highlights "ReDim Preserve myfiles(1 To"
 
Hi, Countderp!


Glad it helped you. About the comparison, could you upload a sample file? Better indeed the entire file.


For fixing the error in Luke M's code, I'll try to make a blind shoot:

a) insert this lines at the end of Dim statements

-----

Dim myfiles() As String

Dim n As Variant, m As Variant

-----

b) replace "c" and "c.Value" for "FileName" as follows:

-----

' If UCase(Right(c, 7)) = "SIG.PDF" Then

If UCase(Right(FileName, 7)) = "SIG.PDF" Then

' NewName = c.Value

NewName = FileName

-----


With these changes the code compiles ok, but I neither tested nor run it, so... no warranties at all.


Regards!
 
Thank you! It does compile, but it gets a subscript out of range error at


FInitial = Left(Trim(n(1)), 1) & ". "


Unfortunately I have no idea how to debug this. And I can't upload the original since first and last names are private info. I'll see if I can make a sample with different names.
 
Hi, Countderp!


Please print, copy and paste the values of:

FNum, FileName, myFiles(FNum), UBound(n).


Regards!


PS: this looks like a remote batch debug... just missing postal mail.
 
Thanks SirJB7 for the cleanup. I was in a hurry, and did not have time to check everything. =/


When I run the code myself on some sample files, I do not get the subscript error. Two big assumptions:

1. There are no other pdf files in the folder that we don't want to look at

2. all the naming conventions were stated above. I have a feeling this is the one causing problem...do some of the files use underscores instead of spaces?
 
Hi, Luke M!


Nice idea too, I'll incorpore the IWIAH to the CASFFML... :)


And about the error, I thought that there might be filenames without extension, ergo without commas "," so the split would return only one element. Or maybe there are subfolders (without commas in their names too)... Now that I'm writing, I'd bet for this last option primarily.


Regards!
 
Back
Top