• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

VBA to convert .pdf to .txt

Status
Not open for further replies.
Hi All

I have a problem at work where a lot of files I require in an excel format are already saved as PDFs, and the original excel files have been deleted (and are not retrievable).

If this was just for a couple of files, I would manually convert these files into text (so that I can then manipulate the data in Excel), but there are nearly a thousand files that I would need to do this for, so obviously I am in need of a macro.

Here is what I have so far:

Code:
                Sub convertpdf2()

                Dim AcroXApp As Acrobat.AcroApp
                Dim AcroXAVDoc As Acrobat.AcroAVDoc
                Dim AcroXPDDoc As Acrobat.AcroPDDoc
                Dim Filename As String
                Dim jsObj As Object
                Dim NewFileName As String

                Filename = "C:\Documents and Settings\xxx\Desktop\file01.pdf"
                NewFileName = "C:\Documents and Settings\xxx\Desktop\file.txt"

                Set AcroXApp = CreateObject("AcroExch.App")
                'AcroXApp.Show

                Set AcroXAVDoc = CreateObject("AcroExch.AVDoc")
                AcroXAVDoc.Open Filename, "Acrobat"

                Set AcroXPDDoc = AcroXAVDoc.GetPDDoc


                Set jsObj = AcroXPDDoc.GetJSObject


                jsObj.SaveAs NewFileName, xlTextWindows



                AcroXAVDoc.Close False
                AcroXApp.Hide
                AcroXApp.Exit

                End Sub
Now the problem is, upon running the code, the last line:

Code:
 jsObj.SaveAs NewFileName, xlTextWindows

gives me an error: Run-Time error '1001' UnsupportedValueError: Value is unsupported. ===> Parameter cConvID.

Now if I change the "NewFileName" to end in .pdf (instead of .txt) and then remove the "xlTextWindows" in the last line of code (essentially saving the pdf as pdf and thus not converting anything) then the code works fine, and the pdf is now saved as pdf under the new name of "file" rather than "file01".

Due to this, I think the problem lies in the .txt part of the code, and I dont think I am missing any Excel VBA references as the code works if I dont try to save as text.

Does anybody have any clue how to fix this?

Once its fixed, I can just put in a loop to go through all the files in the folder and convert them all to text, putting them into a different folder. After that, I can make another macro that goes into the text files and copies over the data I need.

Thanks,

EJ
 
Solved :)

Sub convertpdf2()

Dim AcroXApp As Acrobat.AcroApp
Dim AcroXAVDoc As Acrobat.AcroAVDoc
Dim AcroXPDDoc As Acrobat.AcroPDDoc
Dim Filename As String
Dim jsObj As Object
Dim NewFileName As String

Filename = "C:\Documents and Settings\xxx\Desktop\file01.pdf"
NewFileName = "U:\file.txt"

Set AcroXApp = CreateObject("AcroExch.App")
'AcroXApp.Show

Set AcroXAVDoc = CreateObject("AcroExch.AVDoc")
AcroXAVDoc.Open Filename, "Acrobat"

Set AcroXPDDoc = AcroXAVDoc.GetPDDoc


Set jsObj = AcroXPDDoc.GetJSObject


jsObj.SaveAs NewFileName, "com.adobe.acrobat.plain-text"


AcroXAVDoc.Close False
AcroXApp.Hide
AcroXApp.Exit

End Sub
 
Hi, exceljockey!
Glad you solved it. Thanks for sharing your solution with the community, so as people who read this could have the issue or question and the solution or answer as well.
Regards!
 
awesome job, this question has been solved. and i am very happy to see these sample codes to convert pdf to text. and i just want to ask one question, can i directly convert pdf to excel, will this be much simpler?
 
I've tested the code and my PDF file opens but when the code gets to the line:
"jsObj.SaveAs NewFileName, "com.adobe.acrobat.plain-text" it freezes and the acrobat file is open in acrobat with a dialog box that says:
"File Open Failed"

My variables are as below

Filename = "C:\Users\Richard\Documents\Excel\PDFs_to_Convert\Universal Stmt Aug.pdf"
NewFileName = "C:\Users\Richard\Documents\Excel\PDFs_Converted_to_Xlsx\Universal Stmt Aug.txt"

In Tools References I do have the Acrobat 10.0 Library selected
Any suggestions as to why I'm getting this error


rreily
 
@rreily

Rather than replying to someone else's thread (especially an old one), you'll get more help if you start a new thread. You are welcome to link to other thread(s) for reference, but a new thread with 0 responses is more likely to catch the attention of other members, as they'll see that someone still needs help.
 
Status
Not open for further replies.
Back
Top