• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

VBA to convert .pdf to .txt

Hi All

I have a problem at work where a lot of files I require in an excel format are already saved as PDFs, and the original excel files have been deleted (and are not retrievable).

If this was just for a couple of files, I would manually convert these files into text (so that I can then manipulate the data in Excel), but there are nearly a thousand files that I would need to do this for, so obviously I am in need of a macro.

Here is what I have so far:

Code:
                Sub convertpdf2()

                Dim AcroXApp As Acrobat.AcroApp
                Dim AcroXAVDoc As Acrobat.AcroAVDoc
                Dim AcroXPDDoc As Acrobat.AcroPDDoc
                Dim Filename As String
                Dim jsObj As Object
                Dim NewFileName As String

                Filename = "C:\Documents and Settings\xxx\Desktop\file01.pdf"
                NewFileName = "C:\Documents and Settings\xxx\Desktop\file.txt"

                Set AcroXApp = CreateObject("AcroExch.App")
                'AcroXApp.Show

                Set AcroXAVDoc = CreateObject("AcroExch.AVDoc")
                AcroXAVDoc.Open Filename, "Acrobat"

                Set AcroXPDDoc = AcroXAVDoc.GetPDDoc


                Set jsObj = AcroXPDDoc.GetJSObject


                jsObj.SaveAs NewFileName, xlTextWindows



                AcroXAVDoc.Close False
                AcroXApp.Hide
                AcroXApp.Exit

                End Sub
Now the problem is, upon running the code, the last line:

Code:
 jsObj.SaveAs NewFileName, xlTextWindows
gives me an error: Run-Time error '1001' UnsupportedValueError: Value is unsupported. ===> Parameter cConvID.

Now if I change the "NewFileName" to end in .pdf (instead of .txt) and then remove the "xlTextWindows" in the last line of code (essentially saving the pdf as pdf and thus not converting anything) then the code works fine, and the pdf is now saved as pdf under the new name of "file" rather than "file01".

Due to this, I think the problem lies in the .txt part of the code, and I dont think I am missing any Excel VBA references as the code works if I dont try to save as text.

Does anybody have any clue how to fix this?

Once its fixed, I can just put in a loop to go through all the files in the folder and convert them all to text, putting them into a different folder. After that, I can make another macro that goes into the text files and copies over the data I need.

Thanks,

EJ
 
Solved :)

Sub convertpdf2()

Dim AcroXApp As Acrobat.AcroApp
Dim AcroXAVDoc As Acrobat.AcroAVDoc
Dim AcroXPDDoc As Acrobat.AcroPDDoc
Dim Filename As String
Dim jsObj As Object
Dim NewFileName As String

Filename = "C:\Documents and Settings\xxx\Desktop\file01.pdf"
NewFileName = "U:\file.txt"

Set AcroXApp = CreateObject("AcroExch.App")
'AcroXApp.Show

Set AcroXAVDoc = CreateObject("AcroExch.AVDoc")
AcroXAVDoc.Open Filename, "Acrobat"

Set AcroXPDDoc = AcroXAVDoc.GetPDDoc


Set jsObj = AcroXPDDoc.GetJSObject


jsObj.SaveAs NewFileName, "com.adobe.acrobat.plain-text"


AcroXAVDoc.Close False
AcroXApp.Hide
AcroXApp.Exit

End Sub
 

SirJB7

Excel Rōnin
Hi, exceljockey!
Glad you solved it. Thanks for sharing your solution with the community, so as people who read this could have the issue or question and the solution or answer as well.
Regards!
 

lucywill

New Member
awesome job, this question has been solved. and i am very happy to see these sample codes to convert pdf to text. and i just want to ask one question, can i directly convert pdf to excel, will this be much simpler?
 

rsalois

New Member
As mentioned multiple times ago above here: you do need to purchase the full Acrobat license instead of trying to handle that with Reader only - it will save you a lot of time. But if the price is the matter of concern, you may also to try so using some less expensive products like this one e.g. https://ds11.pdffiller.com It's not a software solution but online, so I can't claim it shall work in your very case for good, but it did so in the one of mine
 
Top