1. Welcome to Chandoo.org Forums. Short message for you

    Hi Guest,

    Thanks for joining Chandoo.org forums. We are here to make you awesome in Excel. Before you post your first question, please read this short introduction guide. When posting or responding to questions please remember our values at Chandoo.org are: Humility, Passion, Fun, Awesomeness, Simplicity, Sharing Remember that we have people here for whom English is not there first language and we need to allow for this in our dealings.

    Yours,
    Chandoo
  2. Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  3. When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

VBA to convert .pdf to .txt

Discussion in 'VBA Macros' started by exceljockey, Jan 8, 2014.

  1. exceljockey

    exceljockey Member

    Messages:
    33
    Hi All

    I have a problem at work where a lot of files I require in an excel format are already saved as PDFs, and the original excel files have been deleted (and are not retrievable).

    If this was just for a couple of files, I would manually convert these files into text (so that I can then manipulate the data in Excel), but there are nearly a thousand files that I would need to do this for, so obviously I am in need of a macro.

    Here is what I have so far:

    Code (vb):


                    Sub convertpdf2()

                    Dim AcroXApp As Acrobat.AcroApp
                    Dim AcroXAVDoc As Acrobat.AcroAVDoc
                    Dim AcroXPDDoc As Acrobat.AcroPDDoc
                    Dim Filename As String
                    Dim jsObj As Object
                    Dim NewFileName As String

                    Filename = "C:\Documents and Settings\xxx\Desktop\file01.pdf"
                    NewFileName = "C:\Documents and Settings\xxx\Desktop\file.txt"

                    Set AcroXApp = CreateObject("AcroExch.App")
                    'AcroXApp.Show

                    Set AcroXAVDoc = CreateObject("AcroExch.AVDoc")
                    AcroXAVDoc.Open Filename, "Acrobat"

                    Set AcroXPDDoc = AcroXAVDoc.GetPDDoc


                    Set jsObj = AcroXPDDoc.GetJSObject


                    jsObj.SaveAs NewFileName, xlTextWindows



                    AcroXAVDoc.Close False
                    AcroXApp.Hide
                    AcroXApp.Exit

                    End Sub

     
    Now the problem is, upon running the code, the last line:

    Code (vb):

     jsObj.SaveAs NewFileName, xlTextWindows
     
    gives me an error: Run-Time error '1001' UnsupportedValueError: Value is unsupported. ===> Parameter cConvID.

    Now if I change the "NewFileName" to end in .pdf (instead of .txt) and then remove the "xlTextWindows" in the last line of code (essentially saving the pdf as pdf and thus not converting anything) then the code works fine, and the pdf is now saved as pdf under the new name of "file" rather than "file01".

    Due to this, I think the problem lies in the .txt part of the code, and I dont think I am missing any Excel VBA references as the code works if I dont try to save as text.

    Does anybody have any clue how to fix this?

    Once its fixed, I can just put in a loop to go through all the files in the folder and convert them all to text, putting them into a different folder. After that, I can make another macro that goes into the text files and copies over the data I need.

    Thanks,

    EJ
  2. exceljockey

    exceljockey Member

    Messages:
    33
    Solved :)

    Sub convertpdf2()

    Dim AcroXApp As Acrobat.AcroApp
    Dim AcroXAVDoc As Acrobat.AcroAVDoc
    Dim AcroXPDDoc As Acrobat.AcroPDDoc
    Dim Filename As String
    Dim jsObj As Object
    Dim NewFileName As String

    Filename = "C:\Documents and Settings\xxx\Desktop\file01.pdf"
    NewFileName = "U:\file.txt"

    Set AcroXApp = CreateObject("AcroExch.App")
    'AcroXApp.Show

    Set AcroXAVDoc = CreateObject("AcroExch.AVDoc")
    AcroXAVDoc.Open Filename, "Acrobat"

    Set AcroXPDDoc = AcroXAVDoc.GetPDDoc


    Set jsObj = AcroXPDDoc.GetJSObject


    jsObj.SaveAs NewFileName, "com.adobe.acrobat.plain-text"


    AcroXAVDoc.Close False
    AcroXApp.Hide
    AcroXApp.Exit

    End Sub
    RAM72 likes this.
  3. SirJB7

    SirJB7 Excel R┼Źnin

    Messages:
    8,894
    Hi, exceljockey!
    Glad you solved it. Thanks for sharing your solution with the community, so as people who read this could have the issue or question and the solution or answer as well.
    Regards!
  4. lucywill

    lucywill New Member

    Messages:
    1
    awesome job, this question has been solved. and i am very happy to see these sample codes to convert pdf to text. and i just want to ask one question, can i directly convert pdf to excel, will this be much simpler?
  5. rumshar

    rumshar Member

    Messages:
    132
    What reference do you need to enable? I am getting error as attached.
    With Regards
    Rudra Error.png
  6. Nitesh Khot

    Nitesh Khot Member

    Messages:
    167
    Getting error for User defined type.

    What reference you have used for this...
  7. Chihiro

    Chihiro Well-Known Member

    Messages:
    3,675
    You'd need Adobe Acrobat (not just Reader) and reference it.
  8. Nitesh Khot

    Nitesh Khot Member

    Messages:
    167
    I have installed Adobe Acrobat Reader DC but not understand refrence...

    upload_2016-4-25_18-41-49.png
  9. Chihiro

    Chihiro Well-Known Member

    Messages:
    3,675
    As stated above. You need full Acrobat licence (paid). Not just DC Reader.

Share This Page