• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Macro to extract text from Word doc

JMayo

New Member
I have been given the task of taking almost 30 Word docs and extracting the information and compiling it into Excel. I began doing this manually by using Find/Replace which is how I was able to complete the first 3 sheets, however, this is cumbersome and time consuming and these will need updating on a regular basis. I have seen multiple threads on this topic but I am not proficient enough with VBA to modify the code to work for my needs. I have attached examples of both the Excel spreadsheet and samples of the Word docs I'll be using. The Excel workbook already has several sheets filled in as an example of what the completed project should look like. Basically, the macro needs to open up the document correlating to the name of the sheet (each one is unique), extract the UID, Abbreviation, Alias, Name, and Units and place them on the same row under their respective column starting on ROW 5 then move to the next row and do the same. Once all the data has been extracted (and each sheet has a unique number of rows) then the document is to be closed and the sheet formatted as in the example sheets to highlight the data. Thanks for any help or guidance you can provide.

I couldn't upload all of the files because it limited me to 5. Here is the link to all of them on OneDrive:
 

Attachments

  • For Forum.xlsx
    343.4 KB · Views: 25
  • Qwer.docx
    23.7 KB · Views: 21
  • Asdf.docx
    33.5 KB · Views: 15
  • Zxcv.docx
    33.6 KB · Views: 9
  • Poiu.docx
    50.8 KB · Views: 10
JMayo
... and as You are 'New Member' then of course,
You have read Forum Rules
before start posting
 
Question :​
as the workbook 'For Forum' does not reflect the content of the document 'Qwer'​
- so difficult to help without an explanation and an attachment both good enough -​
- for the document part it is obviously better to ask first on a Word forum -​
should the worksheet 'Qwer' be cleared before extracting data from its relative document ?​
 
Another point : rather than playing with Word under Excel the easy Excel VBA beginner way is to extract data from a text file​
so if you can save each Word document as a text file …​
Edit : as since I found a workaround with Word to be as fast like with a text file …​
 
Another point : to make the code easier and faster, the obvious way is to not format anything​
but just to start from a worksheet template, so if you can create and attach it here …​
Edit : you can forget this one if the workbook 'For Forum' already contains the sheets (rows 1 to 4 formatted) accordings to documents …​
 
Another point : as your workbook 'For forum.xlsx' can not contain any VBA procedure as it is,​
where should be located the code ?​
 
macropod
... again ...
... and as You are 'New Member' then of course,
You have read Forum Rules
before start posting
 
Which means what, exactly, in this context? Or perhaps you'd prefer people waste time here answering a question that's been answered elsewhere...
 
JMayo
As written in Forum Rules
a) You skipped blue part
b) Now, You try to skip as written in green part :
  • Cross-Posting. Generally, it is considered poor practice to cross post. That is to post the same question on several forums in the hope of getting a response quicker.
  • If you do cross-post, please put that in your post.
  • Also if you have cross-posted and get an Solution elsewhere, have the courtesy of posting the Solution here so other readers can learn from the answer also, as well as stopping people wasting their time on your answered question.
 
Last edited:
Without wanting to place too fine a point on it, I am not the one who cross-posted. Kindly get off your high horse. As a moderator of multiple forums, I am well aware of what the rules say and mean.
 
Sorry, sorry macropod
I thought few seconds after I post my previous post ... sorry that I wrote is as for You.
After few more second I notice that JMayo is correct person,
who should follow rules in this thread (as everyone else too),
which seems to be hard for many.
 
JMayo, if you are able to just read and well answer to questions you may see another way, should be faster …​
 
Marc L: I don't see you offering a solution; merely telling the OP he/she should do something different. The problem has already been solved...
 
As I have other ways but to 'offer' any I just need answers so it's now the OP choice,​
if he needs it if for example your solution does not well suit for what he expects for, whatever …​
 
this is a tough crowd on this forum. I do apologize for being a novice but would appreciate some grace. Yes, I did cross post but corrected my evil ways (I thought). And yes, I did get a response on Mr. Excel (thanks Macropod) and have responded there. Here is the solution I received and am trying to work out the details with Macropod:

>>> use code - tags <<<
Code:
Sub GetWordData()
'Note: this code requires a reference to the Word object model. See under the VBE's Tools|References.
Application.ScreenUpdating = False
Dim wdApp As New Word.Application, wdDoc As Word.Document, c As Long, r As Long
Dim strFolder As String, strFile As String, WkBk As Workbook, WkSht As Worksheet
strFolder = GetFolder: If strFolder = "" Then GoTo ErrExit
Set WkBk = ActiveWorkbook
strFile = Dir(strFolder & "\*.docx", vbNormal)
While strFile <> ""
  Set wdDoc = wdApp.Documents.Open(Filename:=strFolder & "\" & strFile, AddToRecentFiles:=False, Visible:=False)
  Set WkSht = WkBk.Sheets.Add: r = 4
  WkSht.Name = Split(strFile, ".doc")(0)
  WkBk.Sheets(1).Range.Copy
  WkSht.Paste
  WkSht.Range("A2").Value = WkSht.Name
  With wdDoc
    With .Range
      With .Find
        .ClearFormatting
        .Replacement.ClearFormatting
        'Find blocks of text of interest
        .Text = "Uid:*Units:*^13"
        .Replacement.Text = ""
        .Forward = True
        .Wrap = wdFindStop
        .MatchWildcards = True
        .Execute
      End With
      Do While .Find.Found
        r = r + 1
        'Parse & write the text to Excel
        For c = 1 To 4
          WkSht.Cells(r, c).Value = Trim(Split(Split(.Text, vbCr)(c - 1), ":")(1))
        Next
        .Collapse wdCollapseEnd
        .Find.Execute
      Loop
    End With
    .Close SaveChanges:=False
  End With
  strFile = Dir()
Wend
ErrExit:
wdApp.Quit
Set wdDoc = Nothing: Set wdApp = Nothing: Set WkSht = Nothing: Set WkBk = Nothing
Application.ScreenUpdating = True
End Sub

Function GetFolder() As String
    Dim oFolder As Object
    GetFolder = ""
    Set oFolder = CreateObject("Shell.Application").BrowseForFolder(0, "Choose a folder", 0)
    If (Not oFolder Is Nothing) Then GetFolder = oFolder.Items.Item.Path
    Set oFolder = Nothing
End Function
 
Last edited by a moderator:
  • Any missing data after running his procedure ?

  • According to your attachment : how many time is needed to fill your 8 sheets ?
 
See my reply @ MrExcel. It is best if you keep discussions about replies on another forum to that forum; otherwise you risk people on different forums clouding the issue.
 
I wanna know for the time as from the attachment as it is like stated in the initial post​
my second fastest way reading documents from Word but 'giving the priority to Excel'​
needs on an old laptop - the slowest I have - around 13 seconds to fill the 8 worksheets​
- without desactivating the screen updating and in late binding way so without activating any reference -​
as the fastest - but less safer - from Word needs more than 3 seconds less​
and directly from text files less than 2 seconds …​
 
It now turns out the OP has been trying to run the code on a Mac and the actual source files are .txt files, not .docx files...
 
Yes easier from text files and the fastest way directly under Excel without using Word …​
 
Back
Top