• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Extracting multiple sedctions of data from a batch of XML files

V13RNE

New Member
Good morning everyone,

I'm trying to extract certain elements of data from a batch of XML files into a single worksheet.

At present I'm extracting 10 elements from the XML data and I've 10 loops.

The problem I'm encountering is that in any XML file there could me several instances of the same tag e.g. <roof>some data</roof> .... later on in the file <roof>details about a roof extension</roof>. This second data set I don't want.

Below is the (crude) code, I've attached my spreadsheet but can't seem to upload XML files.

Many thanks in advance for your help.

Code:
Function ExtractBetween(BTag As String, ETag As String, Data As String, StartPos As Long) As String
    Dim intPosB As Integer
    Dim intPosE As Integer
     
    intPosB = InStr(StartPos, Data, BTag)
    If intPosB > 0 Then
        intPosE = InStr(intPosB + Len(BTag) + 1, Data, ETag)
        If intPosE > 0 Then
            StartPos = intPosE + Len(ETag)
            ExtractBetween = Mid(Data, intPosB + Len(BTag), intPosE - intPosB - Len(BTag))
        Else
            StartPos = 0
            ExtractBetween = ""
        End If
    Else
        StartPos = 0
        ExtractBetween = ""
    End If
End Function

-------------------------------------

Sub CombineXMLFiles()

Dim strBeginTag As String
Dim strEndTag As String
Dim intUnit As Integer
Dim strBuf As String
Dim lngPos As Long
Dim row As Integer

Dim FileArray As Variant

FileArray = Application.GetOpenFilename(MultiSelect:=True)

If IsArray(FileArray) Then

    For i = LBound(FileArray) To UBound(FileArray)
   
   
        'Load the XML file contents as a string
        strTargetFile = FileArray(i)
        intUnit = FreeFile
        Open strTargetFile For Input As intUnit
        strBuf = Input(LOF(intUnit), #intUnit)
        Close intUnit
       
        'Write the source file name for tracking
        'Sheet1.Cells(row + i, 1).Value = strTargetFile
       
        '==========================================================
        ' Search the file for the 1st XML block - "Completion Date"
        lngPos = 1
        counter1 = 0
        ' Sheet1.Cells(row + i + 1, 1).Value = "<SAP:Completion-Date>"
        Do
            counter1 = counter1 + 1
            strBeginTag = "<SAP:Completion-Date>"
            strEndTag = "</SAP:Completion-Date>"
            Sheet1.Cells(row + i + counter1, 1).Value = ExtractBetween(strBeginTag, strEndTag, strBuf, lngPos)
           
        Loop While lngPos > 0
       
        '==========================================================
        'Search the file for the 2nd XML block - "RRN"
        lngPos = 1
        counter2 = 0
        ' Sheet1.Cells(row + i + 1, 2).Value = "<SAP:Report-Header>"
        Do
            counter2 = counter2 + 1
            strBeginTag = "<HIP:RRN>"
            strEndTag = "</HIP:RRN>"
            Sheet1.Cells(row + i + counter2, 2).Value = ExtractBetween(strBeginTag, strEndTag, strBuf, lngPos)
           
        Loop While lngPos > 0
       
        '==========================================================
        'Search the file for the 3rd XML block - "UPRN"
        lngPos = 1
        counter3 = 0
        ' Sheet1.Cells(row + i + 1, 3).Value = "<HIP:UPRN>"
        Do
            counter3 = counter3 + 1
            strBeginTag = "<HIP:UPRN>"
            strEndTag = "</HIP:UPRN>"
            Sheet1.Cells(row + i + counter3, 3).Value = ExtractBetween(strBeginTag, strEndTag, strBuf, lngPos)
           
        Loop While lngPos > 0
       
        '==========================================================
        'Search the file for the 4th XML block - "<Address-Line-1>"
        lngPos = 1
        counter4 = 0
        ' Sheet1.Cells(row + i + 1, 4).Value = "<Address-Line-1>"
        Do
            counter4 = counter4 + 1
            strBeginTag = "<SAP:Property>"
            strEndTag = "</HIP:Address-Line-1>"
            Sheet1.Cells(row + i + counter4, 4).Value = ExtractBetween(strBeginTag, strEndTag, strBuf, lngPos)
           
        Loop While lngPos > 0
       
        '==========================================================
        'Search the file for the 5th XML block - "<HIP:Post-Town>"
        lngPos = 1
        counter5 = 0
        ' Sheet1.Cells(row + i + 1, 5).Value = "<HIP:Post-Town>"
        Do
            counter5 = counter5 + 1
            strBeginTag = "<SAP:Property>"
            strEndTag = "</HIP:Post-Town>"
            Sheet1.Cells(row + i + counter5, 5).Value = ExtractBetween(strBeginTag, strEndTag, strBuf, lngPos)
           
        Loop While lngPos > 0
       
        '===========================================================
        'Search the file for the 6th XML block - "<HIP:Postcode>"
        lngPos = 1
        counter6 = 0
        ' Sheet1.Cells(row + i + 1, 6).Value = "<HIP:Postcode>"
        Do
            counter6 = counter6 + 1
            strBeginTag = "<SAP:Property>"
            strEndTag = "</HIP:Postcode>"
            Sheet1.Cells(row + i + counter6, 6).Value = ExtractBetween(strBeginTag, strEndTag, strBuf, lngPos)
           
        Loop While lngPos > 0
       
        '===========================================================
        'Search the file for the 7th XML block - "<HIP:Roof>"
        lngPos = 1
        counter7 = 0
        ' Sheet1.Cells(row + i + 1, 8).Value = "<HIP:Roof>"
        Do
            counter7 = counter7 + 1
            strBeginTag = "<HIP:Roof>"
            strEndTag = "</HIP:Description>"
            Sheet1.Cells(row + i + counter7, 7).Value = ExtractBetween(strBeginTag, strEndTag, strBuf, lngPos)
           
        Loop While lngPos > 0
       
        '===========================================================
        'Search the file for the 8th XML block - "<HIP:Wall>"
        lngPos = 1
        counter8 = 0
        ' Sheet1.Cells(row + i + 1, 8).Value = "<HIP:Wall>"
        Do
            counter8 = counter8 + 1
            strBeginTag = "<HIP:Wall>"
            strEndTag = "</HIP:Description>"
            Sheet1.Cells(row + i + counter8, 8).Value = ExtractBetween(strBeginTag, strEndTag, strBuf, lngPos)
           
        Loop While lngPos > 0
       
        '===========================================================
        'Search the file for the 8th XML block - "<HIP:Wall>"
        lngPos = 1
        counter9 = 0
        ' Sheet1.Cells(row + i + 1, 9).Value = "<HIP:Wall>"
        Do
            counter9 = counter9 + 1
            strBeginTag = "<HIP:SAP-Building-Part>"
            strEndTag = "</HIP:Construction-Age-Band>"
            Sheet1.Cells(row + i + counter9, 9).Value = ExtractBetween(strBeginTag, strEndTag, strBuf, lngPos)
           
        Loop While lngPos > 0
       
        '============================================================
        'Search the file for the 8th XML block - "<HIP:Wall>"
        lngPos = 1
        counter10 = 0
        ' Sheet1.Cells(row + i + 1, 10).Value = "<HIP:Wall>"
        Do
            counter10 = counter10 + 1
            strBeginTag = "<HIP:Total-Floor-Area>"
            strEndTag = "</HIP:Total-Floor-Area>"
            Sheet1.Cells(row + i + counter10, 10).Value = ExtractBetween(strBeginTag, strEndTag, strBuf, lngPos)
           
        Loop While lngPos > 0
       
        ' If counter1 > counter2 Then
        '   row = row + counter1
        '  Else
        '    row = row + counter2
           
        '  End If
       
       
    Next i
   
Else
   
    MsgBox "You clicked Cancel"
   
End If

End Sub

-----------------------------------
 

Attachments

  • XML_Reader_Version_1.8.xlsm
    434.2 KB · Views: 11
V13RNE

Do you think you could have attached some sample data inside one of the sheets in the file above. It makes it a lot easier for others to assist you if they can see a snapshot of your XML data. Copy it into the above and resave.

Take care

Smallman
 
Smallman

As requested - been populated with 23 XML files.

Many thanks
 

Attachments

  • XML_Reader_Version_1.8_populated.xlsm
    440 KB · Views: 6
Hello V13RNE,

As SmallMan said, it would be useful to see the actual xml file you are trying to parse. However, to demonstrate the concept, I have attached an xml file. Please rename it (from test.txt to test.xml).

Rather than using string function, Excel can leverage the power of an extensive XML library to do a lot of the work for you. You can repeat the SelectSingleNode function to return different Elements from within the XML file.

The code is:

Code:
Sub ExtractFromXML()

'Requires reference to Microsoft XML, v6.0
Dim DomDoc As MSXML2.DOMDocument60

Const sSourceFilePath As String = "TestXmlFullFilePath"

Set DomDoc = New MSXML2.DOMDocument60

With DomDoc
  .async = False 'Load entire Document before moving on
  .Load sSourceFilePath 'Loads the xml from a file

  'See www.w3schools.com/xpath/xpath_syntax.asp for details on Xpath references
  Debug.Print .SelectSingleNode("//Product/Description[1]").Text
End With

'Clean up
Set DomDoc = Nothing

End Sub
 

Attachments

  • Test.txt
    518 bytes · Views: 2
XML File - The data contained in between the tags I want are highlighted in blue. As you will note, the Address details I need are contained within the <SAP: Property> tag not the <SAP: Contact-Address> which is a few lines above.

Many thanks in advance

Cheers

V13RNE

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
- <ConditionReportRetrieveResponse_1 xmlns:HIP="DCLG-HIP" xmlns:ERR="DCLG-HIP/Exceptions" xmlns:CS="DCLG-HIP/CommonStructures" xmlns:pfdt="DCLG-HIP/BaseDataTypes" xmlns:bdt="DCLG-SAP09/BaseDataTypes" xmlns:SAP="DCLG-SAP" xmlns:SAP05="DCLG-SAP05" xmlns:SAP09="DCLG-SAP09" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.epcregister.com" xsi:schemaLocation="http://www.epcregister.com https://www.epcregister.com/RequestServices/xsd/Messages/ConditionReportRetrieveResponse_1.xsd">
- <Identification>
<CS:ServiceName>Retrieve</CS:ServiceName>
<CS:Operation>Retrieve</CS:Operation>
- <CS:TransactionDetails>
<CS:OriginatingID>Landmark</CS:OriginatingID>
<CS:RecipientID>Stroma Certification</CS:RecipientID>
<CS:Timestamp>2013-12-13T17:58:22.0Z</CS:Timestamp>
<CS:TransactionID>7846839</CS:TransactionID>
</CS:TransactionDetails>
- <Identifier>
<HIP:RRN>9278-5995-6282-6677-4910</HIP:RRN>
</Identifier>
</Identification>
- <Configuration>
<LatestRetrieved>N</LatestRetrieved>
</Configuration>
- <Content>
- <EPC-Data>
<SAP:SAP-Version>9.91</SAP:SAP-Version>
<SAP:BEDF-Revision-Number>351</SAP:BEDF-Revision-Number>
<SAP:Calculation-Software-Name>Stroma RdSAP Software</SAP:Calculation-Software-Name>
<SAP:Calculation-Software-Version>1.4.1.0</SAP:Calculation-Software-Version>
- <SAP:Report-Header>
<SAP:RRN>9278-5995-6282-6677-4910</SAP:RRN>
<SAP:Inspection-Date>2013-12-13</SAP:Inspection-Date>
<SAP:Report-Type>2</SAP:Report-Type>
<SAP:Completion-Date>2013-12-13</SAP:Completion-Date>
<SAP:Registration-Date>2013-12-13</SAP:Registration-Date>
<SAP:Status>entered</SAP:Status>
<SAP:Language-Code>1</SAP:Language-Code>
<SAP:Restricted-Access>0</SAP:Restricted-Access>
<SAP:Tenure>1</SAP:Tenure>
<SAP:Transaction-Type>5</SAP:Transaction-Type>
<SAP:Seller-Commission-Report>Y</SAP:Seller-Commission-Report>
<SAP:property-Type>0</SAP:property-Type>
- <SAP:Home-Inspector>
<SAP:Scheme-Name>Stroma Certification</SAP:Scheme-Name>
<SAP:Scheme-Web-Site>www.stroma.com</SAP:Scheme-Web-Site>
- <SAP:Identification-Number>
<SAP:Certificate-Number>STRO002072</SAP:Certificate-Number>
</SAP:Identification-Number>
<SAP:Name>Mr Richard Greenwood DEA</SAP:Name>
<SAP:Notify-Lodgement>Y</SAP:Notify-Lodgement>
- <SAP:Contact-Address>
<HIP:Address-Line-1>70 Warmwells Lane</HIP:Address-Line-1>
<HIP:Address-Line-2>Derbyshire</HIP:Address-Line-2>
<HIP:post-Town />
<HIP:postcode>DE5 8JB</HIP:postcode>
</SAP:Contact-Address>
<SAP:Web-Site>www.stroma.com</SAP:Web-Site>
<SAP:E-Mail>rgreenwood.epc@gmail.com</SAP:E-Mail>
<SAP:Fax>0000</SAP:Fax>
<SAP:Telephone>07814548491</SAP:Telephone>
<SAP:Company-Name>R Greenwood DEA</SAP:Company-Name>
</SAP:Home-Inspector>
- <SAP:property>
- <HIP:Address>
<HIP:Address-Line-1>1, Brickyard Lane</HIP:Address-Line-1>
<HIP:Address-Line-2>Kilburn</HIP:Address-Line-2>
<HIP:post-Town>BELPER</HIP:post-Town>
<HIP:postcode>DE56 0LL</HIP:postcode>

</HIP:Address>
<HIP:UPRN>5189547668</HIP:UPRN>
</SAP:property>
<SAP:Region-Code>6</SAP:Region-Code>
<SAP:Country-Code>EAW</SAP:Country-Code>
- <SAP:Related-Party-Disclosure>
<SAP:Related-Party-Disclosure-Number>1</SAP:Related-Party-Disclosure-Number>
</SAP:Related-Party-Disclosure>
</SAP:Report-Header>
- <SAP:Energy-Assessment>
- <HIP:property-Summary>
- <HIP:Wall>
<HIP:Description>Solid brick, as built, no insulation (assumed)</HIP:Description>
<HIP:Energy-Efficiency-Rating>1</HIP:Energy-Efficiency-Rating>
<HIP:Environmental-Efficiency-Rating>1</HIP:Environmental-Efficiency-Rating>
</HIP:Wall>
- <HIP:Roof>
<HIP:Description>Pitched, 100 mm loft insulation</HIP:Description>
<HIP:Energy-Efficiency-Rating>3</HIP:Energy-Efficiency-Rating>
<HIP:Environmental-Efficiency-Rating>3</HIP:Environmental-Efficiency-Rating>
</HIP:Roof>
- <HIP:Floor>

CUT...

ency-Rating>3</HIP:Environmental-Efficiency-Rating>
</HIP:Lighting>
- <HIP:Secondary-Heating>
<HIP:Description>Room heaters, mains gas</HIP:Description>
<HIP:Energy-Efficiency-Rating>0</HIP:Energy-Efficiency-Rating>
<HIP:Environmental-Efficiency-Rating>0</HIP:Environmental-Efficiency-Rating>
</HIP:Secondary-Heating>
<HIP:Has-Hot-Water-Cylinder>false</HIP:Has-Hot-Water-Cylinder>
<HIP:Has-Heated-Separate-Conservatory>false</HIP:Has-Heated-Separate-Conservatory>
<HIP:Dwelling-Type>Semi-detached house</HIP:Dwelling-Type>
<HIP:Total-Floor-Area>110</HIP:Total-Floor-Area>
<HIP:Has-Fixed-Air-Conditioning>false</HIP:Has-Fixed-Air-Conditioning>
</HIP:property-Summary>


HUGE CUT HERE...

- <HIP:SAP-Building-Parts>
- <HIP:SAP-Integral-Conservatory>
<HIP:Double-Glazed>Y</HIP:Double-Glazed>
<HIP:Floor-Area>13.217</HIP:Floor-Area>
<HIP:Glazed-Perimeter>9.403</HIP:Glazed-Perimeter>
<HIP:Room-Height>1</HIP:Room-Height>
</HIP:SAP-Integral-Conservatory>
- <HIP:SAP-Building-Part>
<HIP:Building-Part-Number>1</HIP:Building-Part-Number>
<HIP:Identifier>Main Dwelling</HIP:Identifier>
<HIP:Construction-Age-Band>A</HIP:Construction-Age-Band>
<HIP:Wall-Thickness-Measured>Y</HIP:Wall-Thickness-Measured>
<HIP:Wall-Thickness>230</HIP:Wall-Thickness>
<HIP:Wall-Dry-Lined>N</HIP:Wall-Dry-Lined>
<HIP:Wall-Construction>3</HIP:Wall-Construction>
<HIP:Wall-Insulation-Type>4</HIP:Wall-Insulation-Type>
<HIP:Roof-Construction>4</HIP:Roof-Construction>
<HIP:Roof-Insulation-Location>2</HIP:Roof-Insulation-Location>
<HIP:Roof-Insulation-Thickness>100mm</HIP:Roof-Insulation-Thickness>
- <HIP:SAP-Floor-Dimensions>
- <HIP:SAP-Floor-Dimension>
<HIP:Floor>0</HIP:Floor>
<HIP:Floor-Construction>2</HIP:Floor-Construction>

ANOTHER HUGE CUT...

</Content>
</ConditionReportRetrieveResponse_1>
 
Because you are using namespaces, these have to be declared before you can parse, using XPath queries.

Code:
Sub ExtractFromXML()

'Requires reference to Microsoft XML, v6.0
Dim DomDoc As MSXML2.DOMDocument60

Const sSourceFilePath As String = "C:\Data\1 Brickyard Lane DE56 0LL.xml"

Set DomDoc = New MSXML2.DOMDocument60

With DomDoc
  .async = False 'Load entire Document before moving on
  .Load sSourceFilePath 'Loads the xml from a file
  .setProperty "SelectionNamespaces", "xmlns:HIP='DCLG-HIP' xmlns:ERR='DCLG-HIP/Exceptions' " & _
    "xmlns:CS='DCLG-HIP/CommonStructures' xmlns:pfdt='DCLG-HIP/BaseDataTypes' " & _
    "xmlns:bdt='DCLG-SAP09/BaseDataTypes' xmlns:SAP='DCLG-SAP' xmlns:SAP05='DCLG-SAP05' " & _
    "xmlns:SAP09='DCLG-SAP09' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'"
  'See www.w3schools.com/xpath/xpath_syntax.asp for details on Xpath references
  Debug.Print .SelectSingleNode("//HIP:RRN").Text
  Debug.Print .SelectSingleNode("//SAP:Completion-Date").Text
  Debug.Print .SelectSingleNode("//SAP:Property/HIP:Address/HIP:Address-Line-1").Text
  Debug.Print .SelectSingleNode("//SAP:Property/HIP:UPRN").Text
  Debug.Print .SelectSingleNode("//HIP:Wall/HIP:Description").Text
End With

'Clean up
Set DomDoc = Nothing

End Sub
 
Thanks to all for your replies - apologies for not acknowledging sooner - no internet during evenings (at the moment).

I'll work through all your suggestions and (hopefully) come up with a solution. Being able to process all this information contained within these XML files helps are surveyors enormously.

Thanks again - most appreciated :)

V13RNE
 
Back
Top