• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Using post data and xlmlhttp

YasserKhalil

Well-Known Member
Hello everyone
I am still trying to learn more about scraping and I could devise a code that enables me to get the desired results
Here's the code
Code:
Sub Test()
    Dim e          As Variant
    Dim ie          As Object
    Dim ulElem      As Object
    Dim liElem      As Object
    Dim anchElem    As Object
    Dim dt          As Date
    Dim lDay        As Integer
    Dim lMnth      As Integer
    Dim lYear      As Integer
    Dim r          As Long

    Set ie = CreateObject("InternetExplorer.Application")
    dt = Date - 2
    lDay = Day(dt)
    lMnth = Month(dt)
    lYear = Year(dt)

    With ie
        .Visible = False
        .navigate ("http://www.handelsregisterbekanntmachungen.de/?aktion=suche#Ergebnis")

        Do: DoEvents: Loop Until .readyState = 4

        For Each e In ie.document.getElementsByTagName("select")
            If Len(e.innerText) = 56 Then
                e.selectedIndex = lDay
            ElseIf Len(e.innerText) = 18 Then
                e.selectedIndex = lMnth
            ElseIf Left(e.innerText, 8) = "----2000" Then
                e.selectedIndex = lYear - 1999
            ElseIf InStr(e.innerText, "Alle Bekanntmachungen") > 0 Then
                e.selectedIndex = 1
            End If
        Next e

        For Each e In ie.document.getElementsByTagName("input")
            If e.Value = "Suche starten" Then e.Click: Exit For
        Next e
        Do: DoEvents: Loop Until .readyState = 4
        Application.Wait Now() + TimeValue("00:00:05")

        If InStr(ie.document.body.innerHTML, "Es wurden 0 Treffer gefunden.") > 0 Then
            MsgBox "No Results Found", vbExclamation: Exit Sub
        Else
            For Each ulElem In ie.document.getElementsByTagName("b")
                For Each liElem In ulElem.getElementsByTagName("li")
                    Set anchElem = liElem.getElementsByTagName("a")
                    If anchElem.Length > 0 Then
                        r = r + 1
                        Cells(r, 1) = Mid(anchElem.Item(0).innerText, 11)
                    End If
                Next liElem
            Next ulElem
        End If
    End With
End Sub

But as a matter of trying to learn more about xmlhttp I am seeking for a way to get the same results but without using ie .. so I think using xmlhttp will be more efficient specially I could see post data after setting up the desired choices for the search process ..

Thanks advanced for help
 
Like in previous links …

F12, Network tab (clear rows if needed) and operate manually
within webpage : you will see request used.

If you don't understand the read of previous links,
at least crystal clear explain what you are trying to do,
details all manual steps within webpage …
 
Once you've filled the form, before to click on button,
activate inner inspector tool (F12), activate Network tab
(clear rows if needed) then click button :
once webpage completed with data, you will see within this Network tab
the request used by the webbrowser (as explained in my links),
just double click on it to see its details.

This is the very basics of any request scrapping,
as it's just about reading and observing …

A good training, for shahin too ! …
 
Thank you very much
I already have done those steps
001.png


002.png

and this is the form data content
Code:
suchart=uneingeschr&button=Suche+starten&land=&gericht=&gericht_name=&seite=&l=&r=&all=false&vt=11&vm=2&vj=2018&bt=11&bm=2&bj=2018&fname=&fsitz=&rubrik=&az=&gegenstand=1&anzv=10&order=1

What's the next step?
 

Next step is to reproduce this request under VBA
using some details from the inspector tool …​
 
I tried that code
Code:
Sub Test_XMLHTTP()
    Dim XMLHTTP    As New MSXML2.XMLHTTP60
    Dim myUrl      As String
    Dim postData    As String
   
    myUrl = "http://www.handelsregisterbekanntmachungen.de/?aktion=suche#Ergebnis"
    postData = "suchart=uneingeschr&button=Suche+starten&land=&gericht=&gericht_name=&seite=&l=&r=&all=false&vt=11&vm=2&vj=2018&bt=11&bm=2&bj=2018&fname=&fsitz=&rubrik=&az=&gegenstand=1&anzv=10&order=1"
   
    XMLHTTP.Open "GET", myUrl, False
    XMLHTTP.send postData
   
    Debug.Print XMLHTTP.responseText
End Sub

but I expect the responseText to have the string: "Niedersachsen"
but I didn't find that string within the responseText ...
 
Run the code and get the data you are trying to. String manipulation is necessary to split different portions. I'm having lunch. Catch you later.
Code:
Sub Fetch_Data()
    Dim HTTP As New XMLHTTP60, HTML As New HTMLDocument
    Dim post As Object
   
    postdata = "suchart=uneingeschr&button=Suche+starten&land=&gericht=&gericht_name=&seite=&l=&r=&all=false&vt=11&vm=2&vj=2018&bt=11&bm=2&bj=2018&fname=&fsitz=&rubrik=&az=&gegenstand=1&anzv=10&order=1"
   
    With HTTP
        .Open "POST", "http://www.handelsregisterbekanntmachungen.de/?aktion=suche", False
        .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
        .send postdata
        HTML.body.innerHTML = .responseText
    End With
   
    For Each post In HTML.getElementsByTagName("ul")
        Row = Row + 1: Cells(Row, 1) = post.innerText
    Next post
End Sub
 
Thanks a lot my friend
The code works but this is not the desired HTML page ..the results are different from manual steps ..
The first result should start with the string "Ginsterbusch" and I didn't find it ..
 

Post a screenshot of webpage display after button hit (expecting)
and you actual code, I'll try to take a glance after lunch break …​
 
Thanks alot Mr. MarcL
I have this code now after Mr. Shain's great solution
Code:
Sub Test_XMLHTTP()
    Dim http        As New XMLHTTP60
    Dim html        As New HTMLDocument
    Dim post        As Object
    Dim myUrl      As String
    Dim postData    As String
    Dim r          As Long

    myUrl = "http://www.handelsregisterbekanntmachungen.de/?aktion=suche#Ergebnis"
    postData = "suchart=uneingeschr&button=Suche+starten&land=&gericht=&gericht_name=&seite=&l=&r=&all=false&vt=11&vm=2&vj=2018&bt=11&bm=2&bj=2018&fname=&fsitz=&rubrik=&az=&gegenstand=1&anzv=10&order=1"

    With http
        .Open "POST", myUrl, False
        .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
        .send postData
        html.body.innerHTML = .responseText
    End With

    Debug.Print html.body.innerHTML
End Sub

I manually copied the contents of the immediate window and paste into a text file and examine the string manually so as to make sure this is the correct html page
 

Bad idea as sometimes content window can't keep all text (rows # limit) ! Directly check responseText with InStr …​
 
To get only the targeted portion, you can try like below:
Code:
Sub Fetch_Data()
    Dim HTTP As New XMLHTTP60, HTML As New HTMLDocument
    Dim post As Object, postdta As String
   
    postdata = "suchart=uneingeschr&button=Suche+starten&land=&gericht=&gericht_name=&seite=&l=&r=&all=false&vt=11&vm=2&vj=2018&bt=11&bm=2&bj=2018&fname=&fsitz=&rubrik=&az=&gegenstand=1&anzv=10&order=1"
   
    With HTTP
        .Open "POST", "http://www.handelsregisterbekanntmachungen.de/?aktion=suche", False
        .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
        .send postdata
        HTML.body.innerHTML = .responseText
    End With

    For Each post In HTML.getElementsByTagName("li")
        With post.getElementsByTagName("ul")
            If .Length Then Row = Row + 1: Cells(Row, 1) = .Item(0).innerText
        End With
    Next post
End Sub

The result it populates (see the attached image):
 

Attachments

  • Untitled.jpg
    Untitled.jpg
    32.2 KB · Views: 7
Thanks again to Marc L to teach me how to avoid "On Error Resume Next" by using the pattern of code I've pasted above.
 
I found my fault .. after many tries
It was in the url string .. there is extra "#Ergebnis" so I have not got the desired resulls
Now I started to get the process
 
@shahin
Is that line necessary and why
Code:
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
What make us decide those 'setRequestHeader' lines ...?

I tried once with that line and once without that line and I got different results .. so it is very important ..
Can you explain that point please?
 
When you send a post request, you send it like filling a form because there are more or less few options you need to choose from the main page to populate results.
 
Thank you very much for both of you Mr. MarcL and Mr. Sahhin for great support in this issue
Now I got most of the process .. but training is better than thousands of words

Here's the last edition of the code (for the followers of the thread)
Code:
Sub Post_Search_Form_Data_Using_XMLHTTP_Loop_Through_Li_Tag_Name()
    Dim http        As New XMLHTTP60
    Dim html        As New HTMLDocument
    Dim post        As Object
    Dim myUrl      As String
    Dim postData    As String
    Dim dt          As Date
    Dim lDay        As Integer
    Dim lMnth      As Integer
    Dim lYear      As Integer
    Dim r          As Long

    dt = Date - 2
    lDay = Day(dt): lMnth = Month(dt): lYear = Year(dt)

    myUrl = "http://www.handelsregisterbekanntmachungen.de/?aktion=suche"
    postData = "suchart=uneingeschr&button=Suche+starten&land=&gericht=&gericht_name=&seite=&l=&r=&all=false&vt=" & lDay & "&vm=" & lMnth & "&vj=" & lYear & "&bt=" & lDay & "&bm=" & lMnth & "&bj=" & lYear & "&fname=&fsitz=&rubrik=&az=&gegenstand=1&anzv=10&order=1"

    With http
        .Open "POST", myUrl, False
        .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
        .send postData
        html.body.innerHTML = .responseText
    End With
   
    For Each post In html.getElementsByTagName("li")
        With post.getElementsByTagName("ul")
            If .Length Then r = r + 1: Cells(r, 1) = .Item(0).innerText
        End With
    Next post
End Sub
 
Back
Top