• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Web Scraping

on this site you can click on the state and then on the province and then select the category of interest.


this is the contact data and activity description (attach photos of both of one example).


is it possible to select individually the state, province and category or multiple selections or to choose to extrapolate contact and description data?


there is the limit of 800 extractions per IP per day (with a single exchange system, it would be possible to exceed the daily total or research could be saved to continue the next day).
 

Attachments

  • 7D4AA304-B1F3-4E75-B8EE-1FB8B1A82FB5.jpeg
    7D4AA304-B1F3-4E75-B8EE-1FB8B1A82FB5.jpeg
    82.2 KB · Views: 7
  • 880A63CC-6385-4E18-85C7-060B7CAE90B2.jpeg
    880A63CC-6385-4E18-85C7-060B7CAE90B2.jpeg
    45.7 KB · Views: 7
I have looked at this site and it does not appear to be formatted in a manner that would allow for extraction of data to excel in a meaningful format. There are no tables for analysis. It is a lot of data but not in an analytical format. Not sure what you would like to do with this information
 
i'd like to mark them on a map and do data analysis.


Look like a very well-organized site.


you're good at creating Macro, and I trust what you wrote.


thank you
 
Maybe do some research on Power Maps. I have never done anything like that nor worked with Power Maps.
 
Could you be more specific about the fields you are willing to grab from that site? Make sure to detail in such a way so that the script can follow the steps to reach the data.
 
One more thing, you once showed interest to parse data from google using vba, right? What I could figured that you can achieve the same if you wish to go for www.bing.com search engine. The result will vary but not much.
 
i don't have a Macro extrapolating the data from bing.





as for Macro


i have a Macro that's fine, but it's blocking and nobody understands why, but it extrapolates everything from the geographical coordinates, and that's the only way I can't continue in the block


i feel like automatically creating the search keys through an online server
 
Okay, I've already created the script. It should exhaust the whole site other than the next page content available in some links like https://vymaps.com/AF/Badakhshan/establishment/. As the script I've written is a huge one, I didn't go for the pagination stuff. However, it will fetch you the `name`, `address` and the `coordinates` from that site.

Code:
Sub GetContent()
    Const Url$ = "https://vymaps.com/"
    Dim Http As New XMLHTTP60, Html As New HTMLDocument
    Dim HtmlDoc As New HTMLDocument, HtmlNewDoc As New HTMLDocument
    Dim HtmlLastDoc As New HTMLDocument, HtmlFinalDoc As New HTMLDocument
    Dim I&, N&, F&, L&, R&, secondPageLink$, thirdPageLink$, fourthPageLink$
    Dim finalPageLink$, oName$, oAddress$, oCoordiname$
    

    With Http
        .Open "GET", Url, False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send
        Html.body.innerHTML = .responseText
    End With

    With Html.querySelectorAll(".four > a[href*='//vymaps.com/']")
        For I = 1 To .Length - 1
            secondPageLink = "https:" & Replace(.item(I).getAttribute("href"), "about:", "")
            With Http
                .Open "GET", secondPageLink, False
                .setRequestHeader "User-Agent", "Mozilla/5.0"
                .send
                HtmlDoc.body.innerHTML = .responseText
            End With
            
            With HtmlDoc.querySelectorAll(".four > a[href*='//vymaps.com/']")
                For N = 0 To .Length - 1
                    thirdPageLink = "https:" & Replace(.item(N).getAttribute("href"), "about:", "")
                    With Http
                        .Open "GET", thirdPageLink, False
                        .setRequestHeader "User-Agent", "Mozilla/5.0"
                        .send
                        HtmlNewDoc.body.innerHTML = .responseText
                    End With
                    With HtmlNewDoc.querySelectorAll(".four > a[href*='//vymaps.com/']")
                        For F = 0 To .Length - 1
                            fourthPageLink = "https:" & Replace(.item(F).getAttribute("href"), "about:", "")
                            With Http
                                .Open "GET", fourthPageLink, False
                                .setRequestHeader "User-Agent", "Mozilla/5.0"
                                .send
                                HtmlLastDoc.body.innerHTML = .responseText
                            End With
                            With HtmlLastDoc.querySelectorAll(".six > p > b > a[href*='//vymaps.com/']")
                                For L = 0 To .Length - 1
                                    finalPageLink = "https:" & Replace(.item(L).getAttribute("href"), "about:", "")
                                    With Http
                                        .Open "GET", finalPageLink, False
                                        .setRequestHeader "User-Agent", "Mozilla/5.0"
                                        .send
                                        HtmlFinalDoc.body.innerHTML = .responseText
                                    End With
                                    oName = HtmlFinalDoc.querySelector("h1[itemprop='name'] > a").innerText
                                    oAddress = HtmlFinalDoc.querySelector("td[itemprop='address']").innerText
                                    oCoordiname = HtmlFinalDoc.querySelector("td[itemprop='geo'] > a[href]").innerText
                                    R = R + 1: ActiveSheet.Cells(R, 1) = oName
                                    ActiveSheet.Cells(R, 2) = oAddress
                                    ActiveSheet.Cells(R, 3) = oCoordiname
                                Next L
                            End With
                        Next F
                        Stop '---------kick it out later when you decide to execute the script for long
                    End With
                Next N
            End With
        Next I
    End With
End Sub
 
you were lightning


unfortunately, I can't prove the code right now because I've been out of computer and I can't move from home for coronavirus.


i don't understand. Is it possible to select the state and the category or to extrapolate everything automatically?

Thanks
 
The attached image is what I meant. Moreover, you need proxies to get the whole thing done as the site will ban your ip address after certain attempts. Thanks.
 

Attachments

  • Untitled.jpg
    Untitled.jpg
    3.4 KB · Views: 6
One more thing, you once showed interest to parse data from google using vba, right? What I could figured that you can achieve the same if you wish to go for www.bing.com search engine. The result will vary but not much.

hi, Shahin


yesterday you told me it's best to do the bing maps scrapping.


Have you tried that before?
 
I find it very efficient. Just give it a shot. Bing doesn't block you, so you are safe to keep going.
Code:
Sub GetLinksFromBingSearch()
    Const URL$ = "https://www.bing.com/search?q="
    Const base$ = "https://www.bing.com"
    Dim Http As New XMLHTTP60, HTML As New HTMLDocument
    Dim searchStr$, I&, R&, Link$, nextPage As Object
    Dim itemcheck As Object
    
    'put the search term within the following variable to get the first link extracted
    searchStr = "mr excel"
    searchStr = Replace(searchStr, " ", "+")
    Link = URL & searchStr
    
    While Link <> ""
        With Http
            .Open "GET", Link, False
            .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"
            .send
            HTML.body.innerHTML = .responseText
        End With
        
        Set itemcheck = HTML.querySelector("h2 > a[href]")
        If itemcheck Is Nothing Then Exit Sub
        
        With HTML.querySelectorAll("h2 > a[href]")
            For I = 0 To .Length - 1
                R = R + 1: Cells(R, 1) = .Item(I).innerText
                Cells(R, 2) = .Item(I).getAttribute("href")
                If R = 200 Then Exit Sub
            Next I
        End With
        
        Set nextPage = HTML.querySelector("a[title='Next page'][href]")
        If Not nextPage Is Nothing Then
            Link = base & Replace(nextPage.getAttribute("href"), "about:", "")
        End If
    Wend
End Sub
 
i'm curious to try this Macro as soon as possible.


can you extrapolate all contact data (name, full address, telephone, type of activity and geographical coordinates) How can you do that with Google Maps?


you've become the phenomenon of demolition
 
Hi.
I put the code in a form but it doesn't work. Should I activate specific references?
Then is it possible to insert the search object (or more search objects) in a column of an excel sheet instead of having to insert it every time in the code?
 
Back
Top