• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

scrapping data from specific site

claudia80

Member
after 2 hours this message came out:

run-time error '-2146697205 (800c000b)'

time out for the operation


The macro, however, has extrapolated many results and I would say that it's all right.
I await the changes by shahin for the conclusion of the macro.
Thank you
 

Marc L

Excel Ninja
2 hours for how many results ?!
(As I think it should need more than 10 days to grab all ‼)

So you yet know it's not viable under VBA, not the appropriate tool
or maybe the website took you out : try to run it again for one or
two hundred pages to see if your IP is now not allowed anymore
(at least for a day) …
 

claudia80

Member
The time necessary for the extraction is not important to me. I have extracted 2400 results.
In the research I put from 1 to 3000 before .. not 1000.
 

Marc L

Excel Ninja
So if your 2400 results are grabbed in 2 hours,
it means my first VBA estimation in 3 times under
as it should need a month :eek: to scrappe all the data
and only if no issue ‼ I hope too it's very not on any laptop
and you have backups of all your important data …
 

claudia80

Member
I used a laptop. I didn't understand the phrase about backing up my PC's data. Do you think my PC would go haywire if I use it for a month? I'll use the macro on the PC of the work and I will not have time problems apart from the fact that I can change the IP if there were to be blocks and I can exploit the connection is pc of friends. In this way the time would be reduced.


This data doesn't have a commercial end, I'm not in a hurry. I would need a macro that saves extracted data in addition to not deleting the data already extracted when it is restarted (the data must be pasted into blank lines).

If the macro allows you to also choose the state in which the schools are present I'll understand if there will be data leaks. You do well to give me many tips and I thank you for this, but since yesterday you could at least fix the macro, I would have already done two days of extraction
 

Marc L

Excel Ninja

As codes was starters upon your incomplete original post …
Maybe the helpers have no more time to waste,
if only the original post was complete & crystal clear
with an expected workbook attachment …

As starter, it's up to you to mod it to your convenience …
It's a VBA forum and not a do-my-job.com !

Or you must wait until someone may have time …
 

Marc L

Excel Ninja
Do you think my PC would go haywire if I use it for a month?
I already see some laptops burned just running all night long …
Even one really burned : a guy leaving it in its garage,
its brand new car was a little damaged too,
its shiny red turned brown to black on a side and
its company insurance refused to take this incident in charge !
Sometimes it's just the hard disk which failed, or Windows,
no boot anymore and the automatic repairing erased all data …
 

claudia80

Member
Cabbage! I too lost a laptop 2 years ago because I gave it 1 month to do scraping. I kept it on all day :(
for your post to the number 31 you are right that everyone contributes voluntarily according to their own time possibilities. It is also true that it is a pity to leave an incomplete code. Even those who have intervened in writing its code will be curious to understand how it should be written better.
 

claudia80

Member
No as the incomplete is the original post … As a reminder :
The better original explanation & attachment, the better & quicker solution !
As I do not think any helper has more time to waste on this subject,
better is you first try to load the first 1000 pages
and revert how many time it needs on your end …​

If you see well in the first post I indicated what data to extract (I only omitted the geographical coordinates that I saw only later in the GPS form).

YasserKhalil understood well and also added column installations. In its macro it was enough to insert in a column the extraction status and the geographical coordinates in addition to the problem of the following pages and it would have sufficed.I also reiterated that the macro of shahin is good also extracting the geographical coordinates.

I could very well use the shahin macro if the missing part is added. I could proceed by state by directly inserting the link in the macro (as in the case of Afghanistan):

https://maps.me/catalog/education/amenity-school/country-fgnstn/

The ideal given the problem of blocking by the site, would be able to specify the status and the number of pages to extract beyond the time of operation of the macro (so I run it at night) as in the attached file.For the rest I repeat that for me it is enough to have the complete shahin macro.
I could then break down the data with formulas ..
 

Attachments

claudia80

Member
It would also be ideal for the macro to indicate the extraction page in a column and that the file will be automatically rescued at the end of the process.
What would be more improvements?
 

Marc L

Excel Ninja
As I have a doubt on your website try a new test :
compare the time to grap the pages #1 to 10
versus the page #51001 to 51010 …

As this website yet crashes even manually surfing on a country,
you will have to create a parameters worksheet
where you have to manually enter the country website address
and the page # to start from even if I believe re-starting from
a page # upper to the limit won't work …

This particular slow website is a mess, you may hope scraping
complete data for smaller countries but not on bigger ones …
(Maybe you already found the limit of pages near 2400 !)
 

Chihiro

Excel Ninja
Maps.Me is bit too messy. Quite a bit of junks in their data/Edits. Since it's data is "Open Source" contribution, which are made by App users...
 

Marc L

Excel Ninja

Yes, with this poor slow website I won't ever try to waste time to parse
all the parts of each element, all text within an unique cell per element …
 

claudia80

Member
Hello guys.
The site will also be slow and have all its faults but I could still extract the results over time. I'm not doing an extraction race ...
If you also extract the page number I could manually enter the link for each state as I proceed to the extraction. In this way the results to be extracted will be reduced and for the states that have so many results, I will be able to divide the pages into more extractions, indicating the pages to be extracted.
Since there are no other solutions to use the macro, I only need:
1) the part of the code for the extraction of the website;
2) indication of the page for each line;
3) possibility to save the file when the extraction is finished.
4) fill in the blank lines every time the macro is started.
The important thing is to achieve a result even if everything is twisted ...
 

Chihiro

Excel Ninja
FYI - My comment is more directed at "quality" of the content in Maps.Me, since there's little to no oversight for what content ends up on it. So there will be a lot to junk along with valid data.
 

claudia80

Member
Hello Mr. Chihiro. I do not think there are any better solutions to follow.
Meanwhile, I found this website and I have nothing else. Not even the contacts of this site if nobody compiles the missing part :(
If you want you could also participate in the compilation of the missing code. It would be a pleasure.
Meanwhile, thank you for what you have written so far.
 
Last edited by a moderator:

claudia80

Member


And which code did you use when you got 2400 results in 2 hours ?
I used the shahin code. I could use it for the whole site but I would not have the data of the website if there is because the shahin code does not extract it.
I hope to be made understand :(
 

Marc L

Excel Ninja
I used the shahin code.
So I was wrong for the duration : in an utopian world
it should need less than 48h …
Optimizing in a parallel way it could be faster.
As the website yet crashes when operating manually,
better is to start from a country but I think you could not grab
much more than when the first upload stops …
So if you need much more data, better is to move on
another website like Chihiro yet stated.
 

claudia80

Member
I can tell you that there are no lists available for all states, for this I insist on this site. I found only 50 - state contacts.
 

claudia80

Member
Hello to all.
I tried to analyze the code of the page and that of the macro and I tried to add the part of the code to extrapolate the website.
I added this part of the code to queslla of shahin but it does not go :(



Code:
If Not Html.querySelector("p.item__desc-phone span") Is Nothing Then
                    Cells(R, 3) = Split(Html.querySelector("p.item__desc-phone span").innerText, "Phone: ")(1)
                End If
            If Not Html.querySelector("p.item__desc-url span") Is Nothing Then
                    Cells(R, 4) = Split(Html.querySelector("p.item__desc-url span").innerText, "url: ")(2)
                End If
                Next I
        End With
    Next pagenum
End Sub
 

Marc L

Excel Ninja

First error in red : "url: "

Do you see this red text in the webpage display of any school ?!
Compare with the phone part …
 

claudia80

Member
First error in red : "url: "

Do you see this red text in the webpage display of any school ?!
Compare with the phone part …
in the code of the web page it is written in the two versions: url and website.
before posting the message I changed it and it did not work the same ..

I seem to have understood that we must use this text in the code: href =
I do not know where and how though ..
 
Last edited:
Top