• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

scrapping data from specific site

I never wrote that as I just wrote I N I T I A L I Z E so
at the beginning of the procedure before the first loop.
By chance here it's not a concern as recalculating each time the
last used row from the last row of the worksheet (# 1 048 576)
instead of just adding 1 to the previous row …
You have written a series of things that I do not understand because I do not know these terms "loops" etc.
What would you say with initialize? I did not enter the code in the right place? I have tested and it works :)
 
Perhaps more important than the time should be the inclusion of a column with the indication of the membership page. So if the macro freezes I know from what page to resume.
But this extraction seems difficult because I saw the code on the page and it's completely different :(
 
Good morning.
I tried the macro for a couple of hours and I extracted about 10,000 results without having blocks. probably the block was not due to the site but to the macro that takes so long in processing many pages (I entered 1000 pages).
To extract the data, for each State, I entered the link of the State and the total number of pages within the macro. The division of the extractions by status plus the possible pause in the insertion of the link and the number of pages in the macro perhaps helps to avoid blockages.
I believe that if you can automate the inclusion of the state in the search and requiring to extract all pages (in case of any problems you should divide the total pages), you should not have problems.
 
the block was not due to the site but to the macro that takes so long in processing many pages
As manually surfing this website is yet a mess : when you created this thread, I first surfed on the website so manually and it crashed after
a few minutes ! The reason why I didn't start to code anything …

But if you think it's the code, as VBA is not the right tool,
so just move to another way …

I believe that if you can automate the inclusion of the state in the search and requiring to extract all pages (in case of any problems you should divide the total pages), you should not have problems.
As several times I yet wrote it just needs a parameter worksheet
where you manually copy the webpage URL for a particular country
and the page # to start from …
If you want to "automate" it just create in a parameter worksheet
a table with the URL for countries which not have an exact match
with their own webpage URL like you can see for Afghanistan …
 
I did not say that the macro is the problem.

Maybe when you insert so many pages, the macro takes a long time to do the processing and crashes. I tried it with the generic link:

https://maps.me/catalog/education/amenity-school/

and I put a number of pages from 1 to 2000

If you do a search by state, it's okay.
use the macro inserting this link and start it putting the total number of pages for this state (179):

https://maps.me/catalog/education/amenity-school/country-costa-rica/

You'll see that it's OK even if you think it will not be immediate.
If it were possible to set an automatic selection of each state automatically, it will be possible to extract everything.
 
As several times I yet wrote it just needs a parameter worksheet
where you manually copy the webpage URL for a particular country
and the page # to start from …
If you want to "automate" it just create in a parameter worksheet
a table with the URL for countries which not have an exact match
with their own webpage URL like you can see for Afghanistan …
I thought it was possible to copy and paste the name of the state (as you can do manually by double clicking inside the state space).

I will try to copy and link on an excel sheet all the links.

Thank you
 
Oh, you loaded the complete URL instead of the variable part
but ok (easier for beginners) why not that's not a big deal, whatever …
So you can update your post #35 attachment in order
to use in a cell a VLOOKUP formula to read from the country cell
under data validation the URL and its result will be used in the procedure.

A reminder : as a country yet disapears and may reappear in the future,
when you launch the procedure any new country added will not be
in the result and any country deleted will crash the procedure.

As the countries list is not on the original webpage code so it can't
be loaded by the request method shared by shahin & Yasser,
if you want to automate it under VBA you must pilot
Internet Explorer webbrowser (2 to 10+ times slower than a request,
may work on a computer but not on another one)
and can be a mess sometimes with this kind of web combobox.
The reason why since the beginning I wrote VBA is not the appropriate tool …
 
I followed your advice in extracting the links and I removed the table showing the time and inserting the status (because I would not take your time away from your things and the solution of the other posts). I could leave the table of the state / time operating options of the macro, inserting the link extracted today instead of the state (just change column of the list). I follow your advice ...
If it is as difficult as you wrote I could proceed by hand. I could try to understand how to change a macro with some tips but creating one from scratch would be difficult for me. If you have patience I could also put myself to learn, but it will take a lot.
 
if I manually insert the link you wrote to me it will not work. If you want to know the macro freezes even if I insert a number of pages higher than those indicated (maybe it's a normal thing).
Now I started but macro inserting the link of the united states and I started it (11:15 am). Let's see how it goes and at what time it ends.
 
Higher is very not a good idea …

My 2 cents :
how can a procedure work when manually the website yet crashes ?‼
So you can grab all data from smaller countries but only a part of bigger ones
as I had same issue with smaller countries than USA,
this website is crappy, the worst I have ever seen …
 
Marc. You are completely right. There are problems when you indicate the number of pages within the link because it is not possible to extract data from links that have many pages.
I will take what is possible. The rest will proceed in a different way.
 

Yes I know I'm right as the first time I surfed this website
- the day when you created this thread -
it directly crashed when I clicked on any bottom page round button !
Forget this website …
 
I could still extrapolate most of the data.
Do not you think?
If there were similar sites it would be better, however, I do not think so.
 
Extrapolate ?‼ :confused: Are you a medium ?! :rolleyes:

Issue seems for the bigger countries with more than 1 800 pages,
maybe even less …

I checked my town : nothing ! Even the well known university.
I checked others from the famous ones : lacks or errors in data.
So the confidence index of this website is close to zero …
 
Okay, so I'm abandoning the idea of using this site.
If This is an open source site, how does it get the geographical coordinates of each institution?
 
As Chihiro yet stated :
Since it's data is "Open Source" contribution,
which are made by App users...
So it's seems it's people creating entries
without any control for a duplicate, an error, etc …

As my country has more than 2 000 pages,
check yours with the places you well know …
 
I understood that there is no control. But I ask you, since you know so many things, if there is freedom in inserting the name of an institution, the way and the link, how can a person, know and insert the geographical coordinates of a place? It seems strange that all the institutions have geographical coordinates.
If the site is badly created I do not think that they will have a system that automatically calculates the geographical coordinates starting from the street entered by a user.
what do you think about it?
 
Maybe as it seems the entries are created via smartphones
so the coordinates are automaticly added …
What a mess these coordinates are not directly on the main page !

Check your place, where you know some schools
as your country has less than 800 pages (mine more than 2 300)
 
The macro also freezes with 700 pages :(
the site is really bad ..
For geographic coordinates do you think that google is the only site from which it is possible to get them?
I found this site (https://www.latlong.net/convert-address-to-lat-long.html) and it seems that it uses google's api. What do you think about the reliability of this site and ability to use it with a macro?
 
Last edited:
The macro also freezes with 700 pages
Try to restart from page #701 …

For geographic coordinates do you think that google is the only site from which it is possible to get them?
I found this site (https://www.latlong.net/convert-address-to-lat-long.html) and it seems that it uses google's api. What do you think about the reliability of this site and ability to use it with a macro?
Any geographic website can display the coordinates …
Comparing with coordinates from G.Maps it's pretty close.
For the reliability compare for example with the GPS of your car
or from coordinates you already know …
 
Back
Top