dimanche 28 juin 2015

Python, scrapy: Unable to paginate dynamically generated links on website, using the link got from firebug

I am a complete newbie at python, scrapy and web scraping, this being my first learning project. I want to scrape multiple pages from this website using scrapy: http://ift.tt/1C0cyxT

The links seem to be generated using ajax. At the end of the page is the link to next page. Clicking on <2> or and checking the link generated on firebug, shows following request being generated:

GET directory?p=2&category=1&map[disable]=0&map[height]=500&map[list_height]=500&map[span]=5&map[style]=&map[list_show]=0&map[listing_default_zoom]=15&map[options][scrollwheel]=0&map[options][marker_clusters]=1&map[options][force_fit_bounds]=0&distance=0&is_mile=0&zoom=15&perpage=16&scroll_list=0&feature=1&featured_only=0&hide_searchbox=0&hide_nav=0&hide_nav_views=0&hide_pager=0&template=&grid_columns=4&sort=title

So I thought, in my limited understanding, that if i replace p={pagenum} with any page number, that should get me the required page. I tried using the following url to directly request for the page:

http://ift.tt/1CDpnZt

However, this link generates an error page saying "page not found".

Can anyone help me understand what am I doing wrong here?

Thanks for your guidance.

Aucun commentaire:

Enregistrer un commentaire