Job Recruitment Website - Zhaopincom - Python crawl
Python crawl
The traditional crawler starts with the URL of one or several initial web pages and obtains the URL on the initial web pages. In the process of crawling web pages, it constantly extracts new URLs from the current page and puts them in the queue until the system meets certain requirements.
Stop condition. The workflow of focused crawler is complex, so it is necessary to filter out links irrelevant to the topic according to certain web page analysis algorithm, keep useful links and put them in URL queue for crawling. Then, it will take root and sprout
According to a certain search strategy, the URL of the next page to be crawled is selected from the queue, and the above process is repeated until a certain condition of the system is reached.
2. Basic design concept
As you said, first go to the Weibo login page to simulate login, grab the page, find out all the URLs from the page, select the text descriptions of the URLs that meet the requirements, simulate clicking on these URLs, and repeat the above-mentioned crawling actions until you meet the requirements and exit.
3. Existing projects
There is a project on the google project website called sinawler, which is a special Sina Weibo crawler to grab Weibo content. You can't go online, you know. However, you can check Baidu's "Sina Weibo Crawler Written in python (see New Weibo for the current login method)", and you can find the reference source code, which is written in python2. If you write in python3, you can actually use urllib.request to simulate and build a browser with cookies, which saves the processing of cookies and makes the code shorter.
4. In addition,
Look at the Baidu Encyclopedia of Web Crawlers, which contains many in-depth contents, such as algorithm analysis and strategy system, which will be of great help and theoretically improve the technical level of the code.
- Previous article:What about Qingdao Haiyinode Industrial Automation Co., Ltd.?
- Next article:Why do SF warehouse keepers need junior college?
- Related articles
- How to apply for Xiamen Zaolong breakfast car?
- What is the enrollment rate of Shandong teachers' recruitment?
- What skills can I learn to find a job in Shandong?
- What is the telephone number of Chongqing Construction Engineering Second Construction Co., Ltd.?
- Where is PG?
- Why do some state-owned enterprises also contract now, and so many people want to go in?
- Recruitment of golf caddies in Beijing
- Salary of Xiamen Maxiang Li Hao Technology Co., Ltd.
- How to do a good job in technical management of motor vehicle maintenance
- Is Baotou University Town Vocational Education Park good?