Job Recruitment Website - Zhaopincom - How Excel Crawls Web Data JSON Data Crawling
How Excel Crawls Web Data JSON Data Crawling
When crawling a web page, you need to add title information to get the required data.
On the first page of search results, we can read the total number of positions from JSON and get the number of pages to be crawled according to 15 positions per page. Then, the position information is summarized by crawling page by page in a loop, and the output is in CSV format.
The program runs as shown in the following figure:
Grab the results as shown in figure:
Data cleaning accounts for most of the workload of data analysis. Looking for the position of "data analysis" in Shenzhen on the hook. We have 369 seats. When I looked at the job titles, I found that there were four internship positions. Because we are studying full-time positions, we will eliminate internship positions first. Because work experience and salary are both intervals in the form of strings, we first extract values with regular expressions and output them in the form of lists. Average work experience and quartile salary are close to reality.
4. Word cloud
We summarize the data in the work welfare column to generate a string, and generate a word cloud according to the word frequency to realize python visualization. The following is a comparison between the original picture and the word cloud. It can be seen that five insurances and one fund appear most frequently in work welfare, followed by platform, welfare, development space and flexible work.
5. Describe statistical data
It can be seen that the average value of data analyst is 14.6K, and the median value is12.5k. It is a promising career. Data analysis is scattered in various industries, but IT involves data mining and machine learning at the advanced level, and has made great progress in the IT industry.
Let's look at the distribution of wages, which is an important reference for job hunting:
The salary 10- 15K is the most, followed by the salary 15-20K. In my humble opinion, 10- 15K posts are mainly modeling, and posts above 20K are mainly data mining and big data architecture.
Let's take a look at the distribution of jobs in each district:
Data analysis positions are 62.9% in Nanshan District and 25.8% in Futian District, and the rest are distributed in Longgang District, Luohu District, Baoan District and Longhua New District. We can see that Nanshan District and Futian District are the centers of Shenzhen's science and technology industry.
We hope to get the relationship between salary, work experience and education. Because education is divided into three categories, we need to set three dummy variables: junior college, undergraduate and master. The results of multiple regression are as follows:
On the significance level of 0.05, the F value is 82.53, which shows that the regression relationship is significant. T-test and the corresponding P-value are both less than 0.05, indicating that the work experience and three kinds of education are statistically significant. In addition, the value of R square is 0.4 1, which shows that work experience and education only explain the wage variability of 4 1%. This is not difficult to understand, even if the positions are called data analysts, the actual work content is very different, some just use Excel for basic analysis, and some use Python and R for data mining. In addition, the size of each company and the salary it is willing to provide are different. However, due to the difference of work content and the generosity of the company, it is difficult to obtain actual data only through the promotion on the recruitment website, which leads to the fact that the goodness of fit of the model is not very good.
- Previous article:Is there 1.6 million social workers in Hangzhou Binjiang?
- Next article:What's the telephone number of Qinghai Jingke Energy Co., Ltd.?
- Related articles
- What about the new hope Liuhe Rong county pig farm?
- Does the Social Security Department of Guangzhou Railway Group have a branch in Huaihua?
- Qixian county traditional Chinese medicine hospital prepares wages and benefits
- Baicheng Chang 'an Airport Planning Route
- What is the annual salary of Alibaba directors?
- How long does it take to take a taxi from Zibo North Station to Zichuan Fire Brigade?
- Recruitment in Xia Jiaer
- How high is Mayaoling Mountaineering Park in Luwu Village?
- How about Beibei Liangjiang Primary School?
- What factories are there in Guiping Chang 'an Industrial Park?