Job Recruitment Website - Zhaopincom - Which is better for writing reptiles, java or Python?

Which is better for writing reptiles, java or Python?

Python, of course Generally speaking, we all say Python reptiles, and reptile engineers all use Python language.

The unique advantage of Python is the key to writing reptiles. 1) is cross-platform and has good support for Linux and windows; 2) scientific calculation and numerical fitting: Numpy and Scipy3) visualization: 2D: Matplotlib, 3D:Mayavi 2;; 4) Complex network: Networkx, scrapy crawler; 5) Interactive terminals and websites have developed rapidly.

There are three ways to capture information using Python:

1, regular expression. The implementation steps are divided into five steps: 1) deploy an html webpage on the tomcat server; 2) use URL to establish contact with web pages; 3) acquiring an input stream for reading the content in the webpage; 4) establish rules; 5) Put the extracted data into the collection.

Beautiful voice.

Meitang supports various html parsers, including python's own standard library and many other third-party library modules. One of them is the lxml parser. With the help of the structure and attributes of the web page, we can extract an element from the web page in a few simple sentences without writing some complicated regularization.

3. Lxml .Lxml is Python's parsing library, which supports parsing of HTML and xml, xpath parsing and has high parsing efficiency. Lxml mainly solves three problems: 1) There is an xml file, how to parse it; 2) If the tag is found and located after parsing; 3) How to operate the tag after positioning, such as access attributes, text content, etc.

Regular expressions are more suitable when the web page structure is simple and you want to avoid additional dependencies (no need to install libraries). When there is a small amount of data to grab, you can also use a slower BeautifulSoup. Lxml is the best choice when there is a large amount of data and efficiency is sought.

Crawler is a simple and easy-to-use technology. Maybe you can capture the data on a single web page by reading the document. But for large reptiles, it is not as simple as 1*n, so many companies are recruiting Python elites with high salaries.