Job Recruitment Website - Job information - Machine learning library of Python language
Machine learning library of Python language
Python is one of the best programming languages, which is widely used in scientific calculation: computer vision, artificial intelligence, mathematics, astronomy and so on. Not surprisingly, it also applies to machine learning. Of course, it also has some shortcomings; One of them is that tools and libraries are too scattered. If you are a unix-minded person, you will find it very convenient for each tool to do only one thing and do it well. But you also need to know the advantages and disadvantages of different libraries and tools, so that you can make reasonable decisions when building the system. Tools themselves cannot improve systems or products, but with the right tools, we can work more effectively and have higher productivity. Therefore, understanding the right tools is very important for your work field.
The purpose of this article is to list and describe the most useful machine learning tools and libraries in Python. In this list, we don't require these libraries to be written in Python, as long as there are Python interfaces. Finally, we have a deep learning section, because it has attracted a lot of attention recently.
Our goal is not to list all the machine learning libraries in Python (Python Package Index (PyPI) returns 139 results when searching for "machine learning"), but to list those libraries that we know are useful and well maintained. In addition, although some modules can be used for various machine learning tasks, we only list the libraries that mainly focus on machine learning. For example, although Scipy contains some clustering algorithms, its main focus is not machine learning, but a comprehensive scientific computing tool set. So we ruled out Scipy (although we also use it! )。
Another thing to mention is that we will also evaluate these libraries according to the integration effect with other scientific computing libraries, because machine learning (supervised or unsupervised) is also a part of the data processing system. If the library you use does not match other libraries in the data processing system, you will spend a lot of time creating the middle layer between different libraries. It is important to have a great library in the toolset, but it is also important that this library can be well integrated with other libraries.
If you are good at other languages but want to use Python packages, we also briefly describe how to integrate with Python to use the libraries listed in this article.
Scikit-LearnScikit Learn is the machine learning tool we choose in CB Insights. We use it for classification, feature selection, feature extraction and aggregation. What we like best is that it has an easy-to-use consistency API and provides many ready-made evaluation, diagnosis and cross-validation methods (sound familiar? Python also provides a "battery ready" method. The icing on the cake is that it uses Scipy data structure at the bottom, which is well adapted to the rest of Python which uses Scipy, Numpy, Pandas and Matplotlib for scientific calculation. Therefore, if you want to visualize the performance of the classifier (for example, using precision recall charts or receiver operating characteristics (ROC) curves), Matplotlib can help you visualize it quickly. Considering the time it takes to clean up and build data, it will be very convenient to use this library because it can be closely integrated with other scientific computing packages.
In addition, it also includes limited natural language processing feature extraction ability, as well as word bag, TFI DF ($ TERM frequency inverse document frequency algorithm) and preprocessing (stop words/stop words, user-defined preprocessing, analyzer). In addition, if you want to quickly perform different benchmark tests on a small toy data set, its own data set module provides a common and useful data set. You can also create your own small data set based on these data sets, so that you can test whether the model meets your expectations according to your own purpose, and then apply it to the real world. It also provides grid search and random search for parameter optimization and parameter adjustment. None of these functions can be realized without strong community support or poor maintenance. We look forward to its first stable version.
StatsmodelsStatsmodels is another powerful library focusing on statistical models, which is mainly used for predictive and exploratory analysis. If you want to fit a linear model, do statistical analysis, or do predictive modeling, then Statsmodels is very suitable. The statistical tests it provides are quite comprehensive, covering the verification tasks in most cases. If you are a user of R or S, it also provides some R grammars of statistical models. Its model also accepts Numpy arrays and Pandas data frames, making the intermediate data structure a thing of the past!
PyMCPyMC is a tool for making Bayesian curves. It includes Bayesian model, diagnostic tools for statistical distribution and model convergence, and some hierarchical models. If you want to do Bayesian analysis, you should take a look.
Shogun is a machine learning toolbox (SVM) focusing on support vector machines, written in C++. It is being actively developed and maintained, and provides Python interface, which is also the best documented interface. However, compared with Scikit-learn, we find its API more difficult to use. In addition, there are not many out-of-the-box diagnosis and evaluation algorithms. But speed is a big advantage.
GensimGensim is defined as "modeling human topics for human beings". It is described on its homepage, focusing on potential Dirichlet distribution (LDA) and its variants. Unlike other Bao Butong, it supports natural language processing, and can be more easily combined with NLP and other machine learning algorithms. If your field is in NLP and you want to do aggregation and basic classification, you can have a look. At present, they have launched Google's text representation word2vec based on recursive neural network. This library is written only in Python.
Of all the libraries listed in this article, Orange is the only one with a graphical user interface (GUI). The methods of classification, aggregation and feature selection are relatively comprehensive, and there are some cross-validation methods. It is better than Scikit-learn (classification method, partial preprocessing ability) in some aspects, but it is not as good as Scikit-learn in matching with other scientific computing systems (Numpy, Scipy, Matplotlib, Pandas). However, including a GUI is an important advantage. You can visualize the results of cross-validation, model and feature selection methods (some functions need to install Graphviz). For most algorithms, Orange has its own data structure, so you need to package the data packet into an Orange-compatible data structure, which makes its learning curve steeper.
PyMVPAPyMVPA is another statistical learning library, similar to Scikit-learn in API. Including cross-validation and diagnostic tools, but not as comprehensive as Scikit-learn.
Although deep learning is a sub-part of machine learning, the reason why we create a separate part here is because it has recently attracted a lot of attention from Google and Facebook recruitment departments.
TheanoTheano is the most mature deep learning library. It provides a good data structure (tensor) to represent the number of layers of neural network, which is very efficient for linear algebra, similar to Numpy array. It should be noted that its API may not be very intuitive, and the learning curve of users will be high. There are many libraries based on Theano that use its data structure. It also supports GPU programming out of the box.
PyLearn2 also has another library based on Theano, PyLearn2, which introduces modularity and configurability to Theano. You can create neural networks through different profiles, so it is easier to try different parameters. It can be said that if the parameters and attributes of neural network are separated into configuration files, its modularity is more powerful.
DecafDecaf is a deep learning library recently released by the University of California, Berkeley. It is found that its neural network implementation is very advanced in Imagenet classification challenge.
Nolearn If you want to use the excellent Scikit-learn library API in deep learning, Nolearn encapsulated with Decaf will make it easier for you to use. It's decaffeinated packaging, and it's compatible with Scikit-learn (mostly), which makes decaffeinated even more incredible.
Super food is the winner of the recent cat and dog challenge. It is written in C++ and includes a Python wrapper (as well as Matlab and Lua). GPU is used through Torch library, so it is very fast. It also won the detection and positioning challenges of ImageNet classification. If your field is computer vision, you may need to take a look.
HebelHebel is another neural network library supported by GPU, which works out of the box. You can determine the properties of neural network through YAML file (similar to Pylearn2), which provides a friendly way to separate sacred network from code and can run the model quickly. Due to the short development time, the literature is scarce in depth and breadth. As far as the neural network model is concerned, it is also limited because only one neural network model (feedforward) is supported. However, it is written in pure Python and will be a friendly library, because it contains many useful functions, such as scheduler and monitor, which we have not found in other libraries.
NeurolabNeuroLab is another API-friendly (similar to Matlabapi) neural network library. Different from other libraries, it contains different variants of recurrent neural network (RNN) implementation. If you want to use RNN, this library is one of the best choices among similar APIs.
Integration with other languages You don't know Python but are good at other languages? Do not despair! One of the advantages of Python (and other languages) is that it is a perfect glue language, and you can access these libraries through Python using your general programming language. The following packages suitable for various programming languages can be used to combine Python with other languages: r->; RPythonMatlab-& gt; matpython Java-& gt; JythonLua-& gt; Julia the Crazy Python-> PyCall.jl
Inactive Libraries These libraries have not released any updates for over a year. We list them because you may be useful, but these libraries are unlikely to be fixed, especially in the future. MDPMlPyFFnetPyBrain If we missed your favorite Python machine learning package, please let us know in the comments. We are happy to add it to the article.
- Previous article:Bookstore address in Quanzhou, Fujian
- Next article:What about Kunming Bao Kun Wire and Cable Manufacturing Co., Ltd.
- Related articles
- How about the Guangzhou Branch of Zhongankang Logistics Group Co., Ltd.?
- A middle school teacher's salary exposure, seconds kill programmers, teachers and workers are all so high?
- Huichuan servo background software password
- 20 19 is the entrance for compiling teachers' written test scores open in Huiyang district? How long does it take to prepare for the interview?
- Accommodation conditions for employees of Zhejiang Hengyi Group
- Shanghai Han Yun Printing Technology Co., Ltd. Recruitment information, how about Shanghai Han Yun Printing Technology Co., Ltd.
- What steps does the company need to release recruitment information in Zhaopin?
- Will the small I robot go bankrupt?
- (9) Learning content: Both undergraduates and junior colleges need to choose a major to learn relevant professional skills. Undergraduate studies are more comprehensive, deeper and richer in content;
- How about Tibet Linzhi Xinyuan Hotel Co., Ltd.?