Job Recruitment Website - Job information - Who is better in the battle between SAS and R?
Who is better in the battle between SAS and R?
Translation | JosephYX
List of information from SAS resources
abstract
Although R is still dominated by SAS in industry, it is widely used in academia because its free and open source features enable users to write and share their own applications. However, many students who are getting degrees in data analysis find it difficult to find jobs because of their lack of SAS experience. At the same time, they have to face the pain of transforming from R, which is familiar to school, to SAS. Ideally, you need to know all the possible programming languages and use the language that best suits your working environment. Of course, this is basically a daydream. Our aim is to show the respective advantages of these two very different languages and give full play to their advantages together. At the same time, we should also point out the misunderstanding and prejudice of some people who have not used SAS for many years and are now using R language, because they pay little attention to the development and progress of SAS.
order
We choose SAS and R because they are the two most mainstream programming languages in the field of statistics. Now we have noticed a bad phenomenon, that is, users who use R a lot in academic circles think that R has considerable advantages in the industry occupied by SAS, but mastering these two softwares is very important for young people who want to make small achievements in the field of data analysis. Professors' misunderstanding and preference for a certain software is often unfavorable to students. Here, it needs to be pointed out: Professors, don't be lazy. Subjective preference for a certain language will affect students' money.
SAS often has some updates (a little slow, sxlion pays attention), which non-SAS programmers often don't know because there is no technology to follow up. SAS drawing module is an example of rapid development and growth. However, many people don't notice these upgrades, so that they still stubbornly use R drawing. Another little-known example of SAS is that SAS can easily customize functions, which is R's strength. This SAS process step (PROC) has comprehensive grammar checking, detailed documentation and technical support; However, a new user may not know that these tools are available or even know that they exist. In addition, SAS also has excellent training courses, network and user group sharing resources, and a large number of books on different related topics. Understanding and rational use of these technologies and tools will help to reduce the fear of using SAS.
Discussion on related issues
This paper compares the advantages and disadvantages of the two languages according to some common misunderstandings encountered in our hospital. Of course, there are more arguments going on, but we will choose the most common one to discuss in this article. We hope to clarify the misunderstanding and provide new information for analysts who can't follow up R or sa in time.
New progress in statistical methods
Scandinavian airlines
Advantages: SAS software and algorithms have been tested, and SAS has technical support, which can quickly solve the needs of users. If necessary, SAS will try to embed new methods in existing steps, such as adding an option or adding a statement, so users do not need to learn another process step. SAS will also release the latest newsletter, explaining the software updates in detail.
Disadvantages: slow update and upgrade.
rare
Advantages: users can quickly implement new methods or find existing software packages. It is easy to learn and understand new methods, because students can see the functions in the code.
Disadvantages: The R document is updated by users, so the new method has not been well debugged and tested. Developers are scattered everywhere, rather than working together as a team.
On this issue, the advantages and disadvantages of SAS and R are complementary. For R, some people think that its code is open, and you can see how R works, which is easy for people with relevant backgrounds to understand. But for SAS, the process steps are pre-installed, and a large number of mathematical formulas are stored in the file for different statements and options. If the user really wants to see the underlying program, this is also easy to achieve. For users of two languages, whether students or other users, it is no different that the two languages just run code. When you run SAS, you do not need to know what it is doing. Similarly, when running R, you don't need to know the functions it calls in the background. All you have to do is follow the rules.
Drawing design
Scandinavian airlines
Advantages: SAS drawing module is more flexible, more complex and easier to use. In some analysis process steps (PROCs), ODS Graphics can automatically generate some graphs without additional code. This gives users one more choice, that is, they can use the default chart to generate charts, or they can create their own personalized charts.
Disadvantages: The template language (TL) behind graphics is huge and difficult to use, especially for beginners. New advanced functions, such as interactive graphics, are also difficult for beginners to master.
rare
Advantages: you can simply generate beautiful charts, or you can use loop statements to generate animations.
Disadvantages: In R, the chart function has nothing to do with statistical analysis, and drawing and analysis are independent of each other. Users must decide for themselves what kind of graphics are suitable, and the effect of using them depends on their own statistical background and preferences. Although it is not a simple matter to change the graphics to achieve a specific size or angle.
One of the main reasons why R is more attractive is that the chart function before SAS9.2 is insufficient. One of the best features of R is its high quality and ease of use. But at present, SAS/GRAPH combines ODS graphics and SG process, which increases the drawing ability in the software. The combination of ODS graphics and PROCS can make users simply generate display charts related to analysis. There are more and more specific drawing process steps, such as PROCSGPLOT, SGPANEL, SGSCATTER, and of course some code is needed to realize it. In addition, there are some other good drawing options in SAS, such as SGDESOGNER and SAS Enterprise Guide.
Functions and reusable code
Scandinavian airlines
Advantages: SAS has a large number of functions and custom functions, which can be used in DATA and PROC steps. Another powerful and omnipotent macro language can also be used by DATA step and PROC step. Macro variables can be defined as local or global types.
Disadvantages: Writing custom functions and detailed macro codes requires deep programming knowledge to ensure correctness.
rare
Advantages: It is very simple to write functions in R, and users can also upload their own functions to R-Crane to share with other users.
Disadvantages: Writing custom functions requires deep programming knowledge to ensure correctness. Variables are strictly local variables. At this point, the two softwares have similar advantages and disadvantages. Early users of SAS mainly rely on macro programming to run their own custom functions, which is why R users think it is inefficient and cumbersome. However, the SAS 9 version of PROC FCMP allows users to write personalized functions, while the SAS 9.2 version allows users to call these functions in data and PROC steps. This is very useful for simple statistical functions, and it can also be realized in IML language for more complex statistical functions.
Both SAS and R languages are faced with the problem of how to use functions effectively and correctly, which requires users to have a deep programming background in the process of function writing. From a good point of view, a programmer needs to know what he is writing; The danger is that others can download a SAS macro or P package to use, even if they don't know its internal working principle or even its correctness. Therefore, with a correct understanding of macros and functions, you can easily share them and apply them to specific needs.
freeware
Scandinavian airlines
Advantages: SAS has an on-demand version of the software, which is provided to degree-issuing institutions free of charge.
Disadvantages: Real SAS and JMP are not free. The OnDemand version has some restrictions on which operating system to use, and it is reported to be very slow.
rare
Advantages: R is completely free.
Disadvantages: Open source software has security risks for large companies.
The free substitutes provided by SAS company for teaching machines can ensure the use of teachers in class. Pay attention to the installation process and speed of OnDemand. In short, SAS and JMP are not free, and companies need licenses to use the software. R can be installed for free, but many bloggers who participate in the debate believe that if companies that are using SAS use R, they will spend resources and financial resources far beyond the cost of SAS licensing, such as rewriting code, forming new teams, recruiting new professionals and so on. And SAS may be more suitable for those companies whose analysis results need to withstand strict inspection requirements. Small companies without an existing analytical framework can discuss whether to choose paid software (SAS) with a long history and rich resources or software (R) that is free but requires other upfront investment (such as employee knowledge background, coding and debugging). Finally, from the perspective of time and money, the cost of SAS and R may be basically the same.
User support
Scandinavian airlines
Advantages: SAS has rich online reference materials, professional technical support, professional training courses, many excellent published books, and a close user group and online community. SAS problems can be directly reported to the technical support department, who will solve them together with the users.
Disadvantages: I really didn't expect it.
rare
Advantages: R has a good sample manual, online reference materials, R mailing list and R party.
Disadvantages: users rely on other users' opinions and suggestions on the software. Because the developers of R are scattered all over the world, users all over the world lack contact. The package was not written by the R software development core team, which led to the imperfection of the program and sometimes even doubted the correctness of the results. In addition, it is difficult to find a person or team directly for specific problems.
The outstanding support provided by SAS is the highlight of its customer-centric design products. The advantages of SAS support are very suitable for novices, and its abundant details also benefit experienced users. R Confused reference materials and lack of technical support make it difficult to find help. This goes against the original intention of R's developers and designers.
data processing
Scandinavian airlines
Advantages: SAS can handle data of any type and format. DATA step is designed purely for data management, so SAS is good at processing data. With rich options, SAS can handle big data well, and spelling tables and PROC SQL can also reduce running time.
Disadvantages: In the data step, the data step in SAS has an implicit loop algorithm, which needs to change the user's programming thinking to conform to the operating logic of SAS.
rare
Advantages: R was originally thought to be more suitable for big data. This is very effective for matrix operation and sorting design. R can also simulate various data based on analysis.
Disadvantages: R design pays more attention to statistical calculation and drawing functions, so data management is time-consuming and not as clear as in SAS. One of the main reasons is that it is difficult to master a good data processing in R for different types of data.
The importance of data processing is often overlooked in statistical programming, but it is really critical because the actual data is too bad to be directly applied to analysis. Students who only use R often have unrealistic expectations for the data they get, and learning SAS is an effective way to solve how to sort out the original data. SAS can manage and analyze large and complex data sets, while R focuses more on analysis.
When dealing with complex data, R's object-oriented data structure will encounter many problems, and R still lacks an internal circulation process. In SAS, standardized tools are usually used to merge complex data sets with a large number of missing data, and then the variables in them are generated and modified. However, in R, there is no standardization for complex data processing operations, and it often leads to more complicated processes.
The running time comparison between SAS and R software depends on the task. For example, SAS can use memory (not hard disk) like R to improve the running speed by setting MEMLIB. But in R, there is no such hardware driver, and it can only be executed by memory.
fixed
Scandinavian airlines
Advantages: All analysis functions and authorizations of SAS are packaged and installed as a whole. Upgrading the authorization certificate is very easy.
Disadvantages: It is time-consuming and troublesome to install or upgrade to a new version for the first time. But this is simpler than telling students how to use this software in class 1000 times. Now more and more students use Mac notebooks in class, but there is no Mac version of SAS, which means that these students will not use SAS.
rare
Advantages: R and its most commonly used user interface RStudio are easy to install and open in Window, Mac and Unix environments, and the installation speed is very fast.
Disadvantages: To run, you must know the packages that meet the requirements, and then search, install and understand the specific functions. By the time the original was published, there were 4,379 packages available, and this number is increasing every day. While providing more choices, it also increases the time and difficulty of search.
The acquisition of SAS is difficult for users, and the initial installation is also troublesome. But once the installation is completed, there are few problems in the software itself, and no extra packages or steps are needed for special analysis. In R, on the other hand, the installation is very simple, but for additional analysis, additional packages need to be installed, which wastes the time saved by installing the software.
Report form
Scandinavian airlines
Advantages: SAS generates detailed and beautiful reports through many useful process steps.
Disadvantages: you can provide more detailed reporting processes, such as tabulation and reporting, and you will have a difficult learning curve to cross before using it correctly and effectively.
rare
Advantages: In terms of reporting, R has many sharp tools. Sweave package can create PDF files containing text, tables and graphics, in which graphics can be decorated with LaTeX and r commands. Another new software package, Knitr, can quickly generate web content with less format restrictions.
Disadvantages: R has no pattern to generate reports, and needs to spend some time on programming. For R, report generation is a relatively new direction, so it is not as simple and fast as SAS. In R, Sweave and Knitr are the leading packages in this field, but they are also difficult to learn.
Users who use a lot of reports should understand these differences. Although it takes some time to learn the reporting function of SAS, once they master it, it is very valuable and flexible. It may not take as long as SAS to learn R's report function from the most basic.
conclusion
We can see that solving the dispute between R and SAS is a trinity. First of all, as in any statistical programming community, we know that there is no final winner in this PK. These two softwares have their own advantages and disadvantages. It is necessary for them to coexist, as well as in academic teaching. If students can clearly understand their own needs and use them reasonably, they will get better results. If only one kind of software is taught to students, it will be limited, which will make it difficult for them to play their potential in learning another kind of software. Secondly, users need to keep their toolboxes up to date. Both SAS and R have some excellent learning websites to introduce the latest technological progress. There are many latest developments on the SAS technical support website, such as focus areas, electronic news, RSS feeds and blogs. The R blog site contains many news and exercises provided by users. Thirdly, it is ideal to learn two kinds of software and integrate them into the analysis. There are many ways to try, such as using IML of SAS and IML/Studio of SAS (IML is an add-in of SAS), or using SAS X statement to execute external commands, so that R code can be converted into SAS commands to run in SAS. For users of R, by changing the user interface from R to SAS, two kinds of software can be used at the same time. Using two kinds of software, data processing and analysis can get twice the result with half the effort, and users are satisfied.
- Related articles
- How about Runjia Property Management (Beijing) Co., Ltd. Taiyuan Branch?
- Is Hangzhou Vanke Rongxinhia second-hand house worth buying?
- What about Dongguan IOT Electronic Technology Co., Ltd.?
- What kind of headhunting company is Chi Jia International?
- Are there professional health masseuses in Tianjin?
- How should the decoration company manage?
- How far is it from Zhijiang Sanning to Yun Chi?
- Tianjin wuqing district Personnel Bureau Talent Exchange Center Address
- What career is the most reliable in today's society, and it will not be eliminated anyway.
- Interim Measures for the Management of Tender Deposits for Public Resource Engineering Construction Projects in Dezhou City, Shandong Province