Job Recruitment Website - Job seeking and recruitment - Awareness of business and content security

Awareness of business and content security

Share my understanding of enterprise safety. Enterprise security is a particularly big concept. The ultimate goal of enterprise safety is to ensure the normal development of enterprises, and the whole system of enterprise safety is composed of different modules. If any link is not done well, it will affect the development of the enterprise, which may be the income and profit of the enterprise, the reputation of the enterprise and even the survival of the enterprise.

Frequent contact with several departments of Party A: Security Department, Operation Department, Audit Department, Development Department, etc. Every department has different concerns. The security department is basically responsible for network security, the operation department is responsible for ensuring the effectiveness of marketing strategies, the audit department is responsible for content quality and content violations, and the development department will involve the unified development and construction of security platforms. ? The importance of each department's work is also directly related to the company's business, but no matter which department has problems, the enterprise will be affected.

For an intuitive example, for a game company, it may be attacked by DDOS, which will affect the stable operation of the business, the reputation of the company may be affected by data leakage, and the content may be illegal, thus making the whole game take off the shelf for rectification. The most common problem is plug-in. The direct consequence is the loss of users and the loss of income.

Such as all kinds of pornographic information. 65438+In June 2009, the Internet Office conducted a thorough investigation of voice and removed a large number of applications. ? The main solution in the industry is to connect business-related words, pictures, videos and audio to the machine audit platform. At present, it is mainly the saas detection platform of third-party service providers or the self-built detection platform of enterprises, which is mainly used to improve efficiency and reduce audit time. At the same time, combined with manual audit, it ensures the effect and reduces the rate of missed judgment and misjudgment.

Especially in the game APP, in terms of game cracking, if you are interested, you can search Taobao stores and enter keywords to crack the game. There will be many shops and games to choose from. In addition to removing the normal charges in the game, the game will also add some abnormal functions, such as double attack, to attract players. Some shops charge according to the membership system and pay 150 yuan every month, which has exceeded the income of a single user of many original games. It is very fatal for the original game. For the solution of this kind of problem, take mobile games as an example. For the cracking problem, reinforcement can be used to prevent reverse cracking. For the plug-in problem, we can use the anti-plug-in technology of the game to check the simulator, the multi-opener, the real machine in the cloud, and simulate the click, and combine the operation means to enhance the deterrent effect on the plug-in.

/kloc-At the end of 0/8, Starbucks made a new activity of registering to send coffee coupons. At that time, user authentication was relatively simple, and coffee coupons could be obtained by filling in less information. One and a half days after the launch, the Wool Party brushed away almost 400W coupons, which is about 1000 W according to the price of the medium cup. ? In the wool party circle, it is still possible to get hundreds of thousands of minutes. For the protection of the wool party, support the threat intelligence database, such as the blacklist of mobile phone number, IP and email address, and then analyze the data and behavior by collecting the relevant information of users during the activity. ? In this black and gray industry, the drive of interests is very strong and the confrontation is very fierce.

The interesting thing about data leakage is that basically more than 60% of data leakage is due to internal work. Recently, a recruitment website leaked 16W resume information, which is a typical internal and external collusion incident. 50 yuan's resume was illegally sold to vendors and sold on Taobao at the price of 1-2 yuan. Therefore, data leakage prevention can be solved not only by using some data leakage prevention products, but also by perfecting the system, paying attention to the division of authority, strengthening audit activities, training internal personnel on safety awareness and increasing legal awareness.

DDoS attack is the oldest but most effective network attack. Thanks to the development of network communication and Internet technology, DDoS attacks are becoming more and more serious. For example, many IOT devices can be used for DDos attacks. It is difficult for users to solve the attack source and can only be passively protected. At home, dozens of GB attacks are very common now. Usually mixed with traffic and CC attacks, it is difficult to deal with the localized deployment of protective equipment, and most of them are solved by cloud cleaning. We can see that many domestic security vendors are transforming from hardware to cloud services, which is also a trend of cloud security services.

In this sharing, I will focus on how to solve the content security problems faced by enterprises in the context of the explosive growth of UGC content and increasing national supervision.

The current situation of content governance. From three angles, the first is the characteristics of supervision: there are many supervision departments, many regulations and requirements, and many special rectification.

The regulatory authorities include the Internet Information Office and the former State Administration of Radio, Film and Television, which are now split into the State Administration of Radio and Television, the State Administration of Press and Publication, the State Film Bureau, the Ministry of Culture, the Ministry of Public Security and the Ministry of Industry and Information Technology.

The regulatory contents of each regulatory department have their own emphasis, but there will also be overlap. ? For example, the General Administration of Press and Publication mainly supervises news content, and the State Administration of Radio, Film and Television censors radio and television content, such as various online dramas and TV dramas.

For an enterprise, as the object of supervision, it will be supervised by the public security department and the network information office at the same time. The way of supervision is generally implemented through user reports and special inspection activities. In particular, reporting by users is a very important channel. For example, the network office provides a center for reporting illegal and bad information in the central network office. In June this year alone, 1 1.7 million reported incidents were accepted. Regulators not only build their own reporting platforms, but also require major content platforms to build reporting channels, so we can see that, for example, major video websites have reporting feedback portals.

//In our future work and life, everyone can talk about the bad websites or contents they encounter and submit them to the Internet Office by reporting.

The second feature of supervision is that there are many regulatory requirements. Interested parties can go to the official website of various regulatory authorities to inquire about the regulatory requirements, which are already very detailed;

Here I want to emphasize the main body of responsibility, one of which is the user and the other is the platform.

1, taking a scene as an example, a user posts pornographic advertising information on the content platform. This behavior of users is illegal, and it is also illegal for content platforms to publish such content. Objectively speaking, both should be punished, but in reality, the accountability cost of users is very high, so what we can see in various content violations is mostly the handling of platforms.

And since June, 2065438 1, the network security law has been formally implemented, and the regulatory authorities have another legal basis. Take another scenario as an example:

A malicious user, through cyber attacks, tampered with the website to publish content containing pornographic information. The operating platform not only violated the requirements of content publishing, but also failed to implement the protection of information systems according to the Network Security Law, and will be punished according to the Network Security Law.

The third feature of supervision is that there are many governance activities.

According to the inspection of the Information Office, from February 20 18 to June 20 19, four content governance activities were carried out.

18 February, 18 conducted a special inspection on the apps, mainly involving pornography, drugs, illegal games, bad learning and other applications, and 33W apps were removed from the shelves.

65438+September 65438+1October, special rectification was carried out on educational apps, and it was verified that more than 20 apps such as "working dogs" and "pocket teachers" illegally spread obscene pornographic content and were removed from the shelves.

65438+65438+10-June, the "whole network rectification action" was carried out for half a year.

In June, a special speech rectification campaign was launched.

It can be seen that the country's determination and strength to build a green grid space environment.

Even under such strong supervision, illegal content still emerges one after another.

The characteristics of illegal content: covering many scenes, many data variants and strong antagonism.

(1) covers the scene, which is pervasive. ? News content, user comments, user avatars, nicknames, watching online drama barrage, any scene with content can not escape the harassment of illegal content.

(2) In various scenarios, there are many kinds and varieties of illegal data. From the initial sensitive words, to the current font checking, confusion of special symbols, and the embedding of illegal content in pictures, in the last year or two, ASMR content types have appeared on the voice, which will be mixed with a lot of pornographic content.

(3) Strong antagonism is reflected in the fact that the distribution of illegal content is not organized and antagonistic, and the detection or operation strategy is countered through the changes of content form and account number. This part will explain in detail the necessity of the construction of national defense volume later.

Then, under the background of strong national supervision, it is actually a difficult problem to do a good job in content security. ?

For managers, what they want to see in the end generally includes two indicators: the effect of detection and the impact on business. ? The detection effect here generally depends on the correct rate and recall rate. The impact on the business mainly depends on the detection time, and try not to affect the user experience. For example, in IM chat, if the detection time of a text exceeds 1s, it will seriously affect the user experience.

In order to achieve these goals, there are many difficulties in self-built detection system from 0 to 1.

The first is cost input, two main costs: labor cost and equipment cost. In terms of labor cost, the cost of network recruitment is still very high. Just a mature algorithm expert, the annual salary is generally around 50W. Moreover, the whole system needs not only algorithmic personnel, but also related operation and audit personnel. It takes millions of people to invest alone. ? In terms of equipment, GPU nodes needed for image processing are relatively expensive now. For example, a P40 graphics card in NVIDIA, 16, went on the market, and now it takes about 5W, and a P40 can detect pictures in about 30QPS. In addition, model training requires GPU nodes. This is also a relatively high overhead.

In addition to considering the cost, there are barriers to data accumulation and audit experience. Taking image training as an example, a detection model needs tens of thousands or even hundreds of thousands of sample data. It is impossible to accumulate such sample data without a certain amount of time and channels.

In addition, the auditor's experience, audit process and system are also important guarantees for the effect. The auditor's audit experience determines the subjective audit effect and audit efficiency, and the perfect process and system are the objective guarantee of the effect. ? The experience of personnel depends on continuous learning and training, and the process and system need time to formulate and improve. It takes a process.

Next, I will introduce the construction of test team and technical system.

The first is team building, here I take the company's team as an example;

The whole big team is subdivided into several small teams, including algorithm team, system development team, operation team and manual audit team.

The core technology is realized by the algorithm team, which is subdivided into different groups, such as the group that does text machine semester and the group that studies picture machine;

The system development team is responsible for building the business platform;

The operation team is responsible for directly interfacing with the business department, defining the requirements of test standards, adjusting some test strategies in real time, and optimizing the effect;

The audit team has the largest number of people, and at present, it also completes all-weather audit work in shift rotation mode.

There are two principles to be considered in formulating testing standards, one is the principle of comprehensiveness, and the other is the principle of landing.

On the whole, there are two main needs to be considered, one is the country and the other is the operating platform. ? For the country, pornography, terror and contraband are all prohibited contents, and there will be relevant laws and regulations prohibiting them in civilization. These standards are basically tests for all content platforms.

For the operating platform, such as abuse, irrigation, competing products and other advertising information content is not desirable.

This paper emphasizes a real-time, from putting forward requirements to implementing standards, which needs to be completed as soon as possible to reduce the vacuum period of testing. ?

From the landing point of view, it is necessary to collect data and train the model. Data can be collected for people, standards can be descriptive, but data collection and labeling must be detailed. For example, under the classification of pornography, for the detection requirements of "sexual behavior", it is required to describe the categories and concepts of sexual behavior with words themselves, and more details are needed to mark the data. For example, the pictures of leaking buttocks need to be explained, and they should be classified according to shooting angle, whether there are any missing points, whether they are children's photos and other factors. Will eventually be marked as pornographic, vulgar, sexy or normal photos.

After the standards are formulated, different standards are applied according to the needs of on-site testing. ? There is nothing wrong with news content publishing sexy pictures, but it is not normal to appear in children's education IM.

The three most important platforms:

The detection platform (the core of the service) is preset with various trained models.

Manual audit platform (effect and ability supplement, improve efficiency), with functions including data sampling and quick operation.

The model training platform (effect guarantee) is mainly composed of GPU clusters.

The business system is connected with the detection system, which can feed back the detection results of words and pictures in real time. ? The data that needs manual audit will be docked by the detection platform and the audit platform, and finally the audit platform will return the results to the business system.

Machine training platform, mainly according to the bad cases of each channel, carries out model training and optimization, and finally inputs the training results for the detection platform.

In this way, these platforms form a closed loop, achieving the purpose of fast access to services and continuous optimization of effects.

The above three parts, team, standard and platform, form a relatively perfect testing system. It can meet the needs of conventional content detection.

But the reality is that content governance not only deals with content, but also needs a deep detection and defense system.

Objective facts show that most illegal content is released by abnormal users, and content governance is a direct contest between enterprises and black and white producers. It's just that the content detection method is too simple or in a state of being exhausted.

Why is content governance a direct contest between enterprises and black ash production? Let's first look at a business process of black ash production:

From the role, there are publishers, business subcontractors and content platforms. There are several publishers, such as various pornographic websites. In order to attract traffic, it is necessary to publish website related information, and some people will publish illegal content on the same industry platform for the purpose of malicious competition. Publishers will find the role of business subcontracting to realize illegal content publishing. This business subcontracting will involve many roles, including people who specialize in writing automation tools, people who resell accounts, and platforms for publishing content, such as various group control platforms. Finally, each platform will have an issuer to release water.

Now the production of black ash is very mature, and each link has different division of labor. As PPT shows, there are specialized mobile phone card vendors, account vendors, coding platforms, various cloud control platforms and so on.

As we all know, the current mobile phone cards are all from real-name registration system. Therefore, there is a way for mobile phone card manufacturers to handle cards in large quantities. By registering a company, they can apply for a large number of IOT cards in the name of the company. These IOT cards have no voice function, but they can send and receive short messages. It can be used to register and log in accounts. ? So when you call back a mobile phone number with a registered number, the voice prompts: when the number you dialed is not enabled with voice function, it is probably an Internet of Things card.

The driving force of interests here is very strong. For example, a new number is worth a few dollars, but it can be worth tens or even hundreds of dollars by publishing normal content from time to time.

Released on major content platforms, the confrontation is particularly fierce now. Take Weibo as an example. You can observe that in the past, pornographic accounts would directly post pornographic remarks in various hot spots, such as pornographic websites, or add contact information. ? This kind of picture is easy to be detected and blocked, but now it has been changed into a sexier picture, and the published content is mostly normal comments, but the individual owner is pornographic information. In order to enhance antagonism.

In this context of strong confrontation, only the content detection method is too simple, and deep protection is the key.

Content governance is not only the detection of published content, but also the rectification from the source. ? It is necessary to establish an all-round defense system, from account registration to account login, to user behavior, and finally to publishing content, so as to achieve better results. That is to say, from content detection to user behavior detection, plus the ability of user portrait, it can better resist the attack of black ash production.

In the registration stage, there will be problems of batch registration and false registration. We can consider using verification code, number authentication and real person authentication to solve the problems of batch login and violent cracking in the login stage. We can use verification code and anti-cheating technology. Then the publishing behavior and content are detected, for example, the behavior of publishing a large number of similar content in the same account in a short time is handled.

The technical means mentioned here are simply explained by verification code and anti-cheating.

First-hand verification code, mainly used for human-computer identification, aims to increase the attack cost of attackers. Early verification codes, such as character verification codes, are very easy to crack. OCR recognition technology is mainly used for cracking, and it is easy to identify characters in pictures. At present, smart verification code is mostly used, which is judged by analyzing some behavior information and equipment information of users. Now more mainstream, such as puzzle sliding verification code, text clicking verification code, enhance the ability of confrontation.

The technology used in anti-cheating here, such as IP portrait, will detect the user's IP geographical location, whether it is a proxy IP or not. The detection of device environment will detect whether the device is an emulator, whether it has root or jailbreak, analyze the user's behavior, and set a normal behavior baseline through rules according to the information between various dimensions. Generally, it is because of the event entry of registration, login and key business operations, such as posting.

The above are typical security issues, and some contents of content security construction have been shared. ? -Kaka orange juice, content and business security practitioners.