Job Recruitment Website - Property management - Introduction of big data training course, what does big data learning course learn?

Introduction of big data training course, what does big data learning course learn?

The following courses are mainly aimed at the simple and easy-to-understand introduction of zero-based big data engineers at all stages, so that everyone can better understand the big data learning courses. The curriculum framework is a zero-based big data engineer course with big data.

First, the first stage: static web page foundation (HTML+CSS)

1. difficulty: one star

2. Class hours (technical knowledge points+stage project tasks+comprehensive ability)

3. The main technologies include: html common tags, CSS common layout, style, positioning, static page design and production methods.

4. The description is as follows:

Technically, the technical code used at this stage is easy to learn and understand. From the later course level, because we focus on big data, we need to exercise programming skills and thinking in the early stage. According to the analysis of our project manager who has been developing and teaching for many years, J2EE is the best technology to understand and master in the market at present, and J2EE cannot be separated from page technology. So in the first stage, our focus is on page technology. Adopt the mainstream HTMl+CSS in the market.

Second, the second stage: JavaSE+JavaWeb

1. difficulty: two stars

2. Class hours (technical knowledge points+stage project tasks+comprehensive ability)

3. The main technologies include: java basic grammar, java object-oriented (class, object, encapsulation, inheritance, polymorphism, abstract class, interface, public class, inner class, public modifier, etc. ), exception, collection, file, IO, MYSQL (basic SQL statement operation, multi-table query, sub-query, stored procedure, transaction, distributed transaction) JDBC and so on.

4. The description is as follows:

It's called Java Foundation, with technical points from simple to deep, module analysis of real business projects, and design of various storage methods.

And implementation. This stage is the most important of the first four stages, because all the later stages are based on this stage, and it is also the stage with the highest degree of learning big data. At this stage, I will contact the team for the first time to develop and produce a real project, with front and back office (phase I technology+phase II technology comprehensive application).

Third, the third stage: front-end framework

1. Summary procedure: two stars.

2. Class hours (technical knowledge points+stage project tasks+comprehensive ability): 64 class hours.

3. The main technologies include: Java, Jquery and annotation reflection used together, XML and XML parsing, parsing new features of dom4j, jxab and jdk8.0, SVN, Maven and easyui.

4. The description is as follows:

On the basis of the first two stages, changing static into dynamic can enrich the content of our webpage. Of course, if there are professional front-end designers from the perspective of market personnel, our current design goal is that front-end technology can exercise people's thinking and design ability more intuitively. At the same time, we also integrate the advanced features of the second stage into this stage. Let learners walk up a flight of stairs.

The fourth stage: enterprise-level development framework

1. Summary procedure: three stars

2. Class hours (technical knowledge points+stage project tasks+comprehensive ability)

3. The main technologies are: Hibernate, Spring, SpringMVC, log4j slf4j integration, myBatis, struts2, Shiro, redis, process engine activity, reptile technology nutch, lucene, webServiceCXF, Tomcat cluster and hot standby, MySQL read-write separation.

4. The description is as follows:

If you compare the whole JAVA course to a pastry shop, you can make a Wu Dalang baked wheat cake in the first three stages (because it is purely manual-too troublesome), while the learning framework can open a Starbucks (high-tech equipment-saving time and effort). As far as the requirements of the position of J2EE development engineer are concerned, the technology used at this stage must be mastered, and the courses we teach are higher than the market (there are three mainstream frameworks in the market, and we teach seven framework technologies), and they are driven by real commercial projects. Requirements document, overall design, detailed design, source code testing, deployment, installation manual, etc. Will be explained.

Fifth, the fifth stage: understand big data.

1. difficulty: three stars

2. Class hours (technical knowledge points+stage project tasks+comprehensive ability)

3. The main technologies include: the first part of big data (what is big data, application scenarios, how to learn big database, virtual machine concept and installation, etc. ), Linux common commands (file management, system management, disk management), Linux SHELL programming (SHELL variables, loop control, application), hadoop introduction (Hadoop composition, stand-alone environment, directory structure, HDFS interface, MR interface, simple Shell, java accessing Hadoop), HDFS (introduction, Shell, using IDEA development tools, Building a fully distributed cluster), MapReduce application (intermediate computing process, Java operation MapReduce, program running, log monitoring), Hadoop advanced application (YARN framework introduction, configuration items and optimization, CDH introduction, environment construction), extension (see figure-side optimization,

4. The description is as follows:

This stage is to let newcomers have a big concept of big data. How to be relative? After learning JAVA for preparatory courses, you can understand how the program works on a single computer. So, what about big data? Big data is to run programs in large-scale machine clusters for processing. Of course, big data is to process data, so similarly, data storage has changed from single-machine storage to multi-machine large-scale cluster storage.

(You ask me what a cluster is? Ok, I have a big pot of rice. I can finish it myself, but it will take a long time. Now I invite everyone to have dinner together. Call someone when you are alone. What if there are too many people? Is it a crowd? )

Then big data can be roughly divided into: big data storage and big data processing. So at this stage, our course has designed the standard of big data: HADOOP big data is not running WINDOWS 7 or W 10, which we often use, but the most widely used system: LINUX.

Stage 6: Big Data Database

1. difficulty: four stars

2. Class hours (technical knowledge points+stage project tasks+comprehensive ability)

3. The main technologies include Hive introduction (Hive introduction, Hive usage scenario, environment construction, architecture description and working mechanism), Hive Shell programming (table building, query statement, partition and bucket, index management and view), Hive advanced application (DISTINCT implementation, groupby, join, sql conversion principle, java programming, configuration and optimization) and hbase introduction. Hbase SHELL programming (DDL, DML, Java operation table construction, query, compression, filtering), detailed description of Hbase module (regional introduction, configuration of HREGION SERVER, HMASTER, ZOOKEEPER, integration of HBASE and Zookeeper), advanced features of HBase (reading and writing process, data model, reading and writing hotspots of pattern design, optimization and configuration).

4. The description is as follows:

This stage aims to let everyone know how big data handles large-scale data. Simplify programming time and improve reading speed.

How to simplify it? In the first stage, if you need complex business association and data mining, it is very complicated to write MR programs yourself. So at this stage, we introduced HIVE, a data warehouse in big data. Here is a key word, data warehouse. I know you're going to ask me, so I'll start by saying that a data warehouse is usually a huge data center for data mining and analysis. It stores these data, usually large databases such as ORACLE and DB2. These databases are usually used for real-time online business.

In a word, data analysis based on data warehouse is relatively slow. But conveniently, as long as you are familiar with SQL, it is relatively simple to learn, and HIVE is such a tool, an SQL query tool based on big data, and this stage also includes HBASE, which is the database in big data. Wondering, didn't you learn a data "warehouse" called HIVE? HIVE is based on MR, so the query is quite slow. HBASE can query data in real time based on big data. One main analysis and another main query.

Stage 7: Real-time data acquisition

1. Simple procedure: four stars

2. Class hours (technical knowledge points+stage project tasks+comprehensive ability)

3. The main technologies include: Flume log acquisition, KAFKA introduction (message queue, application scenario, cluster construction), KAFKA detailed explanation (partition, theme, receiver, sender, integration with ZOOKEEPER, Shell development, Shell debugging), advanced use of KAFKA (java development, main configuration, optimization project), data visualization (introduction of graphics and charts, classification of chart tools). Brief introduction of STORM (design idea, application scenario, processing flow, cluster installation), STORM development (Stromvn development, writing STORM local program), STORM advanced (java development, main configuration, optimization project), timeliness of KAFKA asynchronous sending and batch sending, KAFKA global message sequencing, and Storm multi-concurrent optimization.

4. The description is as follows:

The data source of the previous stage is based on the existing large-scale data sets, and the results of data processing and analysis have a certain delay. Usually, the data processed is the data of the previous day.

Example scenarios: website security chain, abnormal customer account, real-time credit investigation. What if these scenarios are analyzed based on the data of the previous day? Is it too late? So at this stage, we introduce real-time data acquisition and analysis. It mainly includes: FLUME real-time data acquisition, KAFKA data transceiver, STORM real-time data processing, and second-level data processing.

Eight. Stage 8: Spark data analysis

1. Summary procedure: five stars

2. Class hours (technical knowledge points+stage project tasks+comprehensive ability)

3. The main technologies include: introduction of SCALA (data types, operators, control statements, basic functions), advanced use of SCALA (data structures, classes, objects, features, pattern matching, regular expressions) and advanced use of SCALA (high-order functions, Cory functions, partial functions, tail iteration, self-contained high-order functions, etc.). ), and the introduction of SPARK (environmental construction, infrastructure, operation mode, etc. ). SPARK SQL, SPARK Advanced (data frame, data set, SPARK flow principle, SPARK flow support source, KAFKA and SOCKET integration, programming model), SPARK advanced programming (SPARK-GraphX, Spark-Mllib machine learning), Spark advanced application (system architecture, main configuration and performance optimization, fault and stage recovery), SPARK ML KMEANS algorithm, and so on.

4. The description is as follows:

Let's talk about the previous stage, mainly the first stage. HADOOP analyzes large-scale data sets based on MR, including machine learning and artificial intelligence, which is relatively slow. And is not suitable for iterative calculation. SPARK is an alternative product of MR in analysis. How to replace? Let's talk about their operating mechanism first. HADOOP is based on disk storage analysis and SPARK is based on memory analysis. You may not understand what I said, but it is more vivid, just like going to Shanghai by train from Beijing. MR is a green leather train, and SPARK is a high-speed rail or maglev. SPARK is developed based on SCALA language, of course, it supports SCALA best, so learn SCALA development language first in the course.

In the design of the data course of HKUST, the positions in the market require technology and are basically fully covered. Moreover, it is not simply covering job requirements. The course itself is a complete big data project process from front-end to back-end.

For example, from the storage and analysis of historical data (HADOOP, HIVE, HBASE) to the storage and analysis of real-time data (FLUME, KAFKA) and analysis (STORM, SPARK), these are interdependent in real projects.