Job Recruitment Website - Job seeking and recruitment - Is it illegal to translate Zhaozhuyin?

Is it illegal to translate Zhaozhuyin?

It's not illegal. Phonetic translation is translation work. It refers to translating words and sentence paragraphs from one language into another language and delivering the translated content to users in the form of phonetic information.

First, we must dictate the original sound, then translate the text, and finally synthesize the sound and subtitles. You can try Translai translation platform, register your account and upload the audio and video to be translated. Ai will automatically dictate the input shaft, which is much more convenient.

We know that sound is actually a wave. Common formats such as mp3 are compressed formats and must be converted into uncompressed pure waveform files, such as Windows PCM files, also called wav files. In addition to the file header, the wav file stores the points of the sound waveform.

Before speech recognition, it is sometimes necessary to cut off the silence at the beginning and the end to reduce the interference to the subsequent steps. This silent cutting operation is usually called VAD, which requires some signal processing techniques.

To analyze sound, it is necessary to frame the sound, that is, cut the sound into small pieces, each of which is called a frame. The general framing operation is not simple cutting, but realized by moving the window function, so I won't go into details here. There is usually overlap between frames.

After framing, the speech became many short paragraphs. However, the waveform has almost no descriptive ability in time domain and must be transformed. The common transformation method is to extract MFCC features. According to the physiological characteristics of human ears, each frame waveform is transformed into a multi-dimensional vector, which can be simply understood as containing the content information of this frame of speech. This process is called acoustic feature extraction. In practical application, there are many details in this step, and the acoustic characteristics are not limited to MFCC, so I won't say much here.

At this point, the sound becomes a matrix with 12 rows (assuming the acoustic feature is 12 dimensions) and n columns, which is called an observation sequence, where n is the total number of frames. The observation sequence is shown in the figure below. Each frame in the figure is represented by a vector of 12 dimension, and the color depth of the color block indicates the magnitude of the vector value.