portal news

Jo Nov 10, 2025

Unlike the Optical Character Recognition (OCR) in traditional document images, video subtitle text recognition is still a challenging task in the fields of computer vision and artificial intelligence because of complex background, different size and color of fonts, low contrast between texts and background regions and different location of subtitle texts in video frames. Due to these factors, the traditional OCR developed for document images has proved to be inappropriate for video subtitle texts.

In order to solve this problem, unified models for both detecting and recognizing texts have been proposed. However, these methods have the disadvantage of low accuracy in the recognition of unknown words or sequences of arbitrary characters.

Yun Chol Song, a researcher at the Faculty of Information Science and Technology, has designed and implemented two types of models―one is a video subtitle text detection model, a combination of a convolutional neural network model and an adaptive scale fusion model, and the other is a subtitle text recognition model, a combination of a convolutional neural network model and a BLSTM model.

The results of the experiments have demonstrated that the subtitle text detection accuracy in the video is 93.4% and the subtitle text recognition accuracy is 94.6%, which proves to be superior to the previous methods.