Jo Sep 8, 2023

A research team led by Pak Kwang Hyok, a researcher at the Faculty of Distance Education, has conducted a study of GloVe (Global vector) for extracting the features of Korean.

While CBOW or skip-gram is the prediction task of a contextual word, GloVe is the presentation method by the number of co-appearance of words.

Skip-gram can reflect longer-distance information through skips between some words but this reveals a defect, that is, hard to reflect contextual information.

Therefore, Korean sentence corpora segmented by Byte Pair Encoder (BPE) are needed for extracting the features of Korean by means of GloVe.

The research team used an analysis engine based on the Long-Short Term Memory for BPE.

The research result showed that Korean feature extraction by GloVe was better in F-score estimation than that by CBOW or skip-gram.

This method can be applied to Korean sentence similarity evaluation for online exams and bibliographic search systems.