Vietnamese dataset for the finding similar question problem

Authors

  • Hà Thị Thanh
  • Nguyễn Thị Oanh

Keywords:

dataset, elastic search, search engine.

Abstract

Finding similar questions is a common problem in natural language processing. However, little research has been conducted on the question retrieval problem for Vietnamese. The reason for this is that there is no standard Vietnamese dataset for the finding question problem. In this paper, we created a method to build a Vietnamese dataset for the problem of finding similar questions. As a result, we built 7911 pairs of labeled questions. This dataset was evaluated on some basic machine learning models.

Downloads

Download data is not yet available.

Author Biographies

  • Hà Thị Thanh

    Trường Đại học Công nghệ Thông tin và Truyền thông, Đại học Thái Nguyên

  • Nguyễn Thị Oanh

    Trường Đại học Công nghệ Thông tin và Truyền thông, Đại học Thái Nguyên

Published

2022-11-14