Vietnamese dataset for the finding similar question problem
Keywords:
dataset, elastic search, search engine.Abstract
Finding similar questions is a common problem in natural language processing. However, little research has been conducted on the question retrieval problem for Vietnamese. The reason for this is that there is no standard Vietnamese dataset for the finding question problem. In this paper, we created a method to build a Vietnamese dataset for the problem of finding similar questions. As a result, we built 7911 pairs of labeled questions. This dataset was evaluated on some basic machine learning models.