TOWARDS THE DEVELOPMENT OF A SUBTITLE DATASET FOR EDUCATIONAL VIDEOS IN INFORMATION TECHNOLOGY

Authors

  • Trần Thị Thu Phương
  • Nguyễn Quốc Tuấn
  • Lê Thị Hằng

Keywords:

Abstract

This study aims to construct a domain-specific dataset of video subtitles in the field of Information Technology (IT) to enhance access to educational resources and support the development of natural language processing (NLP) applications in education. A systematic methodology is proposed for data collection and processing, encompassing source selection, subtitle extraction, data cleaning, normalization, and quality assurance. The resulting dataset possesses strong academic value and is intended to serve as a foundational resource for further research and practical applications in IT education

Downloads

Download data is not yet available.

Published

2025-05-27