TOWARDS THE DEVELOPMENT OF A SUBTITLE DATASET FOR EDUCATIONAL VIDEOS IN INFORMATION TECHNOLOGY
Keywords:
Abstract
This study aims to construct a domain-specific dataset of video subtitles in the field of Information Technology (IT) to enhance access to educational resources and support the development of natural language processing (NLP) applications in education. A systematic methodology is proposed for data collection and processing, encompassing source selection, subtitle extraction, data cleaning, normalization, and quality assurance. The resulting dataset possesses strong academic value and is intended to serve as a foundational resource for further research and practical applications in IT education
Downloads
Download data is not yet available.
