Multi-modal video retrieval using Dilated Pyramidal Residual network

Authors

  • La Ngọc Thùy An
  • Nguyễn Phước Đạt
  • Phạm Minh Nhựt
  • Vũ Hải Quân

Keywords:

Abstract

Pyramidal Residual Network achieved high accuracy in image classification tasks. However, there is no previous work on sequence recognition tasks using this model. We presented how to extend its architecture to form Dilated Pyramidal Residual Network (DPRN), for this long-standing research topic and evaluate it on the problems of automatic speech recognition and optical character recognition. Together, they formed a multi-modal video retrieval framework for Vietnamese Broadcast News. Experiments were conducted on caption images and speech frames extracted from VTV broadcast videos. Results showed that DPRN was not only end-to-end trainable but also performed well in sequence recognition tasks.

Downloads

Download data is not yet available.

Published

2020-09-24

Issue

Section

ORIGINAL RESEARCH