Khmer printed character recognition using attention-based Seq2Seq network

Các tác giả

  • Rina Buoy
  • Nguonly Taing
  • Sovisal Chenda
  • Sokchea Kor

Từ khóa:

Tóm tắt

This paper presents an end-to-end deep convolutional recurrent neural network solution for Khmer optical character recognition (OCR) task. The proposed solution uses a sequence-to-sequence (Seq2Seq) architecture with attention mechanism. The encoder extracts visual features from an input text-line image via layers of convolutional blocks and a layer of gated recurrent units (GRU). The features are encoded in a single context vector and a sequence of hidden states which are fed to the decoder for decoding one character at a time until a special end-of-sentence (EOS) token is reached. The attention mechanism allows the decoder network to adaptively select relevant parts of the input image while predicting a target character. The Seq2Seq Khmer OCR network is trained on a large collection of computer-generated text-line images for multiple common Khmer fonts. Complex data augmentation is applied on both train and validation dataset. The proposed model’s performance outperforms the state-of-art Tesseract OCR engine for Khmer language on the validation set of 6400 augmented images by achieving a character error rate (CER) of 0.7% vs 35.9%.

Lượt tải

Chưa có dữ liệu tải xuống.

Tiểu sử tác giả

  • Rina Buoy

    Techo Startup Center, Phnom Penh, Cambodia

  • Nguonly Taing

    Techo Startup Center, Phnom Penh, Cambodia

  • Sovisal Chenda

    Techo Startup Center, Phnom Penh, Cambodia

  • Sokchea Kor

    Royal University of Phnom Penh, Phnom Penh, Cambodia

Đã Xuất bản

2022-11-20

Số

Chuyên mục

Bài viết