Automatic Information Extraction From Student ID Card Images Using DB and VietOCR: A Case Study at a Vietnamese University
Online First: 29/04/2026
Email tác giả liên hệ:
nguyendung@hueuni.edu.vnDOI:
https://doi.org/10.54644/jte.2026.2101Từ khóa:
MobileNetV3, Differentiable Binarization, Text Detection, Vietnamese Text Recognition, VietOCRTóm tắt
The development of an information extraction system from student ID card images plays an important role in the digitalization of student management. This study proposes a two-stage processing framework that integrates computer vision and deep learning techniques, in which MobileNetV3-Small is employed for student identification card image classification, while the Differentiable Binarization (DB) model and VietOCR are responsible for Vietnamese text detection and recognition, respectively. Experimental results on a student ID card image dataset show that the classification model achieves an accuracy of 99.40% with an AUC of 0.9996, while the DB-based text detection model attains an Hmean of 89.81% after data augmentation. For text recognition, the proposed system achieves over 99% character-level accuracy and up to 98.90% full-sequence accuracy. These results demonstrate the effectiveness and practical feasibility of the proposed system, which is further validated through a proof-of-concept offline attendance application. In addition, the system is designed with computational efficiency in mind, enabling deployment on resource-constrained devices without requiring continuous internet connectivity. The proposed framework can be readily adapted to other types of identification documents, providing a scalable and cost-effective solution for automated data acquisition in educational institutions.
Tải xuống: 0
Tài liệu tham khảo
E. Mukul and G. Büyüközkan, “Digital transformation in education: A systematic review of Education 4.0,” Technol. Forecast. Soc. Change, vol. 194, Art. no. 122664, 2023. DOI: https://doi.org/10.1016/j.techfore.2023.122664
K. K. de S. Oliveira and R. A. C. De Souza, “Digital transformation towards Education 4.0,” Informatics in Education, vol. 21, no. 2, pp. 283–309, 2022. DOI: https://doi.org/10.15388/infedu.2022.13
A. A. Bilyalova, D. A. Salimova, and T. I. Zelenina, “Digital transformation in education,” in Proc. Int. Conf. Integrated Science, 2019, pp. 265–276. DOI: https://doi.org/10.1007/978-3-030-22493-6_24
J. Liang, D. Doermann, and H. Li, “Camera-based analysis of text and documents: A survey,” Int. J. Doc. Anal. Recognit., vol. 7, no. 2–3, pp. 84–104, 2005. DOI: https://doi.org/10.1007/s10032-004-0138-z
A. T. I. Mazumdar, N. N. Islam, and M. S. Hossain, “NFC-based mobile application for student attendance in institution of higher learning,” in Proc. ICAEEE, 2022, pp. 1–6. DOI: https://doi.org/10.1109/ICAIC53980.2022.9896975
M. Kumar, P. K. Samota, and M. K. Sharma, “Class attendance management system using NFC mobile devices,” Intell. Autom. Soft Comput., vol. 23, no. 2, pp. 243–250, 2017. DOI: https://doi.org/10.1080/10798587.2016.1204749
T. Karygiannis et al., Guidelines for Securing Radio Frequency Identification (RFID) Systems, NIST Special Publication 800-98, 2007. DOI: https://doi.org/10.6028/NIST.SP.800-98
C. Jin et al., “RFID technology, security vulnerabilities, and countermeasures,” in Cutting Edge Research Topics on Multiple Access Communications. London, U.K.: IntechOpen, 2009.
S. Kumar et al., “A comprehensive taxonomy of security and privacy issues in RFID,” Complex Intell. Syst., vol. 7, no. 4, pp. 1915–1943, 2021. DOI: https://doi.org/10.1007/s40747-021-00280-6
A. Howard et al., “Searching for MobileNetV3,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2019, pp. 1314–1324. DOI: https://doi.org/10.1109/ICCV.2019.00140
M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai, “Real-time scene text detection with differentiable binarization,” in Proc. AAAI Conf. Artif. Intell., vol. 34, no. 7, 2020, pp. 11474–11481. DOI: https://doi.org/10.1609/aaai.v34i07.6812
P. B. C. Quoc, “VietOCR – Nhận dạng tiếng Việt sử dụng mô hình Transformer và AttentionOCR,” 2021. [Online]. Available: https://pbcquoc.github.io/vietocr/
A. V. Gayer, Y. S. Chernyshova, and V. V. Arlazarov, “Recognition of machine-readable zone in identity documents: A review,” IEEE Access, 2025. DOI: https://doi.org/10.1109/ACCESS.2025.3571547
R. Smith, “An overview of the Tesseract OCR engine,” in Proc. ICDAR, 2007, pp. 629–633. DOI: https://doi.org/10.1109/ICDAR.2007.4376991
Y. Xu et al., “LayoutLMv3: Pre-training for document AI with unified text and image masking,” arXiv:2204.08387, 2022.
G. Kim et al., “Donut: Document understanding transformer without OCR,” in Proc. ECCV, 2022, pp. 1–19. DOI: https://doi.org/10.1007/978-3-031-19815-1_29
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in Proc. CVPR, 2018, pp. 4510–4520. DOI: https://doi.org/10.1109/CVPR.2018.00474
X. Zhou et al., “EAST: An efficient and accurate scene text detector,” in Proc. CVPR, 2017, pp. 2642–2651. DOI: https://doi.org/10.1109/CVPR.2017.283
S. Long et al., “TextSnake: A flexible representation for detecting text of arbitrary shapes,” in Proc. ECCV, 2018, pp. 20–36. DOI: https://doi.org/10.1007/978-3-030-01216-8_2
D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv:1409.0473, 2014.
B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network for image-based sequence recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 11, pp. 2298–2304, 2017. DOI: https://doi.org/10.1109/TPAMI.2016.2646371
M. Li et al., “TrOCR: Transformer-based optical character recognition with pre-trained models,” in Proc. AAAI Conf. Artif. Intell., vol. 37, no. 11, 2023, pp. 13094–13102. DOI: https://doi.org/10.1609/aaai.v37i11.26538
J. Deng et al., “ImageNet: A large-scale hierarchical image database,” in Proc. CVPR, 2009, pp. 248–255. DOI: https://doi.org/10.1109/CVPR.2009.5206848
Y. Xu et al., “LayoutLM: Pre-training of text and layout for document image understanding,” in Proc. ACM SIGKDD, 2020, pp. 1192–1200. DOI: https://doi.org/10.1145/3394486.3403172
K. Nguyen-Trong, “An end-to-end method to extract information from Vietnamese ID card images,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 3, 2022. DOI: https://doi.org/10.14569/IJACSA.2022.0130371
I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Adv. Neural Inf. Process. Syst., vol. 27, 2014.
P. Dhote, “Seq2Seq Encoder–Decoder LSTM Model,” Medium, 2020.
A. Vaswani et al., “Attention is all you need,” in Adv. Neural Inf. Process. Syst., vol. 30, 2017, pp. 5998–6008.
Viblo Asia, “Seq2Seq with Attention,” 2019. [Online]. Available: https://viblo.asia
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556, 2014.
Tải xuống
Đã Xuất bản
Cách trích dẫn
Giấy phép
Bản quyền (c) 2026 Tạp chí Khoa học Giáo dục Kỹ Thuật
Tác phẩm này được cấp phép theo Giấy phép quốc tế Creative Commons Attribution-NonCommercial-NoDeri Phái sinh 4.0 .
Bản quyền thuộc về JTE.


