Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Automatic information extraction from identity card with ocr

Notifications You must be signed in to change notification settings

musimab/Tc_ID_Card_OCR

Repository files navigation

Usage

Arguments

  • --folder_name: folder path
  • --neighbor_box_distance: Nearest box distance
  • --face_recognition: Face recognition method (dlib, ssd, haar)
  • --rotation_interval: Id card rotation interval in degrees
  • --ocr_method: ocr method (EasyOcr and TesseractOcr)

In Dlib and Haar face detection model, it is better to choose a rotation angle of less than 30 degrees, otherwise no face may be detected due to image inversion. Create a folder and put the ID card images in that folder

git clone git@github.com:musimab/Tc_ID_Card_OCR.git
mkdir images
python3 main.py --folder_name "images" --neighbor_box_distance 60 --face_recognition ssd --ocr_method EasyOcr --rotation_interval 60

create python3 virtual enviroment and install dependencies

python3 -m venv card_id_ocr_venv
source card_id_ocr_venv/bin/activate
pip3 install -r requirements.txt

The result image and cropped regions will be saved to ./outputs by default. The json data will be saved to ./test by default.

Finds all virtual envs in linux `locate -b '\activate' | grep "/home"``

TODOs

  1. deep learning based (Yolo SSD Faster Rcnn) identity card recognition model will be developed

Algorithm Pipeline

ocr_pip_update1

Input image

ori14_m2rot

Warped image

warped_img

CRAFT Character Density Map

txt_heat_map

Unet Output for character density map

maskem

Craft Output(red boxes) and Matched Boxes(blue boxes)

final_imgp

Ocr Output

Tc : 12345678909 Surname : MUSTAFA ALİ Name : YILMAZ DateofBirth : 07071999

Ocr Evaluation

The accuracy of the optical character system was evaluated according to 2 different criteria. The first of these is accuracy at the word level and the other is accuracy at the character level.

The evaluate.py function retrieves the predicted and actual values in json format

Character Level Comparision

  1. tc: 1303 / 1327 => 98.19 %
  2. surname: 805 / 816 => 98.65 %
  3. name: 742 / 746 => 99.46 %
  4. dateofbirth: 976 / 976 => 100.0 %

Word Level Comparision

  1. tc : 0.96 %
  2. surname : 0.91 %
  3. name : 0.95 %
  4. date: 1.0 %

Easy Ocr

https://github.com/sarra831/EasyOCR

For Support

If you want to support my work, you can buy me a coffee at https://www.buymeacoffee.com/mustafaboyuk

Releases

No releases published

Packages

No packages published

Languages