In today’s fast-paced digital landscape, efficiently extracting information from official documents is essential. The AI-Powered Vehicle CR Book (Registration certificate of the vehicle) Information Extractor is a cutting-edge tool designed to tackle this challenge with high precision. By leveraging advanced technologies such as Optical Character Recognition (OCR), Natural Language Processing (NLP), and Computer Vision(CV) this extractor can seamlessly process vehicle CR books. This project not only streamlines the extraction process but also significantly enhances data accuracy and reduces the time and effort required for manual data entry. Join us as we explore the technical marvels behind this powerful tool and how it transforms the handling of essential vehicle documents.
Extracting information from vehicle CR books involves identifying specific fields and extracting corresponding data. This process is challenging due to its complex patterned background, multi-language text , and structured layout with varying sizes and resolutions. Handwritten elements, stamps, and text embedded in graphics add to the complexity. Accurately recognizing and interpreting language-specific characters and maintaining the document’s layout require an advanced OCR tool and sophisticated mechanism capable of handling these diverse elements. This article outlines a robust approach to identify the contour of the document and accurately perform OCR, making it adaptable to different image resolutions and maintaining high accuracy.
The process of extracting information from vehicle CR book involves two main steps :
- Contour Detection: This step involves identifying the boundaries of the document using image processing techniques, including thresholding, morphological operations, and edge detection, to isolate and crop the document’s area.
- Performing OCR on Selected Fields: On the cropped image from step 1, perform annotation on desired areas on sample images. The annotated coordinates are converted into percentage values relative to the image’s width and height, ensuring adaptability to different resolutions. Thresholds are determined for each field, and OCR is applied to extract text from specific regions. Post-OCR, NLP techniques like RegEx, Fuzzy matching, and Text Normalization are used to ensure accurate recognition and extraction of relevant information such as registration numbers and other vehicle details.
Let’s explore each main step in detail and dive deep into their intricacies !
1. Contour Detection
This process involves following steps :
(I) Filtering Vertical Lines From The Image
(II) Detecting Vertical Lines of Contour
(III) Filtering Horizontal Lines From The Image
(IV) Detecting Horizontal Lines of Contour
(V) Cropping the Image through Contour
Below is an illustration of the contour detection pipeline, detailing the process from the initial input image to cropping the image based on detected contours.
(I) Filtering Vertical Lines From The Image
The first step in our approach involves Identifying the document’s vertical lines is crucial for determining the boundaries of the document. We employ image processing techniques to detect these lines. The steps include:
- Image Thresholding : Convert the image to a binary format where the pixels are either black or white. This helps in distinguishing the document’s lines from the background.
- Morphological Operations : Use morphological operations to enhance the vertical lines in the image. Erosion and dilation are key operations that help in highlighting these lines by removing noise and connecting disjointed parts of the lines.
- Vertical Line Detection : Apply filters to isolate vertical lines. This involves using kernels specifically designed to detect vertical structures within the image.
- Line Grouping : Group detected lines that are close to each other into single lines. This helps in reducing the number of lines to a manageable few that represent the actual document boundaries.
- Line Selection : Select the most relevant lines based on their length and position, ensuring they form the left and right boundaries of the document.
(III) Filtering Horizontal Lines From The Image
Similar to vertical lines, horizontal lines are crucial for determining the top and bottom boundaries of the document. The process includes:
- Image Thresholding : As with vertical lines, convert the image to a binary format.
- Morphological Operations: Apply morphological operations tailored to enhance horizontal lines.
- Horizontal Line Detection : Use appropriate filters to detect horizontal structures in the image.
(IV) Detecting Horizontal Lines of Contour
Once horizontal lines are filtered, we determine their exact positions. This involves :
- Edge Detection : Detect edges corresponding to horizontal lines.
- Line Grouping : Group lines that are close together.
- Line Selection : Choose the most relevant horizontal lines based on their length and position to form the top and bottom boundaries of the document.
(V) Cropping the Image through Contour
With the detected vertical and horizontal lines, we can now crop the image to isolate the document area. This involves:
- Sorting Lines : Ensure the lines are sorted correctly to determine the cropping boundaries.
- Cropping : Use the sorted lines to crop the image, extracting the rectangular region that contains the document.
2. Performing OCR on Selected Fields
Building on the foundation of contour detection and image preprocessing, the next crucial step in extracting information from motor vehicle registration certificates involves defining the OCR (Optical Character Recognition) parameters for various fields. This part of the process ensures that we accurately extract text data from specified regions within the cropped document image. Here’s a detailed explanation of how to approach this step effectively. This process involves following steps :
(I) Manual Annotation
(II) Normalize the Annotated Coordinates
(III) Determine Thresholds
(IV) Perform OCR
(V) Post-OCR Processing
(I) Manual Annotation
Begin by annotating the regions of interest on several sample images. Use a rectangle to mark the areas where each field’s text is located (e.g., Registration Number, Chassis Number, Engine Number, etc.).
(II) Normalize the Annotated coordinates to image dimensions.
Convert the annotated coordinates into percentage values relative to the image’s width and height. This ensures that the coordinates are adaptable to images of different resolutions. By using percentage values, the extraction process becomes resolution-independent, making it robust across various document sizes and qualities. For instance, if a rectangle marking the Registration Number spans from (x1, y1) to (x2, y2) in an image of width W and height H, the percentage values would be:
- x1_percentage = x1 / W
- y1_percentage = y1 / H
- x2_percentage = x2 / W
- y2_percentage = y2 / H
(III) Determine Thresholds
Establish threshold values for each field to filter out irrelevant text and improve OCR accuracy. These thresholds help in distinguishing the target text from other textual content in the document.
(IV) Perform OCR
For extracting text from vehicle CR books, we use PaddleOCR, an open-source OCR tool developed by Baidu. Known for its high performance and accuracy, PaddleOCR supports multiple languages and integrates seamlessly with Python.
why paddle OCR ?
- High Accuracy: After reviewing different open source OCR frameworks (including MMOCR, EASY OCR, PaddleOCR and HiveOCR) and different combinations of proposed models on internal benchmark and on the edge cases, a indisputable winner was PaddleOCR with an average accuracy of 0.8 and an acceptable performance on our edge cases. This result competes with the paid Google Cloud Vision OCR API on the best accuracy we measured. (Adevinta Tech Blog, “Text-in-Image 2.0: Improving OCR Service with PaddleOCR,” Medium, June 2023)
- Advanced Framework: Built on PaddlePaddle, PaddleOCR includes a range of pre-trained models for text recognition and document analysis.
- Versatility: It supports various text extraction tasks, including layout analysis and table recognition, making it highly adaptable to different document structures.
By leveraging PaddleOCR, we ensure reliable and precise text extraction from complex vehicle CR books, facilitating accurate and efficient data processing.
(V) Post-OCR Processing
After extracting text using OCR, the next step involves processing the OCR results to ensure accurate and relevant information is obtained from the document. This involves several key techniques:
- Regular Expressions (RegEx): RegEx patterns are used to identify and validate specific data formats within the extracted text. This helps in isolating relevant information such as registration numbers, chassis numbers, and engine numbers.
- Fuzzy Matching: Fuzzy matching techniques help in identifying text that is similar to predefined target strings, even if there are minor OCR errors. This is particularly useful for fields like class of vehicle and fuel type, where slight variations in the text may occur. The process involves comparing the extracted text to a list of expected values and selecting the closest match based on a similarity threshold.
- Text Normalization: Text normalization involves converting text to a consistent format, such as uppercase, and removing unnecessary characters or spaces. This ensures uniformity and improves the reliability of subsequent processing steps.
- Conditional Text Modifications: Based on specific conditions, certain text elements are modified to match expected formats. This includes adding spaces after certain prefixes, correcting common OCR misinterpretations, and reformatting numbers. For example, cylinder capacities might be normalized to include a consistent “CC” suffix.
- Error Handling: Robust error handling is implemented to manage cases where no matching text is found. Default messages or fallback values are provided to ensure the system can handle unexpected or missing data gracefully.
By applying these post-OCR processing techniques, the extracted data is refined to ensure accuracy and relevance, making it suitable for further use in applications that require precise and reliable information from vehicle CR books.
The process of defining and refining OCR parameters for motor vehicle registration certificates requires meticulous attention to detail to achieve accurate text extraction. By carefully annotating sample images and converting coordinates to percentage values, we lay the groundwork for establishing effective thresholds and creating robust functions to process OCR results. Utilizing regex patterns and text manipulation techniques, these functions are designed to precisely identify and extract the necessary information from the documents, ensuring a reliable and efficient OCR process.
Moving forward, the next steps involve iteratively refining the coordinates and thresholds based on real-world samples to enhance accuracy further. Additionally, integrating the OCR process with an existing system will ensure seamless and automated data extraction, improving overall efficiency and reliability.
We are Kainovation Technologies, Leading the way in AI, ML, and Data Analytics. Our innovative solutions transform industries and enhance business operations. Contact us for all your AI needs.