Electronic identities (eIDs) have emerged as a novel way of identity proving under the umbrella of the digital revolution the society is experiencing in the last decades. Within these solutions access is successfully provided by means of either a password or a set of biometric features. However, these access methods are not enough for the onboarding process in which the digital identity is created, requiring physical presence. An alternative consists in the automatic verification of physical ID documents. In the framework of a European project IMPULSE “Identity Management in PUbLic Services” (Grant Agreement no. 101004459) on the generation and use of digital identities in public services, we have designed an automatic document verification system, based on the combination of different cutting-edge technologies, robust against attempts of deception and forgery and transparent for the end user. It will allow for an easy onboarding from a smartphone.
Our pipeline receives both an image of the ID document (e.g., passport, national ID document, etc.) used to prove identity and a series of personal data provided by the user and retrieved via a form. We use a variety of Digital Image Processing methods to treat the received image. In this regard, we implement algorithms to detect whether the image is excessively blurry or dark, and to crop the image by matching the Scale-Invariant Feature Transforms (SIFT) of the target image to a document model. On the processed image we apply state-of-the-art Optical Character Recognition (OCR) methods, based on Long Short-Term Memory (LSTM) neural networks with a twofold objective, both to recognise the text present within the document fields and to obtain the bounding boxes of the characters that form it.
The document validator must assess two aspects. First, the user sending the information must be the same person whose information appears in the photographed ID document. Second, the image cannot correspond to a forged document. The first assessment is performed by calculating a dissimilarity measure, the Levenshtein distance, between the information fields introduced in the form and the OCR-recognised text. Shall this distance remain below a certain threshold, it is understood that the ID document truly belongs to the user. ID document forgery, on the other hand, is complicated to detect due to the lack of training data stemming from privacy concerns, being particularly difficult to obtain examples of tampered documents – therefore any possibility of using supervised binary classification algorithms is discarded. Our detector uses the SIFT features of the image to detect portions that have been copied and moved into other locations. Moreover, a set of character features are built from the OCR-recognised bounding boxes and fed into a one-class Support Vector Machine (SVM) classificator, trained only on genuine documents.
This development is part of a novel blockchain and artifical intelligence-based eID concept, meant to be useful to a wide amount of public service areas, and aiming to solve the inefficiencies derived from the current eID data management model implemented by governments. On this subject, six pilot case studies will be conducted in five European countries (Spain, Italy, Bulgaria, Iceland and Denmark), to aid processes such as issuing complaints, e-governance and legal identities for persons of business.
In summary, our detection system for fake or forged documents will lead to an easy onboarding solution that will remove the need of physical displacements. At the same time, it is robust in the background and heavily resilient against attempts of citizens with suspicious intentions to sign up using fake or false IDs. In addition, it will find practical use in different public service areas, being user friendly, simple and transparent.