Extract text from Objects
Many organizations have schematic or industrial diagrams that contain standard objects that represent a component or industrial part. Often these components have text inside of them that provide information end users need without going back to the person who drafted the diagram. By using AI Builder in the Power Platform and Forms Recognizer in Cognitive Services, a Power Automate workflow can be configured to automatically use a trained model to extract text from an image so it can be indexed and retrieved in SharePoint.
Standard symbols
Within any industry, much like a spoken language, standard objects or symbols are adopted to convey meaning and have a common set of images to create solutions from. Typical examples might be a schematic diagram or industrial components.
These objects can contain text that further defines the specifications for that object, however when a user wants to search for a specific object and retrieve the text contained within it, that’s where AI technologies can be configured.
Schematic diagrams are a great use case on the varying types of objects that contain useful information.
Industrial diagrams also contain standard objects that when indexed can be easily searched.
Power Automate
The core of solution relies in Power Automate. Activities can be configured to process the image and extract the text from the object.
It all starts in a SharePoint document library that has been configured with the metadata you want to extract.
To accomplish there are 2 main steps that need to be configured in Power Automate:
Object Detection
The first step uses AI Builder in Power Automate to train a model that will recognize the specific object we want.
AI builder steps to deploy a model are:
To begin building your model:
Tagging
AI Builder allows you to create tags that you then map to an object during the model training phase.
Create as many tags as necessary that can be assigned to objects in the image.
AI Builder requires a minimum of 15 images to train a model. The more images you have the better the accuracy of the model and higher confidence ratings.
For each image that is uploaded, you now need to identify the area that that contains the object you want to train. Once this area is selected, choose the correct tag in the context menu.
Continue to do this for all objects on the image that you want to identify. It doesn’t have to be pixel perfect however knowing the boundaries of the object will provide better text extraction later.
Once you have tagged all 15 documents, you can publish the model, which will make it available in Power Automate using the “Detect and count objects in images” activity.
The results of this activity is a .json file that we’ll be used later for further processing.
Form Recognizer in Cognitive Services
Now that we have a model that will recognize and tag objects in our image, we need another process that will OCR the entire image and return a .json file with the OCRed text and its pixel location. For this we turn to Azure.
Now that we have things configured in Azure, we can re-focus on Power Automate. To use the Forms Recognizer service in Azure, it’s a 2 step process.
The result of this processing will be a .JSON file that we will use for further processing.
Add a new HTTP activity to Power Automate and configure it as follows.
Store the RequestID that is returned form this call in a Data Operation
I put in a 10 second delay to give the Forms Processing service time to retrieve my results.
Configure another HTTP activity to get the results.
Putting it all together
Now that we have the results from AI Builder and the pixels coordinates for the objects we want to target, we can now take the pixel coordinates from the Forms Recognizer service and see if they intersect with the object we tagged. If it does, we have a match and we can extract the text from the object. If it doesn’t, then we know there is no text in the object.
To analyze the two .json files that were produced, an Azure function will be called, passing in both .json files. The Azure Function will then determine if the pixel coordinates overlap and return .json results containing the text it found.
The results from the Azure function is a json response that can be used to update the SharePoint document library.
Store the string output in data operation.
Update the SharePoint list with the text from the json response.
Search Experience
Once the text that was extracted from the image and updated the metadata in SharePoint, the user will be able to search for the text and view the image in the results.
Resources
Download the solution artifacts from GitHub.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.