Hi all!
I am trying AI Builder. I am trying to maximize the automation in the tasks my employees have to do, in order to justify the 500$/month license fee.
One of this is identifying tax-deductible medical expenses from the receipts of the pharmacies. the issue is that such receipts vary one from another in a consistent way, meaning that the form processing model cannot be used.
Given that what i need is essentially getting 3 infos, namely:
- the tax identification number of the customer, which normally follows the words "Codice fiscale" or the shortened version "CF"
- the amount of the deductible expenses, which normally follows the words "Totale detraibile"
- the year
i was wondering if i could get these informations through the entity extraction model.
The only issue is that the input is not in plain text from an email, but is included in a pdf.
Do you think it is possible to first extract the text and then "pipe" the output to the entity extraction model?
Thank you in advance!
Vittorio
Solved! Go to Solution.
Hi,
This model is pretrained and there is today no way to retrain it with custom logic.
The workarounds are
1. Train a custom Forms Processing model
2. Post process the model output splitting / escaping the unexpected text with a string functions. See https://docs.microsoft.com/en-us/azure/logic-apps/workflow-definition-language-functions-reference
Hi Nightfall,
Have you tried the new Receipts processing prebuilt capabilities?
https://flow.microsoft.com/en-us/blog/process-receipts-and-translate-text-with-ai-builder/
Please test with this capability and let us know.
An alternative option is to use the Text Recognition (OCR) model to extract all texts, then use some text search function (like IndexOf) to find the keyword you are looking for (E. g. Totale detraibile) and extract the text located just after (with substring).
Thanks
Thank you CedrickB!
I will give it a try, don't know how I could overlook the text recognition, which was actually the most logic solution here 🙂
hi,
i come back to the thread. With the implementation of the possibility to add collections of documents to the form processing function I see a much better (albeit not yet super-reliable) success score.
One issue i find is that often the receipt has the target amount that i want to extract written without any space (e.g. "Totale detraibile=30,50€")
In these cases the field that the buidler recognizes is the whole sentence, and even if I try to manually draw the field limiting it to the number, it automatically takes the sentence again:
Is there a way to better draw the field, so that the solution to the issue will stay in this Model, without having to invoke the OICR capacity?
Thank you again in advance!
Vittorio
On a second thought, a workaround could be to take the field as identified by the Model, and then check if it includes the charachter "=".
But it would be interesting to know if a more direct solution exists.
Thanks again,
Vittorio
Hi,
This model is pretrained and there is today no way to retrain it with custom logic.
The workarounds are
1. Train a custom Forms Processing model
2. Post process the model output splitting / escaping the unexpected text with a string functions. See https://docs.microsoft.com/en-us/azure/logic-apps/workflow-definition-language-functions-reference