cancel
Showing results for 
Search instead for 
Did you mean: 
Jay-Encodian

Guide: Auto tagging documents in SharePoint using Microsoft Cognitive Services/Text Analytics

Hey Flow Fans,

As a long-time SharePoint and information management architect (20 years!)… I've worked on many projects in many different verticals and held many debates with records managers, information managers, end users, legal, etc. about metadata and how much metadata should be associated with a document within SharePoint.

Metadata is a fundamental building block for structured document management, enabling management reporting, document classification, retention schedules, disposition execution, and so forth… so let's create lots of metadata! However... there aren't too many users who are happy or can see the value in completing lots of metadata fields and worse still there are lots of users who will click any old options just to get a document 'correctly' loaded. This typically has a huge detrimental impact on the management of an information repository, typically introducing significant regulatory risk.

Considering the desire to have both rich metadata and a usable solution, I have been asked many, many, many times... can we automate metadata selection based on the content of the file?

Yes! Better still this can be achieved quickly and simply with Microsoft Flow and a few extra components... in this post I'll outline how you can utilise Microsoft Cognitive Services and the Text Analytics API to perform key phrase extraction which we can use later to tag a document within SharePoint.

Flow Creation - Video Guide

 

Flow Creation - Test Guide

Please note: You will require an Azure subscription and a cognitive services account to utilise the Flow Text Analytics connector, you can create a free account here.

1. Create a new Flow from a blank template

2.png

2. Add the ‘When a file is created or modified (Properties Only)’ SharePoint trigger and configure to point to the library / folder where the Flow should be triggered from.

2 (1).png

3. Add an 'Initialise variable' action

3a. Name: Set to 'KeyPhrases'

3b. Type: Select 'String'

33.png

NOTE: This flow will be triggered by either a new document being added or an existing document being updated, this Flow will then update the exact same document again. This will cause an infinite loop (a recursive event)... to protect against this we recommend using a service account identify for the SharePoint connection, this will ensure any updates to the document are made to the document by the Flow are executed by the same identity. We will then add a condition to the Flow to check for and ignore any Flow's which have been triggered by an update to the document made by the service account identity.

4. Add a 'Condition' action

4a. Click 'Choose a value', insert the 'Modified By Email' parameter from the 'When a file is created or modified (properties only)' trigger

mceclip3.png

4b. Set the operator to "Is not equal to"

mceclip1.png

4c. Set the value to the email address of the SharePoint connection's identity

mceclip4.png

4d. If you are unsure of the identity or wish to create a new connection, go to 'Settings > Connections'

mceclip2.png

5. Add a 'Get File Content' SharePoint action, inside the 'Yes' channel

5a. Site Address: Set as per the 'Site Address' value of step #2.

5b. File Identifier: Insert the 'Identifier' parameter from the 'When a file is created or modified (properties only)' action result

3.png

6. Add an Encodian 'Convert to PDF' action

6a. File Content: Insert the 'File Content' parameter from the 'Get file content' action result

8.png

6b. PDF Filename: Insert the 'File name with extension' parameter from the 'When a file is created or modified (properties only)' action result

9.png

Note: The Encodian 'Convert to PDF' action will automatically check the 'PDF Filename' value and change the file extension provided to '.pdf' if required.

6c. Filename: Insert the 'File name with extension' parameter from the 'When a file is created or modified (properties only)' action result

10.png

7. Add an Encodian 'Get PDF Text Layer' action

7a. Filename: Insert the 'Filename' parameter from the 'Convert to PDF' action result

28.png

7b. File Content: Insert the 'File Content' parameter from the 'Convert to PDF' action result

29.png

8.Checkpoint: Your new Flow should look similar to the following:

mceclip5.png

9. Add a Text Analytics 'Key Phrases' action

NOTE: If you have not already created a connection you will be prompted to create a new Text Analytics connection utilising acognitive services account hosted within an Azure subscription, you can create a free account here.

If you need to create a new connection please follow these additional steps:

9a. Connection Name: Enter a name for your connection

9b. Account Key: Enter the key obtained from your Cognitive Services account

9c. Site Url: Enter the endpoint obtained from your Cognitive Services account

9d. Click 'Create'

25.png

Once your connection is created or if your connection was previouslycreated, follow these steps:

9e. Text: Insert the 'Text Layer' parameter from the 'Get PDF Text Layer' action result

30.png

10. Add a 'Append to string variable' action

10a. Name: Set to 'KeyPhrases'

10b. Type: Insert the 'keyPhrases - Item' parameter from the 'Key Phrases' action result

34.png

10c. This will dynamically insert an 'Apply to each' loop action

35.png

10d. To correctly format the results, remove the default value and add the following expression to the 'Value' parameter.

concat(items('Apply_to_each'), ', ')

mceclip3 (1).png

10e. Click 'OK'

mceclip4 (1).png

11. Add an 'Update File Properties' SharePoint action

11a. Site Address: Set as per the 'Site Address' value of step #2.

11b. Library Name: Set as per the 'Library Name' value of step #2.

11c. Id:Insert the 'ID' parameter from the 'When a file is created or modified (properties only)' action result

31.png

The next step is to utilise the data returned from the 'Text Analytics' action and write to a metadata field associated with the source item. We have added a 'Key Phrases' column to the library to store the data.

11d. Key Phrases: Insert the 'KeyPhrases' variable

mceclip7.png

11e. Check and update the SharePoint connection and ensure the service account identity is used, see step 4.

mceclip7 (1).png

12. Completed: Your flow should appear as follows

49.png

12. Test the flow

mceclip8.png

13. Validate the results

41.png

Please note: The 'Text Analytics' action is limited to process 5120 characters per request. It is likely that you will exceed this limit by sending an entire document, however the Encodian 'Get PDF Text Layer' action allows you to target specific pages which can help keep within this limit.

mceclip9.png

References:

Comments
About the Author
  • Experienced Consultant with a demonstrated history of working in the information technology and services industry. Skilled in Office 365, Azure, SharePoint Online, PowerShell, Nintex, K2, SharePoint Designer workflow automation, PowerApps, Microsoft Flow, PowerShell, Active Directory, Operating Systems, Networking, and JavaScript. Strong consulting professional with a Bachelor of Engineering (B.E.) focused in Information Technology from Mumbai University.
  • I am a Microsoft Business Applications MVP and a Senior Manager at EY. I am a technology enthusiast and problem solver. I work/speak/blog/Vlog on Microsoft technology, including Office 365, Power Apps, Power Automate, SharePoint, and Teams Etc. I am helping global clients on Power Platform adoption and empowering them with Power Platform possibilities, capabilities, and easiness. I am a leader of the Houston Power Platform User Group and Power Automate community superuser. I love traveling , exploring new places, and meeting people from different cultures.
  • Read more about me and my achievements at: https://ganeshsanapblogs.wordpress.com/about MCT | SharePoint, Microsoft 365 and Power Platform Consultant | Contributor on SharePoint StackExchange, MSFT Techcommunity
  • Encodian Owner / Founder - Ex Microsoft Consulting Services - Architect / Developer - 20 years in SharePoint - PowerPlatform Fan
  • Founder of SKILLFUL SARDINE, a company focused on productivity and the Power Platform. You can find me on LinkedIn: https://linkedin.com/in/manueltgomes and twitter http://twitter.com/manueltgomes. I also write at https://www.manueltgomes.com, so if you want some Power Automate, SharePoint or Power Apps content I'm your guy 🙂
  • I am the Owner/Principal Architect at Don't Pa..Panic Consulting. I've been working in the information technology industry for over 30 years, and have played key roles in several enterprise SharePoint architectural design review, Intranet deployment, application development, and migration projects. I've been a Microsoft Most Valuable Professional (MVP) 15 consecutive years and am also a Microsoft Certified SharePoint Masters (MCSM) since 2013.
  • Big fan of Power Platform technologies and implemented many solutions.
  • Passionate #Programmer #SharePoint #SPFx #M365 #Power Platform| Microsoft MVP | SharePoint StackOverflow, Github, PnP contributor
  • Web site – https://kamdaryash.wordpress.com Youtube channel - https://www.youtube.com/channel/UCM149rFkLNgerSvgDVeYTZQ/