cancel
Showing results for 
Search instead for 
Did you mean: 

Guide: Auto tagging documents in SharePoint using Microsoft Cognitive Services/Text Analytics

Hey Flow Fans,

As a long-time SharePoint and information management architect (20 years!)… I've worked on many projects in many different verticals and held many debates with records managers, information managers, end users, legal, etc. about metadata and how much metadata should be associated with a document within SharePoint.

Metadata is a fundamental building block for structured document management, enabling management reporting, document classification, retention schedules, disposition execution, and so forth… so let's create lots of metadata! However... there aren't too many users who are happy or can see the value in completing lots of metadata fields and worse still there are lots of users who will click any old options just to get a document 'correctly' loaded. This typically has a huge detrimental impact on the management of an information repository, typically introducing significant regulatory risk.

Considering the desire to have both rich metadata and a usable solution, I have been asked many, many, many times... can we automate metadata selection based on the content of the file?

Yes! Better still this can be achieved quickly and simply with Microsoft Flow and a few extra components... in this post I'll outline how you can utilise Microsoft Cognitive Services and the Text Analytics API to perform key phrase extraction which we can use later to tag a document within SharePoint.

Flow Creation - Video Guide

 

Flow Creation - Test Guide

Please note: You will require an Azure subscription and a cognitive services account to utilise the Flow Text Analytics connector, you can create a free account here.

1. Create a new Flow from a blank template

2.png

2. Add the ‘When a file is created or modified (Properties Only)’ SharePoint trigger and configure to point to the library / folder where the Flow should be triggered from.

2 (1).png

3. Add an 'Initialise variable' action

3a. Name: Set to 'KeyPhrases'

3b. Type: Select 'String'

33.png

NOTE: This flow will be triggered by either a new document being added or an existing document being updated, this Flow will then update the exact same document again. This will cause an infinite loop (a recursive event)... to protect against this we recommend using a service account identify for the SharePoint connection, this will ensure any updates to the document are made to the document by the Flow are executed by the same identity. We will then add a condition to the Flow to check for and ignore any Flow's which have been triggered by an update to the document made by the service account identity.

4. Add a 'Condition' action

4a. Click 'Choose a value', insert the 'Modified By Email' parameter from the 'When a file is created or modified (properties only)' trigger

mceclip3.png

4b. Set the operator to "Is not equal to"

mceclip1.png

4c. Set the value to the email address of the SharePoint connection's identity

mceclip4.png

4d. If you are unsure of the identity or wish to create a new connection, go to 'Settings > Connections'

mceclip2.png

5. Add a 'Get File Content' SharePoint action, inside the 'Yes' channel

5a. Site Address: Set as per the 'Site Address' value of step #2.

5b. File Identifier: Insert the 'Identifier' parameter from the 'When a file is created or modified (properties only)' action result

3.png

6. Add an Encodian 'Convert to PDF' action

6a. File Content: Insert the 'File Content' parameter from the 'Get file content' action result

8.png

6b. PDF Filename: Insert the 'File name with extension' parameter from the 'When a file is created or modified (properties only)' action result

9.png

Note: The Encodian 'Convert to PDF' action will automatically check the 'PDF Filename' value and change the file extension provided to '.pdf' if required.

6c. Filename: Insert the 'File name with extension' parameter from the 'When a file is created or modified (properties only)' action result

10.png

7. Add an Encodian 'Get PDF Text Layer' action

7a. Filename: Insert the 'Filename' parameter from the 'Convert to PDF' action result

28.png

7b. File Content: Insert the 'File Content' parameter from the 'Convert to PDF' action result

29.png

8.Checkpoint: Your new Flow should look similar to the following:

mceclip5.png

9. Add a Text Analytics 'Key Phrases' action

NOTE: If you have not already created a connection you will be prompted to create a new Text Analytics connection utilising acognitive services account hosted within an Azure subscription, you can create a free account here.

If you need to create a new connection please follow these additional steps:

9a. Connection Name: Enter a name for your connection

9b. Account Key: Enter the key obtained from your Cognitive Services account

9c. Site Url: Enter the endpoint obtained from your Cognitive Services account

9d. Click 'Create'

25.png

Once your connection is created or if your connection was previouslycreated, follow these steps:

9e. Text: Insert the 'Text Layer' parameter from the 'Get PDF Text Layer' action result

30.png

10. Add a 'Append to string variable' action

10a. Name: Set to 'KeyPhrases'

10b. Type: Insert the 'keyPhrases - Item' parameter from the 'Key Phrases' action result

34.png

10c. This will dynamically insert an 'Apply to each' loop action

35.png

10d. To correctly format the results, remove the default value and add the following expression to the 'Value' parameter.

concat(items('Apply_to_each'), ', ')

mceclip3 (1).png

10e. Click 'OK'

mceclip4 (1).png

11. Add an 'Update File Properties' SharePoint action

11a. Site Address: Set as per the 'Site Address' value of step #2.

11b. Library Name: Set as per the 'Library Name' value of step #2.

11c. Id:Insert the 'ID' parameter from the 'When a file is created or modified (properties only)' action result

31.png

The next step is to utilise the data returned from the 'Text Analytics' action and write to a metadata field associated with the source item. We have added a 'Key Phrases' column to the library to store the data.

11d. Key Phrases: Insert the 'KeyPhrases' variable

mceclip7.png

11e. Check and update the SharePoint connection and ensure the service account identity is used, see step 4.

mceclip7 (1).png

12. Completed: Your flow should appear as follows

49.png

12. Test the flow

mceclip8.png

13. Validate the results

41.png

Please note: The 'Text Analytics' action is limited to process 5120 characters per request. It is likely that you will exceed this limit by sending an entire document, however the Encodian 'Get PDF Text Layer' action allows you to target specific pages which can help keep within this limit.

mceclip9.png

References:

Meet Our Blog Authors
  • Experienced Consultant with a demonstrated history of working in the information technology and services industry. Skilled in Office 365, Azure, SharePoint Online, PowerShell, Nintex, K2, SharePoint Designer workflow automation, PowerApps, Microsoft Flow, PowerShell, Active Directory, Operating Systems, Networking, and JavaScript. Strong consulting professional with a Bachelor of Engineering (B.E.) focused in Information Technology from Mumbai University.
  • Encodian Owner / Founder - Ex Microsoft Consulting Services - Architect / Developer - 20 years in SharePoint - PowerPlatform Fan
  • Cambridge UK Power Platform User Group Leader, Technical evangelist and speaker. Always says yes to coffee! #LetsGetCoffee
  • Passionate #Programmer #SharePoint #SPFx #Office365 #MSFlow | C-sharpCorner MVP | SharePoint StackOverflow, Github, PnP contributor
  • I am building business processes and applications that are easy for users' to stick to, so they can follow and understand them. In overall I transform processes to be more reliable and effortless. I am a proud co-organizer of SharePoint Saturday Warsaw and active community member, blogger and international speaker.