cancel
Showing results for 
Search instead for 
Did you mean: 

How to convert Audio from Microphone Control of Power Apps to Text using a Power Automate Solution

Recently I wrote a nice blog here on how to Convert the Audio recorded from Microphone Control of Power Apps to Text by configuring a Power Automate solution which consumes Azure Cognitive Services.

 

Since I had to work with a number of components like Power Apps, Power Automate, Azure function, FFMpeg codec and Azure cognitive services to create this solution, I had to divide my blog into 3 parts.

  1. Design a Canvas App with The Microphone Control to capture Audio.
  2. Create an Azure Function to convert audio captured in Power Apps from WEBM to WAV format using FFmpe...
  3. Create a Power Automate (Flow) to create an HTML file, using the text obtained from the output of th...

The main focus of this blog however is to understand how to design and configure a Power Automate solution to bring about conversion of Speech(Audio in Microphone) to Text with additional capabilities like creating an HTML file with the text obtained post conversion.


In addition to this we are going to add more power to the Power Automate solution by also -

  1. Creating a Speech file in SharePoint which has the audio recorded in Microphone.
  2. Creating HTML file with the text obtained from output of Cognitive services post conversion.
  3. Converting the HTML file to PDF (most widely used document format for business processes).

 

Issue-

  • Whenever we record the audio in the Microphone control of Power Apps it always gets recorded in the WEBM format.
  • When we try to pass this audio recorded in WEBM format to Azure cognitive services so as to get it converted from Speech to Text we get an error as Unsupported File Format.
  • This is because, the Azure cognitive service only recognizes audio which is either in WAV or OGG formats. WEBM is not a supported format for Azure cognitive services.

Solution-

  • We’re going to use FFmpeg to convert the Microphone Audio in WEBM format to an audio file in WAV format, so we can pass that file to The Azure Speech to Text Cognitive Services.
  • Simply put,  we’re going to make use of an Azure function to build a simple API, which will do the work of converting a WEBM file to a WAV file for us . This API will be making use of FFmpeg to do the actual conversion itself.
  • FFmpeg is basically an Audio and Video format converter.

Prerequisites-

Before you begin, please make sure the following prerequisites are in place:

 

Now that we have all the prerequisites ready let's starting designing our Power Automate (Flow).

 

Step 1 - Trigger

  • Create a new flow and select trigger as Power Apps.

PowerAppsTrigger.PNG

 

Step 2 - Add a compose action

  • Add a compose action and in the "Inputs" field select from Dynamic content "Ask in PowerApps".

Compose.PNG

 

Step 3 - Add a Parse JSON action

  • Add a Parse JSON action and in the "Content" field add from Dynamic content "Outputs" of the Compose action created above.
  • Click on "Generate from Sample" and add the following piece of code to generate the schema.

 

 

 

{
    "type": "object",
    "properties": {
        "Url": {
            "type": "string"
        }
    }
}

 

 

 

 

ParseJSON.PNG

 

Step 4 - Add Compose action

  • Add another compose action
  • From the Expression select the method "dataURiToBinary".
  • Keeping this value intact now select the Dynamic content and select the "URl" property.
  • In the expression you will now see and expression as "dataUriToBinarybody('Parse_JSON')?['properties']?['Url']"
  • Make sure to apply round brackets to the body parameter to make the expression syntactically correct as follows-

 

 

 

dataUriToBinarybody(('Parse_JSON')?['properties']?['Url'])

 

 

 

 

dataURIToBinary.PNG

 

datauritobinary2.PNG

 

datauritobinary3.PNG

 

  • Click on Ok and you will now see the expression getting configured as below.

Compose2.PNG

 

Step 5 - Create file in SharePoint action

  • Add a "Create file in SharePoint" action.
  • Select the "Site Address" and "Folder Path" where you intend to save the audio file.
  • Give the File a meaningful name and do not forget to save the extension with ".wav" format.
  • In the "File content" pass the "Outputs" of "Compose2" action configured earlier above.

Create file in SP.PNG

Important note-

  • Your speech file is now successfully created in SharePoint that holds the audio recorded in the Power Apps.
  • The next steps will be calling the API that configured using the Azure function that will convert the WEBM audio format to WAV format.
  • To get details on how to configure the Azure function check out the blog  here.

Step 6 - HTTP Post action

  • Add a HTTP request action with method as "POST".
  • The URI is the function URL that you should get in the Azure portal where you have configured your Azure function.
  • In the "Body" field, add the same "DataURItoBinary" expression that we entered in the "Compose2" action.

AzureFunc.PNG

 

Step 7 - HTTP Post action

  • Add a HTTP request action with method as "POST".
  • Use the same URI as mentioned in the screenshot below where you would just need to make a small change in the region as - https://<region in which cognitive service is hosted>.stt.speech.microsoft.com/sp.....
  • In my case the region was "WestEurope".
  • You will need to pass "Ocp-Apim-Subscription-Key" which you should get when you created a Speech services in Azure portal and "Content-type" as "audio/wav".
  • In the "Body" field pass the response "Body" inside the Dynamic content obtained from HTTP request configured earlier.

COgService.PNG

 

Step 8 - Add a Parse JSON action

  • Next we need to parse the response obtained from The Cognitive Services API in order to extract the Text .
  • Select "Body" from the dynamic content and include it inside the "Content"
  • For generating a schema, please use the payload as shown below:

 

 

 

{
    "type": "object",
    "properties": {
        "NBest": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "Confidence": {
                        "type": "number"
                    },
                    "Lexical": {
                        "type": "string"
                    },
                    "ITN": {
                        "type": "string"
                    },
                    "MaskedITN": {
                        "type": "string"
                    },
                    "Display": {
                        "type": "string"
                    }
                },
                "required": [
                    "Confidence",
                    "Lexical",
                    "ITN",
                    "MaskedITN",
                    "Display"
                ]
            }
        }
    }
}

 

 

 

 

Parse3.png

 

Step 9 - Create PDF using the text-

  • Before we go ahead and add a ‘Convert HTML to PDF’ action to grab the extracted text and convert it to a PDF file using The Muhimbi Converter, I want to focus your attention to the output obtained from The Cognitive Services API action.
  • As you can see, the Lexical parameter is preserving our Speech to text output, so we need to go ahead and pass the Lexical parameter as the Source for generating a PDF file.
  • Go ahead and add the ‘Convert HTML to PDF’ action to the Flow. That’s right no need to add an ‘Apply to each’ action, as it will be added automatically.
  • The reason the ‘Apply to each’ action gets added, is because The Cognitive Services API is exposing a lot of parameters in the response, each of which holds data in a specific format like Lexical,ITN,MaskedITN etc..
  • We will be needing the Lexical parameter and hence we will pass Lexical from the dynamic content as shown below.
  • If you have doubts over how The Muhimbi Converter’s ‘Convert HTML to PDF’ action works, please check here.
 

Output.png

 

Finale.png

 

That's it, you are now ready to test your Power Automate (Flow) solution.

 

General Structure of the Power Automate solution -

Outer.PNG

 

Final outputs-

  1. Speech file created in SharePoint
  2. PDF file with Text output converted from Speech

out.PNG

 

11.png

 

 

 

 

 

Comments

Nice article @yashkamdar. It will be really helpful for power users and easy to understand with the detailed steps given. 

Hey, very useful blog @yashkamdar 

Meet Our Blog Authors
  • Experienced Consultant with a demonstrated history of working in the information technology and services industry. Skilled in Office 365, Azure, SharePoint Online, PowerShell, Nintex, K2, SharePoint Designer workflow automation, PowerApps, Microsoft Flow, PowerShell, Active Directory, Operating Systems, Networking, and JavaScript. Strong consulting professional with a Bachelor of Engineering (B.E.) focused in Information Technology from Mumbai University.
  • I am a Microsoft Business Applications MVP and a Senior Manager at EY. I am a technology enthusiast and problem solver. I work/speak/blog/Vlog on Microsoft technology, including Office 365, Power Apps, Power Automate, SharePoint, and Teams Etc. I am helping global clients on Power Platform adoption and empowering them with Power Platform possibilities, capabilities, and easiness. I am a leader of the Houston Power Platform User Group and Power Automate community superuser. I love traveling , exploring new places, and meeting people from different cultures.
  • SharePoint, Microsoft 365 and Power Platform Developer | Contributor on SharePoint StackExchange
  • Encodian Owner / Founder - Ex Microsoft Consulting Services - Architect / Developer - 20 years in SharePoint - PowerPlatform Fan
  • I am the Owner/Principal Architect at Don't Pa..Panic Consulting. I've been working in the information technology industry for over 30 years, and have played key roles in several enterprise SharePoint architectural design review, Intranet deployment, application development, and migration projects. I've been a Microsoft Most Valuable Professional (MVP) 12 consecutive years and am also a Microsoft Certified SharePoint Masters (MCSM) since 2013.
  • Big fan of Power Platform technologies and implemented many solutions.
  • Passionate #Programmer #SharePoint #SPFx #Office365 #MSFlow | C-sharpCorner MVP | SharePoint StackOverflow, Github, PnP contributor
  • Web site – https://kamdaryash.wordpress.com Youtube channel - https://www.youtube.com/channel/UCM149rFkLNgerSvgDVeYTZQ/