Custom Connectors - My experience creating a FLOW ...

cindyc · ‎05-03-2021

Hey Everyone,

I just wanted to share what my experience was when I created a custom connector for a FLOW I worked on.

FLOW Description

The FLOW we needed was an electronic document creation system. It required taking a document from a scanner, which was emailed via outlook to a particular mailbox. One the email arrived, we take the document, and send it to a Custom Connector which is connected to a REST API on a remote machine. Once the document is uploaded to the REST API, parse out some data from the document, rename the document, and upload the document into Sharepoint (TEAMS), with a row entered into a list with a link to the document, and the document saved in a folder.

Custom Connector (assuming your account has the access/authority to create a CC)

I won't get into the details of the FLOW so much, but I'll focus on the Custom Connector (CC). Creating the CC was very straightforward. Within FLOW editor, on the left side of the screen, click "Data", "Custom Connectors".

On the next screen, you click "+ New Custom Connector", "Create from blank".

You enter the name for the CC, and then proceed, on the next 4 screens (1. General, 2. Security, 3. Definition, 4. Test) in CC creation, entering in the host, protocol, action (for uploading a file, I'm using POST), request definition (multipart/form-data) and endpoint as well as security key was as easy as filling in the data on the CC creation screens.

Testing on the 4th screen however is another topic for another day, but I was NEVER able to test successfully from the 4th CC screen, its useless.

* Lesson Learned - Create a new CC by importing an OpenAPI file

One thing you could do when you want to create a copy or clone, is that inside the CC screen, which lists all of your CCs, each CC has a download button;

When you click the download button, it creates a .swagger.json file which is also known as an OpenAPI file. What you can do is download a .swagger.json file from an existing CC, and then open it in Notepad or some other app to edit text file, change the name, the endpoint, etc and then create a new one by importing the .swagger.json (OpenAPI file).

Connections

So what is a connection? In order to get FLOW to connect to your REST API endpoint over a CC, you need to create a "Connection".

A connection is simply a place to store (in my case because I use the API key) the API Key for security. I normally create 2, one for test/development and another for production, this way I can switch out when I need to. Switching out from production to test/dev is not actually this easy - you would need to create a clone or copy of your CC, because most likely your production REST API endpoint will be different from your test/development (see above for how to create a new CC by importing an OpenAPI file), and then create a copy/clone of your FLOW, and put in the test/dev CC step.

You can create a connection in one of 3 places.

1. On the 4th screen of the CC Creation process

2. On the "Data", "Connections" screen ("+ New Connection")

3. Inside the FLOW Editor ("My Connections", "+Add new connection")

Incorporating the CC into your flow

So now you have your FLOW, at least started anyways, and now you have your custom connector, with a connection, that will hit your REST API. Now add the CC to your FLOW by clicking the + button to "Insert a new action", and choose "Custom" to select your custom CC (the pic below actually has 4 CCs, you would choose whichever one you need);

* Lesson Learned - OFF TOPIC - How to structure the REQUEST data to upload a doc to a REST API from a FLOW/CC

I'm including this just to show you guys how to structure the REQUEST within the CC inside your flow for uploading a document.

There are other places online you can look it up, but there are really 2 parts to this;

1. BASE64 ENCODE THE DOCUMENT DATA - in my case, the document type we are dealing with is pdf.

To get the documents or really, attachments, first we need to setup a loop, so we can loop thru all of the attachments (See "Attachments" below in the pic inside "select an output from the previous steps" - this just means get the list of attachments from the email and loop over it) in the email.

What I need to do in my flow is, take each pdf attachment from the incoming email trigger (trigger is the first action of the FLOW), and upload it via the CC to my REST API. So I create a "Compose" step before the CC step, and load it with the "AttachmentsContent";

But the above pic is just to get and show the partial text i need for the pdf attachment contents, and if you hover over it, the "Compose" variable value looks like;

items('Loop_thru_attachments')?['contentBytes']

But this isn't all we need, like I said its just the partial text we need for this "Compose" element. What we really need to do is to take that partial text and wrap it in base64 encoding and enter it into the "Expression" editor which pops up every time you click inside a text field like below. We do this because we need to;

"...encode binary data into an ASCII character set known to pretty much every computer system, in order to transmit the data without loss or modification of the contents itself." What is the real purpose of Base64 encoding? - Stack Overflow -giorgio, Wolverine

Now Hold the blue "Ok" button under "Expression" for 2 seconds, and it will update the text field with the new base64 value.

which essentially looks like;

base64(items('Loop_thru_attachments')?['contentBytes'])

This is a super crucial pre-step for the actual CC step. From my experience, a CC to upload a file, WILL NOT work unless you base64 encode it.

*Note: base64 decoding

Just as a reminder, because you are base64 encoding the binary file data in your FLOW prior to making the CC call, you'll need to base64 decode the file in somewhere in your REST API's endpoint function before you can do anything with the file (Early on in my endpoint function), I do;

Flask/python makes this easy, and I'm certain it will be just as easy in other languages/frameworks.

2. STRUCTURE THE REQUEST WITH multipart/form-data - now inside the CC step in our FLOW, we need to do pretty much the same thing when we created the CC on screen # 3. Definition, although here we are using variables for the values, whereas when we created the CC, we just typed in text as placeholders.

The first variable inside "$mulitipart-item 1" is just for the AttachmentName. The second is the output from the "Compose" step where we base64 encode the contentBytes of the pdf attachment.

Test it Out

Now we have our FLOW created along with our CC and inside it's REQUEST, we are sending the base64 encoded stream of our pdf file contentBytes. The first few times I tested this out, it worked great. Each attachment that was sent to the REST API via my CC contained only 1 page, hence 1 pdf file. But then I realized that part of the requirement was that users can scan multiple docs at a time within a stack of paper, and those pieces of paper in turn will all get combined into one pdf file attachment, and I would need to split the pdf doc into individual docs, and then upload each one to Sharepoint. Testing one page pdfs was fine, but these multi-page pdfs started taking a long time, and they were starting to fail, and I had no idea why.

Why was my CC (Custom Connector) failing?

One of the reasons was that the duration for the REQUESTS spawned from my CC could take varying amounts of time based on how many pages there were. Inside my REST API I split the pdf, and for each page, convert each one to an image, convert the images to text, parse the text and look for specific fields, etc, etc, etc. Some took 10 seconds, while others took several minutes. So I looked in the "Settings" of the CC inside my FLOW, and saw a timeout setting. I thought great, I'll just set this timeout to like 10 or 15 minutes or some amount that I know will never happen, and it will be fine! Ummmmm..... no.

CC Settings Inside Flow

Lets take a look at the "Settings" for our CC in the flow;

Asynchronous Pattern

Setting this to "On", means that the CC will handle specific return values. If 202 "Accepted" is returned, then the CC will continue to call the REST API until it gets success 200 or anything else (Failures of 4xx, 5xx, etc).

Timeout

This one can be confusing - here is the description;

"Limit the maximum duration an asynchronous pattern may take. Note: this does not alter the request timeout of a single request."

A single request timeout is 120 seconds or 2 minutes. And the above text is saying that no matter what you set this timeout value, and no matter how many calls are made async, it will not allow you to set it beyond the limit of a single request, which is again, 2 minutes. And this is exactly what I saw when testing. I would set it to 15 minutes and if a REST API call took 3 minutes, it would always fail with a timeout failure after 2 minutes. And even worse, if a RESPONSE came back with a status code of 4xx, 5xx, etc, the whole FLOW would then go into a fail state, and you wouldn't have the opportunity to get the message or anything inside the response and then try to handle it based on what's in there.

How do we handle long running CC REQUESTS?

You can try to handle this by using this built-in Async handling in your CC settings, if you'd like. I ended up not doing that.

For my process, I needed a little more control over what comes back inside the RESPONSE, as in I wanted to be able to read the error message or success message from the RESPONSE so I can put that in an email and send it to support or mgt. The built-in Async handling did not seem as though it would let me do this, if a failure came back in the RESPONSE, the FLOW would just fail, and not give me chance to look inside the RESPONSE. Keep in mind, you can keep the Asynchronous Pattern setting turned on for this CC as well as the Timeout setting, but in effect they won't matter because the 202 "Accepted" status comes back right away when we put the task on the queue inside flask.

To handle this the way I wanted, I had to do a couple few things, and if you don't care about my REST API specs, and what I had to change there, you don't need to read this section.

Change REST API endpoint to be asynchronous

Within my python flask REST API, I had to change how the endpoint function was structured;

1. Setup REDIS Queue - I had to install REDIS queue software on my dev and production machines.

2. Install and Configure Celery task queue manager - python has a task queue manager called celery which is a library that connects to and manages the REDIS (an many other) task queue(s).

3. Add a Status Check GET Request - I had to add another REST API endpoint to act as a task status check, that passes the task id, and asks the queue what the status of that task is. The Response json from both the CC that uploads the document and the CC that checks the status will be built inside flask and looks like when we put the task on the queue and its still pending and being worked on;

{

200,

{ "status": 202,

"status_desc":"Accepted",

"msg": "",

"Location":"73982hr-w9eu02ieu-39u32-93u3h"

}

And for our success...

{

200,

{ "status": 200,

"status_desc":"success",

"msg": "2 Files were uploaded to sharepoint",

"Location":"73982hr-w9eu02ieu-39u32-93u3h"

}

And for our failures...

{

200,

{ "status": 5xx,

"status_desc":"error",

"msg": "An exception occurred trying to parse document text.",

"Location":"73982hr-w9eu02ieu-39u32-93u3h"

}

Notice the HTTP Response status is always 200. We set that to 200 for ALL Responses we send back from the REST API, again, so we can control the FLOW and do what we want to do next, by reading the inner status, and then responding in the flow accordingly. We want all responses to come back, and to read the data from the response, FLOW will not allow you to control it very well if you send back something like;

{

5xx,

{ "status": 5xx,

"status_desc":"error",

"msg": "An exception occurred trying to parse document text.",

"Location":"73982hr-w9eu02ieu-39u32-93u3h"

}

And sometimes it will just stop on the previous step or several steps above it and not allow you to see what is going on inside each step of the FLOW to debug it.

Also, the "Location" key in the response is really just the task_id from the task queue - we pull this value out of the RESPONSE to call the Status Check CC (see below). In the built-in Async process its expected that it will contain 202 Accepted and a "Location" param on PENDING tasks - they suggest that the "Location" param should contain the full endpoint you need to call to get the status. However, even though I kept the name of the param as "Location", I just ended up putting in the task_id ONLY, not the full endpoint.

Add a New CC (Custom Connector) for the Task Status Check

I also had to add a new CC;

1. Create a new CC - I created a new CC to call the REST API endpoint that will return the status of the task in the queue.

Creating the new CC was easier, its just setting it to GET on the # 3. Definition screen, and setting the endpoint with a parameter (localhost:5000 is just an example dev host);

http://localhost:5000/st/api/files/pdf/convert_to_text/{task_id}

And when we call it from the FLOW and enter the task_id in there, the URI endpoint would look like (for example);

http://localhost:5000/st/api/files/pdf/convert_to_text/e78b2751-57ef-471c-a8b2-95dfb41f8d40

CC Status Check Settings inside the FLOW

Within the Status Check CC, we had to change the settings a little bit;

1. Turn Asynchronous Pattern off - for this CC inside the "Do Until Loop", we don't want to be effected by the timeout or the built-in behavior of Async Pattern here. We want put this CC inside a "Do Until Loop", and then stop the loop when the status changes from 202 "Accepted" to 200 "success" (or failure);

As you can see as well, I set the Limit count to 20 and the timeout to 20 minutes, we should never eclipse either of those. We will break out of the loop on either success or failure from the RESPONSE. We set the FLOW variable called variables('CONTINUE STATUS CHECK') to either True to keep checking, or False to break out the loop based on the inner status code in the RESPONSE.

2. Do not set a Timeout value on CC - for this CC we will not set a Timeout value, we will rely on the "Do Until Loop" timeout limit we set.

2. Do not set a Retry Policy - for this CC we will not set a Retry Policy value, we will set it to None - this can just get confusing and mess up the FLOW and the timing of the CC REQUESTs - of something fails inside the critical functions inside flask, we already baked in retry logic in python, no real need for it here.

Paralell Processing tip - Previously in flask I would run a clean up function to delete the temp files i had created in a folder. But I had to set the initial Incoming Email Trigger to only allow 1 FLOW to be run at 1 time. This was because if I allowed multiple FLOWS to be run, and the cleanup function was run, it might delete files that were needed later. So I just named any files created with an additional hex key in flask during a FLOW call to the REST API from the CC, and then at the end of the REST API call, I delete any file with that key so it doesn't affect other FLOW runs. By doing that, I can now set the Concurrency Control value to 2, which is the number of cpu's i currently have on the machine. I set the uwsgi # of processes to 2 as well, and now have 2 celery workers, so running 2 FLOWS concurrently works fine. I may ramp it up to 3 if this continues to perform well.

Conclusion

Creating my own version of an Async CC for me was definitely a learning curve. I had never worked with REDIS or Celery before, and quite frankly after more research, there is a redis python lib out there you can use instead of celery, which I think is a little more straightforward. Once I put this into production, there were a configuration element in Celery that broke an API call I was making later in the process. Once I figured out which config value it should be (--pool=solo) , it was fine. Again, I was not able to get the built-in Async feature of the CC in the FLOW to do what I wanted, but at least with the above (convoluted) process, I was able to control the FLOW a bit more and set the timeout to whatever I want now (within the limitations of the Do Until Loop) for the Check Status REST API call and send notifications for failure, etc. I hope providing insight into the the things that tripped me up can help someone.

SameerCh · ‎07-02-2021

Excellent post! Thanks for sharing your experience and feedback. I am sure there are many things we can do to improve the overall experience - perhaps more of a guided topic with explanations rather than a series of do this, do that.

murshed · ‎07-02-2021

Thanks for sharing this outstanding article. I would encourage you to try out the import from github option and the paconn cli, which is designed to accelerate the custom connector development cycle. https://docs.microsoft.com/en-us/connectors/custom-connectors/paconn-cli

Custom Connectors - My experience creating a FLOW with a custom connector, and what you should know

Helpful resources

Community Roundup: A Look Back at Our Last 10 Tuesday Tips

Calling all User Group Leaders and Super Users! Mark Your Calendars for the next Community Ambassador Call on May 9th!

April 2024 Community Newsletter

Tuesday Tip | Update Your Community Profile Today!

Hear what's next for the Power Up Program

Super User of the Month | Ahmed Salih