Update & Create Excel Records 50-100x Faster
I was able to develop an Office Script to update rows and an Office Scripts to create rows from Power Automate array data. So instead of a flow creating a new action API call for each individual row update or creation, this flow can just send an array of new data and the Office Scripts will match up primary key values, update each row it finds, then create the rows it doesn't find.
And these Scripts do not require manually entering or changing any column names in the Script code.
• In testing for batches of 1000 updates or creates, it's doing ~2500 row updates or creates per minute, 50x faster than the standard Excel create row or update row actions at max 50 concurrency. And it accomplished all the creates or updates with less than 25 actions or only 2.5% of the standard 1000 action API calls.
• The Run Script code for processing data has 2 modes, the Mode 2 batch method that saves & updates a new instance of the table before posting batches of table ranges back to Excel & the Mode 1 row by row update calling on the Excel table.
The Mode 2 script batch processing method will activate for creates & updates on tables less than 1 million cells. It does encounter more errors with larger tables because it is loading & working with the entire table in memory.
Shoutout to Sudhi Ramamurthy for this great batch processing addition to the template!
Code Write-Up: https://docs.microsoft.com/en-us/office/dev/scripts/resources/samples/write-large-dataset
Video: https://youtu.be/BP9Kp0Ltj7U
The Mode 1 script row by row method will activate for Excel tables with more than 1 million cells. But it is still limited by batch file size so updates & creates on larger tables will need to run with smaller cloud flow batch sizes of less than 1000 in a Do until loop.
The Mode 1 row by row method is also used when the ForceMode1Processing field is set to Yes.
Be aware that some characters in column names, like \ / - _ . : ; ( ) & $ may cause errors when processing the data. Also backslashes \ in the data, which are usually used to escape characters in strings, may cause errors when processing the JSON.
Office Script Code
(Also included in a Compose action at the top of the template flow)
Batch Update Script Code: https://drive.google.com/file/d/1kfzd2NX9nr9K8hBcxy60ipryAN4koStw/view?usp=sharing
Batch Create Script Code: https://drive.google.com/file/d/13OeFdl7em8IkXsti45ZK9hqDGE420wE9/view?usp=sharing
You can download the Version 5 of this template attached to this post, copy the Office Script codes into an online Excel instance, & try it out for yourself.
-Open an online Excel workbook, go the the automate tab, select New Script, then copy & paste the Office Script code into the code editor. Do this for both the Batch Update and the Batch Create script code. You may want to name them BatchUpdateV6 & BatchCreateV5 appropriately.
-Once you get the template flow into your environment, follow the notes in the flow to change the settings to your datasources, data, & office scripts.
If you need just a batch update, then you can remove the batch create scope.
If you need just a batch create, then you can replace the Run script Batch update rows action with the Run script Batch create rows action, delete the update script action, and remove the remaining batch create scope below it. Then any update data sent to the 1st Select GenerateUpdateData action will just be created, it won't check for rows to update.
(ExcelBatchUpsertV5 is the core piece, ExcelBatchUpsertV5b includes a Do until loop set-up if you plan on updating and/or creating more than 1000 rows on large tables.)
Anyone facing issues with the standard zip file import package method can check this post for an alternative method of importing the flow: Re: Excel Batch Create, Update, and Upsert - Page 33 - Power Platform Community (microsoft.com)
Or this one: https://powerusers.microsoft.com/t5/Power-Automate-Cookbook/Excel-Batch-Create-Update-and-Upsert/m-p...
Thanks for any feedback, & please subscribe to my YouTube channel (https://youtube.com/@tylerkolota?si=uEGKko1U8D29CJ86)
watch?v=HiEU34Ix5gA
Hello,
I'm running into an issue using the update script --- if I am updating a table with data that isn't fully inclusive (ie - filtered out data that doesn't need to be updated) the script is changing the data not being updated to a blank value (although other columns associated with the same primary key are being updated). Is this because that column is being updated for other values and not the specific rows filtered out?
For example, say my dataset looks like this:
Name | Date1 | Date2 | Date3 |
John | 4/1/2023 | 4/8/2023 | 4/12/2023 |
Bill | 4/1/2023 | 4/3/2023 | 4/8/2023 |
Robert | 4/1/2023 | 4/12/2023 | |
Stan | 4/1/2023 | 4/12/2023 | 4/8/2023 |
If my Update File looks like this:
Name | Date1 | Date2 | Date3 |
Robert | 4/1/2023 | 4/13/2023 | 4/12/2023 |
Stan | 4/1/2023 | 4/12/2023 | 4/8/2023 |
My results end up looking like this:
Name | Date1 | Date2 | Date3 |
John | |||
Bill | |||
Robert | 4/1/2023 | 4/13/2023 | 4/12/2023 |
Stan | 4/1/2023 | 4/12/2023 | 4/8/2023 |
Thanks in advance.
@seanmccormac
I'm not replicating that issue...
Before update:
After update:
Do these columns have formulas?
Are the primary key values for these rows definitely not in the update data? I see the example is using names, are those names repeated anywhere in the data?
And did you try changing the ForceMode1Processing from "No" to "Yes" to see if that resolved the issue?
Hello,
When I set it to "yes", I get this error 100% of the time (I think because the file I'm using is not as large as the script is meant for):
@seanmccormac
The mode 2 processing works by pulling in the entire table into memory & updating each row in that in-memory copy before pasting it back to the sheet in batches of rows. I'm not sure if the issue is something that happens during that pasting of the rows back to the sheet. It involves more code, but the mode 2 can handle much larger batches until it hits a table size limit.
The mode 1 processing is meant to work on a destination table of any size. And it just finds each row in the table & updates it. No additional in-memory copy or batch re-pasting of the table.
The downside of mode 1 is it can only handle around 500 to 1500 updates in each run of the script depending on how much data you are trying to update.
What is the size of your set of update data? If you take a smaller number of rows, like the initial 500, does the script succeed with mode 1 processing?
Hi @takolota
Awesome solution as always!
I only have one doubt about changing the source.
I'm using the 5.3b because of the size of the data set and in my case I want to try and get the data from a SharePoint list instead of an Excel file.
The issue comes with the "Get Items" step for SharePoint since it doesnt have "Skip Count".
I saw the code you places in comments of the "List rows" step, but unfortunately I didnt quite understand where it should be placed.
Would you mind giving me some clarification?
Thanks in advance.
@Reinand
Thanks for bringing this up.
So if you have 100,000 rows or less to update / create from SharePoint, then I suggest actually putting the Get items (/w pagination set to 100,000) right before the Do Until loop. Then you can set up a Compose with that expression & pass it to SelectGenerateUpdateData or directly input the expression in the GenerateUpdateData action...
take(skip(outputs('Get_items')?['body/value'], mul(iterationIndexes('Do_until_Update'), variables('BatchSize'))), variables('BatchSize'))
There is also a way to go beyond the 100,000 SharePoint items by setting up another Do until loop that runs until the Get items is empty and then within that loop you would need to sort the data by ID, capture the last ID returned from SP, then set the next loop with the next 100,000 SP items to Filter Query to only items with an ID greater than the last ID of the previous loop.
But hopefully you don't have to add those extra complications.
No need for the extra steps, this is exactly what I've been looking for!
Appreciate you taking the time clarify and you're doing an awesome job with these Flows.
Cheers
Version 5.4
-Fix some functionality lost in a previous version allowing null values to over-write cells again
-More people should be able to access the flow through the regular import method now.
(Some users encountered errors preventing them from importing the flow. All initial data source & table references in the template flow have been replaced with a blank placeholder value so the flow isn't trying to read references that do not exist.)
-New BatchUpdateV6 script with more efficient Mode 1 processing allowing for 1.2x to 2x larger batch sizes on the larger tables & jobs, especially on updates with a larger number of columns
Batch Update Script Code: https://drive.google.com/file/d/1kfzd2NX9nr9K8hBcxy60ipryAN4koStw/view?usp=sharing
(This updated script helps reduce the number of times it has to read from the Excel table during each row update. The cell by cell updating within the row was successfully moved to an in-memory row copy that is then inserted back to the table. Instead of 1 read per column, the script now reads each whole row 2 times regardless of the number of columns being updated in each row.)
Good Afternoon,
I hope this is an easy fix and I appreciate anyone's time. I have loaded the V5.4b and I believe that everything is set up correctly. I am using it an Excel spreadsheet in Onedrive with two tables: The target table is Table1 and the source table is Table2. There are approximately 5000 rows and 20 columns in Table2. The structure of Table1 is identical and the columns are identical to Table2. When I run the flow I receive an error "Unable to process template language expressions in action 'ComposeDestinationJSON' inputs at line '0' and column '0': 'The template language function 'json' parameter is not valid." The first 1500 rows are created correctly and they are also updated correctly but nothing happens after row 1500 in the target Table.
I am unfortunately not fluent using Power Automate but what I have tried is changing the batch size down to 100 and up to 1000 and also ForceMode1 set to Yes and No with the same results. If I delete row 1500 to the end on the source sheet the flow runs as expected with no errors.
The error shown below is what I am getting with a long list that includes all the data from row 1501 to the end.
@m00ch
It looks like it is not recognizing the input as a proper stringified JSON. Could you share a sample of your data & maybe a larger part of the error message? If needed, you can private message me with a file.