Re: Fast compare of large JSON array

wvp · ‎01-22-2020

I am trying to build logic in flow to compare two JSON arrays. The arrays are created with in the flow based on source data, the arrays use the same schema, so the object within the array are comparable.

One of the arrays (source) is based on the data in an Excelsheet. The other array is based on the data from a SharePoint list. I would like to extract al objects from the source array that are not available in the array based on the SharePoint list. The result should be a array that contains only objects that need updating in the SharePoint list.

I built the logic for this using a Apply to each loop, but this takes to long. It takes 45 minutes to complete with 1000 items and maximum concurrency. My arrays will contain +/- 40.000 items so this takes forever to complete.

Is there a way to speed thing up?

The process i have built so far:

JSON array built based on data from Excel

JSON array built based on data from SharePoint

manuelstgomes · ‎01-22-2020

Hi @wvp

There's a way that you could do this quicker, but it involves SharePoint.

You need to add a constraint to the column that you're using to compare not to allow duplicate values. Then dump the second array there and use the error paths to "ignore" the duplicates. After that, fetch all values in SharePoint, build the final array, and delete the values in SharePoint for future runs.

This will "catch" the invalid insert and continue.

I think this would be faster since SharePoint takes care of the comparison, and inserting stuff is quite fast.

If I have answered your question, please mark your post as Solved.
If you like my response, please give it a Thumbs Up.

Cheers
Manuel

wvp · ‎01-22-2020

Thanks for your reply @manuelstgomes,

The thing is that the only value that remaines constand is the DebiteurNr. The other values can change.

Most of the time an item already exists with the DebiteurNr, but it need to be updated because on of the other values in the object is changed. If i understand your solution it only work for items that are added to the source which are not yet available in the SharePoint list.

The bigger problem i am trying to fix is:

I noticed that performance for adding items to a sharepoint list with Flow is acceptable for us: it taken 5 hours to add 40.000 items to a SharePoint list.
Looping through all 40.000 items and updating existing items in the SharePoint list is painfully slow. After 4 days it is stil not finished (which seems odd because adding items is so much faster as it seems)
I tried removing the items first and then adding them again, but deleting items from a SharePoint list seems equally as slow as editing them.. it also takes days to complete.

So my idea was to minimize the amount of items to be updated so i can minimize communication with SharePoint. This wil need to be done once a week and there are probably a maximum of a 1000 items changed instead of 40.000.

v-bacao-msft · ‎01-22-2020

Hi @wvp ,

If you have done the following settings for Apply to each, but the effect is still not satisfactory, I think this has indeed exceeded the limit that Flow can handle, because there is a lot of data to be compared.

You could try to use multiple Flows to achieve your needs, the same configuration, but group all items by ID.

Each Flow processes equal amounts of items. For example, if you have 1,000 items, divide them into five groups and process them in five identical flows.

You can filter each group of items by ID in each Flow.

Like:

Hope this helps.

Best Regards,

Community Support Team _ Barry
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

wvp · ‎01-28-2020

Hi @v-bacao-msft ,

I indeed seem to have hit the limits of Flow. If only it would be faster to loop through a simple JSON array, I was hoping this would be more efficiënt in Flow. Concurrency control does not speed things up, it seems to slow them down even more.

As comparison i tried to do the same in Azure Logic App which did the whole transaction in 1.5 hours. Unfortunately at this moment this is not an option.

I will investigate multiple Flows!

jmathews607 · ‎10-21-2020

I ran into the same issue and the following solution was very efficient:

Add a new "Filter array" action
In the "From" field, enter the outputs of your first "Select" action -- in your case, body('SelectSourceItemValues_2')
As the first value to compare, use the "contains" function to check whether the current item in the filter action exists in your second array -- in your case contains(body(SelectSPItemValues_2),item()) in the expression editor
Keep "is equal to" selected and enter "false" in the expression editor for the second value

This should output a list of items from your first array that do not exist in the second.