cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
wvp
Regular Visitor

Fast compare of large JSON array

I am trying to build logic in flow to compare two JSON arrays. The arrays are created with in the flow based on source data, the arrays use the same schema, so the object within the array are comparable.

 

One of the arrays (source) is based on the data in an Excelsheet. The other array is based on the data from a SharePoint list. I would like to extract al objects from the source array that are not available in the array based on the SharePoint list. The result should be a array that contains only objects that need updating in the SharePoint list.

 

I built the logic for this using a Apply to each loop, but this takes to long. It takes 45 minutes to complete with 1000 items and maximum concurrency. My arrays will contain +/- 40.000 items so this takes forever to complete.

 

Is there a way to speed thing up?

 

The process i have built so far:

DataProcess.png

 

JSON array built based on data from Excel

ExcelData.png

 

JSON array built based on data from SharePoint

SharePointdata.png

5 REPLIES 5
manuelstgomes
Super User
Super User

Hi @wvp 

 

There's a way that you could do this quicker, but it involves SharePoint.

 

You need to add a constraint to the column that you're using to compare not to allow duplicate values. Then dump the second array there and use the error paths to "ignore" the duplicates. After that, fetch all values in SharePoint, build the final array, and delete the values in SharePoint for future runs.

 

Screenshot 2020-01-22 at 09.36.57.png

This will "catch" the invalid insert and continue.

 

I think this would be faster since SharePoint takes care of the comparison, and inserting stuff is quite fast.

 

If I have answered your question, please mark your post as Solved.
If you like my response, please give it a Thumbs Up.

Cheers
Manuel

wvp
Regular Visitor

Thanks for your reply @manuelstgomes,

 

The thing is that the only value that remaines constand is the DebiteurNr. The other values can change.

 

Most of the time an item already exists with the DebiteurNr, but it need to be updated because on of the other values in the object is changed. If i understand your solution it only work for items that are added to the source which are not yet available in the SharePoint list.

 

The bigger problem i am trying to fix is:

  • I noticed that performance for adding items to a sharepoint list with Flow is acceptable for us: it taken 5 hours to add 40.000 items to a SharePoint list.
  • Looping through all 40.000 items and updating existing items in the SharePoint list is painfully slow. After 4 days it is stil not finished (which seems odd because adding items is so much faster as it seems)
  •  I tried removing the items first and then adding them again, but deleting items from a SharePoint list seems equally as slow as editing them.. it also takes days to complete. 

So my idea was to minimize the amount of items to be updated so i can minimize communication with SharePoint. This wil need to be done once a week and there are probably a maximum of a 1000 items changed instead of 40.000. 

 

v-bacao-msft
Community Support
Community Support

 

Hi @wvp ,

 

If you have done the following settings for Apply to each, but the effect is still not satisfactory, I think this has indeed exceeded the limit that Flow can handle, because there is a lot of data to be compared.

54.PNG

You could try to use multiple Flows to achieve your needs, the same configuration, but group all items by ID.

Each Flow processes equal amounts of items. For example, if you have 1,000 items, divide them into five groups and process them in five identical flows.

 

You can filter each group of items by ID in each Flow.

Like:

53.PNG

Hope this helps.

 

Best Regards,

Community Support Team _ Barry
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi @v-bacao-msft ,

 

I indeed seem to have hit the limits of Flow. If only it would be faster to loop through a simple JSON array, I was hoping this would be more efficiënt in Flow. Concurrency control does not speed things up, it seems to slow them down even more.

 

As comparison i tried to do the same in Azure Logic App which did the whole transaction in 1.5 hours. Unfortunately at this moment this is not an option.

 

I will investigate multiple Flows!

 

 

jmathews607
Frequent Visitor

I ran into the same issue and the following solution was very efficient: 

 

  1. Add a new "Filter array" action
  2. In the "From" field, enter the outputs of your first "Select" action -- in your case, body('SelectSourceItemValues_2')
  3. As the first value to compare, use the "contains" function to check whether the current item in the filter action exists in your second array -- in your case contains(body(SelectSPItemValues_2),item()) in the expression editor
  4. Keep "is equal to" selected and enter "false" in the expression editor for the second value 

This should output a list of items from your first array that do not exist in the second. 

Helpful resources

Announcements
UG GA Amplification 768x460.png

Launching new user group features

Learn how to create your own user groups today!

Community Connections 768x460.jpg

Community & How To Videos

Check out the new Power Platform Community Connections gallery!

M365 768x460.jpg

Microsoft 365 Collaboration Conference | December 7–9, 2021

Join us, in-person, December 7–9 in Las Vegas, for the largest gathering of the Microsoft community in the world.

Users online (3,080)