cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Varasset
Regular Visitor

Bulk Data Synchronization with CSV Upsert

Hi Everyone,

 

I'm looking to build a product that will require a weekly data synchronization of records. I'm looking for the easiest and cost effective solution to synchronizing this data. Here are the constraints/requirements:

 

  • The customer will have produced a .csv file containing all the records for the weekly synchronization.
  • The customer would either load the .csv file directly into a Model Driven App to start the process, or deposit it to SharePoint, Azure Blob, or on-site file storage (with on-prem data gateway enabled). Open to whatever is easiest.
  • The data will include columns that need to be upserted into Dataverse with their alternate key rather than guid.
  • Record count could be up to roughly~250,000.
  • Easy to deploy with our solution.
  • Able to productize and license it independently of the core solution, including invalidating functionality for non-payment.

We've explored using Azure Data Factory for similar style work, but find that it doesn't really satisfy the last two bullet points since as far as we understand, data factory templates once deployed are pretty much open season to anyone with administrator rights in that tenant.

 

What's the best way to perform this type of data integration while maintaining IP protection when deployed at scale?

1 ACCEPTED SOLUTION

Accepted Solutions
cchannon
Continued Contributor
Continued Contributor

I feel your pain; I've had those customers too.

 

I have never seen that Dataflows don't support state/status, but TBH I don't think I've ever had the need to try, so you've got me there. This seems pretty sus to me, though. Is the issue just that you're not getting the right datatype since technically state/status are not Picklists, but a "virtual" data type that structures its meta as Microsoft.Dynamics.CRM.StatusAttributeMetadata?

View solution in original post

6 REPLIES 6
cchannon
Continued Contributor
Continued Contributor

Dataflows is the new MSFT tooling designed specifically for this kind of massive, unidirectional data synchronization. all your answers are here:

Introduction to dataflows and self-service data prep - Power BI | Microsoft Docs

Thanks for the reply. My initial testing in a sandbox for Dataflows yielded some underwhelming results.

 

While it's more capable than when I first looked at it over a year ago, I was getting on average about 10,000 records inserted every 15-20 minutes, which would put a sync of potentially hundreds of thousands of records into problematic territory. This would be fine for an initial load if the subsequent ones could be done incrementally, but incremental refresh according to the documentation doesn't work with Dataverse as the destination of the data (https://docs.microsoft.com/en-us/power-query/dataflows/incremental-refresh#considerations-and-limita...) which is another problem.

 

In addition to this, I didn't see how Dataflows exposes the ability to sync the status and reason of a record in any way. It doesn't appear to be a mappable data point, which seems kind of lame.

 

Feels like Data Factory might be my only option despite the issues around deployment and productizing it.

cchannon
Continued Contributor
Continued Contributor

At the risk of saying what I'm sure is already quite obvious, if you're moving a quarter of a million records per week, you shouldn't be doing it via csv...

 

If you were making api calls per transaction: 

  1. the systems would be in near-real-time sync instead of weekly
  2. you would have pretty much no performance concerns because your 250K would be spread out over the working hours of an entire week.

 

Yes, I have seen cases where a CSV transport was necessary, but only where the customer had a system so unbelievably ancient that it had no capacity to make a REST call and at the same time they had no access to the server on which it ran to build and install a listener app of some kind. These days, unless you're working for the National Luddite Association (apologies if I offend any members here 😁) that is pretty much never the case. I mean, is your customer running a 65-year-old IBM Mainframe? How is it possibly the case that they have this kind of weekly volume of records, and yet no better way to move them than Intern Bob running a CSV export?

Yeah it definitely isn't our preferred option, but we tend to do data integrations with internal systems at companies where their IT shop doesn't want to play ball by putting in the effort to do a more real-time API integration, so often times flat file extracts from systems is the path of least resistance.

 

It appears that our best option is to install something on their servers to perform a CSV differential ourselves (from previous and current extracts) and then send that resulting file through Dataflows perhaps. The question however still remains regarding Status, since it doesn't appear that State/Reason are mappable fields in Dataflows. Would the workaround there have to be an alternate status column that can fire a workflow or plugin to change the State based on it being set?

cchannon
Continued Contributor
Continued Contributor

I feel your pain; I've had those customers too.

 

I have never seen that Dataflows don't support state/status, but TBH I don't think I've ever had the need to try, so you've got me there. This seems pretty sus to me, though. Is the issue just that you're not getting the right datatype since technically state/status are not Picklists, but a "virtual" data type that structures its meta as Microsoft.Dynamics.CRM.StatusAttributeMetadata?

View solution in original post

That's a possibility. Our developer sped through the process of looking into performance so it's possible he didn't map a data type properly and it just didn't give him the status/reason option as a result.

 

Thanks again for the help, I think we'll call this solved.

Helpful resources

Announcements
PA User Group

Welcome to the User Group Public Preview

Check out new user group experience and if you are a leader please create your group

MBAS Attendee Badge

Claim Your Badge & Digital Swag!

Check out how to claim yours today!

secondImage

Are Your Ready?

Test your skills now with the Cloud Skill Challenge.

secondImage

Demo Extravaganza is Back!

We are excited to announce that Demo Extravaganza for 2021 has started!

MBAS on Demand

Microsoft Business Applications Summit sessions

On-demand access to all the great content presented by the product teams and community members! #MSBizAppsSummit #CommunityRocks

Users online (59,782)