Query based vs. Query Publisher based integration
What is the current best practice when doing integrations into Dynamics 365 (on prem)?
Source data resides in SQL Server, but should I design using a Query based dts applying Bulk features?
Or, should I use a Query based publisher using the MSMQ?
Data arrives typically once each day and holds data for 2.000-50.000 account updates. Peak load is 500.000 accounts...
All articles and blog posts are 5+ years old on this subject and I suspect that Scribe has been enhanced since then.
I'd love to hear your thoughts on this.
With respect to using multiple copies of the same DTS to process into CRM, I've not used this from an integration perspective before, but to make it work, you'd have to somehow subset the source data. If for example you were going to use 4 copies of the same DTS, each would need a slightly different source query, which if using SQL source, you could use partioning features and row numbers to set a field which would be "Subset" from 1-4, and use that in each separate source query, for example.
I probably would go down the route of MSMQ based personally, although the peak load would concern me if the scribe server was having to process many other operations, as they would potentially block it. Do you have any other data integrations that any of this data might be dependent on, that it has to process first, for example? Our largest integration processes about 2 million records a day at peak, from about 15 different integrations, some of which have dependencies on others so we had to set message priorities, but also reprocessing functionality so if an account was processed before one of it's master data lookups it needs, it'll reprocess it after the master data comes in.
Also remember if you have CRM on prem, you have access to the SQL DB, that the standard DB indexes CRM comes with are just that - standard. They are designed with the standard, unconfigured core product in mind. After a week or so of pushing the data into CRM through Scribe, the SQL querying engine will have produced a bunch of statistics around what indexes would make your system perform quicker - for example, if you were doing any lookups based on different fields, or retrieving fields from CRM records, it would potentially give you queries that would improve the performance. There's a good article here - https://basitaalishan.com/2013/03/13/find-missing-indexes-using-sql-serv... . Remember to ensure these are reindexed regularly too as part of a maintenance plan. Potentially even daily.
Finally, if you're pushing large amounts of data into CRM, and for each record perhaps doing multiple lookups against other CRM records, for example if you were looking up a sales rep, or some other related data that's relatively static, consider pulling a copy of that data out into a SQL table to lookup against, rather than issuing lookups against the CRM SQL DB or through the CRM adapter. If you went with the MSMQ route and had 12 message processors running data in parallel, and had say 5 lookups to CRM data for each account/contact update, you have potentially 60 additional database lookups running at the same time, causing major concurrency and locking issues.
Hope that helps!
I would prefer smaller messages in this case. It depends a litte bit on the size of the records and messages and if there are any other Queue IPs which are running in parallel (could be blocked).
At the end, if you want the best performance, then all available processors must work on the messages. If you have more messages than processors, then it's ok (less impact), but not the other way around.
Thanks for your input. As I read it, you'd use MSMQ to handle splitting the workload into multiple parallel sequences working on separate parts of the full payload.
From @Nate Keefe s post above, it seems as Scribe recommends using SQL as source and using multiple threads (and multiple dts...) to enhance performance. It is your experience, that the best performance is obtained by using MSMQ above SQL load using multiple Scribe threads?
Following your setup, would you create a number of messages equal to the number of processors in the associated processor group, or would you split it into smaller messages and just queue more messages?
I suggest a mixture of query and queue IPs:
- Query or Time IP: Source = SQL | Target = MSMQ
- Queue IP: Source = MSMQ | Target = CRM
- Processor Group: SERIAL or Custom (with low number of processors)
- Configure it to have multiple records in one MSMQ message XML (repeating nodes)
- Process Group: DEFAULT or Custom (with higher volume of processors - scalable!)
- Simple queue intergration
- Configure source XML to handle repeating nodes