This is quite common in integration projects to have to extract large volumes of data from a database or to have to insert large volumes of data in a database.
A challenge with such use cases is that the number of rows to manage can be very high (tens of thousands and even hundreds of thousands) and in such case it is not possible to handle the whole data set in memory and it is necessary to process data by blocks of a limited size.
This is something that BusinessWorks can do while doing SQL select and insert queries and this new blog article explains how to do it.
It is recommended to use the BusinessWorks Memory Saving Mode with this pattern.
Inserting data by blocks in a database
To insert data by blocks you have to do the following:
. Configure the ‘JDBC Update’ activity to do batch updates by just checking the ‘Batch Update’ option in the Advanced tab:
. After doing this the ‘Record’ element in the Input tab is changed to be repeating:
. Then you just have to map the input list in the ‘Record’ list using a for-each, like for example:
Using this approach the records in the Input will be inserted in a single SQL statement and it is much faster than doing it record by record.
Extracting data by blocks from a database
To extract data by blocks you have to do the following:
. Configure the ‘JDBC Query’ activity to do batch query by just checking the ‘Process in Subsets’ option in the Advanced tab:
. After doing this the ‘Record’ element in the Output tab is changed to be repeating and a Boolean flag ‘lastSubset’ is added in the Output:
. Also an integer field ‘subsetSize’ is added in the Input, this field has to be mapped to define the number of rows to be processed at each iteration (common values are between 100 and 1000 depending on the size of the rows):
. It is important also to make sure the ‘Maximum rows’ property from the ‘General’ tab is set to 0 (zero) to read all the available rows:
. Then include the ‘JDBC Query’ activity in a Repeat group and set the exit condition with the ‘lastSubset’ flag:
Using this approach a batch of records are extracted at each iteration using a single SQL statement and it is much faster than doing it record by record.
Additional elements
It is possible to implement some kind of ETL (‘Extract Transform and Load’) scenarios using the technics described in this article combined with the mapping technics described in this previous article:
It is also possible to combine the technics described in this article with the large files processing technics described in this previous article:
Using those elements the following use case can be implemented:
. Query a source database to extract data
. Put the extracted data in a file
. Transfer the file to a target location
. Insert data from the file into a target database
Example of a process reading data from a file by blocks and inserting the data blocks in the target database, each block is inserted using a single JDBC Insert activity and a single SQL statement:
Example of a process extracting data from a database by blocks and writing the data blocks to a target file, each block is extracted using a single JDBC Query activity and a single SQL statement:
Reference elements
It is recommended to use the BusinessWorks Memory Saving Mode with this pattern.
Details are available in the following article:
Additional elements
You can refer to the attached demo project.
Recommended Comments
There are no comments to display.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now