Hi sparkuser, thanks for the question. You have a few options to accomplish this. I’ll describe them both.
You can work with database and Hadoop data in the same workflow. Due to differences in the underlying code bases our operators usually can’t work with both at the same time. One option is to transfer the data in Greenplum to your Hadoop cluster. You can accomplish this using the Copy to Hadoop operator. Depending on the size of the data, this may be a slow process. However, it only has to be done once.
You can also save the trained model to the Alpine Model format by using the Export operator. Then, use the Import Model operator to import the trained model to a Greenplum workflow. Please see our documentation for more information on which models we support.