ODI in the hybrid database world – Snowflake JDBC

This second post will talk about Snowflake integration with ODI. Let’s picture a scenario like the last post: your company has a large on-premises ETL/database footprint, but it is starting to move slowly to the cloud, in this case, Snowflake. You want to use the existing ODI architecture for this task, but Snowflake is not a technology that comes out of the box with ODI, so how could you do that? Let’s figure it out in this post.

Luckily a good friend of mine, Michael Rainey, wrote about it in his post here. I won’t go over the details because I don’t want to copy and paste what is already written there, but in a very resumed way you need to download the Snowflake JDBC driver, add it to the ODI agent, create/copy a new technology for Snowflake usage and that’s it. It will work just fine.

However, after working with it for some time, I found some details that I think its worth sharing with you. First, differently from the first post where Oracle is already a technology that ODI knows, Snowflake is not and for that reason you may start to face some small issues here and there regarding SQL statements for example. If the KMs or procedures that you are using are standard/universal SQL that both Oracle and Snowflake understands, it will work just fine. If the SQL is kind of different in Snowflake (meaning a different syntax) or it is Oracle exclusive, than you will need to start doing some customizations.  Luckily, most of these customizations should be very simple to adapt to Snowflake.

Another thing is data volume. For small workloads, it works well. For larger ones you will need to do some tweaks. One way to decrease the time loads is by playing with Array Fetch/Batch Update Sizes and Degree of Parallelism for Target, as you can see below.

Array Fetch/Batch Update Sizes are very hard to fine tune to an optimal value, since it depends on a lot of factors like length and size of the table, network, and so on. Sometimes you may fine tune for smaller tables, but larger tables suffer and vice versa. You will need to run some tests and see what the best value for your case is. However, Degree of Parallelism for Target is one that you may increase up to 20 without too much worry and you will see a huge gain. You cannot increase further because Snowflake (at least in my account) has a limit of 20 parallel threads working on the same object at a time.

If you want to see what is happening on the push of data to Snowflake, you may check the Load task of it in Operator and click on Details. It will show you the details and times that each thread took to execute it:

However, even with those tweaks in the Topology, I found myself into situations where the data load was just not fast enough. Upon doing some research and some testing, I figure out that there is a way to push data to Snowflake way faster than JDBC, and it is by using SnowSQL client. This one I’ll cover on the next post.

See you soon!

One Response to “ODI in the hybrid database world – Snowflake JDBC”

  1. […] For a better EPM world « ODI in the hybrid database world – Snowflake JDBC […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: