Archive for the Snowflake Category

Snowflake loader using ODI

Posted in ODI, Snowflake with tags , on April 13, 2023 by Rodrigo Radtke de Souza

Written on April 13, 2023 by Rodrigo Radtke de Souza

Hi all. Today’s post will talk about how to create a simple, but powerful Snowflake loader using ODI. Actually, you may use it with any technology that you want, but I ended up creating this one for Snowflake because I needed to load several Oracle tables to Snowflake in a fast and dynamic manner.

This post combines the techniques described here and here. In the first post we saw how to load text files to Snowflake. In the second post we saw how to create dynamic text dumps from Oracle. What if we combine both posts? We could create a dynamic load that will receive a table name as a parameter, create a text file and push this text file to Snowflake. Isn’t that cool?

Our dynamic loader will be an ODI Procedure and it will have the following steps:

You can get the code for “Get File Metadata”, “Create File”, “Truncate File”, “Create Header” and “Insert Rows” from my past post. There is a new step here called “Create target table” that will try to create a table in Snowflake if it does not exists yet. I added this step just as a convenient way to speed up my development process, since this procedure is nothing more than a job to get some table from Oracle and copy that to Snowflake. The code is below:

CREATE TABLE <%=odiRef.getSchemaName( "D" )%>.<@=table_name@>
(
<@=table_column_type@>
);

Just don’t forget to set the target schema with the logical schema from Snowflake where you want the table to be created and also set “Ignore Errors” in the procedure, so if the table exists it does not stop the process.

The last three steps are almost the same as in the first post, with one change. Instead of <%=odiRef.getTargetTable(“TABLE_NAME”)%>, now I’m getting the information from <@=folder_path@> and <@=table_name@>. E.g:

OdiOSCommand "-OUT_FILE=<@=folder_path@>_copy.log" "-ERR_FILE=<@=folder_path@>_copy.err"
snowsql -c #P_CONNECTION_NAME -w #P_SNOW_WAREHOUSE -r #P_SNOW_ROLE -d #P_SNOW_DATABASE -s #P_SNOW_SCHEMA -q "copy into #P_SNOW_DATABASE.#P_SNOW_SCHEMA.<@=table_name@> from  @#P_STAGE_NAME/<@=table_name@>.gz"

When we execute the package, we have the following:

The job received a table name as a parameter. It gets the metadata from the table name in Oracle, creates a file with its data, gzips it, sends it to Snowflake using PUT and COPY it into the final table (that was created for you if it didn’t exist before). Awesome right? If you notice it, this process will duplicate the data if you run it twice, since it does not truncate the target table before loading the data into it. I did that on propose because there was scenarios that I wanted to load several different datasets into a single table, in append mode. But, you may enhance this procedure and add a step to truncate the target table if you wish.

Hope you enjoyed! See you next time!

Advertisement