Building dynamic ODI code using Java variables – Table to File example

Posted in ODI with tags , on March 23, 2023 by Rodrigo Radtke de Souza

Written on March 23, 2023 by Rodrigo Radtke de Souza

Hi all. Today’s post is an “enhancement” of one post that I did a long time ago regarding using Oracle’s metadata to build dynamic code. I say it is an enhancement because that technique would use the source/target command tabs and it has a 4000 character limitation. So, if you are trying to build some metadata information that crosses this limit, it will fail.

You should be good using source/target command technique in 90% of your data loads, but there are a few exceptions that we need to go beyond 4000 characters. One of these examples is when you are trying to build dynamic code to generate text file loads in ODI. The example that I’ll show here is about dumping the data from any Oracle table to a text file dynamically, using only one ODI procedure.

This is what we will build. I’ll go over each step in detail:

Basically, it is an ODI procedure that will receive a table name as a parameter and it will write that table data into a text file using pipe “|” as delimiters. This is a very small but useful procedure that you may use when you need to create fast text files from tables/views without going through all the work to create ODI mappings for it.You will need two Logical Schemas, one for the source table and another one for the target file. In this example I used LOCAL_SOURCE that points to an Oracle database and OUTPUT_FILE that points to a folder where the text file will be created.First step is to get the file metadata. As I said, this will be done in a different way now. Instead of using source/target tab commands, we will use Java BeanShell to retrieve this information. This is because ODI creates a very lengthy metadata for each column and the total amount of metadata often crosses the 4000 characters limit. Just as an example, each column in a file text load looks similar to this:

"CRDWG_COLSNP$CRCOL_NAME="+metaDataSrc.getColumnName(i) + "SNP$CRTYPE_NAME=STRINGSNP$CRORDER="+i+"SNP$CRLENGTH=4000SNP$CRPRECISION=4000SNP$CRACTION_ON_ERROR=NULLSNP$"

Here is the full code for the “Get File Metadata” step:

<@
import java.sql.*;

String table_name = "#DWCLOUD.L_TABLE_NAME";
String folder_path = odiRef.getSchemaName("OUTPUT_FILE","D")+"/"+table_name;
String folder_path_txt = folder_path+".txt";
String table_schema = odiRef.getSchemaName("LOCAL_SOURCE","D");
String table_metadata = "/*$$SNPS_START_KEYSNP$CRDWG_TABLESNP$CRTABLE_NAME="+table_name+"SNP$CRLOAD_FILE="+folder_path_txt+"SNP$CRFILE_FORMAT=DSNP$CRFILE_SEP_FIELD=0x007cSNP$CRFILE_SEP_LINE=0x000D0x000ASNP$CRFILE_FIRST_ROW=1SNP$CRFILE_ENC_FIELD=SNP$CRFILE_DEC_SEP=SNP$CRSNP$";
String table_header = "";
String table_header_parameter = "";

Connection ConSrc = DriverManager.getConnection("<%=odiRef.getInfo("SRC_JAVA_URL")%>","<%=odiRef.getInfo("SRC_USER_NAME")%>",odiRef.getInfo("SRC_PASS"));
Statement stmtSrc = ConSrc.createStatement();
ResultSet querySrc =  stmtSrc.executeQuery("select /*+ parallel(64) */ * from "+ table_schema +"." + table_name + " where 1=2");
ResultSetMetaData metaDataSrc = querySrc.getMetaData();

int count = metaDataSrc.getColumnCount();
for (int i=1; i<=count; i++)
  {
	table_metadata = table_metadata + "CRDWG_COLSNP$CRCOL_NAME="+metaDataSrc.getColumnName(i) + "SNP$CRTYPE_NAME=STRINGSNP$CRORDER="+i+"SNP$CRLENGTH=4000SNP$CRPRECISION=4000SNP$CRACTION_ON_ERROR=NULLSNP$";
	table_header = table_header+metaDataSrc.getColumnName(i);
	table_header_parameter = table_header_parameter+":"+metaDataSrc.getColumnName(i);  
    	 
	if (i==count)
	{
  	table_metadata=table_metadata+"CR$$SNPS_END_KEY*/";
	} else
	{   
  	table_metadata=table_metadata+"CRSNP$";
  	table_header = table_header + ",";
  	table_header_parameter = table_header_parameter + ",";    
	}
  }

ConSrc.close();
@>

How it works? It uses a variable (that you will use in an ODI package later on), to figure out which table it will dump into a file. It actually can be other objects that you can query, like a view. Views are actually great to be used here since you may create any kind of business logic and filters on the view and dump the data to a file when needed. It also uses both logical schemas (source and target) to figure out where to read the source data and place the target file.

It then creates a connection object using the source connection that you set in the ODI procedure (please pay close attention to each source/target logical schema in the ODI procedure) and it runs a query where 1=2 just to retrieve the table/view metadata.

Having this metadata in the result set, it loops through it and starts to “build” the dynamic code for the header and the columns. It does some special conditions when it reaches the last column and then it closes the connection.

It may sound complicated, but it’s fairly simple. In less than 40 lines of code we are able to retrieve all metadata to dump any table’s data to a flat file in a dynamic way. The other steps are now very straight forward.

“Create File” step will be as simple as this:

create table "<@=folder_path@>"
<@=table_metadata@>

Same for “Truncate File”:

truncate table "<@=folder_path@>"
<@=table_metadata@>

“Create Header”:

create header (<@=table_header@>)
<@=table_metadata@>

“Insert Rows” is the only one that needs two steps. In Source Command we have:

select     
    /*+ parallel(64) */ <@=table_header@>
from    <@=table_schema@>.<@=table_name@>
where    
    (1=1)    

This one is used to read the source table. In Target Command we have the insert statement to the file object:

insert into "<@=folder_path@>"
(
	<@=table_header@>
)
values 
(
	<@=table_header_parameter@>
)
<@=table_metadata@>
  

Add this procedure in an ODI package, alongside with the ODI variable for the table name:

Add the table/view object that you want to dump the records. In this example, I used a table named SAMPLE_DATA. When I executed the package, I have the following:

If I go to the folder path defined in OUTPUT_FILE logical schema, I can see the data file created there (they are all fake/random rows, not real people information):

I hope you liked this post. See you soon!

Advertisement

ODI Hidden Gems – SNP tables: Query to get load plan execution steps

Posted in ODI, Query with tags , on December 6, 2022 by Rodrigo Radtke de Souza

Hi all,

Today is a short post, but very useful. I often get asked about how to write SNP table queries to get some information from ODI repositories. I posted one about “executed code” here. Today, this is another example of how to get a load plan execution steps that were executed. This one I use in the end of every load plan execution, so it creates a mini report of how each step performed in that load plan. This is useful for ODI Operators, so they can do some analysis of how well their jobs are doing.

This query was created using ODI 12.2.1.4.

WITH 
LP_STEPS AS (
SELECT SLS.I_LOAD_PLAN
     , LP.LOAD_PLAN_NAME
     , I.I_LP_INST
     , SLS.LP_STEP_NAME
     , SLS.I_LP_STEP
     , SLS.PAR_I_LP_STEP
     , SLS.STEP_ORDER
     , LEVEL AS LEVEL_
     , SYS_CONNECT_BY_PATH(SLS.LP_STEP_NAME, ' -> ') AS STEP_NAME_PATH
     , SYS_CONNECT_BY_PATH(SLS.STEP_ORDER, ' ') AS STEP_ORDER_PATH
     , SYS_CONNECT_BY_PATH(SLS.IND_ENABLED, ' ') AS IND_ENABLED_PATH
     , SYS_CONNECT_BY_PATH(DECODE(SLS.LP_STEP_TYPE, 'EX', 1, 0), ' ') AS EXCEPTION_STEP_PATH
     , CONNECT_BY_ISLEAF IS_LEAF
     , SLS.IND_ENABLED
  FROM SNP_LOAD_PLAN LP
    INNER JOIN SNP_LP_STEP SLS
        ON SLS.I_LOAD_PLAN = LP.I_LOAD_PLAN
    INNER JOIN SNP_LP_INST I
        ON SLS.I_LOAD_PLAN = I.I_LOAD_PLAN
        AND I.GLOBAL_ID = '<LOAD_PLAN_INSTANCE_GLOBAL_ID_HERE>'
 CONNECT BY PRIOR SLS.I_LP_STEP = SLS.PAR_I_LP_STEP 
  START WITH SLS.PAR_I_LP_STEP IS NULL
)  
, ALL_ AS (
SELECT L.I_LOAD_PLAN
     , L.LOAD_PLAN_NAME
     , L.I_LP_INST
     , L.LP_STEP_NAME
     , L.I_LP_STEP
     , L.STEP_NAME_PATH
     , L.STEP_ORDER_PATH
     , CASE WHEN L.IND_ENABLED_PATH LIKE '%0%' THEN 'N' ELSE 'Y' END ENABLED
     , MAX(IL.SESS_NO) AS SESS_NO
  FROM LP_STEPS L
    LEFT JOIN SNP_LPI_STEP_LOG IL
        ON L.I_LP_INST = IL.I_LP_INST
        AND L.I_LP_STEP = IL.I_LP_STEP
 WHERE 1 = 1
   AND L.IS_LEAF  = 1
   AND EXCEPTION_STEP_PATH NOT LIKE '%1%'
 GROUP BY L.I_LOAD_PLAN
        , L.LOAD_PLAN_NAME
        , L.I_LP_INST
        , L.LP_STEP_NAME
        , L.I_LP_STEP
        , L.STEP_NAME_PATH
        , L.STEP_ORDER_PATH 
        , L.IND_ENABLED_PATH
) 
SELECT A.I_LOAD_PLAN
     , A.I_LP_STEP
     , A.LOAD_PLAN_NAME
     , A.LP_STEP_NAME
     , A.STEP_NAME_PATH
     , A.SESS_NO
     , CASE WHEN A.ENABLED = 'N' THEN 'Step has been disabled'
            WHEN A.SESS_NO IS NULL THEN 'Step did not run'
            WHEN S.SESS_STATUS = 'D' THEN 'Step finished successfully'
            WHEN S.SESS_STATUS = 'E' THEN 'Step failed'
            WHEN S.SESS_STATUS = 'W' THEN 'Step finished with warnings'
            ELSE 'Other status not relevant'
        END STATUS
     , TO_CHAR(S.SESS_BEG, 'DD-MON-YYYY HH24:MI:SS') AS SESS_BEG
     , TO_CHAR(S.SESS_END, 'DD-MON-YYYY HH24:MI:SS') AS SESS_END
     , S.SESS_DUR
     , REPLACE(REPLACE(REPLACE(SUBSTR(S.ERROR_MESSAGE, 1, INSTR(S.ERROR_MESSAGE, CHR(9) || 'at ')), CHR(10), ' '), CHR(13), ' '), CHR(9), ' ') AS ERROR_MESSAGE
  FROM ALL_ A
    LEFT JOIN SNP_SESSION S
        ON A.SESS_NO = S.SESS_NO
 ORDER BY A.STEP_ORDER_PATH;

I hope you like it! See you soon!

ODI in the hybrid database world – Google BigQuery – gsutil + bq

Posted in BigQuery, Cloud, Google, ODI with tags , , on December 5, 2022 by Rodrigo Radtke de Souza

As we saw in our last post, ODI + BigQuery + Simba Jdbc didn’t give us the performance that we were expecting. But we may apply the same concept that we did for Snowflake: we may create a file, zip it (using GZIP), send it to a storage location (using gsutil) and load it into a table (using bq). Luckily, this is easy to do in ODI and Bigquery. First, we will need to install Google Cloud CLI, which will enable you to use both gsutil and bq. You may read about it here.

Next, we will follow the same steps that we did for Snowflake, first we will create a copy of an existing IKM. In this case, I duplicated IKM SQL to File Append and gave it a new name:

On Tasks, I left some original ones (which will create a text file for us) and I created three new steps: GZIP, Copy and Load.

The first new step is GZIP, because we want to compress our file before moving it to the cloud. Google BigQuery may load zipped files directly from Cloud Storage, but they need to be in a GZIP format. So, on “Target Command” I added this:

OdiOSCommand "-OUT_FILE=<%=odiRef.getTable("TARG_NAME")%>_zip.log" "-ERR_FILE=<%=odiRef.getTable("TARG_NAME")%>_zip.err"

"C:\Program Files\7-Zip\7z.exe" a "<%=odiRef.getTable("TARG_NAME")%>.GZ" "<%=odiRef.getTable("TARG_NAME")%>.txt"

For the sake of simplicity in this example, the path is hardcoded, but in a real-world project, we would make this path dynamic.

Second one is GCS Copy. In this step, we will call “gsutil cp” command that will get a file (in this case GZIP file that we created in the previous step) and send it to a bucket in Google Cloud Storage:

OdiOSCommand "-OUT_FILE=<%=odiRef.getTable("TARG_NAME")%>_cp.log" "-ERR_FILE=<%=odiRef.getTable("TARG_NAME")%>_cp.err"
gsutil cp "<%=odiRef.getTable("TARG_NAME")%>.GZ" gs://<YOUR_GCS_BUCKET_NAME_HERE>

Last step is using “bq load” command that will get the file that is stored in your bucket and load it to a table in BigQuery:

OdiOSCommand "-OUT_FILE=<%=odiRef.getTable("TARG_NAME")%>_bq.log" "-ERR_FILE=<%=odiRef.getTable("TARG_NAME")%>_bq.err"
echo bq load --source_format=CSV --skip_leading_rows=1 --field_delimiter="|" GOOGLE_TARGET.SAMPLE_DATA gs:// <YOUR_GCS_BUCKET_NAME_HERE>/SAMPLE_DATA.GZ| C:\WINDOWS\system32\cmd.exe /k ""C:\Users\<YOUR_USER>\AppData\Local\Google\Cloud SDK\cloud_env.bat""

This command is a little messy, but it works. In your project you may do something more elaborate, but for the sake of this example, it will work. Basically, I’m using the echo trick to call two command lines at once. The first one I run “cloud_env.bat”, which will set my cloud environment variables (this comes with Google CLI installation). Then, I call “bq load”, passing as parameters the source format, the number of heading lines and the field delimiter. In this case, we are using pipe “|” as our column delimiter. After that we pass which Dataset/Table we want to load and from which file (in this case our GZ file).

Seems complicated at first, but the data load speed is worth it. And since all this is in an IKM, you will do it once and reuse it repeatedly.

Before we create a mapping, we need to remember that now we won’t be loading BigQuery directly, but instead we will be loading a text file first. So, we need to create a new model pointing to a File logical schema that will be used to hold our temporary files. For this example, I’ve created a delimited file using pipe and with header. For the columns, I created all of them as Strings.

Now we may create our mapping. In this case, its pretty simple. We just add our Oracle source table and our File target table.

When we execute it, we can see that I was able to load 1,077,500 rows in 76 seconds. Not bad at all:

Once again, we saw that working with files and command line tools were way faster than JDBC connections. Also, if you think a little bit about it, it has other cool benefits. If you threat all your cloud integrations as file transfers, you may create an architecture that can be used across multiple cloud vendors. Your on-premises output will always be compressed files that will be loaded to cloud systems using different ODI Knowledge modules, one for each vendor. Since all of it are text files, ODI mappings may be used to create all sorts of complex logic using your on-premises database as staging and then only pushing the result to the cloud.

I hope you have enjoyed. See you soon!

ODI in the hybrid database world – Google BigQuery – Simba JDBC

Posted in BigQuery, Cloud, ODI with tags , on November 23, 2022 by Rodrigo Radtke de Souza

Hi all! Continuing our series, today we will talk about how to load data into BigQuery using ODI. There will be two posts, one talking about Simba JDBC and another one talking about Google Cloud SDK Shell. You will notice that the first approach (using Simba JDBC) has a lot of challenges, but I think its worth looking at it, so you have options (or at least know why you should avoid it).

First, you need to download Simba JDBC and add it to your ODI client/agent (like what was done with Snowflake JDBC here). Also, you need to “duplicate” one ODI technology (I used Oracle in this example) and name it as BigQuery:

In JDBC Driver, you will add the Simba driver and in URL you will add your connection to Google BigQuery. I’ll not go over the details here, since there are a lot of ways for you to create this URL, but you may check all the ways to create it in this link.

Press “Test Connection” to see if it is all working as expected:

Create a new Physical schema and give it the name of your BigQuery Dataset:

Create a new Logical Schema and a Model. If all is correct, you will be able to Reverse Engineer as you normally do with other technologies:

Let’s go ahead and just create a new mapping. This is a simple one: getting data from an Oracle database and loading it to BigQuery:

As LKM, I’m using LKM SQL to SQL:

As IKM, I just used a Global one:

When I ran it, it fails. If we check Operator, we will see that it couldn’t insert new rows because it was not able to create the C$ temporary table. If we look further, we see that it considers “$” as an illegal character in BigQuery tables:

If we get the create table statement from ODI Operator and try to execute in BigQuery, we will see a lot of issues with that:

First, as I said before, we cannot have $ signs, so let’s try to remove it:

It also does not accept NULL. Removing them gave another error:

Varchar2 is not a thing in BigQuery, so if we change it to String, it will work fine:

So, just for us to create a temporary C$ tables, we had to do several changes to it. It means that ODI/BigQuery integration is not straightforward as we did for Oracle Cloud or Snowflake. We will have to do some changes and additions to BigQuery technology in ODI. Let’s start with $ signs. Double click BigQuery and go to Advanced tab. There, lets remove the $ sign from temporary tables (I removed only in C$ for this example):

Also, you need to remove the DDL Null Keyword:

You may need to do way more changes depending on what you need to load, but for this example, we will try to keep as simple as possible.

Next step is to create a String Datatype. You may go to BigQuery technology and duplicate and existing Datatype. For this example, I used Varchar2. Then I changed the Name/Code/Reverse Code/Create table syntax/Writable Datatype syntax.

Another step is to change our source technology, in this case Oracle. We need to do that because ODI needs to know what to do when converting one datatype from one technology to another. We do this by going to Oracle Technology and double clicking in each source datatype that we have and what they compare to the new target technology (BigData). For this example, we will only be getting Varchar2 columns to String columns, so we will just make one change. In a real-world scenario, you would need to change all the ones that you use. Double click Oracle’s VARCHAR2 Datatype, click on “Converted To” tab and select String on BigQuery Technology.

One last change is to go to our BigQuery datastore and change the column types to String:

This should be enough for us to test the interface again. When we run it, no error happens this time:

However, the data load takes forever to complete (I cancelled after some hours trying to run a 1 M rows table). The ODI job was able to create the work table and I could see data flowing in BigQuery, but the speed was just too slow:

I tried to change the Array Fetch/Batch Update size/Parallelism in the Google BigQuery Data Server ODI object, but none of the values that I tried seemed to work fast as I wish:

Not sure if the slowness was due to something in my architecture, but at this point I just thought that, even if worked reasonably fast, it wouldn’t be as fast as creating a file and send it directly to BigQuery (similarly on what we did for Snowflake). Also, the amount of customization is so high to load a simple table, that I really don’t want to think on how bad it will get when we get to some complex mappings.

Not surprisingly, my tests with text files were extremely fast and the customizations needed are minimum compared to the JDBC method, so that’s what we will be covering in the next post.

Thanks! See you soon!

ODI in the hybrid database world – Files/Stages and SnowSQL

Posted in Uncategorized on April 14, 2022 by Rodrigo Radtke de Souza

Hi all, as mentioned in the second post of this series, there is a faster way to load data into Snowflake using ODI. It will require the creation of new KM, it has some peculiarities that some may think as limitations at first but the speed of the data loading is totally worth the work and the preparations needed for making it work. First, lets just picture how is the current Snowflake JDBC process.

It’s simple and very straight forward. You create an ODI mapping that will read your on-premises DB and send data over to Snowflake using JDBC. If you have small data loads or if you are happy with the time that the jobs are taking using this method, I recommend you stay with it because its pretty simple and works.

Now, if you need speed due to the volume of data that you need to transfer, you may create the following architecture:

Let’s describe the steps. First ODI will be used (using a Mapping) to generate a text file. The format may be anything you like (it needs to match the Snowflake stage definition, as you will see below). I’m using a pipe | delimited file for this example. Then ODI calls SnowSQL client (more on that later) that will compress and push the file to a Snowflake STAGE area, which will then be finally copied over to the final table.

If you stop and think a little bit about it for a second now, it seems very stupid. You have the data in a database, you extract it to a file, then you call a process to compress/push over the internet, stage it then finally copies it. Its way more work than the first method, right? It is way more work, it also requires space to store the text file, however, its way faster then JDBC.

You see, the main bottleneck when working with the cloud is exactly the transfer over the internet. With the second technique, what we end up doing is to zip a large file and send it across the network all at once, instead of relaying in a Java JDBC connector that is buffering some X number of rows and sending it across repeatedly. The amount of work that the JDBC driver does internally is way more and way slower than just creating a file, compress it, send it.

Also, cloud structures are awesome on working with files. Every cloud provider out there makes it very easy and fast to manipulate “raw” data. Snowflake is no different. It will stage and copy the compressed text file in an extreme speed, way faster then batch of rows using JDBC.

If you are still not sure if you should follow this route, my answer is wait until you really need to create a fast process and give it a go. You may do a test by simply extracting the data to a text file and run SnowSQL commands to push the data. You will see it will be super-fast.

Let’s see how we should implement this. First thing is that you will need to install SnowSQL client on your architecture (ODI agent server). This client will be the one called to execute things in Snowflake, including pushing the file and copying it. I won’t go over the details on SnowSQL, but you may read it all in this documentation from Snowflake.

Another thing that I’ll just assume that you know is how STAGES, PUT and COPY commands work in Snowflake. You may read about their documentation here, here and here.

Second step is to create a copy of the current “IKM SQL to File Append” and give it a new name, in my case its “IKM SQL to File Append – Snowflake PUT”. Delete some steps of it and leave only the ones below:

These ones will basically just create a file in a server. This file needs to have the same name as the table that you will want to load in Snowflake, plus the “.txt” extension (e.g., if you are loading CLIENTS table, you need to create CLIENTS.txt file in the server). The target Datastore definition may be anything you want, but I’m following this pattern:

Now you need to add only two more steps in the KM, as below:

Snowflake PUT

The target command is the following:

OdiOSCommand "-OUT_FILE=<%=odiRef.getTable("TARG_NAME")%>_put.log" "-ERR_FILE=<%=odiRef.getTable("TARG_NAME")%>_put.err"

snowsql -c #P_CONNECTION_NAME -w #P_SNOW_WAREHOUSE -r #P_SNOW_ROLE -d #P_SNOW_DATABASE -s #P_SNOW_SCHEMA -q 'PUT file://<%=odiRef.getTable("TARG_NAME")%>.txt @#P_STAGE_NAME auto_compress=#P_AUTO_COMPRESS parallel=#P_PUT_PARALLEL'

We can see that it is basically one OS command that is calling snowsql client. It is passing all the connection information in order to login and then it is issuing a PUT command to Snowflake. This PUT command is sending a text file to a stage area, with auto compress and a defined number of parallel workers. If you are familiar with ODI, you know that all of those # variables need to come from some place. You may implement in any way you want, but in my case, I did put a SQL statement in the command on source tab that returns all this information from a parameter table that is located in the on-premises database, like below:

I even added a CONFIG_CODE filter (which is a KM option that I added to this new KM), in the case of having multiple Snowflake configurations (which is very common to have). So, if you have multiple configs, you may add this option on your new KM and use it when you are creating a new mapping.

Snowflake Copy

This step is very similar to the one before. In the target tab we will have the following:

OdiOSCommand "-OUT_FILE=<%=odiRef.getTable("TARG_NAME")%>_copy.log" "-ERR_FILE=<%=odiRef.getTable("TARG_NAME")%>_copy.err"

snowsql -c #P_CONNECTION_NAME -w #P_SNOW_WAREHOUSE -r #P_SNOW_ROLE -d #P_SNOW_DATABASE -s #P_SNOW_SCHEMA -q 'copy into #P_SNOW_DATABASE.#P_SNOW_SCHEMA.<%=odiRef.getTargetTable("TABLE_NAME")%> from  @#P_STAGE_NAME/<%=odiRef.getTargetTable("TABLE_NAME")%>.txt.gz'

This one is issuing a copy from Snowflake stage to the final table. On the source tab, we have the same SQL shown on the prior step:

And that’s it. You are ready to push data from on-premises to Snowflake in a very fast way. It gives you some work upfront, but I may guarantee to you that it’s worth it.

Thanks, see you soon!

ODI in the hybrid database world – Snowflake JDBC

Posted in Uncategorized on April 13, 2022 by Rodrigo Radtke de Souza

This second post will talk about Snowflake integration with ODI. Let’s picture a scenario like the last post: your company has a large on-premises ETL/database footprint, but it is starting to move slowly to the cloud, in this case, Snowflake. You want to use the existing ODI architecture for this task, but Snowflake is not a technology that comes out of the box with ODI, so how could you do that? Let’s figure it out in this post.

Luckily a good friend of mine, Michael Rainey, wrote about it in his post here. I won’t go over the details because I don’t want to copy and paste what is already written there, but in a very resumed way you need to download the Snowflake JDBC driver, add it to the ODI agent, create/copy a new technology for Snowflake usage and that’s it. It will work just fine.

However, after working with it for some time, I found some details that I think its worth sharing with you. First, differently from the first post where Oracle is already a technology that ODI knows, Snowflake is not and for that reason you may start to face some small issues here and there regarding SQL statements for example. If the KMs or procedures that you are using are standard/universal SQL that both Oracle and Snowflake understands, it will work just fine. If the SQL is kind of different in Snowflake (meaning a different syntax) or it is Oracle exclusive, than you will need to start doing some customizations.  Luckily, most of these customizations should be very simple to adapt to Snowflake.

Another thing is data volume. For small workloads, it works well. For larger ones you will need to do some tweaks. One way to decrease the time loads is by playing with Array Fetch/Batch Update Sizes and Degree of Parallelism for Target, as you can see below.

Array Fetch/Batch Update Sizes are very hard to fine tune to an optimal value, since it depends on a lot of factors like length and size of the table, network, and so on. Sometimes you may fine tune for smaller tables, but larger tables suffer and vice versa. You will need to run some tests and see what the best value for your case is. However, Degree of Parallelism for Target is one that you may increase up to 20 without too much worry and you will see a huge gain. You cannot increase further because Snowflake (at least in my account) has a limit of 20 parallel threads working on the same object at a time.

If you want to see what is happening on the push of data to Snowflake, you may check the Load task of it in Operator and click on Details. It will show you the details and times that each thread took to execute it:

However, even with those tweaks in the Topology, I found myself into situations where the data load was just not fast enough. Upon doing some research and some testing, I figure out that there is a way to push data to Snowflake way faster than JDBC, and it is by using SnowSQL client. This one I’ll cover on the next post.

See you soon!

ODI in the hybrid database world – Oracle Autonomous Database

Posted in Uncategorized on April 13, 2022 by Rodrigo Radtke de Souza

Hi all, today I’ll start a series of four post related to the ODI position in a hybrid database world. Everybody knows for quite some time that the cloud is the future. Some companies may delay its adoption, but it will eventually happen in a way or another. However, this adoption will probably not be all at once. Companies, especially the ones that have a large investment on-premises, will need to live in a hybrid mode until things get migrated, built, adapted. And this takes time, a lot of time.

Also, people often start thinking about migrating to the cloud by either:

  • Massively migrating the existing database/data to the cloud, which may sound very promising in the paper, but generally fails miserably when trying to implement, simply because cloud and on-premises are not the same thing (even if the marketing guys tells your boss that its all the same and the migration is a piece of cake).
  • Starting from scratch, which is great for new projects, but most people already have invested and need their on-premises architecture and don’t want to redo all the existing stuff again.

The truth is that companies will end up building something hybrid: whatever is new will be developed thinking on the cloud already but whatever already exists will be integrated (not migrated) into the cloud by stages, until up to some point that the old process either gets converted completely or gets replaced by something else new on the cloud.

For those that had ODI as their ETL tool on-premises, they will find it easy to integrate things to the cloud using whatever they have today. This is because ODI is great to incorporate technologies that does not come out of the box in an easily matter. For this series of four posts, I’ll be talking about the following:

  • Integrating with Oracle Autonomous Database
  • Integrating with Snowflake JDBC
  • Integrating with Snowflake – Files/Stages and SnowSQL
  • Integrating with Google Big Query

For this first post, let’s start with “Integrating with Oracle Autonomous Database” just because its extremally easy to do. Let’s imagine the scenario. You already have a large ETL architecture on-premises and your company started to use Oracle Autonomous Database as their cloud solution. Instead of migrating all at once, they will do it by stages, leveraging everything that they already have build and pushing only essential data to the cloud. Since its a hybrid approach, maybe they even want to get data from the cloud to the on-premises database, to support some existing application.

First thing to do in ODI is to create a new Data Server in the Oracle Technology:

Add the user and password that will be used to connect. Now, instead of adding the JDBC details, as we usually would do, click on “Use Credential File”:

You will need to point to the file that has the connection to your cloud DB. To get this, go to your Oracle DB instance in the cloud, click on DB Connection and download the wallet file.

Add a password to it:

Save the Zip file and go to ODI. On Credential File select the zip file that you just downloaded. If the file is correct, you will be able to select the Connection Details below:

And its done. If you go to JDBC URL, you will see that ODI automatically populate all the info for you:

Click Test Connection to make sure all is correct, and you are good to go:

From this point on, since its Oracle, its all the same. You may do whatever you want with this database because its Oracle. The only difference is that is located somewhere in the cloud and not on-premises. One thing to notice though, is that, since its on the cloud, it will have network constraints. Data volumes will take time depending on several factors that are beyond this post and depends on each companies’ architecture. But the main thing is that you may create ODI mappings and procedures, and push/get data to/from the cloud as needed and in a very simple way.

See you son!

How to use your existing ODI on premise to seamlessly integrate PBCS (Part 6: Database Design and EPM Automate)

Posted in PBCS with tags on March 29, 2021 by Rodrigo Radtke de Souza

Hi all! Before jumping straight to ODI, let us take a moment to talk about how our database design will look like to be the “middle tier” between our sources and PBCS. We will split our design in 4 parts:

  • Sources: Can be any kind of source (Oracle, SQL Server, Teradata, File system, FTP…)
  • Stage Area: Where we’ll work the data
  • DW: Will have all the information in the right format that PBCS expects
  • File System: The data/metadata from the DW will be exported in the right format/layout and then zipped

While source and stage areas are common across several architectures out there, we need to take a moment to talk about the DW layer, where we have the DW table which will contain the data that we want to load to PBCS and a Metadata validation table. Before we can load data into PBCS we need to make sure that there is no invalid member on the file, and therefore we create a job to export metadata from PBCS in the first place. All existing metadata will be stored in a metadata table with the follow format:

Generally, we build this table to be partitioned by App_Name or Plan_Type, so we can retrieve its information in a faster manner, but it all depends on the size and the number of your applications. With this approach, we may validate all data in the DW tables against the Metadata table and remove/fix it before sending it to PBCS.

The “auxiliary” tool that we will use to send/extract data between our DW and PBCS is EPM Automate. The EPM Automate utility will act as a bridge between ODI and PBCS as it enables Service Administrators to automate many repeatable tasks including (but not limited to) the following:

  • Import and export metadata and data
  • Refresh the application
  • Run business rules
  • Upload files, list files and delete files from PBCS
  • Copy data from one database to another
  • Run a Data Management batch rule
  • Export and import application and artifact snapshots
  • Import pre-mapped balance data into Oracle Account Reconciliation Cloud
  • Import currency rates, pre-mapped transactions, and profiles into Oracle Account Reconciliation Cloud
  • Copy profiles to a period to initiate the reconciliation process
  • Deploy the calculation cube of an Oracle Hyperion Profitability and Cost Management Cloud application
  • Clear, copy, and delete Point of Views in Profitability and Cost Management applications
  • Export and import template in Profitability and Cost Management applications
  • Replay Oracle Smart View for Office load on a service instance to enable performance testing under heavy load

Next post we will show one example of how to Load Data using EPM Automate with ODI. Stay tunned!

How to use your existing ODI on premise to seamlessly integrate PBCS (Part 5: Import Metadata Jobs)

Posted in Uncategorized on March 26, 2021 by RZGiampaoli

Hey guys, how are you? Continuing the how to integrate PBCS seamlessly using you existing ODI environment series (Part 4 Here), today we’ll talk about Importing Metadata Jobs.

As you can imagine, import Metadata will also be simple in PBCS, we just need to pay attention in the file format and that’s it.

To import Metadata, we need:

As usual, first we need to select our inbox, then the name of the files for each dimension, the delimiter for each dimension and if you want to clear the members before load the new members you just need to check Clear Members.

That’s it, all dimensions will be load at once, but we need separated files, one for each dimension. These files can also be in zip format that PBCS will automatically unzip for us.

Now the important part, the file format. This are all properties that PBCS expect when loading metadata.

That’s it. With that we finally finished all the setups we need in PBCS for our ODI jobs to work. One import thing that I need to point out is that Oracle will update PBCS with new versions and the file format can change over time. If that happens, you’ll need also to update your ODI Jobs.

It happened to me during more than one project and is not a big deal, but you need to be aware that if your job starts to fail, this is what can be happening.

I hope you guys enjoy it, stay safe and see you soon.

How to use your existing ODI on premise to seamlessly integrate PBCS (Part 4: Outbound Jobs)

Posted in Uncategorized on March 22, 2021 by RZGiampaoli

Hey guys, how are you? Continuing the how to integrate PBCS seamlessly using you existing ODI environment series (Part 3 Here), today we’ll talk about Outbound Jobs.

In the same ways as the Inbound Jobs, the we need to create a job to extract both data and metadata for ODI to consume and populate our DW as well as use the Metadata for validation.

So, to extract data from PBCS is also easy. First, we need to choose the outbox location to enable save as job (Local does the same as for the Inbound job and enable you an once execution only).

For the outbound we need to choose the plan type, that means, we’ll need at least one job for each plan type. We also need to choose the delimiter (I always like pipeline because is easy to see and is not used in any command) and if you use smart list you can choose if you want to export the labels or the names.

And finally, you set the POV to be exported. You can use essbase substitution variables here if you want to and as you can see the export will also have the same format as planning import, accounts on the rows, periods on the columns and the POV (the plan type as well).

You can change the format if you wish, but I advise to maintain the consistency between the jobs for the sake of dynamic components.

After running this job, PBCS will generate a zip file in his outbox, we just need to go there and download it.

For the Metadata export the idea is the same but a little bit simpler than the others, we just need to select our outbox, the dimensions you want to export and the delimiter and that’s it. PBCS will create one zip file per dimension in our outbox.

That’s it for today. I hope you guys enjoy it, stay safe and see you soon.