Archive for the ODI Category

ODI in the hybrid database world – Amazon Redshift – AWS CLI

Posted in AWS, Cloud, ODI with tags , , on June 5, 2023 by Rodrigo Radtke de Souza

Written on June 5, 2023 by Rodrigo Radtke de Souza

Hi all, probably this is the last post of this series on how to load data from on-premises databases to the major public clouds providers. Today’s post is about Amazon Redshift. Right now, we all know the pattern that we are following: We will duplicate “IKM SQL to File Append” and add some extra steps at the end of it to push the file to the cloud and load the target table. It will look like below:

The first extra command is GZIP, which is the same as the other posts. After we GZIP the file, we will push the file to a S3 bucket using the CP command. To do that, you will need to create a S3 bucket in AWS and install AWS CLI (which you can read about it here) in your ODI server. The AWS CP command is very simple and looks like this:

The next command (aws redshift-data) is a little more complex. redshift-data is an API (read about it here) used to manipulate data in redshift. One of its commands is execute-statement (read about it here), which can be used to issue a copy command (read about it here), that copies a file from a S3 bucket to a redshift table.

The command may vary depending on your setup, but in resume, it will look like this:

aws redshift-data execute-statement --database <database_name> --workgroup-name <workgroup_name> --region <region> --secret-arn <secret_arn> --sql "copy <target_table> from 's3://<bucket>/<file_name>.gz' credentials 'aws_access_key_id=<access_key_id>;aws_secret_access_key=<access_key_secret>' delimiter '|' IGNOREHEADER 1 ACCEPTINVCHARS GZIP region '<region>'"

Notice that there are a lot of variables to fill in. secret_arn, access_key_id and access_key_secret are all related to security and depending on your setup, you may even use different ways to authenticate and run the command. But in resume, you will request redshift-data api to run a execute-statement. This execute-statement will trigger a copy command to a target table from a s3 bucket GZIP file. When you run the mapping, it looks like this:

That’s it folks. I hope you have enjoyed this series. See you next time!

ODI in the hybrid database world – Azure SQL Database – BCP Utility

Posted in Cloud, ODI with tags , , on May 8, 2023 by Rodrigo Radtke de Souza

Written on May 8, 2023 by Rodrigo Radtke de Souza

Hi all! Continuing the ODI in the hybrid database world series, let’s talk today about Azure SQL Database. This one is the simplest so far in our series, but it has some limitations.

The process is again like what we have been doing for some time now. We will duplicate “IKM SQL to File Append” and add some extra steps at the end of it to push the file to the cloud and load the target table. In the Azure SQL Database case, we will only need the BCP Utility installed in the ODI agent server. You can read about BCP Utility here.

The limitation of using this utility (at least I didn’t find anything different than that) is that it does not work with a compressed file, which ends up sending the entire file over the internet in batch sizes (which is configurable). I found a solution that you could first load a compressed file to an Azure Blob storage and then use Azure Data Factory to load this compressed file to Azure SQL Database, but this kind of breaks the simplicity of the examples that we are presenting here in this series, so I won’t go over that example. This is how the IKM will look like:

The first five steps are from the original SQL to File Append IKM. The only new is the one that calls BCP.  This step is very straightforward:

You need to pass the table name in Azure SQL that you want to load, the path to the text file that you created in ODI, the SQL Server name (which will look like this: <MY_SQL_SERVER>.database.windows.net), the database name and finally the Azure SQL user and password that you want to use to connect to the cloud database. There are some additional parameters that indicates some information about the file, like -t “|” that indicates that this is a pipe delimited file and -F 2, which indicates that the data rows begin in the second line of the file, since the first one is the header.

When you run the mapping, you see the following:

If you open the logs, you are going to see that it sent the data in 1000 batches:

The batch size and some more interesting parameters can be seen in the BCP documentation. You may play around with them and see which one is better for your specific data load.

See you next time!

ODI in the hybrid database world – Oracle ADW – OCI/CLI + DBMS_CLOUD

Posted in Autonomous Database, Cloud, ODI with tags , , on April 14, 2023 by Rodrigo Radtke de Souza

Written on April 14, 2023 by Rodrigo Radtke de Souza

Hi! In all the previous posts about ODI in the hybrid database world, I always showed two versions of data loads for each technology, one being JDBC and another with text files. In the case of Oracle ADW, I posted just the JDBC example. So, let me correct that today. Let’s see how you can also use text files to load data into Oracle ADW tables.

The process is very similar to what we have been doing for some time now. We will duplicate “IKM SQL to File Append” and add some extra steps in the end of it to push the file to the cloud and load the target table. In the ADW case, we will need a couple of things to make it work:

  • An Oracle Bucket, which is a container for storing objects in a compartment within an Object Storage namespace in the Oracle cloud. More information about it here.
  • Install OCI CLI, which ODI will call to push the file to the Oracle Bucket. The steps on how to install it can be found here.

Those two you will need to create and install before proceeding. There is a third component needed, but this one already exists in the Oracle ADW: DBMS_CLOUD package. You may read about it here.

This package will be used to push the data from Oracle Bucket to Oracle ADW table. There is a configuration that you may need to do regarding DBMS_CLOUD credentials, but I’ll talk about it later.

This is what the final IKM will look like:

First five steps are from the original KM. The last three are brand new. Let’s take a look first on GZIP:

Here we are just zipping the text file before sending it to the cloud. Just notice that Oracle accepts other kinds of compression, so you may change it if you want to. I used GZIP because it seems to be a standard that works with all Public cloud vendors out there.

Next one is OCI PUT:

Very straight forward step that gets the GZIP file and uses OCI CLI to PUT the file in an Oracle Bucket that belongs to an Oracle NameSpace. You may refer to OCI PUT documentation here.

Last one is dbms_cloud copy_data step:

COPY_DATA may have other options. A full list can be found here.

All options are self explanatory. In this case we are loading a text file with columns separated by pipes (“|”). One option that you need to notice is file_uri_list. It indicates the text file URL that resides in your Oracle bucket. It may look like this:

https://swiftobjectstorage.region.oraclecloud.com/v1/namespace-string/bucket/filename

or like this:

https://objectstorage.region.oraclecloud.com/n/namespace-string/b/bucket/o/filename

To make things simpler in this example, I set the bucket as public, so I could access it without credentials. However, in a production environment, that won’t be the case and you will need to create a credential using DBMS_CLOUD.create_credential. You will run this command only once in your Oracle ADW instance:

begin
  DBMS_CLOUD.create_credential (
	credential_name => 'OBJ_STORE_CRED',
	username => 'XXXXX',
	password => 'YYYYY'
  ) ;
end;

You can read more about how to create the credential here. After you create the credential, you will use it in the COPY_DATA command as a parameter (credential_name).

When we run the mapping, we get something like this:

Very similar to the other posts, we are generating a text file, GZIPing it, calling OCI CLI to send the file to the cloud and then calling DBMS_CLOOUD.COPY_DATA procedure to load the data from the Oracle bucket to our target table.

I hope you liked it! See you next time!

Snowflake loader using ODI

Posted in ODI, Snowflake with tags , on April 13, 2023 by Rodrigo Radtke de Souza

Written on April 13, 2023 by Rodrigo Radtke de Souza

Hi all. Today’s post will talk about how to create a simple, but powerful Snowflake loader using ODI. Actually, you may use it with any technology that you want, but I ended up creating this one for Snowflake because I needed to load several Oracle tables to Snowflake in a fast and dynamic manner.

This post combines the techniques described here and here. In the first post we saw how to load text files to Snowflake. In the second post we saw how to create dynamic text dumps from Oracle. What if we combine both posts? We could create a dynamic load that will receive a table name as a parameter, create a text file and push this text file to Snowflake. Isn’t that cool?

Our dynamic loader will be an ODI Procedure and it will have the following steps:

You can get the code for “Get File Metadata”, “Create File”, “Truncate File”, “Create Header” and “Insert Rows” from my past post. There is a new step here called “Create target table” that will try to create a table in Snowflake if it does not exists yet. I added this step just as a convenient way to speed up my development process, since this procedure is nothing more than a job to get some table from Oracle and copy that to Snowflake. The code is below:

CREATE TABLE <%=odiRef.getSchemaName( "D" )%>.<@=table_name@>
(
<@=table_column_type@>
);

Just don’t forget to set the target schema with the logical schema from Snowflake where you want the table to be created and also set “Ignore Errors” in the procedure, so if the table exists it does not stop the process.

The last three steps are almost the same as in the first post, with one change. Instead of <%=odiRef.getTargetTable(“TABLE_NAME”)%>, now I’m getting the information from <@=folder_path@> and <@=table_name@>. E.g:

OdiOSCommand "-OUT_FILE=<@=folder_path@>_copy.log" "-ERR_FILE=<@=folder_path@>_copy.err"
snowsql -c #P_CONNECTION_NAME -w #P_SNOW_WAREHOUSE -r #P_SNOW_ROLE -d #P_SNOW_DATABASE -s #P_SNOW_SCHEMA -q "copy into #P_SNOW_DATABASE.#P_SNOW_SCHEMA.<@=table_name@> from  @#P_STAGE_NAME/<@=table_name@>.gz"

When we execute the package, we have the following:

The job received a table name as a parameter. It gets the metadata from the table name in Oracle, creates a file with its data, gzips it, sends it to Snowflake using PUT and COPY it into the final table (that was created for you if it didn’t exist before). Awesome right? If you notice it, this process will duplicate the data if you run it twice, since it does not truncate the target table before loading the data into it. I did that on propose because there was scenarios that I wanted to load several different datasets into a single table, in append mode. But, you may enhance this procedure and add a step to truncate the target table if you wish.

Hope you enjoyed! See you next time!

Building dynamic ODI code using Java variables – Table to File example

Posted in ODI with tags , on March 23, 2023 by Rodrigo Radtke de Souza

Written on March 23, 2023 by Rodrigo Radtke de Souza

Hi all. Today’s post is an “enhancement” of one post that I did a long time ago regarding using Oracle’s metadata to build dynamic code. I say it is an enhancement because that technique would use the source/target command tabs and it has a 4000 character limitation. So, if you are trying to build some metadata information that crosses this limit, it will fail.

You should be good using source/target command technique in 90% of your data loads, but there are a few exceptions that we need to go beyond 4000 characters. One of these examples is when you are trying to build dynamic code to generate text file loads in ODI. The example that I’ll show here is about dumping the data from any Oracle table to a text file dynamically, using only one ODI procedure.

This is what we will build. I’ll go over each step in detail:

Basically, it is an ODI procedure that will receive a table name as a parameter and it will write that table data into a text file using pipe “|” as delimiters. This is a very small but useful procedure that you may use when you need to create fast text files from tables/views without going through all the work to create ODI mappings for it.You will need two Logical Schemas, one for the source table and another one for the target file. In this example I used LOCAL_SOURCE that points to an Oracle database and OUTPUT_FILE that points to a folder where the text file will be created.First step is to get the file metadata. As I said, this will be done in a different way now. Instead of using source/target tab commands, we will use Java BeanShell to retrieve this information. This is because ODI creates a very lengthy metadata for each column and the total amount of metadata often crosses the 4000 characters limit. Just as an example, each column in a file text load looks similar to this:

"CRDWG_COLSNP$CRCOL_NAME="+metaDataSrc.getColumnName(i) + "SNP$CRTYPE_NAME=STRINGSNP$CRORDER="+i+"SNP$CRLENGTH=4000SNP$CRPRECISION=4000SNP$CRACTION_ON_ERROR=NULLSNP$"

Here is the full code for the “Get File Metadata” step:

<@
import java.sql.*;

String table_name = "#DWCLOUD.L_TABLE_NAME";
String folder_path = odiRef.getSchemaName("OUTPUT_FILE","D")+"/"+table_name;
String folder_path_txt = folder_path+".txt";
String table_schema = odiRef.getSchemaName("LOCAL_SOURCE","D");
String table_metadata = "/*$$SNPS_START_KEYSNP$CRDWG_TABLESNP$CRTABLE_NAME="+table_name+"SNP$CRLOAD_FILE="+folder_path_txt+"SNP$CRFILE_FORMAT=DSNP$CRFILE_SEP_FIELD=0x007cSNP$CRFILE_SEP_LINE=0x000D0x000ASNP$CRFILE_FIRST_ROW=1SNP$CRFILE_ENC_FIELD=SNP$CRFILE_DEC_SEP=SNP$CRSNP$";
String table_header = "";
String table_header_parameter = "";

Connection ConSrc = DriverManager.getConnection("<%=odiRef.getInfo("SRC_JAVA_URL")%>","<%=odiRef.getInfo("SRC_USER_NAME")%>",odiRef.getInfo("SRC_PASS"));
Statement stmtSrc = ConSrc.createStatement();
ResultSet querySrc =  stmtSrc.executeQuery("select /*+ parallel(64) */ * from "+ table_schema +"." + table_name + " where 1=2");
ResultSetMetaData metaDataSrc = querySrc.getMetaData();

int count = metaDataSrc.getColumnCount();
for (int i=1; i<=count; i++)
  {
	table_metadata = table_metadata + "CRDWG_COLSNP$CRCOL_NAME="+metaDataSrc.getColumnName(i) + "SNP$CRTYPE_NAME=STRINGSNP$CRORDER="+i+"SNP$CRLENGTH=4000SNP$CRPRECISION=4000SNP$CRACTION_ON_ERROR=NULLSNP$";
	table_header = table_header+metaDataSrc.getColumnName(i);
	table_header_parameter = table_header_parameter+":"+metaDataSrc.getColumnName(i);  
    	 
	if (i==count)
	{
  	table_metadata=table_metadata+"CR$$SNPS_END_KEY*/";
	} else
	{   
  	table_metadata=table_metadata+"CRSNP$";
  	table_header = table_header + ",";
  	table_header_parameter = table_header_parameter + ",";    
	}
  }

ConSrc.close();
@>

How it works? It uses a variable (that you will use in an ODI package later on), to figure out which table it will dump into a file. It actually can be other objects that you can query, like a view. Views are actually great to be used here since you may create any kind of business logic and filters on the view and dump the data to a file when needed. It also uses both logical schemas (source and target) to figure out where to read the source data and place the target file.

It then creates a connection object using the source connection that you set in the ODI procedure (please pay close attention to each source/target logical schema in the ODI procedure) and it runs a query where 1=2 just to retrieve the table/view metadata.

Having this metadata in the result set, it loops through it and starts to “build” the dynamic code for the header and the columns. It does some special conditions when it reaches the last column and then it closes the connection.

It may sound complicated, but it’s fairly simple. In less than 40 lines of code we are able to retrieve all metadata to dump any table’s data to a flat file in a dynamic way. The other steps are now very straight forward.

“Create File” step will be as simple as this:

create table "<@=folder_path@>"
<@=table_metadata@>

Same for “Truncate File”:

truncate table "<@=folder_path@>"
<@=table_metadata@>

“Create Header”:

create header (<@=table_header@>)
<@=table_metadata@>

“Insert Rows” is the only one that needs two steps. In Source Command we have:

select     
    /*+ parallel(64) */ <@=table_header@>
from    <@=table_schema@>.<@=table_name@>
where    
    (1=1)    

This one is used to read the source table. In Target Command we have the insert statement to the file object:

insert into "<@=folder_path@>"
(
	<@=table_header@>
)
values 
(
	<@=table_header_parameter@>
)
<@=table_metadata@>
  

Add this procedure in an ODI package, alongside with the ODI variable for the table name:

Add the table/view object that you want to dump the records. In this example, I used a table named SAMPLE_DATA. When I executed the package, I have the following:

If I go to the folder path defined in OUTPUT_FILE logical schema, I can see the data file created there (they are all fake/random rows, not real people information):

I hope you liked this post. See you soon!

ODI Hidden Gems – SNP tables: Query to get load plan execution steps

Posted in ODI, Query with tags , on December 6, 2022 by Rodrigo Radtke de Souza

Hi all,

Today is a short post, but very useful. I often get asked about how to write SNP table queries to get some information from ODI repositories. I posted one about “executed code” here. Today, this is another example of how to get a load plan execution steps that were executed. This one I use in the end of every load plan execution, so it creates a mini report of how each step performed in that load plan. This is useful for ODI Operators, so they can do some analysis of how well their jobs are doing.

This query was created using ODI 12.2.1.4.

WITH 
LP_STEPS AS (
SELECT SLS.I_LOAD_PLAN
     , LP.LOAD_PLAN_NAME
     , I.I_LP_INST
     , SLS.LP_STEP_NAME
     , SLS.I_LP_STEP
     , SLS.PAR_I_LP_STEP
     , SLS.STEP_ORDER
     , LEVEL AS LEVEL_
     , SYS_CONNECT_BY_PATH(SLS.LP_STEP_NAME, ' -> ') AS STEP_NAME_PATH
     , SYS_CONNECT_BY_PATH(SLS.STEP_ORDER, ' ') AS STEP_ORDER_PATH
     , SYS_CONNECT_BY_PATH(SLS.IND_ENABLED, ' ') AS IND_ENABLED_PATH
     , SYS_CONNECT_BY_PATH(DECODE(SLS.LP_STEP_TYPE, 'EX', 1, 0), ' ') AS EXCEPTION_STEP_PATH
     , CONNECT_BY_ISLEAF IS_LEAF
     , SLS.IND_ENABLED
  FROM SNP_LOAD_PLAN LP
    INNER JOIN SNP_LP_STEP SLS
        ON SLS.I_LOAD_PLAN = LP.I_LOAD_PLAN
    INNER JOIN SNP_LP_INST I
        ON SLS.I_LOAD_PLAN = I.I_LOAD_PLAN
        AND I.GLOBAL_ID = '<LOAD_PLAN_INSTANCE_GLOBAL_ID_HERE>'
 CONNECT BY PRIOR SLS.I_LP_STEP = SLS.PAR_I_LP_STEP 
  START WITH SLS.PAR_I_LP_STEP IS NULL
)  
, ALL_ AS (
SELECT L.I_LOAD_PLAN
     , L.LOAD_PLAN_NAME
     , L.I_LP_INST
     , L.LP_STEP_NAME
     , L.I_LP_STEP
     , L.STEP_NAME_PATH
     , L.STEP_ORDER_PATH
     , CASE WHEN L.IND_ENABLED_PATH LIKE '%0%' THEN 'N' ELSE 'Y' END ENABLED
     , MAX(IL.SESS_NO) AS SESS_NO
  FROM LP_STEPS L
    LEFT JOIN SNP_LPI_STEP_LOG IL
        ON L.I_LP_INST = IL.I_LP_INST
        AND L.I_LP_STEP = IL.I_LP_STEP
 WHERE 1 = 1
   AND L.IS_LEAF  = 1
   AND EXCEPTION_STEP_PATH NOT LIKE '%1%'
 GROUP BY L.I_LOAD_PLAN
        , L.LOAD_PLAN_NAME
        , L.I_LP_INST
        , L.LP_STEP_NAME
        , L.I_LP_STEP
        , L.STEP_NAME_PATH
        , L.STEP_ORDER_PATH 
        , L.IND_ENABLED_PATH
) 
SELECT A.I_LOAD_PLAN
     , A.I_LP_STEP
     , A.LOAD_PLAN_NAME
     , A.LP_STEP_NAME
     , A.STEP_NAME_PATH
     , A.SESS_NO
     , CASE WHEN A.ENABLED = 'N' THEN 'Step has been disabled'
            WHEN A.SESS_NO IS NULL THEN 'Step did not run'
            WHEN S.SESS_STATUS = 'D' THEN 'Step finished successfully'
            WHEN S.SESS_STATUS = 'E' THEN 'Step failed'
            WHEN S.SESS_STATUS = 'W' THEN 'Step finished with warnings'
            ELSE 'Other status not relevant'
        END STATUS
     , TO_CHAR(S.SESS_BEG, 'DD-MON-YYYY HH24:MI:SS') AS SESS_BEG
     , TO_CHAR(S.SESS_END, 'DD-MON-YYYY HH24:MI:SS') AS SESS_END
     , S.SESS_DUR
     , REPLACE(REPLACE(REPLACE(SUBSTR(S.ERROR_MESSAGE, 1, INSTR(S.ERROR_MESSAGE, CHR(9) || 'at ')), CHR(10), ' '), CHR(13), ' '), CHR(9), ' ') AS ERROR_MESSAGE
  FROM ALL_ A
    LEFT JOIN SNP_SESSION S
        ON A.SESS_NO = S.SESS_NO
 ORDER BY A.STEP_ORDER_PATH;

I hope you like it! See you soon!

ODI in the hybrid database world – Google BigQuery – gsutil + bq

Posted in BigQuery, Cloud, Google, ODI with tags , , on December 5, 2022 by Rodrigo Radtke de Souza

As we saw in our last post, ODI + BigQuery + Simba Jdbc didn’t give us the performance that we were expecting. But we may apply the same concept that we did for Snowflake: we may create a file, zip it (using GZIP), send it to a storage location (using gsutil) and load it into a table (using bq). Luckily, this is easy to do in ODI and Bigquery. First, we will need to install Google Cloud CLI, which will enable you to use both gsutil and bq. You may read about it here.

Next, we will follow the same steps that we did for Snowflake, first we will create a copy of an existing IKM. In this case, I duplicated IKM SQL to File Append and gave it a new name:

On Tasks, I left some original ones (which will create a text file for us) and I created three new steps: GZIP, Copy and Load.

The first new step is GZIP, because we want to compress our file before moving it to the cloud. Google BigQuery may load zipped files directly from Cloud Storage, but they need to be in a GZIP format. So, on “Target Command” I added this:

OdiOSCommand "-OUT_FILE=<%=odiRef.getTable("TARG_NAME")%>_zip.log" "-ERR_FILE=<%=odiRef.getTable("TARG_NAME")%>_zip.err"

"C:\Program Files\7-Zip\7z.exe" a "<%=odiRef.getTable("TARG_NAME")%>.GZ" "<%=odiRef.getTable("TARG_NAME")%>.txt"

For the sake of simplicity in this example, the path is hardcoded, but in a real-world project, we would make this path dynamic.

Second one is GCS Copy. In this step, we will call “gsutil cp” command that will get a file (in this case GZIP file that we created in the previous step) and send it to a bucket in Google Cloud Storage:

OdiOSCommand "-OUT_FILE=<%=odiRef.getTable("TARG_NAME")%>_cp.log" "-ERR_FILE=<%=odiRef.getTable("TARG_NAME")%>_cp.err"
gsutil cp "<%=odiRef.getTable("TARG_NAME")%>.GZ" gs://<YOUR_GCS_BUCKET_NAME_HERE>

Last step is using “bq load” command that will get the file that is stored in your bucket and load it to a table in BigQuery:

OdiOSCommand "-OUT_FILE=<%=odiRef.getTable("TARG_NAME")%>_bq.log" "-ERR_FILE=<%=odiRef.getTable("TARG_NAME")%>_bq.err"
echo bq load --source_format=CSV --skip_leading_rows=1 --field_delimiter="|" GOOGLE_TARGET.SAMPLE_DATA gs:// <YOUR_GCS_BUCKET_NAME_HERE>/SAMPLE_DATA.GZ| C:\WINDOWS\system32\cmd.exe /k ""C:\Users\<YOUR_USER>\AppData\Local\Google\Cloud SDK\cloud_env.bat""

This command is a little messy, but it works. In your project you may do something more elaborate, but for the sake of this example, it will work. Basically, I’m using the echo trick to call two command lines at once. The first one I run “cloud_env.bat”, which will set my cloud environment variables (this comes with Google CLI installation). Then, I call “bq load”, passing as parameters the source format, the number of heading lines and the field delimiter. In this case, we are using pipe “|” as our column delimiter. After that we pass which Dataset/Table we want to load and from which file (in this case our GZ file).

Seems complicated at first, but the data load speed is worth it. And since all this is in an IKM, you will do it once and reuse it repeatedly.

Before we create a mapping, we need to remember that now we won’t be loading BigQuery directly, but instead we will be loading a text file first. So, we need to create a new model pointing to a File logical schema that will be used to hold our temporary files. For this example, I’ve created a delimited file using pipe and with header. For the columns, I created all of them as Strings.

Now we may create our mapping. In this case, its pretty simple. We just add our Oracle source table and our File target table.

When we execute it, we can see that I was able to load 1,077,500 rows in 76 seconds. Not bad at all:

Once again, we saw that working with files and command line tools were way faster than JDBC connections. Also, if you think a little bit about it, it has other cool benefits. If you threat all your cloud integrations as file transfers, you may create an architecture that can be used across multiple cloud vendors. Your on-premises output will always be compressed files that will be loaded to cloud systems using different ODI Knowledge modules, one for each vendor. Since all of it are text files, ODI mappings may be used to create all sorts of complex logic using your on-premises database as staging and then only pushing the result to the cloud.

I hope you have enjoyed. See you soon!

ODI in the hybrid database world – Google BigQuery – Simba JDBC

Posted in BigQuery, Cloud, ODI with tags , on November 23, 2022 by Rodrigo Radtke de Souza

Hi all! Continuing our series, today we will talk about how to load data into BigQuery using ODI. There will be two posts, one talking about Simba JDBC and another one talking about Google Cloud SDK Shell. You will notice that the first approach (using Simba JDBC) has a lot of challenges, but I think its worth looking at it, so you have options (or at least know why you should avoid it).

First, you need to download Simba JDBC and add it to your ODI client/agent (like what was done with Snowflake JDBC here). Also, you need to “duplicate” one ODI technology (I used Oracle in this example) and name it as BigQuery:

In JDBC Driver, you will add the Simba driver and in URL you will add your connection to Google BigQuery. I’ll not go over the details here, since there are a lot of ways for you to create this URL, but you may check all the ways to create it in this link.

Press “Test Connection” to see if it is all working as expected:

Create a new Physical schema and give it the name of your BigQuery Dataset:

Create a new Logical Schema and a Model. If all is correct, you will be able to Reverse Engineer as you normally do with other technologies:

Let’s go ahead and just create a new mapping. This is a simple one: getting data from an Oracle database and loading it to BigQuery:

As LKM, I’m using LKM SQL to SQL:

As IKM, I just used a Global one:

When I ran it, it fails. If we check Operator, we will see that it couldn’t insert new rows because it was not able to create the C$ temporary table. If we look further, we see that it considers “$” as an illegal character in BigQuery tables:

If we get the create table statement from ODI Operator and try to execute in BigQuery, we will see a lot of issues with that:

First, as I said before, we cannot have $ signs, so let’s try to remove it:

It also does not accept NULL. Removing them gave another error:

Varchar2 is not a thing in BigQuery, so if we change it to String, it will work fine:

So, just for us to create a temporary C$ tables, we had to do several changes to it. It means that ODI/BigQuery integration is not straightforward as we did for Oracle Cloud or Snowflake. We will have to do some changes and additions to BigQuery technology in ODI. Let’s start with $ signs. Double click BigQuery and go to Advanced tab. There, lets remove the $ sign from temporary tables (I removed only in C$ for this example):

Also, you need to remove the DDL Null Keyword:

You may need to do way more changes depending on what you need to load, but for this example, we will try to keep as simple as possible.

Next step is to create a String Datatype. You may go to BigQuery technology and duplicate and existing Datatype. For this example, I used Varchar2. Then I changed the Name/Code/Reverse Code/Create table syntax/Writable Datatype syntax.

Another step is to change our source technology, in this case Oracle. We need to do that because ODI needs to know what to do when converting one datatype from one technology to another. We do this by going to Oracle Technology and double clicking in each source datatype that we have and what they compare to the new target technology (BigData). For this example, we will only be getting Varchar2 columns to String columns, so we will just make one change. In a real-world scenario, you would need to change all the ones that you use. Double click Oracle’s VARCHAR2 Datatype, click on “Converted To” tab and select String on BigQuery Technology.

One last change is to go to our BigQuery datastore and change the column types to String:

This should be enough for us to test the interface again. When we run it, no error happens this time:

However, the data load takes forever to complete (I cancelled after some hours trying to run a 1 M rows table). The ODI job was able to create the work table and I could see data flowing in BigQuery, but the speed was just too slow:

I tried to change the Array Fetch/Batch Update size/Parallelism in the Google BigQuery Data Server ODI object, but none of the values that I tried seemed to work fast as I wish:

Not sure if the slowness was due to something in my architecture, but at this point I just thought that, even if worked reasonably fast, it wouldn’t be as fast as creating a file and send it directly to BigQuery (similarly on what we did for Snowflake). Also, the amount of customization is so high to load a simple table, that I really don’t want to think on how bad it will get when we get to some complex mappings.

Not surprisingly, my tests with text files were extremely fast and the customizations needed are minimum compared to the JDBC method, so that’s what we will be covering in the next post.

Thanks! See you soon!

How to use your existing ODI on premise to seamlessly integrate PBCS (Part 3: Inbound Jobs)

Posted in Cloud, EPM Automate, ODI, PBCS, Uncategorized with tags , , on March 18, 2021 by RZGiampaoli

Hey guys, how are you? Continuing the how to integrate PBCS seamlessly using you existing ODI environment series (Part 2 Here), today we’ll talk about Inbound Jobs.

We already know that we’ll need 4 type of jobs for our integration then let’s see what we need to do to create an Inbound Job:

The first thing we need to do is to set the location as inbox (or outbox in case of an extract). This is what enables the button save as a job. If you choose Local, it enables you to run just for one time.

Second, source type. We have 2 choices there, Essbase and Planning. Essbase has the same format that we are used with Essbase exports, the problem is that if we select Essbase we’ll need to have one job per Plan type.

Planning in the other hand has a special format that is very handy for create dynamic objects in ODI as we can see in the table below:

As we can see, the planning format has a column for the account, then the periods, a POV column and the data load cube name (plan type).

The periods on the column is already a nice thing to have since will be way fester to load and way more optimized since we’ll have one how for the entire year.

Of course, this will only be true when we are talking about forecast because actuals normally, we load just one month per time… even so I prefer this format then have one column for the period and another for data.

The POV is also nice because doesn’t matter how many dimensions we have in one plan type, everything will be in the same column, then for integrate is simple since we just need to concatenate all columns but accounts in one column. I recommend using Regular expression and LISTAGG to do so (best function ever for dynamic components).

And to end we need to inform to each plan type that data will be loaded, very nice, you can load all your plan types at once. One job, one file and everything can be done by one generic component in ODI.

After that we need to choose the delimiter used in the file, in this case, I choose pipeline because it’s very hard to have this character in the middle of any metadata.

Finally, we just need to inform the name of the file. That’s all we need to create a job that we can call using EPMAutomate anytime we need it.

ODI Hidden Gems – Unique temporary object names

Posted in Gems, ODI, Tips and Tricks with tags , , on March 11, 2021 by Rodrigo Radtke de Souza

Hi all, I was not going to write about this one because I thought that this “hidden” gem was already known to every single ODI 12 developer out there, but I still get questions on why sometimes some specific data loads fails when they run in parallel, and they work fine when they run in serial. Most of the times, those are related to how ODI handles the temporary objects that it creates to do ETL (like C$, I$, E$ tables).

Let us see one example, which is the default in ODI. I have created one very simple mapping that has one source and one target table.

This mapping is loading data from two databases that resides in different data servers, so it will need to create a C$ table to be able to transfer the data. If we look at ODI Operator, we will notice the following:

The C$ table that it created is named as C$_0SECTIONTYPE. By default, ODI will create this name based on the source component that the data was generated from, so in this case it was a table called SECTIONTYPE. The “0” in front of it is an incremental number that would increase if you had another source with the same name in the mapping. For example, if you had SECTIONTYPE mapped twice as source tables, one would be loaded as C$_0SECTIONTYPE and the other C$_1SECTIONTYPE. ODI does that so we do not have a clash between names within the same ODI mapping.

However, what would happen if you tried to run the same mapping or another mapping that also contains SECTIONTYPE as source at the same time? As you may imagine, one mapping would interfere in the other, since both C$ tables would be called C$_0SECTIONTYPE and both mappings would be trying to load/read/drop it at the same time, which would cause a failure (in a good scenario) or wrong data (in a bad scenario).

To avoid this kind of issue to happen, ODI 10/11 developers were very creative in the past and would create some Java variables and some tweaks in some ODI KMs to make the temp table names dynamic. However, ODI 12 introduces something way simpler to handle this kind of situation, however, its not default and it is kind of “hidden”. If you go back to the mapping, click on the Physical tab, and scroll all the way down. You will notice a check box that says “Use Unique Temporary Object Names” that is unchecked. If you check this one and run it again, you will see the difference.

ODI now created a table named C$_0SECTIONTYPEAIHLNQMAVPK7Q1FR66UM225DF3, which is totally unique, and it will never clash with another mapping running in parallel. But then, another question arises: should I go to each mapping to check this option, if I want all to be unique? Well, the answer is no. There is another “hidden” gem that you can use.

Go to ODI Topology and double click your ODI agent, then go to Properties.

Luckily, you may enable the use of unique temporary objects at an ODI agent level, so you do not need to go back to each mapping and changing them.

That is it for today. See ya!