ODI in the hybrid database world – Files/Stages and SnowSQL

Posted in Uncategorized on April 14, 2022 by radk00

Hi all, as mentioned in the second post of this series, there is a faster way to load data into Snowflake using ODI. It will require the creation of new KM, it has some peculiarities that some may think as limitations at first but the speed of the data loading is totally worth the work and the preparations needed for making it work. First, lets just picture how is the current Snowflake JDBC process.

It’s simple and very straight forward. You create an ODI mapping that will read your on-premises DB and send data over to Snowflake using JDBC. If you have small data loads or if you are happy with the time that the jobs are taking using this method, I recommend you stay with it because its pretty simple and works.

Now, if you need speed due to the volume of data that you need to transfer, you may create the following architecture:

Let’s describe the steps. First ODI will be used (using a Mapping) to generate a text file. The format may be anything you like (it needs to match the Snowflake stage definition, as you will see below). I’m using a pipe | delimited file for this example. Then ODI calls SnowSQL client (more on that later) that will compress and push the file to a Snowflake STAGE area, which will then be finally copied over to the final table.

If you stop and think a little bit about it for a second now, it seems very stupid. You have the data in a database, you extract it to a file, then you call a process to compress/push over the internet, stage it then finally copies it. Its way more work than the first method, right? It is way more work, it also requires space to store the text file, however, its way faster then JDBC.

You see, the main bottleneck when working with the cloud is exactly the transfer over the internet. With the second technique, what we end up doing is to zip a large file and send it across the network all at once, instead of relaying in a Java JDBC connector that is buffering some X number of rows and sending it across repeatedly. The amount of work that the JDBC driver does internally is way more and way slower than just creating a file, compress it, send it.

Also, cloud structures are awesome on working with files. Every cloud provider out there makes it very easy and fast to manipulate “raw” data. Snowflake is no different. It will stage and copy the compressed text file in an extreme speed, way faster then batch of rows using JDBC.

If you are still not sure if you should follow this route, my answer is wait until you really need to create a fast process and give it a go. You may do a test by simply extracting the data to a text file and run SnowSQL commands to push the data. You will see it will be super-fast.

Let’s see how we should implement this. First thing is that you will need to install SnowSQL client on your architecture (ODI agent server). This client will be the one called to execute things in Snowflake, including pushing the file and copying it. I won’t go over the details on SnowSQL, but you may read it all in this documentation from Snowflake.

Another thing that I’ll just assume that you know is how STAGES, PUT and COPY commands work in Snowflake. You may read about their documentation here, here and here.

Second step is to create a copy of the current “IKM SQL to File Append” and give it a new name, in my case its “IKM SQL to File Append – Snowflake PUT”. Delete some steps of it and leave only the ones below:

These ones will basically just create a file in a server. This file needs to have the same name as the table that you will want to load in Snowflake, plus the “.txt” extension (e.g., if you are loading CLIENTS table, you need to create CLIENTS.txt file in the server). The target Datastore definition may be anything you want, but I’m following this pattern:

Now you need to add only two more steps in the KM, as below:

Snowflake PUT

The target command is the following:

OdiOSCommand "-OUT_FILE=<%=odiRef.getTable("TARG_NAME")%>_put.log" "-ERR_FILE=<%=odiRef.getTable("TARG_NAME")%>_put.err"

snowsql -c #P_CONNECTION_NAME -w #P_SNOW_WAREHOUSE -r #P_SNOW_ROLE -d #P_SNOW_DATABASE -s #P_SNOW_SCHEMA -q 'PUT file://<%=odiRef.getTable("TARG_NAME")%>.txt @#P_STAGE_NAME auto_compress=#P_AUTO_COMPRESS parallel=#P_PUT_PARALLEL'

We can see that it is basically one OS command that is calling snowsql client. It is passing all the connection information in order to login and then it is issuing a PUT command to Snowflake. This PUT command is sending a text file to a stage area, with auto compress and a defined number of parallel workers. If you are familiar with ODI, you know that all of those # variables need to come from some place. You may implement in any way you want, but in my case, I did put a SQL statement in the command on source tab that returns all this information from a parameter table that is located in the on-premises database, like below:

I even added a CONFIG_CODE filter (which is a KM option that I added to this new KM), in the case of having multiple Snowflake configurations (which is very common to have). So, if you have multiple configs, you may add this option on your new KM and use it when you are creating a new mapping.

Snowflake Copy

This step is very similar to the one before. In the target tab we will have the following:

OdiOSCommand "-OUT_FILE=<%=odiRef.getTable("TARG_NAME")%>_copy.log" "-ERR_FILE=<%=odiRef.getTable("TARG_NAME")%>_copy.err"

snowsql -c #P_CONNECTION_NAME -w #P_SNOW_WAREHOUSE -r #P_SNOW_ROLE -d #P_SNOW_DATABASE -s #P_SNOW_SCHEMA -q 'copy into #P_SNOW_DATABASE.#P_SNOW_SCHEMA.<%=odiRef.getTargetTable("TABLE_NAME")%> from  @#P_STAGE_NAME/<%=odiRef.getTargetTable("TABLE_NAME")%>.txt.gz'

This one is issuing a copy from Snowflake stage to the final table. On the source tab, we have the same SQL shown on the prior step:

And that’s it. You are ready to push data from on-premises to Snowflake in a very fast way. It gives you some work upfront, but I may guarantee to you that it’s worth it.

Thanks, see you soon!

ODI in the hybrid database world – Snowflake JDBC

Posted in Uncategorized on April 13, 2022 by radk00

This second post will talk about Snowflake integration with ODI. Let’s picture a scenario like the last post: your company has a large on-premises ETL/database footprint, but it is starting to move slowly to the cloud, in this case, Snowflake. You want to use the existing ODI architecture for this task, but Snowflake is not a technology that comes out of the box with ODI, so how could you do that? Let’s figure it out in this post.

Luckily a good friend of mine, Michael Rainey, wrote about it in his post here. I won’t go over the details because I don’t want to copy and paste what is already written there, but in a very resumed way you need to download the Snowflake JDBC driver, add it to the ODI agent, create/copy a new technology for Snowflake usage and that’s it. It will work just fine.

However, after working with it for some time, I found some details that I think its worth sharing with you. First, differently from the first post where Oracle is already a technology that ODI knows, Snowflake is not and for that reason you may start to face some small issues here and there regarding SQL statements for example. If the KMs or procedures that you are using are standard/universal SQL that both Oracle and Snowflake understands, it will work just fine. If the SQL is kind of different in Snowflake (meaning a different syntax) or it is Oracle exclusive, than you will need to start doing some customizations.  Luckily, most of these customizations should be very simple to adapt to Snowflake.

Another thing is data volume. For small workloads, it works well. For larger ones you will need to do some tweaks. One way to decrease the time loads is by playing with Array Fetch/Batch Update Sizes and Degree of Parallelism for Target, as you can see below.

Array Fetch/Batch Update Sizes are very hard to fine tune to an optimal value, since it depends on a lot of factors like length and size of the table, network, and so on. Sometimes you may fine tune for smaller tables, but larger tables suffer and vice versa. You will need to run some tests and see what the best value for your case is. However, Degree of Parallelism for Target is one that you may increase up to 20 without too much worry and you will see a huge gain. You cannot increase further because Snowflake (at least in my account) has a limit of 20 parallel threads working on the same object at a time.

If you want to see what is happening on the push of data to Snowflake, you may check the Load task of it in Operator and click on Details. It will show you the details and times that each thread took to execute it:

However, even with those tweaks in the Topology, I found myself into situations where the data load was just not fast enough. Upon doing some research and some testing, I figure out that there is a way to push data to Snowflake way faster than JDBC, and it is by using SnowSQL client. This one I’ll cover on the next post.

See you soon!

ODI in the hybrid database world – Oracle Autonomous Database

Posted in Uncategorized on April 13, 2022 by radk00

Hi all, today I’ll start a series of four post related to the ODI position in a hybrid database world. Everybody knows for quite some time that the cloud is the future. Some companies may delay its adoption, but it will eventually happen in a way or another. However, this adoption will probably not be all at once. Companies, especially the ones that have a large investment on-premises, will need to live in a hybrid mode until things get migrated, built, adapted. And this takes time, a lot of time.

Also, people often start thinking about migrating to the cloud by either:

  • Massively migrating the existing database/data to the cloud, which may sound very promising in the paper, but generally fails miserably when trying to implement, simply because cloud and on-premises are not the same thing (even if the marketing guys tells your boss that its all the same and the migration is a piece of cake).
  • Starting from scratch, which is great for new projects, but most people already have invested and need their on-premises architecture and don’t want to redo all the existing stuff again.

The truth is that companies will end up building something hybrid: whatever is new will be developed thinking on the cloud already but whatever already exists will be integrated (not migrated) into the cloud by stages, until up to some point that the old process either gets converted completely or gets replaced by something else new on the cloud.

For those that had ODI as their ETL tool on-premises, they will find it easy to integrate things to the cloud using whatever they have today. This is because ODI is great to incorporate technologies that does not come out of the box in an easily matter. For this series of four posts, I’ll be talking about the following:

  • Integrating with Oracle Autonomous Database
  • Integrating with Snowflake JDBC
  • Integrating with Snowflake – Files/Stages and SnowSQL
  • Integrating with Google Big Query

For this first post, let’s start with “Integrating with Oracle Autonomous Database” just because its extremally easy to do. Let’s imagine the scenario. You already have a large ETL architecture on-premises and your company started to use Oracle Autonomous Database as their cloud solution. Instead of migrating all at once, they will do it by stages, leveraging everything that they already have build and pushing only essential data to the cloud. Since its a hybrid approach, maybe they even want to get data from the cloud to the on-premises database, to support some existing application.

First thing to do in ODI is to create a new Data Server in the Oracle Technology:

Add the user and password that will be used to connect. Now, instead of adding the JDBC details, as we usually would do, click on “Use Credential File”:

You will need to point to the file that has the connection to your cloud DB. To get this, go to your Oracle DB instance in the cloud, click on DB Connection and download the wallet file.

Add a password to it:

Save the Zip file and go to ODI. On Credential File select the zip file that you just downloaded. If the file is correct, you will be able to select the Connection Details below:

And its done. If you go to JDBC URL, you will see that ODI automatically populate all the info for you:

Click Test Connection to make sure all is correct, and you are good to go:

From this point on, since its Oracle, its all the same. You may do whatever you want with this database because its Oracle. The only difference is that is located somewhere in the cloud and not on-premises. One thing to notice though, is that, since its on the cloud, it will have network constraints. Data volumes will take time depending on several factors that are beyond this post and depends on each companies’ architecture. But the main thing is that you may create ODI mappings and procedures, and push/get data to/from the cloud as needed and in a very simple way.

See you son!

How to use your existing ODI on premise to seamlessly integrate PBCS (Part 6: Database Design and EPM Automate)

Posted in PBCS with tags on March 29, 2021 by radk00

Hi all! Before jumping straight to ODI, let us take a moment to talk about how our database design will look like to be the “middle tier” between our sources and PBCS. We will split our design in 4 parts:

  • Sources: Can be any kind of source (Oracle, SQL Server, Teradata, File system, FTP…)
  • Stage Area: Where we’ll work the data
  • DW: Will have all the information in the right format that PBCS expects
  • File System: The data/metadata from the DW will be exported in the right format/layout and then zipped

While source and stage areas are common across several architectures out there, we need to take a moment to talk about the DW layer, where we have the DW table which will contain the data that we want to load to PBCS and a Metadata validation table. Before we can load data into PBCS we need to make sure that there is no invalid member on the file, and therefore we create a job to export metadata from PBCS in the first place. All existing metadata will be stored in a metadata table with the follow format:

Generally, we build this table to be partitioned by App_Name or Plan_Type, so we can retrieve its information in a faster manner, but it all depends on the size and the number of your applications. With this approach, we may validate all data in the DW tables against the Metadata table and remove/fix it before sending it to PBCS.

The “auxiliary” tool that we will use to send/extract data between our DW and PBCS is EPM Automate. The EPM Automate utility will act as a bridge between ODI and PBCS as it enables Service Administrators to automate many repeatable tasks including (but not limited to) the following:

  • Import and export metadata and data
  • Refresh the application
  • Run business rules
  • Upload files, list files and delete files from PBCS
  • Copy data from one database to another
  • Run a Data Management batch rule
  • Export and import application and artifact snapshots
  • Import pre-mapped balance data into Oracle Account Reconciliation Cloud
  • Import currency rates, pre-mapped transactions, and profiles into Oracle Account Reconciliation Cloud
  • Copy profiles to a period to initiate the reconciliation process
  • Deploy the calculation cube of an Oracle Hyperion Profitability and Cost Management Cloud application
  • Clear, copy, and delete Point of Views in Profitability and Cost Management applications
  • Export and import template in Profitability and Cost Management applications
  • Replay Oracle Smart View for Office load on a service instance to enable performance testing under heavy load

Next post we will show one example of how to Load Data using EPM Automate with ODI. Stay tunned!

How to use your existing ODI on premise to seamlessly integrate PBCS (Part 5: Import Metadata Jobs)

Posted in Uncategorized on March 26, 2021 by RZGiampaoli

Hey guys, how are you? Continuing the how to integrate PBCS seamlessly using you existing ODI environment series (Part 4 Here), today we’ll talk about Importing Metadata Jobs.

As you can imagine, import Metadata will also be simple in PBCS, we just need to pay attention in the file format and that’s it.

To import Metadata, we need:

As usual, first we need to select our inbox, then the name of the files for each dimension, the delimiter for each dimension and if you want to clear the members before load the new members you just need to check Clear Members.

That’s it, all dimensions will be load at once, but we need separated files, one for each dimension. These files can also be in zip format that PBCS will automatically unzip for us.

Now the important part, the file format. This are all properties that PBCS expect when loading metadata.

That’s it. With that we finally finished all the setups we need in PBCS for our ODI jobs to work. One import thing that I need to point out is that Oracle will update PBCS with new versions and the file format can change over time. If that happens, you’ll need also to update your ODI Jobs.

It happened to me during more than one project and is not a big deal, but you need to be aware that if your job starts to fail, this is what can be happening.

I hope you guys enjoy it, stay safe and see you soon.

How to use your existing ODI on premise to seamlessly integrate PBCS (Part 4: Outbound Jobs)

Posted in Uncategorized on March 22, 2021 by RZGiampaoli

Hey guys, how are you? Continuing the how to integrate PBCS seamlessly using you existing ODI environment series (Part 3 Here), today we’ll talk about Outbound Jobs.

In the same ways as the Inbound Jobs, the we need to create a job to extract both data and metadata for ODI to consume and populate our DW as well as use the Metadata for validation.

So, to extract data from PBCS is also easy. First, we need to choose the outbox location to enable save as job (Local does the same as for the Inbound job and enable you an once execution only).

For the outbound we need to choose the plan type, that means, we’ll need at least one job for each plan type. We also need to choose the delimiter (I always like pipeline because is easy to see and is not used in any command) and if you use smart list you can choose if you want to export the labels or the names.

And finally, you set the POV to be exported. You can use essbase substitution variables here if you want to and as you can see the export will also have the same format as planning import, accounts on the rows, periods on the columns and the POV (the plan type as well).

You can change the format if you wish, but I advise to maintain the consistency between the jobs for the sake of dynamic components.

After running this job, PBCS will generate a zip file in his outbox, we just need to go there and download it.

For the Metadata export the idea is the same but a little bit simpler than the others, we just need to select our outbox, the dimensions you want to export and the delimiter and that’s it. PBCS will create one zip file per dimension in our outbox.

That’s it for today. I hope you guys enjoy it, stay safe and see you soon.

How to use your existing ODI on premise to seamlessly integrate PBCS (Part 3: Inbound Jobs)

Posted in Cloud, EPM Automate, ODI, PBCS, Uncategorized with tags , , on March 18, 2021 by RZGiampaoli

Hey guys, how are you? Continuing the how to integrate PBCS seamlessly using you existing ODI environment series (Part 2 Here), today we’ll talk about Inbound Jobs.

We already know that we’ll need 4 type of jobs for our integration then let’s see what we need to do to create an Inbound Job:

The first thing we need to do is to set the location as inbox (or outbox in case of an extract). This is what enables the button save as a job. If you choose Local, it enables you to run just for one time.

Second, source type. We have 2 choices there, Essbase and Planning. Essbase has the same format that we are used with Essbase exports, the problem is that if we select Essbase we’ll need to have one job per Plan type.

Planning in the other hand has a special format that is very handy for create dynamic objects in ODI as we can see in the table below:

As we can see, the planning format has a column for the account, then the periods, a POV column and the data load cube name (plan type).

The periods on the column is already a nice thing to have since will be way fester to load and way more optimized since we’ll have one how for the entire year.

Of course, this will only be true when we are talking about forecast because actuals normally, we load just one month per time… even so I prefer this format then have one column for the period and another for data.

The POV is also nice because doesn’t matter how many dimensions we have in one plan type, everything will be in the same column, then for integrate is simple since we just need to concatenate all columns but accounts in one column. I recommend using Regular expression and LISTAGG to do so (best function ever for dynamic components).

And to end we need to inform to each plan type that data will be loaded, very nice, you can load all your plan types at once. One job, one file and everything can be done by one generic component in ODI.

After that we need to choose the delimiter used in the file, in this case, I choose pipeline because it’s very hard to have this character in the middle of any metadata.

Finally, we just need to inform the name of the file. That’s all we need to create a job that we can call using EPMAutomate anytime we need it.

ODI Hidden Gems – Unique temporary object names

Posted in Gems, ODI, Tips and Tricks with tags , , on March 11, 2021 by radk00

Hi all, I was not going to write about this one because I thought that this “hidden” gem was already known to every single ODI 12 developer out there, but I still get questions on why sometimes some specific data loads fails when they run in parallel, and they work fine when they run in serial. Most of the times, those are related to how ODI handles the temporary objects that it creates to do ETL (like C$, I$, E$ tables).

Let us see one example, which is the default in ODI. I have created one very simple mapping that has one source and one target table.

This mapping is loading data from two databases that resides in different data servers, so it will need to create a C$ table to be able to transfer the data. If we look at ODI Operator, we will notice the following:

The C$ table that it created is named as C$_0SECTIONTYPE. By default, ODI will create this name based on the source component that the data was generated from, so in this case it was a table called SECTIONTYPE. The “0” in front of it is an incremental number that would increase if you had another source with the same name in the mapping. For example, if you had SECTIONTYPE mapped twice as source tables, one would be loaded as C$_0SECTIONTYPE and the other C$_1SECTIONTYPE. ODI does that so we do not have a clash between names within the same ODI mapping.

However, what would happen if you tried to run the same mapping or another mapping that also contains SECTIONTYPE as source at the same time? As you may imagine, one mapping would interfere in the other, since both C$ tables would be called C$_0SECTIONTYPE and both mappings would be trying to load/read/drop it at the same time, which would cause a failure (in a good scenario) or wrong data (in a bad scenario).

To avoid this kind of issue to happen, ODI 10/11 developers were very creative in the past and would create some Java variables and some tweaks in some ODI KMs to make the temp table names dynamic. However, ODI 12 introduces something way simpler to handle this kind of situation, however, its not default and it is kind of “hidden”. If you go back to the mapping, click on the Physical tab, and scroll all the way down. You will notice a check box that says “Use Unique Temporary Object Names” that is unchecked. If you check this one and run it again, you will see the difference.

ODI now created a table named C$_0SECTIONTYPEAIHLNQMAVPK7Q1FR66UM225DF3, which is totally unique, and it will never clash with another mapping running in parallel. But then, another question arises: should I go to each mapping to check this option, if I want all to be unique? Well, the answer is no. There is another “hidden” gem that you can use.

Go to ODI Topology and double click your ODI agent, then go to Properties.

Luckily, you may enable the use of unique temporary objects at an ODI agent level, so you do not need to go back to each mapping and changing them.

That is it for today. See ya!

How to use your existing ODI on premise to seamlessly integrate PBCS (Part 2: PBCS)

Posted in Uncategorized on March 10, 2021 by RZGiampaoli

Hey guys, how are you? Continuing the how to integrate PBCS seamlessly using you existing ODI environment series (Part 1 Here), today we’ll talk about PBCS itself.

For us to export or import data from PBCS, we need to prepare our PBCS environment to receive and send all the data we need. Let’s take a look on this.

PBCS has 2 very different interface:

  • The regular interface, same as planning
  • And the Simplified interface (That planning also have) but in PBCS it’s way more useful

The regular interface is almost exactly like Planning one, but the important thing is that we can do anything we need to configure the environment on it. We need to use the simplified interface.

The simplified interface has some new features like be optimized to a tablet or a smartphone but the main feature we are looking for is the inbox/outbox. The inbox/outbox is the only point of contact with the external word and everything that need to be loaded needs to be uploaded to the inbox first and everything that is extracted from PBCS will be available in the outbox to be downloaded.

Knowing this we’ll need to setup for our design:

  • Data and Metadata load
    • Upload the file with the data to the inbox.
    • Create a job to import the data.
    • Execute the job.
  • Data and Metadata Export:
    • Create an export job (For data export is possible to create a Business Rules to export the data directly to the inbox folder: /u03/lcm/)
    • Download the files
    • Unzip and import the file with ODI

For Data/metadata load we need first to upload a file to the inbox, then we need to have an import data/metadata job created and finally we need to run the job to load.

The file can be in .zip format that PBCS automatically unzip the file. This is very good since we may move some big files around

For Data/Metadata extract, we need first to create an export job and run it (you can do data export using the business rules, same old dataexport from essbase, and the path to export would be /u02/lcm/), download the file and finally use ODI to unzip it and load it to our tables.

But what are jobs? Jobs are basically templates with information about what you want to do. For example, a data export Job will contain the POV of the data you want to export, and every time you ran the job that POV will be exported.

In the console menu we have everything we need to create and monitoring our jobs. All data Jobs are created clicking in the Actions menu, Export and Import options and to check the job results you can go to inbox/outbox explorer.

And that’s it for today. I hope you guys enjoy and next post we’ll create all the jobs we need. Stay safe and see you soon.

How to use your existing ODI on premise to seamlessly integrate PBCS (Part 1: Solution)

Posted in Uncategorized on March 9, 2021 by RZGiampaoli

Hey guys, how are you? Today we’ll going to talk about how to integrate PBCS seamlessly using you existing ODI environment.

I said PBCS but this approach can be user for any cloud application available out there (with just changing the way of connect/upload data/API that needs to be used to integrate). Other than that, everything else is valid.

Let’s think a little bit about PBCS. PBCS is a close box that can only be accessed by HTML, SFTP, REST API or EPM Automate. Then for integration we basically have 2 options, REST API and EPM Automate since everything else is way too manual to be called integration.

Both will work in almost the same way, then to keep it simple we’ll be using EPM Automate here. In our case we have a ODI on premise designed like this:

  • We have a lot of different sources that are loaded in our stage area.
  • In the stage area we do whatever transformation we need in the data before we load it to the DW schema
  • During the load into our DW we validate all the data POV against the metadata in Planning or Essbase.

From everything described here the only two changes we need to do is the source of the metadata for validation and the target of the data load.

For the metadata, instead of reading the Planning repository or extract the metadata from Essbase, we’ll export from PBCS the metadata and load it to a table, and this is what we’ll use for validating the data before load.

This step is very important because we don’t want to get errors during data load and slow down the process we are creating. Also, it gives the business a fallout report that they can work to fix the invalid members in PBCS.

For the data load, instead of loading this data directly to Planning we’ll be exporting the data in the right format to a txt file, zip them, and use EPMAutomate to load it to PBCS.

Remember, you chose the REST API to do so, the only thing you need to do differently is a little bit of programing, but the logic will be the same.

Then in the end we’ll have a design like this:

As I said, we can see by this schema that any cloud app can be integrated in kind of the same way. We can have everything we have until today and we just need to figure out the bridge we’ll have between our on-premises DB and our target app.

If the target app is VPN enable, everything get’s way easier since for ODI, this will be just another source, in fact, we could even have DB Links between our environments.

Then keep your mind open for the approach here more than the content we are talking about (PBCS).

I hope your guys enjoy this quick start of our series and see you soon.