Archive for ODI

Fragmented and Aggregated tables in OBIEE using ODI Part 3/5: Populating the Fragmented tables

Posted in ACE, Data Warehouse, Hacking, OBIEE, ODI, ODI Architecture, Oracle, Oracle Database, Query, SQL with tags , , , , on February 11, 2020 by RZGiampaoli

Hey guys, how are you?

Continuing the series Fragmented and Aggregated tables in OBIEE using ODI and today we are talking about how to Populating the Fragmented tables using ODI.

Just to make easier for you to navigate in this series, here’s the parts of it:

Creating the tables: I’ll explain the advantages and how you can create the fragmented tables and also how to further improve performance using partitioning and sub partitioning.

Managing the partitions of the Fragmented/Aggregated tables: Here you’ll see how to manage all partitions using ODI.

Populating the Fragmented tables using ODI: Here you’ll find how to create generic components to load all fragmented tables we need to.

Populating the Aggregated tables using ODI: Here you’ll find how to create generic components to load all Aggregated tables we need to.

Setting the OBIEE Repository: Finally, we’ll going to setting up the OBIEE repository to make use of all tables.

In my previous posts we had design our 18 tables and the partition management process. Now we’ll talk about how we’ll going to load data to our fragmented tables. As you can imagine, for the 6 fragmented tables, the code will be the same, the only difference is what goes to each table. Then what we need to make ODI to do for us is to dynamically filter and load the data that comes from the same query to different tables based in a rule.

If you are following this series from the beginning, you probably already guessed what we’ll need to do here. The same thing we did for the Partition management process, well, almost the same thing because we’ll going to do something a little bit more optimize for the data load since this is the most time-consuming step.

The main difference between the approach I explained in the previous post is that it loops one Statement by each row that returns from the Source command in SERIAL, that means, we need to wait it to finish for the second loop to start. For the Partition Management that’s ok because everything there is very fast, but for that load we can have something better.

We don’t need and we don’t want to wait for a serial execution because we have one table for each Loop we’ll perform (This is not the case but even if we have only one table, if we have partitions and/or Sub-Partitions, we can Loop the code by the Partition/Sub partition level and load everything in Parallel. The only case we can’t load in parallel is if we have a single table without partition).

In our case we’ll going to loop all the 6 tables load in parallel. To do that, we are going to use the same approach as before but instead of having the ALTER TABLE in the Command on Target, we’ll going to have an OdiStartScen, to call a Scenario multiple times in PARALLEL. Let’s take a look in the Command on Target:

As we can see, the Command on Target contains an ODI tool called OdiStartScen that is used to call a scenario. We have a few parameters there to set how we’ll going to call the scenario and, in this case, the important one is the SYNC_MODE=2, that means it’ll execute everything in parallel (1 is serial). Everything else are variables that we are using to pass information from the Command on Source to the Called scenario.

A quick tip here before we continue, if you want to know how in havens, I have all this setting in mind, for your information I don’t, but there’s a very easy way to find out. Just drop an ODI Start Scen from the ODI Toolbox in any part of your scenario and fill with the information you need:

In the General we set the scenario we want to load, version of the scenario (-1 is always the latest scenario you have), the Synchronous and Asynchronous mode (serial and parallel), Session name, in case you want to have an different name showing in your execution (this is always a good Idea to have since you are looping the same code with different parameters, then in my case I pass as a variable the name of the table, the period and the hierarchy that scenario will be loading).

In the Additional Variables tab you can set all variables you need to use inside the scenario (as you can see in my case, I have a lot of variables because the complexity of my sources, you case will be different).

And after you set everything, you just need to click in the Command tab to get the code that ODI generates based in the setting you did:

This is the code that you need to paste in the Command on target to call a scenario. After that you can delete the OdiStartScen object from your scenario and continue the development. Another advantage of this is that if you have the code and want to create an OdiStartScen step in a scenario, you can paste in the Command tab and the General/Additional Variable will be filled based on the code.

With the Target code ready we just need a query that will going to pass all the information needed to call the scenario. In my case, I pass a lot of information to the target, as you can see by the amount of variable I have. Also, the variable information doesn’t need to come only from the command on source. If you have variable in your scenario and they are the same for all the loops, you can just use these variables to pass their values to the Scenario the command on target is calling. The command on source needs only to have the information that will dynamically change in each loop.

In my case I’m passing all kind of information like the SESSION_NAME (the name of the session that we use to create parameter specific for each session we have and use the same variables for all, more information HERE) but the important one for us is the TABLE_NM.

In my case, the other variables will change the values depending of the table we need to load for each time we loop. To do that I insert all information I need to be pass to the scenario I’m calling in a Parameter table (that I like to call ODI_PARAMETERS). This way I can easily query this table to get specific information I need for each Loop I’m going to perform.

With all this set, we just need to create a child scenario and add “Declare” variables for each Variable we want to pass to the internal scenario like this:

As you can see, this is a very simple scenario that contains just one procedure that’ll be used to load the data to our tables.

This approach can be done with Interfaces as well, in fact with anything at all. We can have a mains scenario that loops a very complex scenario, that loops another scenario and so on. This is a very nice way to loop something because we have a lot of control over this kind of loop as well, we can change the behavior of each loop because the info we get from the Command on Source.

Also, a very good feature is that, if the source query doesn’t return any rows, the Command on Target is not executed at all. Then you’ll always have a clean execution. For example, if you have a folder that can have 0 to X number of files and you want to create an scenario to load it, the best approach would be to read this folder files (OS command to a file and loaf file or Java or Phyton or…) and store the info in our ODI parameter. Then we create a proc to call a scenario that will contain the interface that will load each file, and finally we just do a Select in the Command on Source to get the name of the files in the folder. With that, if we have 0 files, nothing will be executed, but if we have 1000 files, we’ll loop the scenario 1000 times and each time we’ll pass the name of a different file to the internal scenario. Very neat way to do it.

Inside the procedure we just need to create the query need to load the data and use the variables to filter the right data and load the right table. This is not my case but if you have different source for each table you want to load, just insert this information in the parameter table and pass it to the internal scenario. Use these variables to replace the Table Names in the SQL and you are done.

And that’s it. This is all we need to do to load all our 6 fragmented tables. In my next post I’ll be talking about how we can aggregate our data and load our Aggregated tables.

I hope you guys enjoy and see you soon.

Advertisement

Fragmented and Aggregated tables in OBIEE using ODI Part 2/5: Managing the partitions of Fragmented/Aggregated tables

Posted in OBIEE, ODI, ODI Architecture, Oracle, Oracle Database, Performance, SQL, Tips and Tricks with tags , , , on February 10, 2020 by RZGiampaoli

Hey guys, how are you?

Continuing the series Fragmented and Aggregated tables in OBIEE and today we are talking about how to Populating the Fragmented tables using ODI.

Just to make easier for you to navigate in this series, here’s the parts of it:

Creating the tables: I’ll explain the advantages and how you can create the fragmented tables and also how to further improve performance using partitioning and sub partitioning.

Managing the partitions of the Fragmented/Aggregated tables: Here you’ll see how to manage all partitions using ODI.

Populating the Fragmented tables using ODI: Here you’ll find how to create generic components to load all fragmented tables we need to.

Populating the Aggregated tables using ODI: Here you’ll find how to create generic components to load all Aggregated tables we need to.

Setting the OBIEE Repository: Finally, we’ll going to setting up the OBIEE repository to make use of all tables.

In my previous post we design 18 tables and its partitions and sub-partitions. Now what we need to do is to make ODI manage these partitions for us. As I said before, the Sub partitions will be static and if we need to create one more sub-partitions, we’ll be doing that manually. That’s always the best choice because manage sub-partitions add a layer of complicity that is not needed.

With that said, if you need to manage the sub-partitions, you can use the exactly same approach I’ll show here just adapting to the needs of the sub-partition’s creation and sub-partition template update.

Let’s start analyzing what needs to be done. For each table we need to issue this command to the database:

For List partitions:

               ALTER TABLE table_name ADD PARTITION partition_name VALUES (value[, value]…);

For Range Partitions:

               ALTER TABLE table_name ADD PARTITION partition_name VALUES LESS THAN (value[, value]…);

Sounds hard but in fact is very simple and it’ll going to work in any SQL you have. To do that this we will use a procedure with our ALTER TABLE statement in the Command on target tab and a SELECT in the command on source to pass the information we need to create the partitions like the table names, the partition name and the partition values.

For each row that the SELECT on source returns, the statement in the Command on target will be executed, and we can pass the values from the SELECT in the Command on Source to change the behavior of the statement in the Command on target.

This is something very important that we need to understand about ODI. ODI It’s not an ETL tool but a code generator tool that orchestrate Database and a lot of other tools if you will.

What I mean by that is, until ODI send the code for the database to execute, everything that ODI does is to generate code. If we use the ODI API it’ll replace it before send to the database as well to replaces the variable with their values, meaning a variable like this #SQL on the Command on Target tab could contain an entire dynamically generate INSERT that ODI would replace before send it to the Database.

And this is why ODI is so powerful, because we can manipulate string and make everything dynamic.

Then our query in the source should return something like this:

The query will change depending of your design of the application then I don’t think sharing my query will going to help. One tip I can give you is always compare the name of the partition with the content of the column PARTITION_NAME on ALL_TAB_PARTITIONS view to see if the partition you want to create already exists to prevent an error in ODI, in fact you have 2 choices, either do this or flag the step as Ignore Error to end the step with a warning. I prefer to compare and end the step without warnings, then I compare.

Another thing to notice is that for FY20 Feb, my PARTITION_VALUE (MAX_PERIOD) is the 1 day after the last day of the month. It’s hard to see here because the calendar used by the client is all different but if you have the normal calendar and the mask you have for you PERIOD_ID is YYYYMMDD, for 2020/Feb you partition ID needs to be 20200230 (Feb ends 02/29/2020).

That’s because this is a RANGE partition, and for RANGE partition we define the partition value as VALUES LESS THAN, what means that everything under 20200230 will be FEB unless we have a smaller partition (JAN) below it. But for example, if you smallest partition is 2018/Jan, everything below that period will be inserted in the 2018/Jan partition, then you need be careful with when you’ll start to create partitions.

Another important thing to mention is that you cannot insert a partition smaller than an existing partition, just bigger. That means, if you start with 2018/Jan, it’s impossible to create a 2017 partition, the only way is to drop the table and create the table starting by 2017. A RANGE partition needs to be sequential and always growing.

Having these values returning from the SELECT on the Command on source tab, the only thing that is left is to use these variables in the statement in the command on Target tab like this:

ALTER TABLE #TABLE_NM ADD PARTITION PARTITION_NM VALUES LESS THAN (#MAX_PERIOD);

Pay attention on the values of the partitions. In this case the values are numeric, that is why I’m not using quotes. If the values were String, we need to enclosure the variable with quotes.

And that’s it, this is all we need to do to manage partitions using ODI. Because the SELECT on the source returns All tables with their partitions name and values and we pass to the TARGET statement this information, for each row in the source the target will be looped, creating the partitions you need for each table.

We are done for the fragmented tables. Now we need to manage the partitions for the Aggregated table. The approach is the exactly the same, the only thing that will change is the content of the query from the Command on Source tab and that the PARTITION now will be a LIST Partition instead of a RANGE partition.

We could have done just one query to return all values including if that was a RANGE or a LIST partition, but for the sake of simplicity is always better to split the scenarios to not over complicate something simple.

For the month level we need the query in the Source to return:

And the statement in the Command on Target should be like this:

ALTER TABLE #TABLE_NM ADD PARTITION PARTITION_NM VALUES (#MAX_PERIOD);

That’s all that needs to be done. For the quarter level the results are similar:

And the statement in the Command on Target should be like this:

ALTER TABLE #TABLE_NM ADD PARTITION PARTITION_NM VALUES (#MAX_PERIOD);

As you can see, it’s very simple to manipulate and manage the database with ODI, we just need to create a query with the information we want in the source, and replace the target statement with that information, that’s all we need to do.

Another important thing that we need in our procedure, in case you want to go with the truncate/insert approach, is to truncate the partitions before load. This is also managed by ODI and also works in the exactly same way as the CREATE PARTITIONS but a little bit simpler since we don’t need to have the partitions values:

And the statement in the Command on Target should be like this:

ALTER TABLE #TABLE_NM TRUNCATE PARTITION PARTITION_NM;

For month level:

And the statement in the Command on Target should be like this:

ALTER TABLE #TABLE_NM TRUNCATE PARTITION PARTITION_NM;

And for the Quarter Level:

And the statement in the Command on Target should be like this:

ALTER TABLE #TABLE_NM TRUNCATE PARTITION PARTITION_NM;

Also, could be done with just one query but again, simplicity works always better than complexity.

This is all we need to start loading data. One last thing we could do, if is you case, is to create a purge process to drop old partitions. If your business requires 5 years of data, it’s a good idea to drop older partitions. The approach is exactly the same

In fact, the results of the Command on source are exactly the same as the ones we need for the TRUNCATE PARTITION steps, the only difference is that you need to have in place a logic to find the old partitions. Other than that, the only thing that changes are the Statement on the Target that should be:

ALTER TABLE #TABLE_NM DROP PARTITION PARTITION_NM;

And that’s it, we just create a procedure to manage the partitions for all our 18 tables including create, truncate and drop old partitions.

In the next part of this series we’ll going to talk about populating the Fragmented tables.

I hope you find this helpful and see you soon.

Fragmented and Aggregated tables in OBIEE using ODI Part 1/5: Creating the tables

Posted in OBIEE, ODI, ODI 12c, Oracle, Oracle 11.2.0, Oracle Database, Performance, SQL, Tips and Tricks with tags , , , , on February 7, 2020 by RZGiampaoli

Hey guys, how are you?

I normally don’t talk about OBIEE, despite the fact I work a lot with it, because there’s a lot of people talking about it. But recently I worked in a project and I decide to use fragmented tables and aggregated tables at same time and I found that we don’t have much specific information regarding this matter, and of course, zero information about using ODI to enable this project.

Then since this will be kind of big, I decide to split this in 5 parts:

Creating the tables: I’ll explain the advantages and how you can create the fragmented tables and also how to further improve performance using partitioning and sub partitioning.

Managing the partitions of the Fragmented/Aggregated tables: Here you’ll see how to manage all partitions using ODI.

Populating the Fragmented tables using ODI: Here you’ll find how to create generic components to load all fragmented tables we need to.

Populating the Aggregated tables using ODI: Here you’ll find how to create generic components to load all Aggregated tables we need to.

Setting the OBIEE Repository: Finally, we’ll going to setting up the OBIEE repository to make use of all tables.

To start the series Fragmented and Aggregated tables in OBIEE, today we are talking about how to Creating the tables for our project.

In this project I had data from 4 different source system, and each source system had different amount of data. Dell data had 20 million rows per month, EMC data 30 million, DTC data 5 million and STAT data another 5 million.

The business wanted to have 15 quarters of data stored, then if we do a simple calculation we’ll going to have: 20.000.000 + 30.000.000 + 5.000.000 + 5.000.000 * 45 = 2,700,000,000 rows.

That’s a lot of data even for a partitioned table. Then I decide to split this tables in some way.

Analyzing the data, I saw that this tables contain 3 different account hierarchies and the users will only analyze one per time: BS, Income and PL2. I decide to split the tables by it. Then just with that now I had 3 fragmented tables. JAT_JE_DETAIL_BS, JAT_JE_DETAIL_INCOME and JAT_JE_DETAIL_PL2.

That’s nice, just with that I split that 2.7 trillion rows in 3. The problem now is that I have 4 different source system right, and even if I partition ate the table by source system, EMC is as big as the other 3 sources put together, and that would make that source slower than the other to retrieve.

With that I decided to split the tables by 2 again, one for EMC and another for the other sources. That makes the data to be distribute evenly. 20.000.000 + 5.000.000 + 5.000.000 in the D table and 30.000.000 in the E table (D and E is how I called them).

With that now we have 6 partitioned tables, right? 3 for EMC data (one for each Account Hierarchy) and another 3 tables for Dell, DTC and STAT data.

Now that we have a nice and even data distribution, we need to decide how we’ll going to create the partition and sub partition for these tables.

The partitions I always like to use the PERIOD_ID because it makes sense in most of cases (unless we are creating a Forecast app, in this case the users will want to see by scenario and not by one period).

Then for these tables JAT_JE_DETAIL_D_BS_FACT, JAT_JE_DETAIL_D_INCOME_FACT, JAT_JE_DETAIL_D_PL2_FACT we are going to partition by period. But what kind of partition we will going to use? Well, for this case, this tables contains Journal information at day level and the users want to drill from the month to the Journal rows of that mouth. What that means is that we’ll always going to work inside a month, then the best way to go is using a RANGE partition.

Range partitions, how the name already says, uses a Range of values to create the partition area. In our case it’ll be one month. The range can be built by a DATE column or a Number column, and the second one is our case since is easier to manipulate the periods using a number than a data.

(The period for this project is not a regular month, for example, the month of February FY20 start in Jan/30 and goes until Feb/2).

For the sub partitions is an easy call, we’ll going to use the Source System since we have 3 different one in this tables right (Dell, DTC and STAT), and the partition type will be LIST since we have a simple list of values. Also, for sub partitions is always easier to have values that don’t increase because the complexity of adding new sub partition values to a table.

What I mean is that to add a new partition you just need to use one ALTER TABLE ADD PARTITION statement and add any number of partitions you need. But for a sub partition is different. Every time you create a sub partition Oracle create a Sub Partition template. That template is used every time you create a Partition.

In our case for example, our Sub Partition Template would be DELL, DTC and STAT. What that means is that when you create another partition, this partition will be created with these 3 sub partitions automatically. Then if you want to add another value to a sub partition you need to do 2 things instead of just one (in partition case).

You need to change the sub partition template definition to add the new sub partition and you need add your new Sub partition in every single already existing Partitions (is not update automatically). After that all new partitions will contains all your sub partitions.

If you don’t update the template, every time you create a partition, you’ll not have the new value you want.

Enough of theory, let’s proceed with the other set of tables, JAT_JE_DETAIL_E_BS_FACT, JAT_JE_DETAIL_E_INCOME_FACT, JAT_JE_DETAIL_E_PL2_FACT. For these tables we’ll continue to have the partitions as a RANGE of the PERIOD_ID but the sub partition doesn’t make sense for us to use source system because we have only one source system.

And now is the time to explain another thing about partitions. For Oracle to use the partition you need to filter the column that is used to create the partition. For example, if we are creating a partition using the column PERIOD_ID, we must filter the data by PERIOD_ID and the ideal would be always have a filter in the Partition and Sub Partition column, because if you don’t, it’ll do a full table scan anyway. Then for a data retrieval point of view, the choice for the partitions are always made based in the Filter requirement, that means, we need to choose columns that we can force the user to choose in OBIEE. For an ETL point of view, the best would be to have partitions and sub partitions in columns that we can use to truncate before load.

For example, in our first set of tables I can easily and fast truncate a period and a source system before I load it back to the tables. That’s the beauty of partitions, I can drop, truncate and create on demand.

That said, you can imagine that some times what is good for the report is not good for the DW, but this is not our case today.

With this new piece of information in mind, the users are force in the OBIEE prompts to choose a Gen 2 Account, then this will be our sub partition for our second set of tables also with the partition type as LIST.

Before we proceed, let’s just see how everything is set then:

Table NamePartitionSub Partition
JAT_JE_DETAIL_D_BS_FACTPERIOD_ID (Range)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_INCOME_FACTPERIOD_ID (Range)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_PL2_FACTPERIOD_ID (Range)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_E_BS_FACTPERIOD_ID (Range)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_INCOME_FACTPERIOD_ID (Range)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_PL2_FACTPERIOD_ID (Range)ACCOUNT_GEN2_ID (List)

Perfect, now the only thing that is missing here is the aggregate tables.

Aggregated table are tables that uses a dimension Drill to pre aggregate the vales that it contains. It’s an easy way to improve performance during a drill operation.

The most common aggregated table is for the Period dimension, and this is the one we’ll going to do here as well. Then in our case, as mentioned before, the 6 tables we have now contains day level data right. I want to have the data aggregated by Month and by Quarter.

What that means is that we’ll need to have one more table for each one of these 6 tables for Monthly level data and one more table for each one of these 6 tables for Quarterly level data. These tables will going to have the same Partitions and Sub Partitions definitions with only one difference, now instead of using a Range Partition, we’ll going to use a List Partition because we don’t have a range anymore, we’ll going to aggregate that range of days in one month.

Then let’s see how the tables needs to be set:

Table NamePartitionSub Partition
JAT_JE_DETAIL_D_BS_FACTPERIOD_ID (Range)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_INCOME_FACTPERIOD_ID (Range)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_PL2_FACTPERIOD_ID (Range)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_M_BS_FACTPERIOD_ID (List)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_M_INCOME_FACTPERIOD_ID (List)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_M_PL2_FACTPERIOD_ID (List)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_Q_BS_FACTPERIOD_ID (List)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_Q_INCOME_FACTPERIOD_ID (List)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_Q_PL2_FACTPERIOD_ID (List)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_E_BS_FACTPERIOD_ID (Range)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_INCOME_FACTPERIOD_ID (Range)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_PL2_FACTPERIOD_ID (Range)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_M_BS_FACTPERIOD_ID (List)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_M_INCOME_FACTPERIOD_ID (List)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_M_PL2_FACTPERIOD_ID (List)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_Q_BS_FACTPERIOD_ID (List)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_Q_INCOME_FACTPERIOD_ID (List)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_Q_PL2_FACTPERIOD_ID (List)ACCOUNT_GEN2_ID (List)

Great, now we have 18 tables, 6 fragmented tables for daily level data, 6 fragmented tables for monthly aggregated level data (M) and 6 fragmented tables for quarterly aggregated level data (Q).

The last thing about tables is that we need a matching Period dimension (for this case) to match the data in each aggregated level. What that means is that, in this case, since we have 3 different level of aggregations, we need to have 3 Period dimensions.

OBIEE don’t allow us to join the same table using columns that are not an ID column. Then what we need is one PERIOD dimension (DIM_PERIOD) that contains the drill from Year to day (Period ID will be YYYYQMMDD number), another PERIOD Month Dimension (DIM_PERIOD_MONTH) that contains Drill from Year to Month (Period ID will be YYYYQMM number) and one last PERIOD Quarter table (DIM_PEIOD_QUARTER) that contains Drill from Year to Quarter (Period ID will be YYYYQ number).

Of course, the Aggregated FACT tables also will going to have a matching ID (and we need that to aggregate the data right).

Last thing to mentioning is that the Detail Fact table (the one in the Day level) it contains a lot of Journal details like Journal number, Description, Header ID and others. For the detail this is ok, but when we design the aggregated tables, only ID are allowed there for maximum aggregation. If the table contains Descriptions, nothing will be going to be aggregated right?

Don’t worry that OBIEE can handle this kind of situation very well.

That’s for now. I hope you guys enjoy it and found helpful.

See you soon.

Pushing files to GCS (Google Cloud Storage) using ODI and GSUTIL

Posted in Cloud, GCS, Google, ODI with tags , , , on January 6, 2020 by Rodrigo Radtke de Souza

Hi all, today’s post is a short one, but since I didn’t find any other post related to that in the Internet, I found it could be useful for someone else. Recently I came across a requirement to push CSV files to GCS using ODI. After some research, I saw that Google has a utility called “gsutil”, that is a tool that enables you to access Cloud Storage from the command-line. So, if something is accessible from a command-line, ODI can also do that for sure.

First we need to have gsutil installed in the server that will be used by ODI to run the OS Command (in my case it was Windows). After that, we need to configure how gsutil will connect to our GCS instance. Probably you will create a service account in GCS to be used by ODI that will contain the correct read/write permissions and from it you may download its private key file. This is a json file that contains a private key and all the login information that will be used to configure gsutil, so it may authenticate with GCS. Save the json file in the ODI agent server and run the following command there:

gcloud auth activate-service-account <ACCOUNT_NAME> –key-file=”<JSON_FILE_LOCATION>” –project=<GCS_PROJECT_NAME>

Now we are all set to start using ODI to issue commands to GCS. For us to push a csv file to the cloud, we would only need to create an ODI procedure, select “Operating System” as a Technology and write a simple CP command on it, like below:

gsutil cp “<CSV_FILE_TO_BE_UPLOADED>” gs://<GCS_BUCKET_NAME>

a

When you run the proc, the files will be pushed to the cloud. Very simple, but powerful example on what we may accomplish with ODI + gsutil. For a complete list of gsutil commands, please see its documentation page.

b

Hope you liked it. Thanks!

ODI Hidden Gems – Static validations

Posted in Gems, ODI, Tips and Tricks with tags , , on December 20, 2019 by Rodrigo Radtke de Souza

Hi all!

This post is based in one question that I answered in the Data Integration community forum some time back. This feature is powerful, but it is also somehow “hidden”. The question was like this (I have edited it for the purpose of this post):

I have one mapping that will load one target table (TARGET) from two different sources (SOURCE_A and SOURCE_B). Target table TARGET contains ITEM_KEY, ATT1, ATT2, ATT3 and so on. One Source (SOURCE_A) contains ITEM_ID, ATT1, ATT2, ATT3 values and we are using a look up to a dimension table (T_ITEM) to get ITEM_KEY for the ITEM_ID. Second source (SOURCE_B) contains CC_ID, ATT1, ATT2, ATT3 and there is a reference table (T_CC_ITEM_REF) which contains the mapping between CC_ID and ITEM_ID that we use to look up to the dimension table (T_ITEM) to get ITEM_KEY for the ITEM_ID.

Validating the first source is straight forward as we have declared a constraint in CKM to log foreign key error records which got logged as ‘Join Error between TARGET and T_ITEM’ in E$. For the second source, there is a “middle” table in the mapping, so we must join SOURCE_B to T_CC_ITEM_REF (reference table) and to T_ITEM (dimension table). I can still create a constraint and log foreign key errors between TARGET and T_ITEM, but is there a way that I can also be more specific and capture any join error between T_CC_ITEM_REF (reference table) & T_ITEM (dimension table). E.g. if T_CC_ITEM_REF is missing any mapping record or if any mapping is mapped to an inexistent T_ITEM row?

This is very good example of “indirect join” validation, where you want to validate some source data that is used in your mappings, but these tables are not directly associated with your target table. In these cases you cannot directly validate them in a regular mapping, but ODI has a very nice feature called Static validation, which allows you to run any validation in any data model at any time, so you could catch all those errors before trying to load your target table.

If you go to ODI models and expand any of them, you will see that you may add “Constraints” to it. Generally, we add those constraints to the target table, so data gets validated during the load data flow, before hitting the actual target table. I said “generally” it is done in the target because we have situations (like the one that we are talking now) where we want to validate the source/mapping tables even before we try to load the targets.

Before getting directly on how to solve this issue, lets step back and briefly see some options that ODI has to offer regarding data validation. If you go to ODI models and expand any of them, you will see that you may add “Constraints” to it.

1

The first one is a “Key”, which resembles a “Primary Key”. You will add all columns that belongs to your table primary key and ODI will make sure to log all records that are not unique based on this key.

The second one is “Reference”, which resembles a “Foreign Key”. You will add the columns and tables names that belongs to a foreign key relationship and ODI will automatically log all records that have no reference between that relationship. If we go back to the above question, this works fine to validate our TARGET table against the T_ITEM (dimension) because there is a direct link between them. ODI will log all source rows that does not have a valid “join” to the T_ITEM table before loading it to the target (so the target has only “valid” records regarding that FK).

Third one is called “Condition” and it is the most flexible of all ODI constraints. Basically, it is a free form text where you may add any kind of SQL statement which you want to check. You just need to remember that you always want to write down a statement that will check for TRUE values, as for example, you want to test if a column value exists, “is in” another table or if the values are greater or lesser than a specific value. When you execute your mapping, you will see that ODI will “negate” your condition adding a NOT before it, so it will check for all the records that are not TRUE (in other words, FALSE) and logging them at the E$ table. Pretty neat stuff.

Going back to our example, let’s divide our problem in two pieces. First let’s validate if T_CC_ITEM_REF contains any kind of bad data, meaning that we will check if it contains any mapping row that references a non-existing T_ITEM row. This can be achieved by going to T_CC_ITEM_REF and create a New Reference.

2

There you point the “Parent Table” (in this case, T_ITEM):

3

In attributes, add the two columns that are used in the join condition:

4

Save it. Go to the parent Model of this datastore and check its Control tab. You will need to select which CKM will be used to run the Static validation (you cannot leave it blank).

5

Create a new package and “drag and drop” the model to the package. It will look like this:

6

This icon indicates that, once you run the package, ODI will run a Static validation on this datastore using the CKM that we set in its parent Model. Once we run it, this is what we get in Operator:

7

The warning sign on the scenario execution tell us that some error was logged in the E$ table. When we go to check it, we can see following:

8

It means that ITEM_ID 50 does not exist in T_ITEM, so any source record with CC_ID equals to 5 would be dropped due to an invalid mapping row.

Now, what if we want to check if all CC_IDs from the source has a valid mapping record? This is where “Condition” constraint comes in handy, since it is very flexible and allow us to virtually write any kind of SQL logic in it. Let’s go to SOURCE_B and create a Condition to it:

9

Add a name to it and select a type. An Oracle Data Integrator Condition is a condition that exists only in the model and does not exist in the database. A Database Condition is a condition that is defined in the database and has been reverse-engineered. In our case, let’s pick ODI Condition. Write the SQL statement that you want to be true. In this case, I want all source CC_ID columns to be not null and that also exists in the join between my mapping/dimension tables. Another cool thing about conditions is that you may write custom messages for it, so it gets clearer to the users what that validation means.

10

Similarly, to the previous example, drag and drop the model to the package and run it. This is what you will get:

11

12

The error is telling us that there is a source record (CC_ID 3) that does not have a mapping record on T_CC_ITEM_REF. You may add both Static validations on your package, before you load your target and have a complete picture on what data is missing in your data flow.

Hope you have liked the post. See ya!

ODI Hidden Gems – Temporary Indexes

Posted in ODI, ODI 12c, Tips and Tricks with tags , , on September 3, 2019 by Rodrigo Radtke de Souza

Hi all!

Today’s gem is indeed a very hidden one. ODI is known (unfortunately) to be “not intuitive” most of the times and I think that’s because we have many options that are scared across too many objects in the UI, which leads you to keep clicking on objects until you find what you need. To make things a little bit harder, you have the logical and physical tabs and each one of them has its own representation of the objects, so some of the options will be in the logical tab, some others in the physical tab. Lets talk today about the “Temporary Indexes” option that exists in some objects in the Physical tab of a mapping.

Very often you will load data from places that may not have an “index” concept, like files for example. You may create a mapping that will load any number of big files, put them in a staging area and filter/join them together to do some ETL. You may reach situations where those joins are not performing well since you don’t have an “index” on the files. Here is where ODI may help you with some Temporary Indexes creation. Let’s see this example (it’s very basic, but you will get the idea):

1

You join both files and load to a target table. When we execute this mapping, this is what we get:

2

Basically, you are loading both files to C$ tables and then you are joining those C$ tables when loading to the target.

7

If you think that this join would benefit from an index creation, you may configure ODI to create temporary indexes on that join. To do that, you will need to go to physical tab, click the “join” object and go to “Properties” as below. After you know where the option is, it seems pretty easy and obvious, however if you are not familiar with ODI, you will need some practice to actually figure out that you need to go to physical tab and then click on the specific object to get some unique properties of that object:

3

Change Index Type to one of the possible values (Bitmap, Non-Unique, Unique) and run the mapping again. You will see the index creation after it loads the data to C$:

4

This technique also works for filters objects. If we add a filter in any of the file columns and if you think that it would be beneficial to have an index on it, the steps to create it  would be the same as before:

5When executing, we can see all the indexes creation there:

6

If you go to Oracle documentation, there is a note there regarding using the Temporary Index creation:

  • The creation of temporary indexes may be a time-consuming operation in the overall flow. Oracle recommends reviewing execution statistics and comparing the execution time saved by the indexes to the time spent creating them.

It’s a very valid point. Most of the time we don’t need to create any temporary index, but if you end up in one of those cases that you need to create them, now you know where to find this option.

Thanks!

 

ODI Hidden Gems – Setting custom Warnings and Errors

Posted in ODI, Tips and Tricks with tags , on August 7, 2019 by Rodrigo Radtke de Souza

Hi all! ODI developers often create their own ODI procedures that contains any kind of specific logic/technology on it. Since it’s a custom code, we want to be sure that they are efficient, but also easy to read or show errors/warnings when something wrong happens. Let’s see an example of what can be done in ODI regarding this topic.

I’ve created a simple procedure and put it in a package. If I execute it and something goes wrong, the entire procedure gives an error and it is propagated to the main scenario:

1

Now imagine that this procedure is not that critical and that we could ignore the errors when it happens. We just need to open the ODI procedure and click the “Ignore Errors” option.

2

When we execute again, this is what happens:

3

My procedure step finished with a warning, which is good, but the main package finished with a success, which may be bad. If someone looks at the operator only at the package level, they may never look to see why that internal step has a warning, since the parent is “green”. If the warning is somehow important and we want to propagate it to the parent step, we will first need to understand what ODI checks to set a parent step to warning.

For ODI to cascade the warning to its parent, one of its child steps needs to have the “Nº of Errors” greater than zero. In our case, when the error was triggered and set to ignore, the “Nº of Errors” in the step was never change (it remained 0) as you can see below:

4

Fortunately, it’s very easy for us to manipulate all those record statistics numbers in ODI. You can use one of five methods below (one for each statistic):

  • setNbInsert()
  • setNbUpdate()
  • setNbDelete()
  • setNbErrors()
  • setNbRows()

You may create a task in a procedure with Jython technology and just add odiRef.setNbErrors(1) to set the number of errors of that step to 1. Just to get it easier to understand, lets remove the Ignore step option from our example and create a new one procedure, just to set the error number, as below:

5

6

When we ran our package now, we can see the following:

7

Our procedure thrown an error, we caught with a red arrow in the flow and then set a 1 to the number of errors. ODI understands that, when the error number is greater than 0, then it must set the parent icon as a Warning. Depending on the code/technology in your procedure, you may even include the setNbErrors inside of your own code, so you don’t need a separated procedure for that.

8

Thanks all! See you soon!

 

 

 

Oracle Data Integrator Requirements

Posted in ODI, ODI Architecture with tags , on August 5, 2019 by Rodrigo Radtke de Souza

Hi all,

Today is a very quick post, but since I hear this question often, I decided to post it:

What is the minimum machine requirement to run ODI?

Oracle give all the details here (version 12.2.1.3), not only for ODI but for all Oracle Fusion Middleware products:

See you soon!

ODI Hidden Gems – Exception Handling – Timeout(s)

Posted in ODI, Tips and Tricks with tags , on July 26, 2019 by Rodrigo Radtke de Souza

Hi all,

Today’s hidden gem is the “Exception Handling – Timeout(s)” option which is located at the Load Plan steps:

1

There are certain situations where we may have a very strict load window, which we cannot go over a certain limit of time. If this situation happens, the data load should abort before something bad happens. Luckily, ODI Load Plans have a very easy mechanism to handle timeout situations and most people are not aware of it. In every load plan step, we may add a timeout value which is the maximum time (in seconds) that this step takes before it is aborted by the Load Plan. When a timeout is reached, the step is marked in error and the Exception step (if defined) is executed.

It seems simple, but it can be very powerful, since this setting may be applied to any parent step (even the root step). In this case, we may have a safeguard to avoid a potential long running/overlapping situation for the entire load plan. As for example, if you want to be sure that the entire load finishes within 8 hours, just add a timeout value to the root step (28800 seconds) and it will stop in case it reaches this value. In daily execution load plans, you may set it to 86400 seconds (24 hours), so it does not overlap with the next daily execution. In the following screen, I set a 10 seconds timeout, and this is the error that is triggered when it reaches the timeout setting value.

2

That’s if folks, see ya!

ODI Hidden Gems – Log Steps in the Journal

Posted in ODI, Tips and Tricks with tags , on July 11, 2019 by Rodrigo Radtke de Souza

Hi all,

Today we will talk about “Log Steps in the Journal” option. This one resides the “Advanced” tab of every step inside an ODI package, as you can see below. Every step in a package appears in the execution log while being executed, but we may define whether the step should be kept in the journal after its execution is finished or not. The available options are:

  • Never: the step is deleted from the journal.
  • Always: the step is always kept in the journal.
  • Errors: the step is kept in the journal only if it failed. Otherwise, it is deleted.

1

Although it seems a very simple option, it has some good PROS and CONS about using it. Let’s talk about the PROS first. Imagine that the above scenario was going to loop the same procs 10 times. You would end up with a log like this:

2

You may want to keep the log of all those procedures executions, so you know what they did (like how many rows did they insert/delete/update). However, all the steps related to the loop variable are kind of useless, since they are only used to control the “loop” over the steps. In this case, if you wish to keep a cleaner log, you may set both variable steps to “Never”, like below:

3

Now, when you execute this package, you will have a much cleaner log:

4

However, this option comes with a bad CONS: I already saw several situations where people were trying to debug an execution in Operator and they were not understanding how the values were being assigned or they were not sure how some stuff were being populated if they don’t see any step in Operator related to that. After some time lost wondering about it and then double checking the same package in development, they would realize that someone had put that step to never log (sometimes even by accident). So, anytime that you are trying to debug something in Operator, and it seems weird or missing pieces, please make sure to look on the development package as well, since some steps may be set to Never log.

That’s it folks. A quick post today. See ya!