Hey guys, how are you?
Continuing the series Fragmented and Aggregated tables in OBIEE using ODI and today we are talking about how to Populating the Fragmented tables using ODI.
Just to make easier for you to navigate in this series, here’s the parts of it:
Creating the tables: I’ll explain the advantages and how you can create the fragmented tables and also how to further improve performance using partitioning and sub partitioning.
Managing the partitions of the Fragmented/Aggregated tables: Here you’ll see how to manage all partitions using ODI.
Populating the Fragmented tables using ODI: Here you’ll find how to create generic components to load all fragmented tables we need to.
Populating the Aggregated tables using ODI: Here you’ll find how to create generic components to load all Aggregated tables we need to.
Setting the OBIEE Repository: Finally, we’ll going to setting up the OBIEE repository to make use of all tables.
In my previous posts we had design our 18 tables and the partition management process. Now we’ll talk about how we’ll going to load data to our fragmented tables. As you can imagine, for the 6 fragmented tables, the code will be the same, the only difference is what goes to each table. Then what we need to make ODI to do for us is to dynamically filter and load the data that comes from the same query to different tables based in a rule.
If you are following this series from the beginning, you probably already guessed what we’ll need to do here. The same thing we did for the Partition management process, well, almost the same thing because we’ll going to do something a little bit more optimize for the data load since this is the most time-consuming step.
The main difference between the approach I explained in the previous post is that it loops one Statement by each row that returns from the Source command in SERIAL, that means, we need to wait it to finish for the second loop to start. For the Partition Management that’s ok because everything there is very fast, but for that load we can have something better.
We don’t need and we don’t want to wait for a serial execution because we have one table for each Loop we’ll perform (This is not the case but even if we have only one table, if we have partitions and/or Sub-Partitions, we can Loop the code by the Partition/Sub partition level and load everything in Parallel. The only case we can’t load in parallel is if we have a single table without partition).
In our case we’ll going to loop all the 6 tables load in parallel. To do that, we are going to use the same approach as before but instead of having the ALTER TABLE in the Command on Target, we’ll going to have an OdiStartScen, to call a Scenario multiple times in PARALLEL. Let’s take a look in the Command on Target:
As we can see, the Command on Target contains an ODI tool called OdiStartScen that is used to call a scenario. We have a few parameters there to set how we’ll going to call the scenario and, in this case, the important one is the SYNC_MODE=2, that means it’ll execute everything in parallel (1 is serial). Everything else are variables that we are using to pass information from the Command on Source to the Called scenario.
A quick tip here before we continue, if you want to know how in havens, I have all this setting in mind, for your information I don’t, but there’s a very easy way to find out. Just drop an ODI Start Scen from the ODI Toolbox in any part of your scenario and fill with the information you need:
In the General we set the scenario we want to load, version of the scenario (-1 is always the latest scenario you have), the Synchronous and Asynchronous mode (serial and parallel), Session name, in case you want to have an different name showing in your execution (this is always a good Idea to have since you are looping the same code with different parameters, then in my case I pass as a variable the name of the table, the period and the hierarchy that scenario will be loading).
In the Additional Variables tab you can set all variables you need to use inside the scenario (as you can see in my case, I have a lot of variables because the complexity of my sources, you case will be different).
And after you set everything, you just need to click in the Command tab to get the code that ODI generates based in the setting you did:
This is the code that you need to paste in the Command on target to call a scenario. After that you can delete the OdiStartScen object from your scenario and continue the development. Another advantage of this is that if you have the code and want to create an OdiStartScen step in a scenario, you can paste in the Command tab and the General/Additional Variable will be filled based on the code.
With the Target code ready we just need a query that will going to pass all the information needed to call the scenario. In my case, I pass a lot of information to the target, as you can see by the amount of variable I have. Also, the variable information doesn’t need to come only from the command on source. If you have variable in your scenario and they are the same for all the loops, you can just use these variables to pass their values to the Scenario the command on target is calling. The command on source needs only to have the information that will dynamically change in each loop.
In my case I’m passing all kind of information like the SESSION_NAME (the name of the session that we use to create parameter specific for each session we have and use the same variables for all, more information HERE) but the important one for us is the TABLE_NM.
In my case, the other variables will change the values depending of the table we need to load for each time we loop. To do that I insert all information I need to be pass to the scenario I’m calling in a Parameter table (that I like to call ODI_PARAMETERS). This way I can easily query this table to get specific information I need for each Loop I’m going to perform.
With all this set, we just need to create a child scenario and add “Declare” variables for each Variable we want to pass to the internal scenario like this:
As you can see, this is a very simple scenario that contains just one procedure that’ll be used to load the data to our tables.
This approach can be done with Interfaces as well, in fact with anything at all. We can have a mains scenario that loops a very complex scenario, that loops another scenario and so on. This is a very nice way to loop something because we have a lot of control over this kind of loop as well, we can change the behavior of each loop because the info we get from the Command on Source.
Also, a very good feature is that, if the source query doesn’t return any rows, the Command on Target is not executed at all. Then you’ll always have a clean execution. For example, if you have a folder that can have 0 to X number of files and you want to create an scenario to load it, the best approach would be to read this folder files (OS command to a file and loaf file or Java or Phyton or…) and store the info in our ODI parameter. Then we create a proc to call a scenario that will contain the interface that will load each file, and finally we just do a Select in the Command on Source to get the name of the files in the folder. With that, if we have 0 files, nothing will be executed, but if we have 1000 files, we’ll loop the scenario 1000 times and each time we’ll pass the name of a different file to the internal scenario. Very neat way to do it.
Inside the procedure we just need to create the query need to load the data and use the variables to filter the right data and load the right table. This is not my case but if you have different source for each table you want to load, just insert this information in the parameter table and pass it to the internal scenario. Use these variables to replace the Table Names in the SQL and you are done.
And that’s it. This is all we need to do to load all our 6 fragmented tables. In my next post I’ll be talking about how we can aggregate our data and load our Aggregated tables.
I hope you guys enjoy and see you soon.