Hey guys how are you keeping? I hope everybody is healthy and keep this way in this difficult times.
And to make our life less complicate, here’s another tip. Let’s talk about how to concatenate stuff in Oracle.
Imagine a simple case, we want to query the Planning repository to get the list of UDA’s a member have. We can easily do that by query the HSP_OBJECT, HSP_MEMBER_TO_UDA and HSP_UDA tables.
I’m filtering just 3 products to make it easier for us to see. The results shows that each project has a different number of UDA’s, and we never know how many it’ll be, then the easiest way to concatenate them is to use the command LISTAGG (or WM_CONCAT if you are in a DB version prior to 11.1).
The command is very simple LISTAGG(Column, Separator) WITHIN GROUP (ORDER BY column). As we can see the command allow us to select the separator we want (can be comma or any string really) as well to order the results by another column). Let’s take a look in the example above.
As you can see, it easily create a list split my comma (as specified) for me, and the nice thing about it is that I don’t need to do any string treatment if return null or if I have just one string on it and things like that.
This is an extremely good Function and we heavily use it in ODI to generate dynamic code because its simplicity, for example, we can generate a SQL statement on the fly using the command on source and command on target:
With this results we can easily pass this info to the command on target to generate a dynamic query where ODI will replace the columns we got in the target as well the table name and will also loop for each row we have in the source. This is very handy.
And for the ones that are not in the ORACLE 11.2 and ahead, we can still do that using WM_CONCAT. Is not as powerful as LISTAGG, but works pretty well. Let’s try the first example again:
I cannot show you the results since WM_CONCAT was decommissioned in the 12c (my version), but it’ll work like this. We don’t have the option to choose the separator and to make the string unique and to order by it we need to add DISTINCT in the command WM_CONCAT(DISTINCT column).
Today I’ll post something that is very simple but very useful specially when working with ODI.
When we work with partitioned table we know that if we filter that table by the partitioned column Oracle will use that partition as source of data. But what if we are doing an Insert, Update or Merge?
There’s another way to explicit refer to a partition and make sure Oracle will be working inside that one and is by defining it in the From clause.
For example if I want to query the Partition “DELL_BALANCES_FY20_FEB” I can query:
As we can see, after the table name I specified the PARTITION (DELL_BALANCES_FY20_FEB) and put inside the parentheses the partition name (don’t specify as string) and that makes oracle distinct all the rows in that partition, and my Distinct of the PARTITION_KEY shows only one results as expected. (this command needs to come before the table alias).
If we are doing an Insert, Update or Merge the idea is the same:
This way we can, specially in the MERGE, make sure Oracle will be working in the right partition in the target table.
And it’s specially useful with ODI because we always know the partition we want to query or insert data when we use ODI, then we can always bind Oracle to a specific partition and make sure he’ll stay there.
We are delighted to tell everybody that we’ll be at KScope 20 in Boston.
We’ll be presenting the session Essbase Statistics DW: How automatically administrate Essbase using ODI on room 304, Level 3 => Wed, Jul 01, 2020 (02:15 PM – 03:15 PM). In this session we’ll talk about how to keep Essbase configuration up to date using ODI. We also will review
To have a fast Essbase cube, we must keep vigilance and follow its growth and its data movements, so we can distribute caches and adjust the database parameters accordingly. But this is a very difficult task to achieve since Essbase statistics are not temporal and only tell you the cube statistics are in that specific time frame. This session will present how ODI can be used to create a historical DW containing Essbase cube’s information and how to identify trends and patterns, giving us the ability to programmatically tune our Essbase databases automatically.
Hi all, today’s gem is something extremely useful that I’ve being using in every single project that I work on. Today’s gem is about SNP tables, which are the database tables that are used by ODI to store its metadata. When we install ODI, the installer asks us where we want to create our Master and Work repositories. Each repository contains a set of tables with different kind of information. From Oracle documentation:
Master Repository: This is a data structure containing information on the topology of the company’s IT resources, on security and on version management of projects and data models. This repository is stored on a relational database accessible in client/server mode from the different Oracle Data Integrator modules.
Work Repository:This is a data structure containing information about data models, projects, and their use. This repository is stored on a relational database accessible in client/server mode from the different Oracle Data Integrator modules.
You may think of them being the place where contains all information about the code that was developed in ODI, all the jobs that were executed, all the source and target tables and so on. These tables may give us answers to questions like: how many mapping objects does project X have? Which are the target mappings for a specific job? How many jobs are executing in a daily basis, how long does each of them take and how much data do they manipulate (insert/delete/update)? All those questions will eventually come to you after some time, and querying the SNP tables will provide you all the answers on what is going on in your ODI projects.
Below is one example of a query that returns a lot of information regarding all the ODI executions that happened in an ODI repository in a give time frame. It gives you the name of the scenarios, versions, when it began and ended, the session status, the order that they happened and (maybe the most important) which code was executed. The last info, together with how much time it took to execute, may be very useful to analyze which are the steps that are taking longer in your environment and then do something about them.
I wont go over each table and what they mean, but you may take a look on “Doc ID 1903225.1 : Oracle Data Integrator 11g and 12c Repository Description” in Oracle support for a full list of tables and their description. In the beginning, the number of tables and attributes may look intimidating, but once you start to use them you will see that the data architechture is fairly simple and you may retrieve a lot of good information out of them.
Without further due, here is the SQL. This one was created over ODI 12.2.1. Please notice that each ODI version may have changes in the repositories tables, which may lead you to modify those queries accordingly.
SELECT SS.SESS_NO, SS.SCEN_NAME, SS.SCEN_VERSION, SS.SESS_NAME, SS.PARENT_SESS_NO, SS.SESS_BEG, SS.SESS_END, SS.SESS_STATUS, DECODE(SS.SESS_STATUS,'D','Done','E','Error','M','Warning','Q','Queued','R','Running','W','Waiting',SS.SESS_STATUS) AS SESS_STATUS_DESC, SSL.NNO, SSTL.NB_RUN, SST.TASK_TYPE, DECODE(SST.TASK_TYPE,'C','Loading','J','Mapping','S','Procedure','V','Variable',SST.TASK_TYPE) AS TASK_TYPE_DESC, SST.EXE_CHANNEL, DECODE(SST.EXE_CHANNEL,'B','Oracle Data Integrator Scripting','C','Oracle Data Integrator Connector','J','JDBC','O','Operating System' ,'Q','Queue','S','Oracle Data Integrator Command','T','Topic','U','XML Topic',SST.EXE_CHANNEL) AS EXE_CHANNEL_DESC, SSTL.SCEN_TASK_NO, SST.PAR_SCEN_TASK_NO, SST.TASK_NAME1, SST.TASK_NAME2, SST.TASK_NAME3, SSTL.TASK_DUR, SSTL.NB_ROW, SSTL.NB_INS, SSTL.NB_UPD, SSTL.NB_DEL, SSTL.NB_ERR, SSS.LSCHEMA_NAME || '.' || SSS.RES_NAME AS TARGET_TABLE, CASE WHEN SST.COL_TECH_INT_NAME IS NOT NULL AND SST.COL_LSCHEMA_NAME IS NOT NULL THEN SST.COL_TECH_INT_NAME || '.' || SST.COL_LSCHEMA_NAME ELSE NULL END AS TARGET_SCHEMA, SSTL.DEF_TXT AS TARGET_COMMAND, CASE WHEN SST.DEF_TECH_INT_NAME IS NOT NULL AND SST.DEF_LSCHEMA_NAME IS NOT NULL THEN SST.DEF_TECH_INT_NAME || '.' || SST.DEF_LSCHEMA_NAME ELSE NULL END AS SOURCE_SCHEMA, SSTL.COL_TXT AS SOURCE_COMMAND FROM SNP_SESSION SS INNER JOIN SNP_STEP_LOG SSL ON SS.SESS_NO = SSL.SESS_NO INNER JOIN SNP_SESS_TASK_LOG SSTL ON SS.SESS_NO = SSTL.SESS_NO INNER JOIN SNP_SB_TASK SST ON SSTL.SB_NO = SST.SB_NO AND SSTL.SCEN_TASK_NO = SST.SCEN_TASK_NO AND SSL.NNO = SSTL.NNO AND SSTL.NNO = SST.NNO AND SSL.NB_RUN = SSTL.NB_RUN LEFT JOIN SNP_SB_STEP SSS ON SST.SB_NO = SSS.SB_NO AND SST.NNO = SSS.NNO WHERE SS.SESS_BEG >= TRUNC(SYSDATE) - 1 ORDER BY SESS_NO, NNO, SCEN_TASK_NO
I normally don’t talk about OBIEE, despite the fact I work a lot with it, because there’s a lot of people talking about it. But recently I worked in a project and I decide to use fragmented tables and aggregated tables at same time and I found that we don’t have much specific information regarding this matter, and of course, zero information about using ODI to enable this project.
Then since this will be kind of big, I decide to split this in 5 parts:
Creating the tables: I’ll explain the advantages and how you can create the fragmented tables and also how to further improve performance using partitioning and sub partitioning.
To start the series Fragmented and Aggregated tables in OBIEE, today we are talking about how to Creating the tables for our project.
In this project I had data from 4 different source system, and each source system had different amount of data. Dell data had 20 million rows per month, EMC data 30 million, DTC data 5 million and STAT data another 5 million.
The business wanted to have 15 quarters of data stored, then if we do a simple calculation we’ll going to have: 20.000.000 + 30.000.000 + 5.000.000 + 5.000.000 * 45 = 2,700,000,000 rows.
That’s a lot of data even for a partitioned table. Then I decide to split this tables in some way.
Analyzing the data, I saw that this tables contain 3 different account hierarchies and the users will only analyze one per time: BS, Income and PL2. I decide to split the tables by it. Then just with that now I had 3 fragmented tables. JAT_JE_DETAIL_BS, JAT_JE_DETAIL_INCOME and JAT_JE_DETAIL_PL2.
That’s nice, just with that I split that 2.7 trillion rows in 3. The problem now is that I have 4 different source system right, and even if I partition ate the table by source system, EMC is as big as the other 3 sources put together, and that would make that source slower than the other to retrieve.
With that I decided to split the tables by 2 again, one for EMC and another for the other sources. That makes the data to be distribute evenly. 20.000.000 + 5.000.000 + 5.000.000 in the D table and 30.000.000 in the E table (D and E is how I called them).
With that now we have 6 partitioned tables, right? 3 for EMC data (one for each Account Hierarchy) and another 3 tables for Dell, DTC and STAT data.
Now that we have a nice and even data distribution, we need to decide how we’ll going to create the partition and sub partition for these tables.
The partitions I always like to use the PERIOD_ID because it makes sense in most of cases (unless we are creating a Forecast app, in this case the users will want to see by scenario and not by one period).
Then for these tables JAT_JE_DETAIL_D_BS_FACT, JAT_JE_DETAIL_D_INCOME_FACT, JAT_JE_DETAIL_D_PL2_FACT we are going to partition by period. But what kind of partition we will going to use? Well, for this case, this tables contains Journal information at day level and the users want to drill from the month to the Journal rows of that mouth. What that means is that we’ll always going to work inside a month, then the best way to go is using a RANGE partition.
Range partitions, how the name already says, uses a Range of values to create the partition area. In our case it’ll be one month. The range can be built by a DATE column or a Number column, and the second one is our case since is easier to manipulate the periods using a number than a data.
(The period for this project is not a regular month, for example, the month of February FY20 start in Jan/30 and goes until Feb/2).
For the sub partitions is an easy call, we’ll going to use the Source System since we have 3 different one in this tables right (Dell, DTC and STAT), and the partition type will be LIST since we have a simple list of values. Also, for sub partitions is always easier to have values that don’t increase because the complexity of adding new sub partition values to a table.
What I mean is that to add a new partition you just need to use one ALTER TABLE ADD PARTITION statement and add any number of partitions you need. But for a sub partition is different. Every time you create a sub partition Oracle create a Sub Partition template. That template is used every time you create a Partition.
In our case for example, our Sub Partition Template would be DELL, DTC and STAT. What that means is that when you create another partition, this partition will be created with these 3 sub partitions automatically. Then if you want to add another value to a sub partition you need to do 2 things instead of just one (in partition case).
You need to change the sub partition template definition to add the new sub partition and you need add your new Sub partition in every single already existing Partitions (is not update automatically). After that all new partitions will contains all your sub partitions.
If you don’t update the template, every time you create a partition, you’ll not have the new value you want.
Enough of theory, let’s proceed with the other set of tables, JAT_JE_DETAIL_E_BS_FACT, JAT_JE_DETAIL_E_INCOME_FACT, JAT_JE_DETAIL_E_PL2_FACT. For these tables we’ll continue to have the partitions as a RANGE of the PERIOD_ID but the sub partition doesn’t make sense for us to use source system because we have only one source system.
And now is the time to explain another thing about partitions. For Oracle to use the partition you need to filter the column that is used to create the partition. For example, if we are creating a partition using the column PERIOD_ID, we must filter the data by PERIOD_ID and the ideal would be always have a filter in the Partition and Sub Partition column, because if you don’t, it’ll do a full table scan anyway. Then for a data retrieval point of view, the choice for the partitions are always made based in the Filter requirement, that means, we need to choose columns that we can force the user to choose in OBIEE. For an ETL point of view, the best would be to have partitions and sub partitions in columns that we can use to truncate before load.
For example, in our first set of tables I can easily and fast truncate a period and a source system before I load it back to the tables. That’s the beauty of partitions, I can drop, truncate and create on demand.
That said, you can imagine that some times what is good for the report is not good for the DW, but this is not our case today.
With this new piece of information in mind, the users are force in the OBIEE prompts to choose a Gen 2 Account, then this will be our sub partition for our second set of tables also with the partition type as LIST.
Before we proceed, let’s just see how everything is set then:
Table Name
Partition
Sub Partition
JAT_JE_DETAIL_D_BS_FACT
PERIOD_ID (Range)
SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_INCOME_FACT
PERIOD_ID (Range)
SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_PL2_FACT
PERIOD_ID (Range)
SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_E_BS_FACT
PERIOD_ID (Range)
ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_INCOME_FACT
PERIOD_ID (Range)
ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_PL2_FACT
PERIOD_ID (Range)
ACCOUNT_GEN2_ID (List)
Perfect, now the only thing that is missing here is the aggregate tables.
Aggregated table are tables that uses a dimension Drill to pre aggregate the vales that it contains. It’s an easy way to improve performance during a drill operation.
The most common aggregated table is for the Period dimension, and this is the one we’ll going to do here as well. Then in our case, as mentioned before, the 6 tables we have now contains day level data right. I want to have the data aggregated by Month and by Quarter.
What that means is that we’ll need to have one more table for each one of these 6 tables for Monthly level data and one more table for each one of these 6 tables for Quarterly level data. These tables will going to have the same Partitions and Sub Partitions definitions with only one difference, now instead of using a Range Partition, we’ll going to use a List Partition because we don’t have a range anymore, we’ll going to aggregate that range of days in one month.
Then let’s see how the tables needs to be set:
Table Name
Partition
Sub Partition
JAT_JE_DETAIL_D_BS_FACT
PERIOD_ID (Range)
SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_INCOME_FACT
PERIOD_ID (Range)
SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_PL2_FACT
PERIOD_ID (Range)
SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_M_BS_FACT
PERIOD_ID (List)
SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_M_INCOME_FACT
PERIOD_ID (List)
SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_M_PL2_FACT
PERIOD_ID (List)
SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_Q_BS_FACT
PERIOD_ID (List)
SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_Q_INCOME_FACT
PERIOD_ID (List)
SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_Q_PL2_FACT
PERIOD_ID (List)
SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_E_BS_FACT
PERIOD_ID (Range)
ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_INCOME_FACT
PERIOD_ID (Range)
ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_PL2_FACT
PERIOD_ID (Range)
ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_M_BS_FACT
PERIOD_ID (List)
ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_M_INCOME_FACT
PERIOD_ID (List)
ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_M_PL2_FACT
PERIOD_ID (List)
ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_Q_BS_FACT
PERIOD_ID (List)
ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_Q_INCOME_FACT
PERIOD_ID (List)
ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_Q_PL2_FACT
PERIOD_ID (List)
ACCOUNT_GEN2_ID (List)
Great, now we have 18 tables, 6 fragmented tables for daily level data, 6 fragmented tables for monthly aggregated level data (M) and 6 fragmented tables for quarterly aggregated level data (Q).
The last thing about tables is that we need a matching Period dimension (for this case) to match the data in each aggregated level. What that means is that, in this case, since we have 3 different level of aggregations, we need to have 3 Period dimensions.
OBIEE don’t allow us to join the same table using columns that are not an ID column. Then what we need is one PERIOD dimension (DIM_PERIOD) that contains the drill from Year to day (Period ID will be YYYYQMMDD number), another PERIOD Month Dimension (DIM_PERIOD_MONTH) that contains Drill from Year to Month (Period ID will be YYYYQMM number) and one last PERIOD Quarter table (DIM_PEIOD_QUARTER) that contains Drill from Year to Quarter (Period ID will be YYYYQ number).
Of course, the Aggregated FACT tables also will going to have a matching ID (and we need that to aggregate the data right).
Last thing to mentioning is that the Detail Fact table (the one in the Day level) it contains a lot of Journal details like Journal number, Description, Header ID and others. For the detail this is ok, but when we design the aggregated tables, only ID are allowed there for maximum aggregation. If the table contains Descriptions, nothing will be going to be aggregated right?
Don’t worry that OBIEE can handle this kind of situation very well.
That’s for now. I hope you guys enjoy it and found helpful.
Today’s gem is related to a very nice surprise that all ODI developers had when they were migrating from ODI 11 to ODI 12, which is the ability to load multiple target tables with the same ODI mapping object. You may have a very large mapping and “split” (with Split object) the result set into multiple targets, or you may just copy the same result to different target tables, mapping different fields in each of the targets. This post is not about what you can do when loading multiple targets (which is kind a lot due to this feature in ODI 12c), but how to control the order which those target tables are loaded.
Let’s start with the following example:
If you execute this mapping, this is what it will look like in Operator:
You can see that it loaded the tables in an order that was decided by ODI, probably in the same order that you dragged and dropped the models into the mapping. Now imagine that you want to have control over the order and need to load Table B before table A. The classical example where you would use that is when you retrieve a source dataset and you want to derive both the dimension and fact information out of it. In this case, you would want to load the dimension table first and the fact table second.
Luckily, ODI allow us to change the order, but its not too intuitive. In the Logical tab, you need to click in any area that does not contain an object (any white area will do). This will display the “Target Load Order” option:
Click on the “gear” icon in the far right and a menu will popup:
Now you may configure it to have any order that you want, let’s say table B, C and then A. If you save and run the mapping now, this is what you get:
Today’s short post is about a simple, but very powerful feature that often is overlooked: Begin/End Mapping Command. These options are in the Physical tab and, as their name suggests, they may issue any kind of command before a mapping begins and/or after it finishes.
Pay close attention to the detail that they may execute ANY command from ANY technology that ODI may handle and that’s why it is so powerful. You may run anything from Oracle DML statements, a piece of Java code, trigger OS commands and so on. This gives you a lot of flexibility.
A very common example that we may use those are to “track” some mapping in a separate log table. Although you have ODI Operator that contains all the log information on it, sometimes we may get a requirement to track all the executions of a particular mapping, so people know for sure when it ran and that the logs will not be purged by accident from the Operator by someone. Let’s see how we may accomplish logging the start and end times of a execution.
Let’s start with “Begin”. First you select which technology and logical schema that command refers to. In this case, we will insert the name of the mapping, the time that it started, and which was the session number that it was assigned to in ODI.
Let’s do the same with “End”:
Let’s run the mapping. When we go to Operator, we may see that two new tasks were created, one before and another one after the main mapping:
We may double click it to see the code that was executed:
If we query the LOG_INFO table, we will see two entries, one for begin and another one for end:
This was a very short example as you may do way more than that. You may send emails to alert that a critical mapping has completed, you may zip and move a file after it was just loaded by the mapping, you may run an OS bat file that will prepare your enviroment before a data load and so on. These two options are a great alternative for us to get all these “small” codes inside the ODI mapping object itself and rely less on small ODI procedures.
Today’s gem is indeed a very hidden one. ODI is known (unfortunately) to be “not intuitive” most of the times and I think that’s because we have many options that are scared across too many objects in the UI, which leads you to keep clicking on objects until you find what you need. To make things a little bit harder, you have the logical and physical tabs and each one of them has its own representation of the objects, so some of the options will be in the logical tab, some others in the physical tab. Lets talk today about the “Temporary Indexes” option that exists in some objects in the Physical tab of a mapping.
Very often you will load data from places that may not have an “index” concept, like files for example. You may create a mapping that will load any number of big files, put them in a staging area and filter/join them together to do some ETL. You may reach situations where those joins are not performing well since you don’t have an “index” on the files. Here is where ODI may help you with some Temporary Indexes creation. Let’s see this example (it’s very basic, but you will get the idea):
You join both files and load to a target table. When we execute this mapping, this is what we get:
Basically, you are loading both files to C$ tables and then you are joining those C$ tables when loading to the target.
If you think that this join would benefit from an index creation, you may configure ODI to create temporary indexes on that join. To do that, you will need to go to physical tab, click the “join” object and go to “Properties” as below. After you know where the option is, it seems pretty easy and obvious, however if you are not familiar with ODI, you will need some practice to actually figure out that you need to go to physical tab and then click on the specific object to get some unique properties of that object:
Change Index Type to one of the possible values (Bitmap, Non-Unique, Unique) and run the mapping again. You will see the index creation after it loads the data to C$:
This technique also works for filters objects. If we add a filter in any of the file columns and if you think that it would be beneficial to have an index on it, the steps to create it would be the same as before:
When executing, we can see all the indexes creation there:
If you go to Oracle documentation, there is a note there regarding using the Temporary Index creation:
The creation of temporary indexes may be a time-consuming operation in the overall flow. Oracle recommends reviewing execution statistics and comparing the execution time saved by the indexes to the time spent creating them.
It’s a very valid point. Most of the time we don’t need to create any temporary index, but if you end up in one of those cases that you need to create them, now you know where to find this option.
Hi all, today’s post will be about how ODI can be used to generate any kind of SQL statements using Oracle metadata tables. We always like to say that ODI is way more than just an ETL tool and that people needs to start to think about ODI as being a full development platform, where you may create any kind of code that you wish there. Today I’ll describe how we may create a simple (but dynamic) merge statement between two similar tables using an ODI procedure that will read from ALL_CONSTRAINTS, ALL_CONS_COLUMNS and ALL_TAB_COLS Oracle tables to figure out what to do.
This is the scenario that we will be working on: we have several stage tables that are truncated and loaded everyday with daily records from a source system. We have another set of tables that are used to store all the historical information and the process uses the first stage tables as sources, merging its data against the historical tables using their primary key. This is very common in a lot of places where we have a stage data layer that stores daily data pulls and then a “base” data layer that stores the historical data. In this scenario that we will describe here, both source and target set of tables have very similar structures, with the same column names, data types and so on.
Using the conventional ODI development process, we would need to create one mapping object for each set of source->target tables, so if we have 50 sources that needs to be merged against 50 targets, we would need to create 50 ODI mappings. Since the set of source->target tables are similar in this case, we may be smarter and create an ODI process that will receive a table name as a input parameter (in this case the target table name) and it will create a merge statement against those two tables in a dynamic way using Oracle metadata dictionary.
For those that are not familiar with Oracle metadata dictionary, its nothing more than a set of tables that exists in Oracle database that contains information about its existing components like, information about its tables, what are the columns that they have, which is their data type and so on. This is a great resource place that ODI may read from it and build generic code using its results. Let’s see how it looks like with a real example.
Imagine that you have two tables with the following structure:
As you can see, our base table is almost the same as our stage table and the only difference is that it contains 2 additional columns named INSERT_DTTM and UPDATE_DTTM that will be used as “control columns” to identify when that data was inserted/updated in our base table. For ODI to figure out which columns are presented in which table, we may query ALL_TAB_COLS in Oracle filtering its table name, as below:
This is showing us all the table columns that this table contains. Similarly, if we query ALL_CONSTRAINTS and ALL_CONS_COLUMNS, we may get all the table constraints (like Primary Key) with all its associated columns:
With those two sets of data, we may create a SQL that will build our dynamic merge statement. To make it easier, I’ll show you the final SQL statement now, that is divided in two pieces, and then I’ll explain each of them:
WITH TABLE_PARAMS AS
(
SELECT
'BASE_TABLE_A' AS TABLE_NAME,
'SCHEMA_A' AS TABLE_OWNER
FROM DUAL
),
TABLE_PK AS
(
SELECT
ACC.OWNER,
ACC.TABLE_NAME,
ACC.COLUMN_NAME
FROM ALL_CONSTRAINTS AC,
ALL_CONS_COLUMNS ACC,
TABLE_PARAMS
WHERE 1=1
AND AC.OWNER = ACC.OWNER
AND AC.TABLE_NAME = ACC.TABLE_NAME
AND AC.OWNER = TABLE_PARAMS.TABLE_OWNER
AND AC.TABLE_NAME = TABLE_PARAMS.TABLE_NAME
AND AC.CONSTRAINT_NAME = ACC.CONSTRAINT_NAME
AND AC.CONSTRAINT_TYPE = 'P'
)
,
MAIN_TAB_COLS AS
(
SELECT
ATC.OWNER,
ATC.TABLE_NAME,
ATC.COLUMN_NAME
FROM ALL_TAB_COLS ATC,
TABLE_PARAMS
WHERE 1=1
AND ATC.TABLE_NAME = TABLE_PARAMS.TABLE_NAME
AND ATC.OWNER = TABLE_PARAMS.TABLE_OWNER
AND ATC.COLUMN_NAME NOT IN ('INSERT_DTTM','UPDATE_DTTM')
AND ATC.COLUMN_NAME NOT IN (SELECT COLUMN_NAME FROM TABLE_PK)
)
SELECT
MTC.TABLE_NAME AS TARGET_TABLE,
REPLACE(MTC.TABLE_NAME,'BASE_','STG_') AS SOURCE_TABLE,
PK_ST_LIST,
PK_S_LIST||','||(LISTAGG('S.'||MTC.COLUMN_NAME ,',') WITHIN GROUP (ORDER BY MTC.COLUMN_NAME)) || ',SYSDATE,SYSDATE' AS TABLE_S,
PK_T_LIST||','||(LISTAGG('T.'||MTC.COLUMN_NAME ,',') WITHIN GROUP (ORDER BY MTC.COLUMN_NAME)) || ',T.INSERT_DTTM,T.UPDATE_DTTM' AS TABLE_T,
LISTAGG ('T.'||MTC.COLUMN_NAME||'=S.'||MTC.COLUMN_NAME , ',') WITHIN GROUP (ORDER BY MTC.COLUMN_NAME ) AS ST_COLS
FROM MAIN_TAB_COLS MTC,
(SELECT
TP.OWNER,
TP.TABLE_NAME,
LISTAGG ('T.'||TP.COLUMN_NAME||'=S.'||TP.COLUMN_NAME , ' AND ') WITHIN GROUP (ORDER BY TP.COLUMN_NAME ) PK_ST_LIST,
LISTAGG ('S.'||TP.COLUMN_NAME, ',') WITHIN GROUP (ORDER BY TP.COLUMN_NAME ) PK_S_LIST,
LISTAGG ('T.'||TP.COLUMN_NAME, ',') WITHIN GROUP (ORDER BY TP.COLUMN_NAME ) PK_T_LIST
FROM TABLE_PK TP
GROUP BY
TP.OWNER,
TP.TABLE_NAME
) TP
WHERE 1=1
AND MTC.OWNER = TP.OWNER
AND MTC.TABLE_NAME = TP.TABLE_NAME
GROUP BY
MTC.OWNER,
MTC.TABLE_NAME,
PK_ST_LIST,
PK_S_LIST,
PK_T_LIST;
The first piece of the SQL contains a WITH clause with three sections:
TABLE_PARAMS: used to “receive” the table name parameter that will be sent by ODI;
TABLE_PK: used to figure out which is the table PK that will be used to do a “merge” in our target table;
MAIN_TAB_COLS: used to retrieve all columns that exists in a table removing any kind of control columns (in this case INSERT_DTTM and UPDATE_DTTM) and any column that is already presented in the PK columns;
The second piece is the main one where we will use the three sub selects from the WITH section to build what we need. In this case, it will return the following columns:
TARGET_TABLE: name of the target table that will be merged;
SOURCE_TABLE: name of the source table that will be used as a source of the merge process;
PK_ST_LIST: PK columns list that will be used on merge process;
TABLE_S: column names from the source table;
TABLE_T: column names from target table;
ST_COLS: combination of source and target columns for update process inside the merge;
When we run the SQL for our tables in this example, this is the result:
Now we have all information that we need to create a dynamic merge statement for any set of similar tables, but how do we use it in ODI? This is very simple with one of the best features that ODI has (if you read our blog, you know that we just love it): command on source/target. Let’s create a procedure and add our SQL statement in the command on source tab:
In our command on target tab, we will add the following code there:
As you can see, this SQL contains a lot of variables in it. These variables will be used at runtime to receive the return values from the SQL in command on source. In this way, we don’t need to worry about creating 50 mappings to do 50 merge processes. Instead, we have one procedure that will receive a table name as a parameter and will build the necessary SQL accordingly. Let’s see how it looks like in an ODI package:
As you can see, it’s a very simple package that is receiving a table name as a parameter and then building/running a dynamic merge SQL. This package can be called by an external package that may run it N times with different table names (like doing 50 table mergers with one single procedure). Of course, that this was just one example of a simple merge task, but it shows you the main idea of having ODI building the code for you. You may add more tasks to your procedure to create temp tables, run gather statistics and so on. There are almost no limits on what you may do using this kind of technique.
Hi everybody! Today’s post is about installing an ODI 12c standalone agent. This is not a “new” topic and the steps to perform it can also be found at the Oracle site, however it got me a little bit “off guard” when I was requested to install one and the reason is that it changed considerably comparing to ODI11g (and yeah, we still work A LOT with ODI11g, so installing ODI12c agent was “new” for us).
Prior to ODI 12 version, the ODI agent was configured by simply editing a file called odiparams.bat (odiparams.sh in Linux), which would contain all the necessary agent configuration parameters. It was a simple step, where you would enter the ODI master/work configuration, DB/ODI connection users and so on. After that, you would simply run the agent program and that was it, very short and easy to do. However, in ODI 12 version, it changed considerably and now we need to go through two wizard setups, one for creating the necessary pre-requisite DB schema for ”Common Infrastructure Services” and the other one to configure the ODI Standalone agent for us.
This change added some extra complexity to an architecture that was (talking exclusively about ODI Standalone Agent here) very simple to setup in the old days. Although Oracle provides wizards for us to minimize this effort, nothing was easier than simply configuring a parameter file and running a java program. But enough grumbling, let’s see how we may accomplish this task on ODI 12.
The first wizard that we need to run is the Repository Creation Utility (RCU) that is located here at ORACLE_HOME/oracle_common/bin/rcu.bat. Before we run it, we must understand what RCU is and what it can do for us. As its name suggests, it is a utility that may be used to create any repository component required for Oracle Fusion Middleware products, including the ODI Master/Work repository.
In our project, we did not create ODI Master/Work repository with RCU, but instead we got two empty Oracle DB schemas and installed ODI directly there. The reason why we did not use RCU in this situation is because RCU will force you to create one single Oracle DB schema that will store both ODI Master and Work repositories and this is not a good approach when dealing with large environments. We think that Oracle’s rational on this subject was to simplify certain ODI installs by unifying all in a single place, but again, this removes some of the ODI’s architecture flexibility and complicates the use of complex architectures in the future, like using multiple Work repositories attached to one Master.
So, if we already have ODI Master/Work repositories created, why do we still need RCU? This is because, from ODI 12 version on, we need a third Oracle DB schema that will be used to store the “Common Infrastructure Services” tables that are required for the ODI Standalone agent and the only way to create these tables are using the RCU utility.
Now that we have set our expectations around RCU, let’s run it. The first screen is just a welcome screen explaining what RCU is about, so just click Next.
Now let’s select “Create Repository” and “System Load and Product Load”. Just notice that you will be asked for a DBA user in the next steps, since this DBA user will be used to create the necessary database objects (including the DB schema itself) in the new “Common Infrastructure Services” schema. Click Next.
Add the database and DBA information and click next.
ODI installer will check your information and if everything is ok, all tasks will be green. Select Ok to proceed.
In the next screen is where we may select which components we want RCU to install. We may notice that RCU is able to create several schemas for different components, from ODI to WebLogic. Since we already have our Master and Work repositories created, we just need to select “AS Common Schemas”/”Common Infrastructure Services”. Note here that, for this schema, RCU will create it using what is added in the “Create new prefix” option plus a “_STB” postfix. Click Next.
The installer will check the pre-requisites to install and if it is ok, a green check will appear. Click OK.
In the next screen you will identify which schema password will be used on the new created DB schema. Add a password and click next.
Define the Default and Temp table spaces that will be used by the new schema and click Next.
If the table spaces does not exist, they will be created for you. Click Ok.
The installer will check once more if everything is okay and also create the necessary table spaces. Click Ok.
On the next page, we are going to have a Summary on what the installer will do. If everything looks correct, click Create to create the necessary DB objects.
Check the Completion Summary, click close and that’s it! You have successfully created the “Common Infrastructure Services” schema, which is a pre-requisite for the ODI Agent install.
The next step is to run the wizard setup that will configure the ODI Standalone agent for us. Run the Config program on ORACLE_HOME/oracle_common/common/bin/config.cmd. In the first screen let’s create a new domain. In this domain folder is where the ODI Agent batch programs will reside, such as Start/Stop agent. Select a meaningful folder and click next.
In the next screen you will select “Oracle Data Integrator – Standalone Agent – 12.2.1.2.6 [odi]” and click next. This step will also install some basic Standalone components required for the ODI Agent.
Select a valid JDK location and click next.
Since we did not create our Master and Work repositories using RCU, we won’t be able to use the “RCU Data” option for Auto Configuration here. It is not a big deal, since we may select “Manual Configuration” and click next.
Here we will need to input all the information related to two schemas: The ODI Master and the “Common Infrastructure Services“. The way that this screen works is tricky and confusing, since there are options that may be typed for all schemas at once. The best way to do it without any mistake is by selecting one of them, add all information, then uncheck and check the other one and add all the information again. Click next.
The installer will check the information that was added here and if it is okay, two green marks will be showed in the Status column. Click next.
The next screen will be used to define our ODI Agent name. Create a meaningful name here, since this will be used by the ODI users to select on which ODI agent they will run their ETL processes. Click next.
Add the server address, the port and an ODI user/password that has “Supervisor” access. On preferred Data source option, leave it as odiMasterRepository and click next.
Although we are not going to use our ODI Standalone Agent in a Node Manager object, which would be controlled by WebLogic, we still need to select a type for it and create a new credential. Add any name and a password for it (don’t worry, you will not use it for the ODI Standalone Agent) and click next.
Review the install summary and if everything is ok, just click Create.
Check all the steps as they turn into green checks and once completed, click next.
That’s the end of the configurations! You have successfully completed the ODI Standalone agent configuration and it is ready to run.
In order to run the ODI agent, open a CMD command, navigate to your base domain folder and run the ODI Agent start program with its name as an input argument: agent.cmd –NAME=DEV_AGENT. Wait a little bit for it to load and when its status gets to “started” it is good to go.
Now that the ODI agent is up and running, we may go to ODI Topology/Agent and double click the ODI agent that you have created. Now we may click on the Test button and see what happens. If everything is correct, you will see an information windows saying that the ODI agent Test was Successful!
Congratulations, now you have an ODI12c Standalone Agent configured. As you can see, we now have some more extra steps to do compared to ODI11g. I hope this post helps you to get prepared for this new kind of installs.