Archive for Oracle

Fragmented and Aggregated tables in OBIEE using ODI Part 4/5: Populating the Aggregated tables

Posted in Hacking, OBIEE, ODI, ODI Architecture, Oracle, Oracle Database, Tips and Tricks, Uncategorized with tags , , , , on February 12, 2020 by RZGiampaoli

Hey guys, how are you?

Continuing the series Fragmented and Aggregated tables in OBIEE using ODI and today we are talking about how to Populating the Aggregated tables using ODI.

Just to make easier for you to navigate in this series, here’s the parts of it:

Creating the tables: I’ll explain the advantages and how you can create the fragmented tables and also how to further improve performance using partitioning and sub partitioning.

Managing the partitions of the Fragmented/Aggregated tables: Here you’ll see how to manage all partitions using ODI.

Populating the Fragmented tables using ODI: Here you’ll find how to create generic components to load all fragmented tables we need to.

Populating the Aggregated tables using ODI: Here you’ll find how to create generic components to load all Aggregated tables we need to.

Setting the OBIEE Repository: Finally, we’ll going to setting up the OBIEE repository to make use of all tables.

Today we are in the final step before we can work in our OBIEE repository to put all these 18 fact tables together. The data load for our Aggregated tables.

The aggregation in fact is a very simple process, we just need to remove all detailed columns we have in the detail fact table and leave just the ID’s columns. After that we just need to reduce the level of the ID of the column we want to aggregate and sum all the data at that new level.

In our case we going to use the PERIOD_ID to do that, because period is the most common choice when we talk about aggregated table and serve well in most of the cases.

And that’s why I design the PERIOD_ID as YYYYQMMDD, because is very simple and easy to manipulate this number to go up or down a period, as well to do range or even transform it back to date. It’s way easier than create a surrogate key or whatever, even if you want to work with SCD.

As you probably already guest by now, we’ll use the command on source and command on target again to do the aggregations, this way we can have only one code to spread the data through out aggregate fact tables.

In the command on source for the monthly level table, we just need a query in the source that return the name of the detailed table plus the name of the monthly table. Since I designed all tables with a specific name pattern, we can easily manipulate the table name to get the month table from the detail table like this:

We don’t need anything fancy here because in the last post we create a proc to call six time the same scenario passing different parameter to it.

One of these parameter is the name of the fact table it needs to be loaded and this information is store in the variable #JAT_TABLE_NAME (already replaced by the value in the picture) and what I have done there is just split the table name using REGEXP to get the forth and the fifth occurrences of ‘_’ and concatenate everything back adding a _M_ in the middle of it, creating my Monthly level fact table name.

With data we have the detail table name that the scenario needs to load and also the monthly level fact table name that we need to use for that loop. We just need to create a query to aggregate the data, what’s very straight forward.

As the query will change depending of the design of the table, this one is just intended for explain what needs to be done since the idea is very simple. First we replace the name of the table that  we wish to insert data and the table that we wish to get the data from for our to variables: #JAT_TABLE_NAME that we are sending when we call this scenario and #JAT_M_TABLE_NAME that we just create in the Command on Source tab.

With this, for each loop, the scenario will get the data from the right source table and will insert in the right aggregated table. Now we need to aggregate the data, also a very simple matter. All we need to do is to join the detailed fact table with the period dimension and, because this is a range partition, we need to get the first date of that month and the last date of that month, that’s why we have that MIN and MAX filtered by the Year and Month.

With the right data filtered, we just need to aggregate the data and use the FISCAL_MONTH_ID instead of the PERIOD_ID, this way the data will be aggregated by month. And we are done here.

By the way, we could instead of using the between to get the right range of the partition filtered the partition itself using explicit partition filtering:

This will make oracle to go straight to that partition. The only thing is that you need to pass the partition name, that in our case is very straight forward, specially because we are creating and managing the partitions ourselves. We could have oracle create the partitions automatically using INTERVAL (for another post) but if we do that oracle will also create the names of partitions like SYS###### and that will make everything harder to filter by partitions. Anyway, this is just another option we can have.

For the quarter level it’s even easier because we don’t need to worry about range partitions anymore. We just need to have our Command on Source return the Monthly level table name and our Quarterly level table name:

As you can see, the query is the same, the only difference is that we insert a ‘_Q_’ in the middle of our string. And for the insert in the Target Command tab we just need, as before, replace the tables using the right variable in each case, join with the DIM_PERIOD_MONTH to have the right level of data, filter the Monthly level table using any method and then use the QUARTER_ID to sum the Monthly level data to the Quarterly level data and that’s it.

That’s all we need to do to populate all aggregated table. And we finally have all data populated in our tables and now we can start to create the OBIEE repository. It’s wort to mentioning that if you resume everything that I said until this point, we basically had to:

  • Create our tables;
  • Create one procedure to:
    • Create the partitions;
    • Truncate the partitions before we load data;
    • Call and loop the scenario that will load data;
    • Drop the old partitions
  • Create another procedure to:
    • Load detail data;
    • Load Monthly level data;
    • Load Quarterly level data;

What this means is that with just two procedure we can maintain and populate a very complex environment in a very clean way with very little points of failures.

This is a very elegant approach and as I always said, if is too complex and/or difficult, you were doing something wrong. It is just a meter of think about the design first, all that needs to be done, and split it in a logical and reusable way. If you do that, everything will be simple to be archived using ODI.

I hope this series has been useful so far and I see you in the last chapter.

Advertisement

Fragmented and Aggregated tables in OBIEE using ODI Part 3/5: Populating the Fragmented tables

Posted in ACE, Data Warehouse, Hacking, OBIEE, ODI, ODI Architecture, Oracle, Oracle Database, Query, SQL with tags , , , , on February 11, 2020 by RZGiampaoli

Hey guys, how are you?

Continuing the series Fragmented and Aggregated tables in OBIEE using ODI and today we are talking about how to Populating the Fragmented tables using ODI.

Just to make easier for you to navigate in this series, here’s the parts of it:

Creating the tables: I’ll explain the advantages and how you can create the fragmented tables and also how to further improve performance using partitioning and sub partitioning.

Managing the partitions of the Fragmented/Aggregated tables: Here you’ll see how to manage all partitions using ODI.

Populating the Fragmented tables using ODI: Here you’ll find how to create generic components to load all fragmented tables we need to.

Populating the Aggregated tables using ODI: Here you’ll find how to create generic components to load all Aggregated tables we need to.

Setting the OBIEE Repository: Finally, we’ll going to setting up the OBIEE repository to make use of all tables.

In my previous posts we had design our 18 tables and the partition management process. Now we’ll talk about how we’ll going to load data to our fragmented tables. As you can imagine, for the 6 fragmented tables, the code will be the same, the only difference is what goes to each table. Then what we need to make ODI to do for us is to dynamically filter and load the data that comes from the same query to different tables based in a rule.

If you are following this series from the beginning, you probably already guessed what we’ll need to do here. The same thing we did for the Partition management process, well, almost the same thing because we’ll going to do something a little bit more optimize for the data load since this is the most time-consuming step.

The main difference between the approach I explained in the previous post is that it loops one Statement by each row that returns from the Source command in SERIAL, that means, we need to wait it to finish for the second loop to start. For the Partition Management that’s ok because everything there is very fast, but for that load we can have something better.

We don’t need and we don’t want to wait for a serial execution because we have one table for each Loop we’ll perform (This is not the case but even if we have only one table, if we have partitions and/or Sub-Partitions, we can Loop the code by the Partition/Sub partition level and load everything in Parallel. The only case we can’t load in parallel is if we have a single table without partition).

In our case we’ll going to loop all the 6 tables load in parallel. To do that, we are going to use the same approach as before but instead of having the ALTER TABLE in the Command on Target, we’ll going to have an OdiStartScen, to call a Scenario multiple times in PARALLEL. Let’s take a look in the Command on Target:

As we can see, the Command on Target contains an ODI tool called OdiStartScen that is used to call a scenario. We have a few parameters there to set how we’ll going to call the scenario and, in this case, the important one is the SYNC_MODE=2, that means it’ll execute everything in parallel (1 is serial). Everything else are variables that we are using to pass information from the Command on Source to the Called scenario.

A quick tip here before we continue, if you want to know how in havens, I have all this setting in mind, for your information I don’t, but there’s a very easy way to find out. Just drop an ODI Start Scen from the ODI Toolbox in any part of your scenario and fill with the information you need:

In the General we set the scenario we want to load, version of the scenario (-1 is always the latest scenario you have), the Synchronous and Asynchronous mode (serial and parallel), Session name, in case you want to have an different name showing in your execution (this is always a good Idea to have since you are looping the same code with different parameters, then in my case I pass as a variable the name of the table, the period and the hierarchy that scenario will be loading).

In the Additional Variables tab you can set all variables you need to use inside the scenario (as you can see in my case, I have a lot of variables because the complexity of my sources, you case will be different).

And after you set everything, you just need to click in the Command tab to get the code that ODI generates based in the setting you did:

This is the code that you need to paste in the Command on target to call a scenario. After that you can delete the OdiStartScen object from your scenario and continue the development. Another advantage of this is that if you have the code and want to create an OdiStartScen step in a scenario, you can paste in the Command tab and the General/Additional Variable will be filled based on the code.

With the Target code ready we just need a query that will going to pass all the information needed to call the scenario. In my case, I pass a lot of information to the target, as you can see by the amount of variable I have. Also, the variable information doesn’t need to come only from the command on source. If you have variable in your scenario and they are the same for all the loops, you can just use these variables to pass their values to the Scenario the command on target is calling. The command on source needs only to have the information that will dynamically change in each loop.

In my case I’m passing all kind of information like the SESSION_NAME (the name of the session that we use to create parameter specific for each session we have and use the same variables for all, more information HERE) but the important one for us is the TABLE_NM.

In my case, the other variables will change the values depending of the table we need to load for each time we loop. To do that I insert all information I need to be pass to the scenario I’m calling in a Parameter table (that I like to call ODI_PARAMETERS). This way I can easily query this table to get specific information I need for each Loop I’m going to perform.

With all this set, we just need to create a child scenario and add “Declare” variables for each Variable we want to pass to the internal scenario like this:

As you can see, this is a very simple scenario that contains just one procedure that’ll be used to load the data to our tables.

This approach can be done with Interfaces as well, in fact with anything at all. We can have a mains scenario that loops a very complex scenario, that loops another scenario and so on. This is a very nice way to loop something because we have a lot of control over this kind of loop as well, we can change the behavior of each loop because the info we get from the Command on Source.

Also, a very good feature is that, if the source query doesn’t return any rows, the Command on Target is not executed at all. Then you’ll always have a clean execution. For example, if you have a folder that can have 0 to X number of files and you want to create an scenario to load it, the best approach would be to read this folder files (OS command to a file and loaf file or Java or Phyton or…) and store the info in our ODI parameter. Then we create a proc to call a scenario that will contain the interface that will load each file, and finally we just do a Select in the Command on Source to get the name of the files in the folder. With that, if we have 0 files, nothing will be executed, but if we have 1000 files, we’ll loop the scenario 1000 times and each time we’ll pass the name of a different file to the internal scenario. Very neat way to do it.

Inside the procedure we just need to create the query need to load the data and use the variables to filter the right data and load the right table. This is not my case but if you have different source for each table you want to load, just insert this information in the parameter table and pass it to the internal scenario. Use these variables to replace the Table Names in the SQL and you are done.

And that’s it. This is all we need to do to load all our 6 fragmented tables. In my next post I’ll be talking about how we can aggregate our data and load our Aggregated tables.

I hope you guys enjoy and see you soon.

Fragmented and Aggregated tables in OBIEE using ODI Part 1/5: Creating the tables

Posted in OBIEE, ODI, ODI 12c, Oracle, Oracle 11.2.0, Oracle Database, Performance, SQL, Tips and Tricks with tags , , , , on February 7, 2020 by RZGiampaoli

Hey guys, how are you?

I normally don’t talk about OBIEE, despite the fact I work a lot with it, because there’s a lot of people talking about it. But recently I worked in a project and I decide to use fragmented tables and aggregated tables at same time and I found that we don’t have much specific information regarding this matter, and of course, zero information about using ODI to enable this project.

Then since this will be kind of big, I decide to split this in 5 parts:

Creating the tables: I’ll explain the advantages and how you can create the fragmented tables and also how to further improve performance using partitioning and sub partitioning.

Managing the partitions of the Fragmented/Aggregated tables: Here you’ll see how to manage all partitions using ODI.

Populating the Fragmented tables using ODI: Here you’ll find how to create generic components to load all fragmented tables we need to.

Populating the Aggregated tables using ODI: Here you’ll find how to create generic components to load all Aggregated tables we need to.

Setting the OBIEE Repository: Finally, we’ll going to setting up the OBIEE repository to make use of all tables.

To start the series Fragmented and Aggregated tables in OBIEE, today we are talking about how to Creating the tables for our project.

In this project I had data from 4 different source system, and each source system had different amount of data. Dell data had 20 million rows per month, EMC data 30 million, DTC data 5 million and STAT data another 5 million.

The business wanted to have 15 quarters of data stored, then if we do a simple calculation we’ll going to have: 20.000.000 + 30.000.000 + 5.000.000 + 5.000.000 * 45 = 2,700,000,000 rows.

That’s a lot of data even for a partitioned table. Then I decide to split this tables in some way.

Analyzing the data, I saw that this tables contain 3 different account hierarchies and the users will only analyze one per time: BS, Income and PL2. I decide to split the tables by it. Then just with that now I had 3 fragmented tables. JAT_JE_DETAIL_BS, JAT_JE_DETAIL_INCOME and JAT_JE_DETAIL_PL2.

That’s nice, just with that I split that 2.7 trillion rows in 3. The problem now is that I have 4 different source system right, and even if I partition ate the table by source system, EMC is as big as the other 3 sources put together, and that would make that source slower than the other to retrieve.

With that I decided to split the tables by 2 again, one for EMC and another for the other sources. That makes the data to be distribute evenly. 20.000.000 + 5.000.000 + 5.000.000 in the D table and 30.000.000 in the E table (D and E is how I called them).

With that now we have 6 partitioned tables, right? 3 for EMC data (one for each Account Hierarchy) and another 3 tables for Dell, DTC and STAT data.

Now that we have a nice and even data distribution, we need to decide how we’ll going to create the partition and sub partition for these tables.

The partitions I always like to use the PERIOD_ID because it makes sense in most of cases (unless we are creating a Forecast app, in this case the users will want to see by scenario and not by one period).

Then for these tables JAT_JE_DETAIL_D_BS_FACT, JAT_JE_DETAIL_D_INCOME_FACT, JAT_JE_DETAIL_D_PL2_FACT we are going to partition by period. But what kind of partition we will going to use? Well, for this case, this tables contains Journal information at day level and the users want to drill from the month to the Journal rows of that mouth. What that means is that we’ll always going to work inside a month, then the best way to go is using a RANGE partition.

Range partitions, how the name already says, uses a Range of values to create the partition area. In our case it’ll be one month. The range can be built by a DATE column or a Number column, and the second one is our case since is easier to manipulate the periods using a number than a data.

(The period for this project is not a regular month, for example, the month of February FY20 start in Jan/30 and goes until Feb/2).

For the sub partitions is an easy call, we’ll going to use the Source System since we have 3 different one in this tables right (Dell, DTC and STAT), and the partition type will be LIST since we have a simple list of values. Also, for sub partitions is always easier to have values that don’t increase because the complexity of adding new sub partition values to a table.

What I mean is that to add a new partition you just need to use one ALTER TABLE ADD PARTITION statement and add any number of partitions you need. But for a sub partition is different. Every time you create a sub partition Oracle create a Sub Partition template. That template is used every time you create a Partition.

In our case for example, our Sub Partition Template would be DELL, DTC and STAT. What that means is that when you create another partition, this partition will be created with these 3 sub partitions automatically. Then if you want to add another value to a sub partition you need to do 2 things instead of just one (in partition case).

You need to change the sub partition template definition to add the new sub partition and you need add your new Sub partition in every single already existing Partitions (is not update automatically). After that all new partitions will contains all your sub partitions.

If you don’t update the template, every time you create a partition, you’ll not have the new value you want.

Enough of theory, let’s proceed with the other set of tables, JAT_JE_DETAIL_E_BS_FACT, JAT_JE_DETAIL_E_INCOME_FACT, JAT_JE_DETAIL_E_PL2_FACT. For these tables we’ll continue to have the partitions as a RANGE of the PERIOD_ID but the sub partition doesn’t make sense for us to use source system because we have only one source system.

And now is the time to explain another thing about partitions. For Oracle to use the partition you need to filter the column that is used to create the partition. For example, if we are creating a partition using the column PERIOD_ID, we must filter the data by PERIOD_ID and the ideal would be always have a filter in the Partition and Sub Partition column, because if you don’t, it’ll do a full table scan anyway. Then for a data retrieval point of view, the choice for the partitions are always made based in the Filter requirement, that means, we need to choose columns that we can force the user to choose in OBIEE. For an ETL point of view, the best would be to have partitions and sub partitions in columns that we can use to truncate before load.

For example, in our first set of tables I can easily and fast truncate a period and a source system before I load it back to the tables. That’s the beauty of partitions, I can drop, truncate and create on demand.

That said, you can imagine that some times what is good for the report is not good for the DW, but this is not our case today.

With this new piece of information in mind, the users are force in the OBIEE prompts to choose a Gen 2 Account, then this will be our sub partition for our second set of tables also with the partition type as LIST.

Before we proceed, let’s just see how everything is set then:

Table NamePartitionSub Partition
JAT_JE_DETAIL_D_BS_FACTPERIOD_ID (Range)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_INCOME_FACTPERIOD_ID (Range)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_PL2_FACTPERIOD_ID (Range)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_E_BS_FACTPERIOD_ID (Range)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_INCOME_FACTPERIOD_ID (Range)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_PL2_FACTPERIOD_ID (Range)ACCOUNT_GEN2_ID (List)

Perfect, now the only thing that is missing here is the aggregate tables.

Aggregated table are tables that uses a dimension Drill to pre aggregate the vales that it contains. It’s an easy way to improve performance during a drill operation.

The most common aggregated table is for the Period dimension, and this is the one we’ll going to do here as well. Then in our case, as mentioned before, the 6 tables we have now contains day level data right. I want to have the data aggregated by Month and by Quarter.

What that means is that we’ll need to have one more table for each one of these 6 tables for Monthly level data and one more table for each one of these 6 tables for Quarterly level data. These tables will going to have the same Partitions and Sub Partitions definitions with only one difference, now instead of using a Range Partition, we’ll going to use a List Partition because we don’t have a range anymore, we’ll going to aggregate that range of days in one month.

Then let’s see how the tables needs to be set:

Table NamePartitionSub Partition
JAT_JE_DETAIL_D_BS_FACTPERIOD_ID (Range)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_INCOME_FACTPERIOD_ID (Range)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_PL2_FACTPERIOD_ID (Range)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_M_BS_FACTPERIOD_ID (List)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_M_INCOME_FACTPERIOD_ID (List)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_M_PL2_FACTPERIOD_ID (List)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_Q_BS_FACTPERIOD_ID (List)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_Q_INCOME_FACTPERIOD_ID (List)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_D_Q_PL2_FACTPERIOD_ID (List)SOURCE_SYSTEM_ID (List)
JAT_JE_DETAIL_E_BS_FACTPERIOD_ID (Range)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_INCOME_FACTPERIOD_ID (Range)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_PL2_FACTPERIOD_ID (Range)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_M_BS_FACTPERIOD_ID (List)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_M_INCOME_FACTPERIOD_ID (List)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_M_PL2_FACTPERIOD_ID (List)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_Q_BS_FACTPERIOD_ID (List)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_Q_INCOME_FACTPERIOD_ID (List)ACCOUNT_GEN2_ID (List)
JAT_JE_DETAIL_E_Q_PL2_FACTPERIOD_ID (List)ACCOUNT_GEN2_ID (List)

Great, now we have 18 tables, 6 fragmented tables for daily level data, 6 fragmented tables for monthly aggregated level data (M) and 6 fragmented tables for quarterly aggregated level data (Q).

The last thing about tables is that we need a matching Period dimension (for this case) to match the data in each aggregated level. What that means is that, in this case, since we have 3 different level of aggregations, we need to have 3 Period dimensions.

OBIEE don’t allow us to join the same table using columns that are not an ID column. Then what we need is one PERIOD dimension (DIM_PERIOD) that contains the drill from Year to day (Period ID will be YYYYQMMDD number), another PERIOD Month Dimension (DIM_PERIOD_MONTH) that contains Drill from Year to Month (Period ID will be YYYYQMM number) and one last PERIOD Quarter table (DIM_PEIOD_QUARTER) that contains Drill from Year to Quarter (Period ID will be YYYYQ number).

Of course, the Aggregated FACT tables also will going to have a matching ID (and we need that to aggregate the data right).

Last thing to mentioning is that the Detail Fact table (the one in the Day level) it contains a lot of Journal details like Journal number, Description, Header ID and others. For the detail this is ok, but when we design the aggregated tables, only ID are allowed there for maximum aggregation. If the table contains Descriptions, nothing will be going to be aggregated right?

Don’t worry that OBIEE can handle this kind of situation very well.

That’s for now. I hope you guys enjoy it and found helpful.

See you soon.

Oracle SQL for EPM Tips and Tricks S01EP08

Posted in Connect By, DEVEPM, Dimensions, EPM, Oracle, Oracle 11.2.0, Oracle 11.2.0.4, Oracle Database, Performance, REGEXP, Tips and Tricks with tags , , , , , on November 26, 2019 by RZGiampaoli

Hey guys how are you? Today I’ll expand a little bit more the REGEXP team just to show how easier and powerful these functions are, not to mentioning how dynamic.

In the last post S01EP04 I explained how to split a string in different columns. The thing is, what if you don’t want to split in different columns? what if you want just to have the string as a list in the same column?

We can also do that with a very nice trick using REGEXP_COUNT. Let’s take a look on this.

In the previous post we had a query like this:

As I said there, using REGEXP is very simple to “walk”trough the string and get any part of it you want. And is also simple to transform a string in a list of values. To do that we just need to add one extra connect by and the REGEXP_COUNT function like this:

What that CONNECT BY LEVEL <= REGEXP_COUNT(PATH, ‘[^|]+’) is doing is basically increasing by 1 for each “|” he finds in the string we have, in other words, he is dynamically “walking” through the string looking for “|” and passing the Level he find a “|” to the REGEXP_SUBSTR above. The result is:

As you can see, the REGEXP_COUNT will return the number of “|” he finds in the string, the CONNECT BY LEVEL will multiply that string by that number and the REGEXP_SUBSTR will “walk” the string using the LEVEL of each row to extract the list of members from that string.

This is very useful to manipulate strings that you don’t know the amount of members you want to extract from that, like UDA members for example, that when you extract from ASO cubes, comes in a string all together.

I hope this is a useful trick for you guys and see you soon.

Oracle SQL for epm tips and tricks S01EP07

Posted in ACE, DEVEPM, EPM, Oracle, RANK, SQL with tags , , , , , on August 15, 2019 by RZGiampaoli

Continuing our Oracle SQL for EPM series, today we’ll start to talk about analytic functions and how can we use them for more than “just” analytics.

To start with, let’s talk about RANK(). As the name suggest, RANK() is used to rank our data based in something. It’s very useful to find out each data is more relevant than others. Let’s see a example:

Here we have a small table with 2 currencies and a few products. Let’s first start with the basic function of RANK() and see each product generated more income:

The basic syntax is RANK() OVER (ORDER BY COLUMN). Basically what you are saying to oracle is, rank my data based by a column (or multiple columns). Since I just ordered by data, the values of the RANK() got duplicated everything oracle finds the same value. This is because we have 2 currencies and they are both USD.

To fix data we can do 2 things: Or we can include currency in side the order by or we can use another more advanced use of RANK() that is OVER PARTITION.

Let’s see how it works:

If I just add another column in the ORDER BY, it’ll basically create the Rank based in the order of these 2 columns. It’s the same as do a ORDER BY and then follow the order of the data that returns. Then in this case, you can see that the products PR235 for Functional Data got Rank 1 and for USD rank 11, even both having the same value. By the way, you also can see that the Ranks is ordering in the opposite order that we would like to have. This was intentional to show you how the Rank is produce. To fix that we just need to put a DESC in the ORDER BY clause, like we would do in a normal ORDER BY.

Ok then let’s see the more advanced way to write this query:

Instead of inserting new columns in the ORDER BY we can use PARTITION BY instead. The results here is the same, but this can be used in other ways as well and I would say that this would be the best way to used it since is more clear what you want to do.

The PARTITION BY does exactly what the name says, it partition the data by the content of one or more columns. In fact, the PARTITION BY clause can be used in most off the analytics functions like MAX, SUM, MIN, AVG…. then it’s very powerful and the best thing is that, if you use it, you don’t need to use a group by (we’ll see that in the future).

Now, as I said before, we can have other uses for RANK than just ranking data. Let’s say that you have this table without the CURRENCIES column:

Without the CURRENCIES column we end up with duplicate data in the table right? In this case we could do just a distinct and use the data as is, but let’s say you want to create the CURRENCIES column based in the data that we have, and the rule would be, the first data you find is USD and the second (if exists) would be Function. We can use Rank for that too:

Since here the data is the same for the same product, the only thing that could differentiate them was the ROWNUM (or ROWID, that would be better to make sure each one was the first one, but harder to see the example) I used it to create a Rank that shows each row has the lowest ROWNUM and that would have the Rank 1, the second one will be 2 and with this information, I just did a decode to make the 1 USD and the 2 Functional (Also a NA in case we have more than 2 duplicated rows).

This can be used in exactly the same way if you have a metadata table without the datastorage information and you want to create it. Then the first member you find (Trough our friend CONNECT BY PRIOR) will be the Prototype (Store or never Share or Dynamic Calc and Store) and the other would be Shared members.

Of coarse there’s way more ways to use this function, and we’ll see more of them with the other analytics functions that we’ll going to see here.

See you soon guys.

ORACLE SQL for EPM tips and tricks S01EP06!

Posted in Oracle, Tips and Tricks with tags , on May 6, 2019 by Rodrigo Radtke de Souza

Hi all,

Today’s post is about two cool Oracle analytics functions that are powerful and awesome, but few people use them, which is LEAD and LAG. LEAD function lets you query more than one row in a table at a time without having to join the table to itself. It returns values from the next row in the table. LAG does the same thing but returning the previous row. It may sound weird when you just read its descriptions, so let’s get some real examples.

Imagine that we have the following data:

1

I need to do a query that shows the percentage that DATA column increased over the periods in a single row. For example, in PERIOD 2 I need to show one row with the previous and current period values and how much it increased over the period. I see in a lot of places people just querying the same table twice, joining by its key columns (in this case ACCOUNT and PERIOD) and then doing the Percentage calculation. However, we don’t need to go over all this trouble, since it is very easy to accomplish the same result using LAG function as showed below:

SELECT

ACCOUNT,


PERIOD,


LAG (DATA,1) OVER (PARTITION BY ACCOUNT ORDER BY PERIOD) as PREVIOUS_DATA,


DATA as PERIOD_DATA,


ROUND(DATA/LAG (DATA,1) OVER (PARTITION BY ACCOUNT ORDER BY PERIOD) *100,2) PERCENTAGE


FROM T$_LEAD_LAG

2

LAG and LEAD syntax are basically the same:

LAG/LEAD ( expression [, offset [, default] ] )

OVER ( [ query_partition_clause ] order_by_clause )

In our example, I’m querying the table only once and I’m “LAGing” for 1 previous row, partitioned by ACCOUNT and ordering by PERIOD. So, for each distinct account value, Oracle will order the rows by period and we will access its values as a normal column. We may do this as many times as we want, for example if we want a two-month comparison:

SELECT

ACCOUNT,


PERIOD,


DATA as PERIOD_DATA,


LAG (DATA,1) OVER (PARTITION BY ACCOUNT ORDER BY PERIOD) as PREVIOUS_PERIOD_DATA,


LAG (DATA,2) OVER (PARTITION BY ACCOUNT ORDER BY PERIOD) as PREVIOUS_TWO_PERIODS_DATA,


ROUND((DATA - LAG (DATA,1) OVER (PARTITION BY ACCOUNT ORDER BY PERIOD) )/ LAG (DATA,1) OVER (PARTITION BY ACCOUNT ORDER BY PERIOD) *100,2) PERCENTAGE_PREVIOUS_PERIOD,


ROUND((DATA - LAG (DATA,2) OVER (PARTITION BY ACCOUNT ORDER BY PERIOD) )/ LAG (DATA,2) OVER (PARTITION BY ACCOUNT ORDER BY PERIOD) *100,2) PERCENTAGE_PREVIOUS_TWO_PERIODS


FROM T$_LEAD_LAG

3

Another example using LEAD can be used to check data accuracy between “linked” rows, often seen in tables that contains SCD (Slowly Changing Dimension) behavior. Let’s get the below example:

4

In a SCD table, we have the effective start and end date for each one of the records that belongs to the same key. These dates creates a “link” between the records, where one effective date starts where another effective date ends. The above picture is an example where all records looks good, having each effective date ending and starting correctly. Now see example below:

5

We can see that there is a one-day gap between 10/08 and 11/08, which may cause a lot of trouble if the application tries to see which record was effective right between those two days (it would return null).

In order to search for those kinds of gaps between the records, we may write a simple and elegant LEAD statement that will search for all records that has a “gap” between them. The statement would look like this:

WITH ALL_ AS (

SELECT RECORD_KEY


, TO_DATE(EFF_START_DATE,'mm/dd/yyyy hh24:mi:ss') EFF_START_DATE


, TO_DATE(EFF_END_DATE,'mm/dd/yyyy hh24:mi:ss') EFF_END_DATE


, CURRENT_FLAG


, LEAD (TO_DATE(EFF_START_DATE,'mm/dd/yyyy hh24:mi:ss'),1) OVER (PARTITION BY RECORD_KEY ORDER BY TO_DATE(EFF_START_DATE,'mm/dd/yyyy hh24:mi:ss')) AS NEXT_START


FROM T$_POST
)

SELECT RECORD_KEY


, TO_CHAR(EFF_START_DATE,'mm/dd/yyyy hh24:mi:ss') EFF_START_DATE


, TO_CHAR(EFF_END_DATE,'mm/dd/yyyy hh24:mi:ss') EFF_END_DATE


, TO_CHAR(NEXT_START,'mm/dd/yyyy hh24:mi:ss') NEXT_START


FROM ALL_ WHERE (NEXT_START - EFF_END_DATE)*24*60*60 > 0

6

The result will show which record has a “gap” between its effective end date and the next effective start date. In this case I had to create the SQL using a WITH clause, because we cannot use “window” functions directly into the where clause. If we try to do it, we will get an ORA-30483 error:

7

Thanks all! I hope you have liked it! See you soon!

ORACLE SQL for EPM tips and tricks S01EP04!

Posted in ACE, Connect By, Oracle, Oracle Database, Performance, REGEXP, SQL, Tips and Tricks with tags , , , , , , on April 9, 2019 by RZGiampaoli

Continuing the Oracle SQL for EPM series, today we’ll extend a little bit the “Connect by” post to see some neat thing we can do when we mix regular expression with connect by.

First of all, let’s start explaining the basic of this approach. We can use Connect By and Level to generate data in Oracle like this:

This is very useful when you need to generate 2 rows for each row we have in a query, then you can do a Connect by Level <=2 and use it as a query without join, duplicating everything in the source query. Then you can use a Decode or Case to say, Case when level = 1 then x when level =2 then y….

But for us this is just the basic concept that we need to understand for our trick. let’s get back to our metadata table, the one with parent child relationship and with all metadata from all apps and all dims that we have. Let’s say we need to create an hierarchical table based in a parent child table. Some people does one union for each level that we have in the source table (too much work), another will do filter all over the places, I don’t, I just use connect by with regular expression. Let’s take a look.

As we can see, we have 17 levels or generations for the account dimension. quite big. First of all, let’s do a connect by path to flat the entire hierarchy in one row and remove the parent child relationship.

Now, as we can see, if we just do a connect by prior with the sys_connect_by _path, the query will generate 1 row for each level of data. We don’t want that, unless you want to have the ability to input data in the upper level of the hierarchy, that is usually not the case. What we want is to have one the entire path for each leaf member, then we need to filter just the leafs in our queries. In my case my table already have a column that says if a member has children or not, but I’ll proceed as if I don’t have it.

If we don’t have the leaf information we can generate it by using Connect_By_IsLeaf that basically return 0 if is a parent or 1 if is a leaf. Again, we are filtering this outside the connect by prior and now we have just the leafs and the entire hierarchy above him. Next step is to split this back into rows.

REGEXP_SUBSTR is the perfect tool to split the string we have into new columns. As you can see, we can use the follow expression: REGEXP_SUBSTR(Column, ‘[^|]+’, 1, 1) Basically the regexp uses what is in [] to find what yo want and split it in the way you want. In our case I’m looking for the first | (that is defined by the 1, 1) and get the first word after it. Then I have another regexp and I changed the 1, 1 to 1, 2 to get the second word and keep doing this until I get to the 17 generation (as we saw in the beginning).

Basically if you have 17 generation you need to have 17 columns in the table, then you need to have 17 REGEXP, one for each column, you just need to change the 1, 1, 1, 2, 1, 3… until the 1, 17.

Here’s when ODI come in handy because if you use ODI you just need to do a select in the source to figure out how many generations you will have and then generate dynamically the 17 REGEXP and pass this to the target dynamically. It’s very easy since the code itself is always the same and the only thing that changes is the second parameter, and it’s a sequential parameter.

I hope this can be as useful for you as it’s for me and see you soon.

ORACLE SQL for EPM tips and tricks S01EP02!

Posted in ACE, Connect By, EPM, Oracle Database, Performance, Query, SQL, Tips and Tricks, WITH Clause with tags , , , , , , , on March 21, 2019 by RZGiampaoli

hey guys how are you? Let’s continue the SQL for EPM series. Today I’ll continue to talk about With with a small bonus of Connect by :). let’s start.

A lot of people uses Connect By in a daily bases but as far I having seeing, most of then don’t know how to use it properly. I already lost count with the amount of people complaining about performance issue with Connect By.

The thing is, Connect By works a little different than everything else in Oracle. We can say that Connect By has 2 stages and we’ll see why I’m saying that with this example. Let’s get back to our metadata table and let’s do a Connect By to extract the Balance Sheet Hierarchy from the Juno application:

As we can see, inside this table we have more than one application and more than one hierarchies for each application. That’s ok, we just need to filter it in our SQL right?

If we filter the APP_NAME and the HIER_NAME we’ll get all accounts for that Application and this will generate 12,622 rows. By the way, this table has all metadata from all our applications and we always filter by APP_NAME and HIER_NAME to select what we want (the table is also partitioned and sub-partitioned by these 2 columns). It’s important to know that without filtering anything this table has:

Ok, now, if we want to get just the BS hierarchy we just need to do the Connect By right?

That works… perfect… or not? Well in fact, this the wrong way to use Connect by because what I said before, the 2 stages.

As you can see, this query took 25 sec just to return the first 50 rows. In a integration this will take way more time, in fact, if you join this table to a data table to do a SUM in the BS level, this will take ages to return.

The reason is that for the Connect by, first Oracle does everything that is after the word Connect by and after the word Start with and then, and only then, it does what is in the where condition. That means, first he did the connect by in those 2.260.372 rows (and they are all repeated) and then after all the processing, it filtered what we wanted, that is the APP_NAME and the HIER_NAME. Then the right way to use it is:

Now it looks way better. 0.375 seconds to do exactly the same thing as before, and the only thing I did was to move our filters to the right place. Now Oracle is filtering and doing the Connect by at same time.

Now, if you do a SYS_CONNECT_BY_PATH and want to get just the leaf (to have the complete path that the hierarchy does, you can filter the leafs in the where clause (and need to be there otherwise it’ll not have the entire hierarchy during the connect by). This is how:

Now you see that the connect by filtered what needs to be filter during the Connect by execution and afterwards, it filtered just the leafs (using the CONNECT_BY_ISLEAF that returns if a member is a leaf or not).

Also, i used the CONNECT_BY_ROOT to generate the Root member used in this query (BS) and the SYS_CONNECT_BY_PATH to generate the entire path of the metadata (Very useful to transform parent/child tables in generation tables using this Technic and a regexp [we’ll see this in another post]).

Ok, now that the “Bonus” is written, let’s talk about the WITH that was the main subject here. Even with this Connect by write in the right way with the filters in the right place, we can still improve the performance using WITH.

That’s right, the idea is to prepare our subset of data using WITH before we ask Oracle to do the Connect by and leave it as simple as possible. Let’s take a look:

This is by far the best way to use a Connect by clause. You can, instead of using WITH use a sub-query but I think this way is easier and more organised as well. Also, I know the time difference doesn’t look to big between the previous example and this one but when you join this with data and start to SUM everything, you’ll see a huge difference between this method and the previous one.

Also, some times Oracle get lost with the previous method making everything slower but with the WITH method, it never happens then I advise you start to use this.

I hope you guys enjoy this little tip and see you next time.

BI Community Lvl up! Enter Analytics Community

Posted in ACE, Career, DEVEPM, EPM, ODTUG, Oracle with tags , , , on February 22, 2019 by RZGiampaoli

Hey guys how are you?

Today I have a very exiting news for all of us. ODTUG just announced the re-branding of the BI Community, to the Analytics Community.
The change is not in name alone—it’s an effort to create a more dynamic and inclusive community to better serve the needs the of ODTUG members. What does this change mean? It means we encompass more than just BI. It means if you are interested in any of the tools that are used to analyze data and turn it into information, we are here for you! Have a question? Want to share knowledge you gained from a project? Connect with other like-minded analysts who explore data in myriad ways, both on premises and in the cloud. Join the Analytics Community to share your passion for all things analytics, whether you are using OAC, OBIEE, ML/AI, R, Python, Essbase, DV, Big Data, Data Warehousing, or other platforms. This page is your portal to all things Analytics– a blog feed, a list of upcoming Analytics-related events, and a look at the Analytics content located in the Technical Database. Check out the most recent blogs from industry experts. Like what you read? Click on the title to visit the author’s blog. Would you like to see your blog featured here? Drop a note to Haleigh@odtug.com. Let’s explore analytics together!

Check the Twitter, LinkedIn and Facebook page to learn more, join us and help us to create a better Analytic Community.

BI Playlist Banner2.jpg

ORACLE SQL for EPM tips and tricks S01EP01!

Posted in DEVEPM, ETL, Oracle, Oracle Database, Performance, SQL, Tips and Tricks, Uncategorized, WITH Clause with tags , , , , , , on January 21, 2019 by RZGiampaoli

Hey guys how are you? I decide to start a new series called ORACLE SQL for EPM tips and tricks. The idea here is to show the most useful SQL commands for EPM, how to improve performance, tips, tricks and everything that can be useful from a SQL point of view!

And to start well, I’ll show something very old but very useful that I don’t see too many people using these days. “WITH” clause.

I love to use “WITH” in my code. It helps organize the code, helps to optimize it and more important, to make it more efficient.

When you use “WITH” Oracle treats your query inside it as an inline view or resolved as a temporary table, making it easier and faster for Oracle to access that data if you need it again.

Simply putting, every time you needs to right a query that uses the same table over and over, it’ll probably be way more efficient if you use “WITH”.

The “WITH”clause works a little bit different from a regular SQL. We can say that we split the query in 2, one is the “WITH” declaration (That will behave like a table) and the other is the SQL that will Query the “WITH”.

WITH name_of_temp_table_here AS
(
    YOUR QUERY HERE
),
   name_of_temp_table_here2 AS
(
   SELECT *
   FROM name_of_temp_table_here, another_table...
)
SELECT *
FROM name_of_temp_table_here, name_of_temp_table_here2 

In the “WITH” you can have any kind of query you want. You can do joins, group by, you can also have more than one “WITH”, you can use the result of one “WITH” in the next “WITH”, you can do a lot of things.

But for now, lets take a look in a more real example. Let’s say that you have a table like I do, that contains all metadata from all yours applications:

Let’s say you want to get the Parent of a attribute that is associated with your Entity dimension. You probably will be doing something like this:

In the “FROM” we call the table 2 times and we join and filter everything we need. Since we don’t have attribute association in all levels we do a “Left Join” to make sure all Entities comes in the query. If we run a Explain Plan now we’ll get something like this:

As you can see, Oracle is querying the METADATA_EXTRACT table twice and each time it’s doing a FULL in one Partition (‘ENTITY’ and ‘PHYSICAL_GEOGRAPHY’ partitions).

Now, if we change the query (and we can do it in different ways, this is just one of them) to a “WITH” clause we ‘ll have something like this:

As you can see, we achieved the same results with the code a little bit different. Now I have all my filters in the “WITH” query and in the bottom I just call the “WITH” query 2 times and do what needs to be done.

If we run a Explain Plain now we will have:

As you can see, Oracle now is querying the METADATA_EXTRACT table just once and then his queries the SYS.SYS TEMP table twice. The only problem with this query and the way I did is that since we are creating a temporary table filtering 2 partitions and then later I’m filtering again, it’s basically doing 2 FULL scan in the same TEMP table, and even so, it’s a few seconds faster then the original query.

But this is just an example on how we can reduce the amount of times that Oracle needs to query a table. WITH is not a miracle clause or anything like that, is just another tool that we can use to optimize our code, and its performance needs to be evaluated in a case-by-case basis.

And even if the performance doesn’t change, I believe using “WITH” clause makes any query easier to ready, to test, to update and to right since you can divide your huge query in small bits and then join
everything in the bottom query.

“WITH” is a huge subject and we’ll be talking more about it in the next post, and this time we’ll be improving performance as well using “WITH” with “CONNECT BY”.