Fragmented and Aggregated tables in OBIEE using ODI Part 1/5: Creating the tables
Hey guys, how are you?
I normally don’t talk about OBIEE, despite the fact I work a lot with it, because there’s a lot of people talking about it. But recently I worked in a project and I decide to use fragmented tables and aggregated tables at same time and I found that we don’t have much specific information regarding this matter, and of course, zero information about using ODI to enable this project.
Then since this will be kind of big, I decide to split this in 5 parts:
Creating the tables: I’ll explain the advantages and how you can create the fragmented tables and also how to further improve performance using partitioning and sub partitioning.
To start the series Fragmented and Aggregated tables in OBIEE, today we are talking about how to Creating the tables for our project.
In this project I had data from 4 different source system, and each source system had different amount of data. Dell data had 20 million rows per month, EMC data 30 million, DTC data 5 million and STAT data another 5 million.
The business wanted to have 15 quarters of data stored, then if we do a simple calculation we’ll going to have: 20.000.000 + 30.000.000 + 5.000.000 + 5.000.000 * 45 = 2,700,000,000 rows.
That’s a lot of data even for a partitioned table. Then I decide to split this tables in some way.
Analyzing the data, I saw that this tables contain 3 different account hierarchies and the users will only analyze one per time: BS, Income and PL2. I decide to split the tables by it. Then just with that now I had 3 fragmented tables. JAT_JE_DETAIL_BS, JAT_JE_DETAIL_INCOME and JAT_JE_DETAIL_PL2.
That’s nice, just with that I split that 2.7 trillion rows in 3. The problem now is that I have 4 different source system right, and even if I partition ate the table by source system, EMC is as big as the other 3 sources put together, and that would make that source slower than the other to retrieve.
With that I decided to split the tables by 2 again, one for EMC and another for the other sources. That makes the data to be distribute evenly. 20.000.000 + 5.000.000 + 5.000.000 in the D table and 30.000.000 in the E table (D and E is how I called them).
With that now we have 6 partitioned tables, right? 3 for EMC data (one for each Account Hierarchy) and another 3 tables for Dell, DTC and STAT data.
Now that we have a nice and even data distribution, we need to decide how we’ll going to create the partition and sub partition for these tables.
The partitions I always like to use the PERIOD_ID because it makes sense in most of cases (unless we are creating a Forecast app, in this case the users will want to see by scenario and not by one period).
Then for these tables JAT_JE_DETAIL_D_BS_FACT, JAT_JE_DETAIL_D_INCOME_FACT, JAT_JE_DETAIL_D_PL2_FACT we are going to partition by period. But what kind of partition we will going to use? Well, for this case, this tables contains Journal information at day level and the users want to drill from the month to the Journal rows of that mouth. What that means is that we’ll always going to work inside a month, then the best way to go is using a RANGE partition.
Range partitions, how the name already says, uses a Range of values to create the partition area. In our case it’ll be one month. The range can be built by a DATE column or a Number column, and the second one is our case since is easier to manipulate the periods using a number than a data.
(The period for this project is not a regular month, for example, the month of February FY20 start in Jan/30 and goes until Feb/2).
For the sub partitions is an easy call, we’ll going to use the Source System since we have 3 different one in this tables right (Dell, DTC and STAT), and the partition type will be LIST since we have a simple list of values. Also, for sub partitions is always easier to have values that don’t increase because the complexity of adding new sub partition values to a table.
What I mean is that to add a new partition you just need to use one ALTER TABLE ADD PARTITION statement and add any number of partitions you need. But for a sub partition is different. Every time you create a sub partition Oracle create a Sub Partition template. That template is used every time you create a Partition.
In our case for example, our Sub Partition Template would be DELL, DTC and STAT. What that means is that when you create another partition, this partition will be created with these 3 sub partitions automatically. Then if you want to add another value to a sub partition you need to do 2 things instead of just one (in partition case).
You need to change the sub partition template definition to add the new sub partition and you need add your new Sub partition in every single already existing Partitions (is not update automatically). After that all new partitions will contains all your sub partitions.
If you don’t update the template, every time you create a partition, you’ll not have the new value you want.
Enough of theory, let’s proceed with the other set of tables, JAT_JE_DETAIL_E_BS_FACT, JAT_JE_DETAIL_E_INCOME_FACT, JAT_JE_DETAIL_E_PL2_FACT. For these tables we’ll continue to have the partitions as a RANGE of the PERIOD_ID but the sub partition doesn’t make sense for us to use source system because we have only one source system.
And now is the time to explain another thing about partitions. For Oracle to use the partition you need to filter the column that is used to create the partition. For example, if we are creating a partition using the column PERIOD_ID, we must filter the data by PERIOD_ID and the ideal would be always have a filter in the Partition and Sub Partition column, because if you don’t, it’ll do a full table scan anyway. Then for a data retrieval point of view, the choice for the partitions are always made based in the Filter requirement, that means, we need to choose columns that we can force the user to choose in OBIEE. For an ETL point of view, the best would be to have partitions and sub partitions in columns that we can use to truncate before load.
For example, in our first set of tables I can easily and fast truncate a period and a source system before I load it back to the tables. That’s the beauty of partitions, I can drop, truncate and create on demand.
That said, you can imagine that some times what is good for the report is not good for the DW, but this is not our case today.
With this new piece of information in mind, the users are force in the OBIEE prompts to choose a Gen 2 Account, then this will be our sub partition for our second set of tables also with the partition type as LIST.
Before we proceed, let’s just see how everything is set then:
Table Name | Partition | Sub Partition |
JAT_JE_DETAIL_D_BS_FACT | PERIOD_ID (Range) | SOURCE_SYSTEM_ID (List) |
JAT_JE_DETAIL_D_INCOME_FACT | PERIOD_ID (Range) | SOURCE_SYSTEM_ID (List) |
JAT_JE_DETAIL_D_PL2_FACT | PERIOD_ID (Range) | SOURCE_SYSTEM_ID (List) |
JAT_JE_DETAIL_E_BS_FACT | PERIOD_ID (Range) | ACCOUNT_GEN2_ID (List) |
JAT_JE_DETAIL_E_INCOME_FACT | PERIOD_ID (Range) | ACCOUNT_GEN2_ID (List) |
JAT_JE_DETAIL_E_PL2_FACT | PERIOD_ID (Range) | ACCOUNT_GEN2_ID (List) |
Perfect, now the only thing that is missing here is the aggregate tables.
Aggregated table are tables that uses a dimension Drill to pre aggregate the vales that it contains. It’s an easy way to improve performance during a drill operation.
The most common aggregated table is for the Period dimension, and this is the one we’ll going to do here as well. Then in our case, as mentioned before, the 6 tables we have now contains day level data right. I want to have the data aggregated by Month and by Quarter.
What that means is that we’ll need to have one more table for each one of these 6 tables for Monthly level data and one more table for each one of these 6 tables for Quarterly level data. These tables will going to have the same Partitions and Sub Partitions definitions with only one difference, now instead of using a Range Partition, we’ll going to use a List Partition because we don’t have a range anymore, we’ll going to aggregate that range of days in one month.
Then let’s see how the tables needs to be set:
Table Name | Partition | Sub Partition |
JAT_JE_DETAIL_D_BS_FACT | PERIOD_ID (Range) | SOURCE_SYSTEM_ID (List) |
JAT_JE_DETAIL_D_INCOME_FACT | PERIOD_ID (Range) | SOURCE_SYSTEM_ID (List) |
JAT_JE_DETAIL_D_PL2_FACT | PERIOD_ID (Range) | SOURCE_SYSTEM_ID (List) |
JAT_JE_DETAIL_D_M_BS_FACT | PERIOD_ID (List) | SOURCE_SYSTEM_ID (List) |
JAT_JE_DETAIL_D_M_INCOME_FACT | PERIOD_ID (List) | SOURCE_SYSTEM_ID (List) |
JAT_JE_DETAIL_D_M_PL2_FACT | PERIOD_ID (List) | SOURCE_SYSTEM_ID (List) |
JAT_JE_DETAIL_D_Q_BS_FACT | PERIOD_ID (List) | SOURCE_SYSTEM_ID (List) |
JAT_JE_DETAIL_D_Q_INCOME_FACT | PERIOD_ID (List) | SOURCE_SYSTEM_ID (List) |
JAT_JE_DETAIL_D_Q_PL2_FACT | PERIOD_ID (List) | SOURCE_SYSTEM_ID (List) |
JAT_JE_DETAIL_E_BS_FACT | PERIOD_ID (Range) | ACCOUNT_GEN2_ID (List) |
JAT_JE_DETAIL_E_INCOME_FACT | PERIOD_ID (Range) | ACCOUNT_GEN2_ID (List) |
JAT_JE_DETAIL_E_PL2_FACT | PERIOD_ID (Range) | ACCOUNT_GEN2_ID (List) |
JAT_JE_DETAIL_E_M_BS_FACT | PERIOD_ID (List) | ACCOUNT_GEN2_ID (List) |
JAT_JE_DETAIL_E_M_INCOME_FACT | PERIOD_ID (List) | ACCOUNT_GEN2_ID (List) |
JAT_JE_DETAIL_E_M_PL2_FACT | PERIOD_ID (List) | ACCOUNT_GEN2_ID (List) |
JAT_JE_DETAIL_E_Q_BS_FACT | PERIOD_ID (List) | ACCOUNT_GEN2_ID (List) |
JAT_JE_DETAIL_E_Q_INCOME_FACT | PERIOD_ID (List) | ACCOUNT_GEN2_ID (List) |
JAT_JE_DETAIL_E_Q_PL2_FACT | PERIOD_ID (List) | ACCOUNT_GEN2_ID (List) |
Great, now we have 18 tables, 6 fragmented tables for daily level data, 6 fragmented tables for monthly aggregated level data (M) and 6 fragmented tables for quarterly aggregated level data (Q).
The last thing about tables is that we need a matching Period dimension (for this case) to match the data in each aggregated level. What that means is that, in this case, since we have 3 different level of aggregations, we need to have 3 Period dimensions.
OBIEE don’t allow us to join the same table using columns that are not an ID column. Then what we need is one PERIOD dimension (DIM_PERIOD) that contains the drill from Year to day (Period ID will be YYYYQMMDD number), another PERIOD Month Dimension (DIM_PERIOD_MONTH) that contains Drill from Year to Month (Period ID will be YYYYQMM number) and one last PERIOD Quarter table (DIM_PEIOD_QUARTER) that contains Drill from Year to Quarter (Period ID will be YYYYQ number).
Of course, the Aggregated FACT tables also will going to have a matching ID (and we need that to aggregate the data right).
Last thing to mentioning is that the Detail Fact table (the one in the Day level) it contains a lot of Journal details like Journal number, Description, Header ID and others. For the detail this is ok, but when we design the aggregated tables, only ID are allowed there for maximum aggregation. If the table contains Descriptions, nothing will be going to be aggregated right?
Don’t worry that OBIEE can handle this kind of situation very well.
That’s for now. I hope you guys enjoy it and found helpful.
See you soon.
February 10, 2020 at 8:41 am
[…] For a better EPM world « Fragmented and Aggregated tables in OBIEE using ODI Part 1/5: Creating the tables […]
February 11, 2020 at 7:43 am
[…] Creating the tables: I’ll explain the advantages and how you can create the fragmented tables and … […]
February 12, 2020 at 11:57 am
[…] Creating the tables: I’ll explain the advantages and how you can create the fragmented tables and … […]
February 13, 2020 at 6:47 am
[…] Creating the tables: I’ll explain the advantages and how you can create the fragmented tables and … […]