Archive for Tips and Tricks

ODI Hidden Gems – Temporary Indexes

Posted in ODI, ODI 12c, Tips and Tricks with tags , , on September 3, 2019 by Rodrigo Radtke de Souza

Hi all!

Today’s gem is indeed a very hidden one. ODI is known (unfortunately) to be “not intuitive” most of the times and I think that’s because we have many options that are scared across too many objects in the UI, which leads you to keep clicking on objects until you find what you need. To make things a little bit harder, you have the logical and physical tabs and each one of them has its own representation of the objects, so some of the options will be in the logical tab, some others in the physical tab. Lets talk today about the “Temporary Indexes” option that exists in some objects in the Physical tab of a mapping.

Very often you will load data from places that may not have an “index” concept, like files for example. You may create a mapping that will load any number of big files, put them in a staging area and filter/join them together to do some ETL. You may reach situations where those joins are not performing well since you don’t have an “index” on the files. Here is where ODI may help you with some Temporary Indexes creation. Let’s see this example (it’s very basic, but you will get the idea):

1

You join both files and load to a target table. When we execute this mapping, this is what we get:

2

Basically, you are loading both files to C$ tables and then you are joining those C$ tables when loading to the target.

7

If you think that this join would benefit from an index creation, you may configure ODI to create temporary indexes on that join. To do that, you will need to go to physical tab, click the “join” object and go to “Properties” as below. After you know where the option is, it seems pretty easy and obvious, however if you are not familiar with ODI, you will need some practice to actually figure out that you need to go to physical tab and then click on the specific object to get some unique properties of that object:

3

Change Index Type to one of the possible values (Bitmap, Non-Unique, Unique) and run the mapping again. You will see the index creation after it loads the data to C$:

4

This technique also works for filters objects. If we add a filter in any of the file columns and if you think that it would be beneficial to have an index on it, the steps to create it  would be the same as before:

5When executing, we can see all the indexes creation there:

6

If you go to Oracle documentation, there is a note there regarding using the Temporary Index creation:

  • The creation of temporary indexes may be a time-consuming operation in the overall flow. Oracle recommends reviewing execution statistics and comparing the execution time saved by the indexes to the time spent creating them.

It’s a very valid point. Most of the time we don’t need to create any temporary index, but if you end up in one of those cases that you need to create them, now you know where to find this option.

Thanks!

 

Advertisement

Oracle SQL for epm tips and tricks S01EP07

Posted in ACE, DEVEPM, EPM, Oracle, RANK, SQL with tags , , , , , on August 15, 2019 by RZGiampaoli

Continuing our Oracle SQL for EPM series, today we’ll start to talk about analytic functions and how can we use them for more than “just” analytics.

To start with, let’s talk about RANK(). As the name suggest, RANK() is used to rank our data based in something. It’s very useful to find out each data is more relevant than others. Let’s see a example:

Here we have a small table with 2 currencies and a few products. Let’s first start with the basic function of RANK() and see each product generated more income:

The basic syntax is RANK() OVER (ORDER BY COLUMN). Basically what you are saying to oracle is, rank my data based by a column (or multiple columns). Since I just ordered by data, the values of the RANK() got duplicated everything oracle finds the same value. This is because we have 2 currencies and they are both USD.

To fix data we can do 2 things: Or we can include currency in side the order by or we can use another more advanced use of RANK() that is OVER PARTITION.

Let’s see how it works:

If I just add another column in the ORDER BY, it’ll basically create the Rank based in the order of these 2 columns. It’s the same as do a ORDER BY and then follow the order of the data that returns. Then in this case, you can see that the products PR235 for Functional Data got Rank 1 and for USD rank 11, even both having the same value. By the way, you also can see that the Ranks is ordering in the opposite order that we would like to have. This was intentional to show you how the Rank is produce. To fix that we just need to put a DESC in the ORDER BY clause, like we would do in a normal ORDER BY.

Ok then let’s see the more advanced way to write this query:

Instead of inserting new columns in the ORDER BY we can use PARTITION BY instead. The results here is the same, but this can be used in other ways as well and I would say that this would be the best way to used it since is more clear what you want to do.

The PARTITION BY does exactly what the name says, it partition the data by the content of one or more columns. In fact, the PARTITION BY clause can be used in most off the analytics functions like MAX, SUM, MIN, AVG…. then it’s very powerful and the best thing is that, if you use it, you don’t need to use a group by (we’ll see that in the future).

Now, as I said before, we can have other uses for RANK than just ranking data. Let’s say that you have this table without the CURRENCIES column:

Without the CURRENCIES column we end up with duplicate data in the table right? In this case we could do just a distinct and use the data as is, but let’s say you want to create the CURRENCIES column based in the data that we have, and the rule would be, the first data you find is USD and the second (if exists) would be Function. We can use Rank for that too:

Since here the data is the same for the same product, the only thing that could differentiate them was the ROWNUM (or ROWID, that would be better to make sure each one was the first one, but harder to see the example) I used it to create a Rank that shows each row has the lowest ROWNUM and that would have the Rank 1, the second one will be 2 and with this information, I just did a decode to make the 1 USD and the 2 Functional (Also a NA in case we have more than 2 duplicated rows).

This can be used in exactly the same way if you have a metadata table without the datastorage information and you want to create it. Then the first member you find (Trough our friend CONNECT BY PRIOR) will be the Prototype (Store or never Share or Dynamic Calc and Store) and the other would be Shared members.

Of coarse there’s way more ways to use this function, and we’ll see more of them with the other analytics functions that we’ll going to see here.

See you soon guys.

ODI Hidden Gems – Setting custom Warnings and Errors

Posted in ODI, Tips and Tricks with tags , on August 7, 2019 by Rodrigo Radtke de Souza

Hi all! ODI developers often create their own ODI procedures that contains any kind of specific logic/technology on it. Since it’s a custom code, we want to be sure that they are efficient, but also easy to read or show errors/warnings when something wrong happens. Let’s see an example of what can be done in ODI regarding this topic.

I’ve created a simple procedure and put it in a package. If I execute it and something goes wrong, the entire procedure gives an error and it is propagated to the main scenario:

1

Now imagine that this procedure is not that critical and that we could ignore the errors when it happens. We just need to open the ODI procedure and click the “Ignore Errors” option.

2

When we execute again, this is what happens:

3

My procedure step finished with a warning, which is good, but the main package finished with a success, which may be bad. If someone looks at the operator only at the package level, they may never look to see why that internal step has a warning, since the parent is “green”. If the warning is somehow important and we want to propagate it to the parent step, we will first need to understand what ODI checks to set a parent step to warning.

For ODI to cascade the warning to its parent, one of its child steps needs to have the “Nº of Errors” greater than zero. In our case, when the error was triggered and set to ignore, the “Nº of Errors” in the step was never change (it remained 0) as you can see below:

4

Fortunately, it’s very easy for us to manipulate all those record statistics numbers in ODI. You can use one of five methods below (one for each statistic):

  • setNbInsert()
  • setNbUpdate()
  • setNbDelete()
  • setNbErrors()
  • setNbRows()

You may create a task in a procedure with Jython technology and just add odiRef.setNbErrors(1) to set the number of errors of that step to 1. Just to get it easier to understand, lets remove the Ignore step option from our example and create a new one procedure, just to set the error number, as below:

5

6

When we ran our package now, we can see the following:

7

Our procedure thrown an error, we caught with a red arrow in the flow and then set a 1 to the number of errors. ODI understands that, when the error number is greater than 0, then it must set the parent icon as a Warning. Depending on the code/technology in your procedure, you may even include the setNbErrors inside of your own code, so you don’t need a separated procedure for that.

8

Thanks all! See you soon!

 

 

 

ODI Hidden Gems – Exception Handling – Timeout(s)

Posted in ODI, Tips and Tricks with tags , on July 26, 2019 by Rodrigo Radtke de Souza

Hi all,

Today’s hidden gem is the “Exception Handling – Timeout(s)” option which is located at the Load Plan steps:

1

There are certain situations where we may have a very strict load window, which we cannot go over a certain limit of time. If this situation happens, the data load should abort before something bad happens. Luckily, ODI Load Plans have a very easy mechanism to handle timeout situations and most people are not aware of it. In every load plan step, we may add a timeout value which is the maximum time (in seconds) that this step takes before it is aborted by the Load Plan. When a timeout is reached, the step is marked in error and the Exception step (if defined) is executed.

It seems simple, but it can be very powerful, since this setting may be applied to any parent step (even the root step). In this case, we may have a safeguard to avoid a potential long running/overlapping situation for the entire load plan. As for example, if you want to be sure that the entire load finishes within 8 hours, just add a timeout value to the root step (28800 seconds) and it will stop in case it reaches this value. In daily execution load plans, you may set it to 86400 seconds (24 hours), so it does not overlap with the next daily execution. In the following screen, I set a 10 seconds timeout, and this is the error that is triggered when it reaches the timeout setting value.

2

That’s if folks, see ya!

ODI Hidden Gems – Log Steps in the Journal

Posted in ODI, Tips and Tricks with tags , on July 11, 2019 by Rodrigo Radtke de Souza

Hi all,

Today we will talk about “Log Steps in the Journal” option. This one resides the “Advanced” tab of every step inside an ODI package, as you can see below. Every step in a package appears in the execution log while being executed, but we may define whether the step should be kept in the journal after its execution is finished or not. The available options are:

  • Never: the step is deleted from the journal.
  • Always: the step is always kept in the journal.
  • Errors: the step is kept in the journal only if it failed. Otherwise, it is deleted.

1

Although it seems a very simple option, it has some good PROS and CONS about using it. Let’s talk about the PROS first. Imagine that the above scenario was going to loop the same procs 10 times. You would end up with a log like this:

2

You may want to keep the log of all those procedures executions, so you know what they did (like how many rows did they insert/delete/update). However, all the steps related to the loop variable are kind of useless, since they are only used to control the “loop” over the steps. In this case, if you wish to keep a cleaner log, you may set both variable steps to “Never”, like below:

3

Now, when you execute this package, you will have a much cleaner log:

4

However, this option comes with a bad CONS: I already saw several situations where people were trying to debug an execution in Operator and they were not understanding how the values were being assigned or they were not sure how some stuff were being populated if they don’t see any step in Operator related to that. After some time lost wondering about it and then double checking the same package in development, they would realize that someone had put that step to never log (sometimes even by accident). So, anytime that you are trying to debug something in Operator, and it seems weird or missing pieces, please make sure to look on the development package as well, since some steps may be set to Never log.

That’s it folks. A quick post today. See ya!

ODI Hidden Gems – Degree of Parallelism for Target (DOP)

Posted in ODI, Tips and Tricks with tags , on July 5, 2019 by Rodrigo Radtke de Souza

Hi all,

If you read our posts, you know that we like to write “series” of them. We think it’s a good motivation for us to focus on some topic and keep writing about it. So, let me begin with a brand-new series called “ODI Hidden Gems”. We will be talking about those small configurations, check boxes and settings that most of the people just ignore them or don’t even know that they exist, but they can be of great value.

ODI is a great tool, it has a lot of options and anyone may survive without knowing all of them, however, there are some that may shine and gives you better data load performance, tool development usability and so on. Without further delay, let’s talk about Degree of Parallelism for Target (DOP).

1

This setting resides on ODI Data Server component within Topology together with Array Fetch Size and Batch Update Size and all the three are often misleading. First thing to notice is that each of those settings happens in either on SOURCE, TEMP (C$) or TARGET databases (and here is where the mislead happens).

  • Array Fetch Size: This setting is only used when the data server is used as a SOURCE. When reading large volumes of data from a data server, Oracle Data Integrator fetches successive batches of records. This value is the number of rows (records read) requested by Oracle Data Integrator on each communication with the data server.
  • Batch Update Size: This setting is only used when the data server is used as a TARGET. When writing large volumes of data into a data server, Oracle Data Integrator pushes successive batches of records. This value is the number of rows (records written) in a single Oracle Data Integrator INSERT command.
  • Degree of Parallelism for Target: Although the name suggests TARGET, this setting is only used on the TEMP (C$) part when the data server is used as a TARGET. Indicates the number of threads allowed for a loading task, in other words, in C$ population from the source database to the target database. Default value is 1. Maximum number of threads allowed is 99.

So, when you want to optimize all three parameters, you will probably change in two different data servers (source and C$/target) and not only in one data server, as most people try to do. Also, when we talk about ODI Data Server DOP, which is a number that represents the number of parallel threads, we are talking exclusively about the C$ piece of the integration, so it’s not related to SOURCE/TARGET at all.

Let me give you one example to make it clearer. If you are not aware from where I’m getting the following details, please notice that, every time you have a “Load Data” step from Server A to Server B, ODI creates a “Details” tab at the Operator Task Level with a lot of useful information. This is also another free Hidden Gem.

Source: SQL Server, TABLE_A has 2,261,393 rows
Target: Oracle, TABLE_B will be loaded from TABLE_A.
Topology: Target Data Server is set to DOP 1.

3

We can see that it took 75 seconds to load this data, but the Wait time on source connection was 27 seconds. It means that, ODI was able to get data fast from the source database, but it needed to wait for the target thread to be available, so it could send more data in. Also, target DOP is one, so only one thread worked to load this data to the C$ table.

Source: SQL Server, TABLE_A has 2,261,393 rows
Target: Oracle, TABLE_B will be loaded from TABLE_A.
Topology: Target Data Server is set to DOP 16.

5

Now we see some gain. The wait time is 0.610 seconds in the source and the target threads were able to load all of them to C$ table in 35 seconds, running 16 threads in parallel. You may even try to reduce this load times further by changing Array Fetch Size (in the source Data Server) and Batch Update Size (in the target Data Server), but those two settings I’ll leave to another Hidden Gem post.

See you later!

ORACLE SQL for EPM tips and tricks S01EP06!

Posted in Oracle, Tips and Tricks with tags , on May 6, 2019 by Rodrigo Radtke de Souza

Hi all,

Today’s post is about two cool Oracle analytics functions that are powerful and awesome, but few people use them, which is LEAD and LAG. LEAD function lets you query more than one row in a table at a time without having to join the table to itself. It returns values from the next row in the table. LAG does the same thing but returning the previous row. It may sound weird when you just read its descriptions, so let’s get some real examples.

Imagine that we have the following data:

1

I need to do a query that shows the percentage that DATA column increased over the periods in a single row. For example, in PERIOD 2 I need to show one row with the previous and current period values and how much it increased over the period. I see in a lot of places people just querying the same table twice, joining by its key columns (in this case ACCOUNT and PERIOD) and then doing the Percentage calculation. However, we don’t need to go over all this trouble, since it is very easy to accomplish the same result using LAG function as showed below:

SELECT

ACCOUNT,


PERIOD,


LAG (DATA,1) OVER (PARTITION BY ACCOUNT ORDER BY PERIOD) as PREVIOUS_DATA,


DATA as PERIOD_DATA,


ROUND(DATA/LAG (DATA,1) OVER (PARTITION BY ACCOUNT ORDER BY PERIOD) *100,2) PERCENTAGE


FROM T$_LEAD_LAG

2

LAG and LEAD syntax are basically the same:

LAG/LEAD ( expression [, offset [, default] ] )

OVER ( [ query_partition_clause ] order_by_clause )

In our example, I’m querying the table only once and I’m “LAGing” for 1 previous row, partitioned by ACCOUNT and ordering by PERIOD. So, for each distinct account value, Oracle will order the rows by period and we will access its values as a normal column. We may do this as many times as we want, for example if we want a two-month comparison:

SELECT

ACCOUNT,


PERIOD,


DATA as PERIOD_DATA,


LAG (DATA,1) OVER (PARTITION BY ACCOUNT ORDER BY PERIOD) as PREVIOUS_PERIOD_DATA,


LAG (DATA,2) OVER (PARTITION BY ACCOUNT ORDER BY PERIOD) as PREVIOUS_TWO_PERIODS_DATA,


ROUND((DATA - LAG (DATA,1) OVER (PARTITION BY ACCOUNT ORDER BY PERIOD) )/ LAG (DATA,1) OVER (PARTITION BY ACCOUNT ORDER BY PERIOD) *100,2) PERCENTAGE_PREVIOUS_PERIOD,


ROUND((DATA - LAG (DATA,2) OVER (PARTITION BY ACCOUNT ORDER BY PERIOD) )/ LAG (DATA,2) OVER (PARTITION BY ACCOUNT ORDER BY PERIOD) *100,2) PERCENTAGE_PREVIOUS_TWO_PERIODS


FROM T$_LEAD_LAG

3

Another example using LEAD can be used to check data accuracy between “linked” rows, often seen in tables that contains SCD (Slowly Changing Dimension) behavior. Let’s get the below example:

4

In a SCD table, we have the effective start and end date for each one of the records that belongs to the same key. These dates creates a “link” between the records, where one effective date starts where another effective date ends. The above picture is an example where all records looks good, having each effective date ending and starting correctly. Now see example below:

5

We can see that there is a one-day gap between 10/08 and 11/08, which may cause a lot of trouble if the application tries to see which record was effective right between those two days (it would return null).

In order to search for those kinds of gaps between the records, we may write a simple and elegant LEAD statement that will search for all records that has a “gap” between them. The statement would look like this:

WITH ALL_ AS (

SELECT RECORD_KEY


, TO_DATE(EFF_START_DATE,'mm/dd/yyyy hh24:mi:ss') EFF_START_DATE


, TO_DATE(EFF_END_DATE,'mm/dd/yyyy hh24:mi:ss') EFF_END_DATE


, CURRENT_FLAG


, LEAD (TO_DATE(EFF_START_DATE,'mm/dd/yyyy hh24:mi:ss'),1) OVER (PARTITION BY RECORD_KEY ORDER BY TO_DATE(EFF_START_DATE,'mm/dd/yyyy hh24:mi:ss')) AS NEXT_START


FROM T$_POST
)

SELECT RECORD_KEY


, TO_CHAR(EFF_START_DATE,'mm/dd/yyyy hh24:mi:ss') EFF_START_DATE


, TO_CHAR(EFF_END_DATE,'mm/dd/yyyy hh24:mi:ss') EFF_END_DATE


, TO_CHAR(NEXT_START,'mm/dd/yyyy hh24:mi:ss') NEXT_START


FROM ALL_ WHERE (NEXT_START - EFF_END_DATE)*24*60*60 > 0

6

The result will show which record has a “gap” between its effective end date and the next effective start date. In this case I had to create the SQL using a WITH clause, because we cannot use “window” functions directly into the where clause. If we try to do it, we will get an ORA-30483 error:

7

Thanks all! I hope you have liked it! See you soon!

ORACLE SQL for EPM tips and tricks S01EP05!

Posted in ACE, Connect By, DEVEPM, Dimensions, EPM, ETL, Oracle, Oracle 11.2.0, Oracle 11.2.0.4, Oracle Database, Query, REGEXP, SQL, Tips and Tricks with tags , , , , on May 3, 2019 by RZGiampaoli

Continuing the Oracle SQL for EPM series today we’ll see another usage of Connect by. I’m talking a lot about connect by because is a very useful function and we should use it a lot in the analytic space.

Let’s imagine that we are working with a planning application and we want to figure out in a query what is the region of each cost center and to what top level that cost centers belong. In this app we have the Entity dimension with Support Geography as a attribute.

The support geography has the region on the generation 2 of the hierarchy. How can we do that. Well, connect by to the help.

First of all we need to get the entire physical geography hierarchy. To do that we’ll do a query like this:

This is a simple query that joins the HSP_ATTRIBUTE_MEMBER table (this table contains all the attributes from all attribute dimensions) that’s why we have that sub-query to select just the Support Geography members (1).

Then we join this table with the HSP_OBJECT that contains all metadata from everything in Planning: Forms, Dimensions, Flows, everything. We do that to get the name of the members. And finally we do a connect by to rebuild the hierarchy (2).

With the hierarchy re-built we can use the Connect by root to figure out to witch top level each member belongs. (3)

With this query we know what’s the leaf member of the Support Geography dimensions (ID) and to what Region that member belong. Now we need to do the same for the Entity dimension.

With this query we are filtering the OBJECT_TYPE=33, that means Entity (1) (If you want to know what are the possible object types you can query the HSP_OBJECT_TYPE table to check that out), and then we do a Connect By to re-built the hierarchy and then use some Connect by Root to get the parents as well the Sys connect By Path to get the Path of the hierarchy.

Now we just need to join everything under the same query to get all the Cost Centers and to witch top level each one belongs and to witch region they are part of as well.

Another thing that I like a lot about the WITH clause is that is very easy to create nested queries. In this case I created 2 different queries, each one with a WITH and a select based in that WITH. Now everything I need to do is put everything under the same with by:

We can see that now I just got the Select that was under the WITH clause and just created a nested WITH by just creating the step 1 and 2 and now I have instead of 2 WITH queries I have just 1 WITH with 4 Clauses under it being 2 of then a nested one based in the previous one (Connect by from the filtered query).

All we need to do is putting everything together by joining the ENT_HIER and the SG_HIER using the HSP_MEMBER_TO_ATTRIBUTE table. This table basically is the map between the ENTITY dimension and the Attribute dimensions, in this case Support Geography.

The results is the Cost Center, the Path of that member in the hierarchy and with this we can use the REGEXP to extract any level of the hierarchy, and finally the region that cost center belong.

You may thing why we need to do that. well, this is a query I used to join with the HSP_GROUP table to get the groups and the members from each region and then create the security dynamically for each user. That means, I have a Planning application with Entities that has cost centers from different regions and I’m using the Attribute dimension to generate the security.

If a User has access only to AMER data, it’ll only see the AMER cost centers. This is just one example of what we can achieve using Connect by.

I hope you guys enjoy. Next time we’ll talk about another very very useful function that I really love it.

See you soon guys.

ODI “Command on Source” buffered behavior

Posted in ODI, Tips and Tricks with tags , on March 28, 2019 by Rodrigo Radtke de Souza

Hi all! This post was created based on a friend’s question to me a couple of days ago. He asked me the following:

  • I know that we may create a procedure with a SQL in “Command on Source” that would return N rows and trigger an OS command in the “Command on Target” tab for each of those rows, passing the results as a parameter. My process takes a while for each row, so I was wondering if I could insert more rows in the table that is being read on “Command on Source” while it is still executing, so it would pick the new rows as well in the same execution?

In other words, he wanted to trigger the ODI procedure once and keeps “feeding” the “Command on Source” table many times, so all his OS Commands would get executed in one procedure run. Instinctively I said no, because ODI needs to somehow “buffer” the “Command on Source” results (which may be a result of a SQL statement with different tables) and then start to run the “Command on Target” commands. He agreed with me and moved on. However, that keep in my head: what would happen if the source table somehow changes while the ODI procedure is running? What if more rows were inserted or if the table was truncated/deleted? I did some tests to make sure I gave him the right answer.

I created a procedure that contains only an ODI sleep command in the Target that will be executed for each row that comes from the source, like this:

1

Then in the Source I added the following:

2

I populated this table with 20 records and I executed the proc. As expected, it took 20 seconds to complete:

3

Then I did the following test: I executed the proc and right away I inserted 20 more rows in the target. As I thought, the procedure ended again in 20 seconds, not 40, which means that ODI really buffers the results before executing the Command on Target:

4

But this is a small number and maybe ODI can buffer all of it right away and maybe that’s why it worked. I looked in the Topology and the Array Fetch Size for this connection was set to 250:

5

I did another test with 1000 rows and decreasing the wait time from 1000 to 100 to see how it goes. Both executions (with 1000 rows and then with adding more 1000 rows in between) ended in 104 seconds (the four extra seconds may be a delay due to network, ODI usage and so on):

6

So, I changed my approach and tried something different. What if I add many rows (100,000), change the delay to ‘1’ and in the meanwhile I truncate the table? The result kind of surprised me:

7

Truncate is considered a DDL command, so that’s why the error is saying that the object does not exists anymore, although it’s still there in the DB (it does not exist in that session, as it was modified by a DDL command in another session). This test was not exactly what I wanted to test and yet it does surprises me because I was expecting that ODI would have already buffered all the results (the error happened after 51 seconds only) and it would not be sensitive to certain DDL commands. However, this may indicate that ODI does not buffer it all at once and when it tried to read the table again, it was already truncated. So, let’s do another test.

Next test I doubled the row amount and inserted 200,000 and kept the delay to ‘1’ and ran the procedure. While it was running, I tried to delete the rows instead of truncating the table. The delete ran fine, it took 18 seconds to delete (less than 51 seconds from truncate) and no errors happened this time. So, it seems that ODI buffered all the results (which were doubled) before the 18 seconds.

8

After the deleting and committing, the proc continued to run and finished around 200 seconds, as expected.

9

I did one more test to see if the time that it took to fail in the truncate test would increase if I increase the number of rows. I inserted again 200K rows (again, double amount from the previous truncate test), ran the process and truncated the table. It took the same 51 seconds. So, I believe that, although ODI can buffer all the results before 51 seconds, Oracle somehow tells the process that the table was changed by a DDL command and sends a “stop” signal to the connection from a specific amount of time. I don’t see any other explanation for this behavior.

I could run some other tests, especially to see how large is the ODI buffer size, but as for now I’m ok with the results. In resume we figure out that:

  • ODI does buffer the results in the “Command on Source/Target” and we cannot modify them once it starts;
  • Although the results are buffered, the ODI procedure may fail if DDL commands are issued to the source tables;
  • DML commands does not seem to affect the buffered results, as expected;

If any of you has done some tests like this, please share with us! This kind of things are never documented, so we need to keep testing to see how they work behind the scenes.

Thanks! See you soon!

ORACLE SQL for EPM tips and tricks S01EP03!

Posted in Query, SQL, Tips and Tricks with tags , , on March 26, 2019 by Rodrigo Radtke de Souza

Hi all! Continuing the Oracle SQL for EPM series, today’s post is quite simple, but it may consume an extreme amount of time when we are requested to troubleshoot “why these numbers does not match” type of scenarios. Its related to UNION and UNION ALL operations. Let me describe what happened to me in one of those situations.

The client had a table with several columns that would calculate some metrics related to their

business. It was a “cumulative” type of table, where metrics were being aggregated by each previous period’s numbers. In a very resumed way, lets use the following example:

1

So, for Feb-19, the SUM would be 150 for Account 1 and 60 for Account 2. Next month, he would get the following:

2

His logic was summing the March period in Account 1 correctly (30) and summing it to previous 150. However, since Account 2 was not coming in March, his SQL was not reporting Account 2 in March. To make the calculations easier, he decided to add a “dummy” metric for all existing Accounts as 0, so his logic would calculate it correctly even it the record did not exist for that period. Something like that:

3

The process would still give his correct value of 30 in Account 1 for March and 0 for Account 2, which would then sum against the previous periods. It all worked fine, until someday someone complained that the numbers could not be right and some numbers were missing. When I checked the code, I quickly realized his mistake: he created his “dummy” metrics using a UNION in Oracle against his periodic metric and his “dummy” metric. But why it was giving the wrong numbers? Oracle explains:

  • UNION combines the results of two queries, which eliminates duplicate selected rows. The UNION operator returns only distinct rows that appear in either result.

Let’s picture the problem. His logic worked fine for Feb and Mar, but in Apr, something like this happened:

4

If you sum Apr period for Account 1, the number should 80, but he was getting only 60 as below:

5

This is due to UNION’s behavior: It will run an implicit distinct in the combined dataset, which in this case is eliminating good data. I went ahead and changed the UNION to UNION ALL, which Oracle states:

  • The UNION operator returns only distinct rows that appear in either result, while the UNION ALL operator returns all rows. The UNION ALL operator does not eliminate duplicate selected rows.

The result is the following:

6

Now it looks correct: 80 for Account 1 and 0 for Account 2.

That’s it folks! Simple things that may give us enormous headaches and wrong numbers, so please always check out when you see an UNION in the queries! It may be implicitly omitting some good data there.

See ya!