Hi all! This post was created based on a friend’s question to me a couple of days ago. He asked me the following:
- I know that we may create a procedure with a SQL in “Command on Source” that would return N rows and trigger an OS command in the “Command on Target” tab for each of those rows, passing the results as a parameter. My process takes a while for each row, so I was wondering if I could insert more rows in the table that is being read on “Command on Source” while it is still executing, so it would pick the new rows as well in the same execution?
In other words, he wanted to trigger the ODI procedure once and keeps “feeding” the “Command on Source” table many times, so all his OS Commands would get executed in one procedure run. Instinctively I said no, because ODI needs to somehow “buffer” the “Command on Source” results (which may be a result of a SQL statement with different tables) and then start to run the “Command on Target” commands. He agreed with me and moved on. However, that keep in my head: what would happen if the source table somehow changes while the ODI procedure is running? What if more rows were inserted or if the table was truncated/deleted? I did some tests to make sure I gave him the right answer.
I created a procedure that contains only an ODI sleep command in the Target that will be executed for each row that comes from the source, like this:
Then in the Source I added the following:
I populated this table with 20 records and I executed the proc. As expected, it took 20 seconds to complete:
Then I did the following test: I executed the proc and right away I inserted 20 more rows in the target. As I thought, the procedure ended again in 20 seconds, not 40, which means that ODI really buffers the results before executing the Command on Target:
But this is a small number and maybe ODI can buffer all of it right away and maybe that’s why it worked. I looked in the Topology and the Array Fetch Size for this connection was set to 250:
I did another test with 1000 rows and decreasing the wait time from 1000 to 100 to see how it goes. Both executions (with 1000 rows and then with adding more 1000 rows in between) ended in 104 seconds (the four extra seconds may be a delay due to network, ODI usage and so on):
So, I changed my approach and tried something different. What if I add many rows (100,000), change the delay to ‘1’ and in the meanwhile I truncate the table? The result kind of surprised me:
Truncate is considered a DDL command, so that’s why the error is saying that the object does not exists anymore, although it’s still there in the DB (it does not exist in that session, as it was modified by a DDL command in another session). This test was not exactly what I wanted to test and yet it does surprises me because I was expecting that ODI would have already buffered all the results (the error happened after 51 seconds only) and it would not be sensitive to certain DDL commands. However, this may indicate that ODI does not buffer it all at once and when it tried to read the table again, it was already truncated. So, let’s do another test.
Next test I doubled the row amount and inserted 200,000 and kept the delay to ‘1’ and ran the procedure. While it was running, I tried to delete the rows instead of truncating the table. The delete ran fine, it took 18 seconds to delete (less than 51 seconds from truncate) and no errors happened this time. So, it seems that ODI buffered all the results (which were doubled) before the 18 seconds.
After the deleting and committing, the proc continued to run and finished around 200 seconds, as expected.
I did one more test to see if the time that it took to fail in the truncate test would increase if I increase the number of rows. I inserted again 200K rows (again, double amount from the previous truncate test), ran the process and truncated the table. It took the same 51 seconds. So, I believe that, although ODI can buffer all the results before 51 seconds, Oracle somehow tells the process that the table was changed by a DDL command and sends a “stop” signal to the connection from a specific amount of time. I don’t see any other explanation for this behavior.
I could run some other tests, especially to see how large is the ODI buffer size, but as for now I’m ok with the results. In resume we figure out that:
- ODI does buffer the results in the “Command on Source/Target” and we cannot modify them once it starts;
- Although the results are buffered, the ODI procedure may fail if DDL commands are issued to the source tables;
- DML commands does not seem to affect the buffered results, as expected;
If any of you has done some tests like this, please share with us! This kind of things are never documented, so we need to keep testing to see how they work behind the scenes.
Thanks! See you soon!