Archive for the ODI Category

ODI Hidden Gems – Log Steps in the Journal

Posted in ODI, Tips and Tricks with tags , on July 11, 2019 by radk00

Hi all,

Today we will talk about “Log Steps in the Journal” option. This one resides the “Advanced” tab of every step inside an ODI package, as you can see below. Every step in a package appears in the execution log while being executed, but we may define whether the step should be kept in the journal after its execution is finished or not. The available options are:

  • Never: the step is deleted from the journal.
  • Always: the step is always kept in the journal.
  • Errors: the step is kept in the journal only if it failed. Otherwise, it is deleted.

1

Although it seems a very simple option, it has some good PROS and CONS about using it. Let’s talk about the PROS first. Imagine that the above scenario was going to loop the same procs 10 times. You would end up with a log like this:

2

You may want to keep the log of all those procedures executions, so you know what they did (like how many rows did they insert/delete/update). However, all the steps related to the loop variable are kind of useless, since they are only used to control the “loop” over the steps. In this case, if you wish to keep a cleaner log, you may set both variable steps to “Never”, like below:

3

Now, when you execute this package, you will have a much cleaner log:

4

However, this option comes with a bad CONS: I already saw several situations where people were trying to debug an execution in Operator and they were not understanding how the values were being assigned or they were not sure how some stuff were being populated if they don’t see any step in Operator related to that. After some time lost wondering about it and then double checking the same package in development, they would realize that someone had put that step to never log (sometimes even by accident). So, anytime that you are trying to debug something in Operator, and it seems weird or missing pieces, please make sure to look on the development package as well, since some steps may be set to Never log.

That’s it folks. A quick post today. See ya!

Advertisements

ODI Hidden Gems – Degree of Parallelism for Target (DOP)

Posted in ODI, Tips and Tricks with tags , on July 5, 2019 by radk00

Hi all,

If you read our posts, you know that we like to write “series” of them. We think it’s a good motivation for us to focus on some topic and keep writing about it. So, let me begin with a brand-new series called “ODI Hidden Gems”. We will be talking about those small configurations, check boxes and settings that most of the people just ignore them or don’t even know that they exist, but they can be of great value.

ODI is a great tool, it has a lot of options and anyone may survive without knowing all of them, however, there are some that may shine and gives you better data load performance, tool development usability and so on. Without further delay, let’s talk about Degree of Parallelism for Target (DOP).

1

This setting resides on ODI Data Server component within Topology together with Array Fetch Size and Batch Update Size and all the three are often misleading. First thing to notice is that each of those settings happens in either on SOURCE, TEMP (C$) or TARGET databases (and here is where the mislead happens).

  • Array Fetch Size: This setting is only used when the data server is used as a SOURCE. When reading large volumes of data from a data server, Oracle Data Integrator fetches successive batches of records. This value is the number of rows (records read) requested by Oracle Data Integrator on each communication with the data server.
  • Batch Update Size: This setting is only used when the data server is used as a TARGET. When writing large volumes of data into a data server, Oracle Data Integrator pushes successive batches of records. This value is the number of rows (records written) in a single Oracle Data Integrator INSERT command.
  • Degree of Parallelism for Target: Although the name suggests TARGET, this setting is only used on the TEMP (C$) part when the data server is used as a TARGET. Indicates the number of threads allowed for a loading task, in other words, in C$ population from the source database to the target database. Default value is 1. Maximum number of threads allowed is 99.

So, when you want to optimize all three parameters, you will probably change in two different data servers (source and C$/target) and not only in one data server, as most people try to do. Also, when we talk about ODI Data Server DOP, which is a number that represents the number of parallel threads, we are talking exclusively about the C$ piece of the integration, so it’s not related to SOURCE/TARGET at all.

Let me give you one example to make it clearer. If you are not aware from where I’m getting the following details, please notice that, every time you have a “Load Data” step from Server A to Server B, ODI creates a “Details” tab at the Operator Task Level with a lot of useful information. This is also another free Hidden Gem.

Source: SQL Server, TABLE_A has 2,261,393 rows
Target: Oracle, TABLE_B will be loaded from TABLE_A.
Topology: Target Data Server is set to DOP 1.

3

We can see that it took 75 seconds to load this data, but the Wait time on source connection was 27 seconds. It means that, ODI was able to get data fast from the source database, but it needed to wait for the target thread to be available, so it could send more data in. Also, target DOP is one, so only one thread worked to load this data to the C$ table.

Source: SQL Server, TABLE_A has 2,261,393 rows
Target: Oracle, TABLE_B will be loaded from TABLE_A.
Topology: Target Data Server is set to DOP 16.

5

Now we see some gain. The wait time is 0.610 seconds in the source and the target threads were able to load all of them to C$ table in 35 seconds, running 16 threads in parallel. You may even try to reduce this load times further by changing Array Fetch Size (in the source Data Server) and Batch Update Size (in the target Data Server), but those two settings I’ll leave to another Hidden Gem post.

See you later!

DEVEPM in the land of the free!

Posted in Career, ODI with tags , on April 29, 2019 by radk00

Hi all,

Today’s quick post is not a technical one, but it is just to share with you some good news. I (Rodrigo) have recently moved from Brazil to USA and I’m very excited about it! This means new challenges, new opportunities and probably some new content to the blog 😊

I have accepted an offer to work for Innive Inc, an IT consulting company that you may read more about it in our site. I’ll still work a lot with data integration, but I’ll probably move away a little bit from EPM. However, it doesn’t mean that I won’t play around and do some tests when new EPM stuff comes out 😉

Thanks everyone! And don’t forget that we will be at Kscope19 this year, so if you want to talk to us there, please feel free to send us a message!

See ya!

ODI “Command on Source” buffered behavior

Posted in ODI, Tips and Tricks with tags , on March 28, 2019 by radk00

Hi all! This post was created based on a friend’s question to me a couple of days ago. He asked me the following:

  • I know that we may create a procedure with a SQL in “Command on Source” that would return N rows and trigger an OS command in the “Command on Target” tab for each of those rows, passing the results as a parameter. My process takes a while for each row, so I was wondering if I could insert more rows in the table that is being read on “Command on Source” while it is still executing, so it would pick the new rows as well in the same execution?

In other words, he wanted to trigger the ODI procedure once and keeps “feeding” the “Command on Source” table many times, so all his OS Commands would get executed in one procedure run. Instinctively I said no, because ODI needs to somehow “buffer” the “Command on Source” results (which may be a result of a SQL statement with different tables) and then start to run the “Command on Target” commands. He agreed with me and moved on. However, that keep in my head: what would happen if the source table somehow changes while the ODI procedure is running? What if more rows were inserted or if the table was truncated/deleted? I did some tests to make sure I gave him the right answer.

I created a procedure that contains only an ODI sleep command in the Target that will be executed for each row that comes from the source, like this:

1

Then in the Source I added the following:

2

I populated this table with 20 records and I executed the proc. As expected, it took 20 seconds to complete:

3

Then I did the following test: I executed the proc and right away I inserted 20 more rows in the target. As I thought, the procedure ended again in 20 seconds, not 40, which means that ODI really buffers the results before executing the Command on Target:

4

But this is a small number and maybe ODI can buffer all of it right away and maybe that’s why it worked. I looked in the Topology and the Array Fetch Size for this connection was set to 250:

5

I did another test with 1000 rows and decreasing the wait time from 1000 to 100 to see how it goes. Both executions (with 1000 rows and then with adding more 1000 rows in between) ended in 104 seconds (the four extra seconds may be a delay due to network, ODI usage and so on):

6

So, I changed my approach and tried something different. What if I add many rows (100,000), change the delay to ‘1’ and in the meanwhile I truncate the table? The result kind of surprised me:

7

Truncate is considered a DDL command, so that’s why the error is saying that the object does not exists anymore, although it’s still there in the DB (it does not exist in that session, as it was modified by a DDL command in another session). This test was not exactly what I wanted to test and yet it does surprises me because I was expecting that ODI would have already buffered all the results (the error happened after 51 seconds only) and it would not be sensitive to certain DDL commands. However, this may indicate that ODI does not buffer it all at once and when it tried to read the table again, it was already truncated. So, let’s do another test.

Next test I doubled the row amount and inserted 200,000 and kept the delay to ‘1’ and ran the procedure. While it was running, I tried to delete the rows instead of truncating the table. The delete ran fine, it took 18 seconds to delete (less than 51 seconds from truncate) and no errors happened this time. So, it seems that ODI buffered all the results (which were doubled) before the 18 seconds.

8

After the deleting and committing, the proc continued to run and finished around 200 seconds, as expected.

9

I did one more test to see if the time that it took to fail in the truncate test would increase if I increase the number of rows. I inserted again 200K rows (again, double amount from the previous truncate test), ran the process and truncated the table. It took the same 51 seconds. So, I believe that, although ODI can buffer all the results before 51 seconds, Oracle somehow tells the process that the table was changed by a DDL command and sends a “stop” signal to the connection from a specific amount of time. I don’t see any other explanation for this behavior.

I could run some other tests, especially to see how large is the ODI buffer size, but as for now I’m ok with the results. In resume we figure out that:

  • ODI does buffer the results in the “Command on Source/Target” and we cannot modify them once it starts;
  • Although the results are buffered, the ODI procedure may fail if DDL commands are issued to the source tables;
  • DML commands does not seem to affect the buffered results, as expected;

If any of you has done some tests like this, please share with us! This kind of things are never documented, so we need to keep testing to see how they work behind the scenes.

Thanks! See you soon!

DEVEPM will be at Kscope19!

Posted in DEVEPM, Kscope, Kscope 19, ODI with tags , , on March 14, 2019 by radk00

Hi all, how are you doing? We are very happy to announce that once again DEVEPM will be at KScope! We are very honored to be selected to present on the best EPM conference in the word! We got one presentation and a panel in, so here is what we are going present at Kscope19:

OAC and ODI! A Match Made in…the cloud?

  • OAC stands for Oracle Analytics Cloud Services, and it’s another cloud solution offered by Oracle. It provides you a lot of analytic tools for your data. The question is, do you need to be 100% cloud to use OAC services?
    Well, with ODI we always have options, and for OAC that is not an exception.
    In this presentation we’ll take a look at three different ways to use ODI to integrate all your data with OAC, ranging from using your existing on-premises environment to a 100% cloud solution (no ODI/DB footprint in your environment).

205, Level 2 => Tue, Jun 25, 2019 (09:00 AM – 10:00 AM)

EPM Data Integration Panel

Is there a functional issue that you’ve been trying to solve with Cloud Data Management or FDMEE that you just can’t seem to break through on? Are you about to kick off a new project or phase and need to validate that Data Management or FDMEE is the right tool for your needs? Are you about to subscribe to an Oracle EPM SaaS offering and want to know your data integration options? Or do you just want to give some feedback about the product and features that will help you increase your utilization of it?

201, Level 2 => Wed, Jun 26, 2019 (11:45 AM – 12:45 PM)

Kscope is the largest EPM conference in the world and it will be held in Seattle on June 2019. It will feature more than 300 technical sessions, five symposiums, deep dive sessions, and hands on labs over the course of five days.

ys3header

Got interested? If you register by March 31st you’ll take advantage of the Kscope early bird rates. Don’t waste more time and let’s be part of the greatest EPM event in the world. If you are still unsure about it, read our post about how Kscope/ODTUG changed our lives! Kscope is indeed a life changer event!

Playing with ODI and Groovy – Part 3 – Comparing ODI Scenarios from two environments

Posted in GROOVY, Java, ODI, ODI SDK with tags , , on February 1, 2019 by radk00

Hi all!

Now that we can connect to any ODI environment and list all sorts of ODI objects, its time to compare them and find its differences. Today’s example will do a simple check on ODI Scenarios between two environments based on its name and version. We will increment our code logic in later posts by adding different types of filters to make our code more efficient, but as for now let’s do the basic stuff just for us the get the main idea.

One important thing to notice here is that we will compare the objects only by its name and version number, not by its content. In other words, if you have the same ODI object regenerated with the same version number over and over again, this code won’t probably work in the way you are expecting too, since it may tells you that SCENARIO_001 is the same as SCENARIO_001, but in fact it may not be if you regenerated the source scenario more than once.

Let’s pause and talk briefly about ODI Scenario versions

Just a small comment on using different ODI scenario version numbers whenever the code has changed: I’m a huge fan of that and I always do a big effort to implement this concept in every project that I work on. The concept is very simple, but a lot of people don’t do it:

  • Every time that the scenario leaves the DEV environment to be imported in a subsequent environment, it should have an incremental version number.

For example: when you create a new ODI scenario in DEV, you may keep regenerating it repeatedly with the same number, let’s say 1_00_00. After some time, you will deploy it to test environment using version 1_00_00. Now you need to do some code fix or enhancement to it. Instead of regenerate it as 1_00_00 and move it forward, you should increment a version number, like 1_00_01. Your test environment will have both scenarios, but you may call them always using “-1” in the Load Plans or Scenario calls in procedures, which will ensure that ODI will always use the latest one (1_00_01). You may end up with several test interactions/fixes and end up with 10 versions of the same scenario, but that is fine. Once test finishes, you move just the latest one to Production, which will be 1_00_10.

Some people don’t like this approach and their main argument is that it creates a bunch of “repetitive” scenarios over the environments. However, they don’t realize (at first) that this approach creates a lot of benefits over time, such as:

  • Traceability: if you have a defect tracking tool, you may add a comment in the defect (or even in the ODI scenario comments) identifying which defect generated which scenario version, so you may answer all those answers like: “is the code fix xxxx that was generated in version xxxx already in test environment?”. It’s impossible to answer that question if the version is always 001;
  • Control: you may easily see how many defects/interactions a particular ODI scenario had in that development cycle and who did each one of them, so you may have a sense if a code (or even a developer) is having problems to deliver things properly;
  • Roll back is a breeze: if you add something to production but needs to roll it back due to an unexpected behavior, you just remove the latest scenario and that’s it. ODI will pick the old scenario and you are good to go. No need to export the ODI scenario first as a backup, then import it again in case of any problem;
  • Number of scenarios is manageable: if someone still argues about the number of similar objects over the environments, we may always adopt “the last three” versions approach, where we maintain only the last three versions of a scenario in each environment. Three is a good number because its very unlikely that you will want to rollback your code to older than three versions.

Getting back to our main topic

Getting back to our post (and really hoping that you start to follow the incremental version approach from now on, if you dont already), lets change our code to connect to two repositories instead of one.  We already have a code to connect to a source repository, so its just a matter to duplicate it to a target repository (or you may create a function as well to reuse some of the code there):

targetUrl = "jdbc:oracle:thin:@YOUR_SERVER_INFO"
targetSchema = "DEV_ODI_REPO"
targetSchemaPwd = "XXXXXXXX"
targetWorkrep = "WORKREP"
targetOdiUser = "XXXXXXXX"
targetOdiUserPwd = "XXXXXXXX"

targetMasterInfo = new  MasterRepositoryDbInfo(targetUrl, driver, targetSchema, targetSchemaPwd.toCharArray(), new PoolingAttributes())
targetWorkInfo = new WorkRepositoryDbInfo(targetWorkrep, new PoolingAttributes())

targetOdiInstance = OdiInstance.createInstance(new OdiInstanceConfig(targetMasterInfo, targetWorkInfo))
targetAuth = targetOdiInstance.getSecurityManager().createAuthentication(targetOdiUser, targetOdiUserPwd.toCharArray())
targetOdiInstance.getSecurityManager().setCurrentThreadAuthentication(targetAuth)

Next thing is that we will slightly change our listObjects function to receive a list object that will hold our source and target ODI scenarios:

def listObjects (odiInstance,odiClass,listOfObjects) {
	odiObjects = odiInstance.getTransactionalEntityManager().getFinder(odiClass).findAll().sort{it.name}
	for (Object odiSingleObject: odiObjects)
		listOfObjects.add(odiSingleObject.getName() + " - " + (odiSingleObject.getClass()==OdiScenario.class? odiSingleObject.getVersion() : "NA") )
}

We will call it for both source and target connections:

println("Creating Source Scenarios List")
sourceScenarios = []
listObjects (sourceOdiInstance,OdiScenario.class,sourceScenarios)

println("Creating Target Scenarios List")
targetScenarios = []
listObjects (targetOdiInstance,OdiScenario.class,targetScenarios)

Now that we have both lists, is just a matter to compare those. Its very easy to compare and even filter the results in Groovy. The code below will get all ODI scenarios that exists in the source environment but do not exists in target and will filter this list to retrieve only the scenarios that begins with TEST. This kind of filtering is useful, since sometimes you just want to search for certain scenarios, not all of them (again, we will increase the filtering options on later posts):

diffScenarios = (sourceScenarios-targetScenarios).findAll(){sourceScenarios -> sourceScenarios.toUpperCase().startsWith('TEST')}
println("The difference (with the filter) is:")
println(diffScenarios)

If we execute our code, the result will be something like this:

post1

TEST2 scenario was already imported in my target environment, so the result says that TEST1/TEST3/TEST4 exists in source but not in target.

That’s it for today folks. Next post we will learn how do we get this list of different objects and import them in our target environment. If you want the code that was presented in this post, please click here.

Thanks!

Playing with ODI and Groovy – Part 2 – Listing all kinds of ODI objects

Posted in Java, ODI, ODI SDK with tags , , on January 22, 2019 by radk00

Today’s post is short as we will learn how to list any kind of ODI objects using ODI SDK. Although it is simple, it can be used for several different reasons in your daily activities and we will use it to list all the existing scenarios, load plans and folders in our ODI utility. ODI SDK has a very simple way to search for its objects as we can see below:

odi.getTransactionalEntityManager().getFinder(odiClass).findAll()

From “odi” instance object, we get an Entity Manager, which provides the methods to interact with the persistence context in order to make CRUD (Create, Read, Update, Delete) operations against IOdiEntity instances. ODI entity is any object that reside in an ODI repository and so is capable of being persisted (like scenarios, load plans, folders, etc).

From Entity Manager, we may get a Finder, that will receive an ODI Class as a parameter and will return a collection of all objects that belongs to that class. You can “find” any object that implements the IOdiEntity interface. Some examples of ODI classes that you can use are:

  • OdiDataStore
  • OdiFolder
  • OdiIKM
  • OdiLKM
  • OdiLoadPlan
  • OdiLogicalSchema
  • OdiModel
  • OdiPackage
  • OdiPhysicalSchema
  • OdiProcedure
  • OdiScenario
  • OdiScenarioFolder
  • OdiSession
  • OdiSessionFolder
  • OdiUser

So, lets create a procedure in our code that will list all the correspondent ODI objects from a given ODI instance object and a class:

def listObjects (odi,odiClass) {
	odiObjects = odi.getTransactionalEntityManager().getFinder(odiClass).findAll().sort{it.name}
	if (odiObjects.size() > 0) {
		for (int i = 0; i < odiObjects.size(); i++) {
			odiSingleObject = odiObjects.toArray()[i]
			println(odiSingleObject.getName() + " - " + (odiSingleObject.getClass()==OdiScenario.class? odiSingleObject.getVersion() : "NA") )
		}
	}
}

A couple of things about this code. You can see that I’m sorting all the objects that will be displayed by its name. But if I needed something more complex, like sort by name and version number, I could write something like this:

sort {a, b -> a.name.toLowerCase() <=> b.name.toLowerCase() ?: a.version  b.version}

However, this sort wouldn’t work for all classes, since we are using VERSION, which may not be applicable to all ODI objects, like folders. On those cases, we may do a simple check to see if that object belongs to a specific class or not:

odiSingleObject.getClass()==OdiScenario.class? odiSingleObject.getVersion() : "NA"

This one is checking if the current class is OdiScenario. If true, then it will get its version value, otherwise it will just get “NA”.

To run our procedure, it is just a matter to do something like this:

try {
	listObjects (sourceOdiInstance,OdiScenario.class)
	listObjects (sourceOdiInstance,OdiLoadPlan.class)
	listObjects (sourceOdiInstance,OdiScenarioFolder.class)
}
catch (e){
	println(e)
}

The result will be a print of the list of objects:

1

2

That’s it for today folks. You can look at the code in this link (I’ll add one for each post, so its easier for the readers to follow).

See ya!