Archive for December, 2022

ODI Hidden Gems – SNP tables: Query to get load plan execution steps

Posted in ODI, Query with tags , on December 6, 2022 by Rodrigo Radtke de Souza

Hi all,

Today is a short post, but very useful. I often get asked about how to write SNP table queries to get some information from ODI repositories. I posted one about “executed code” here. Today, this is another example of how to get a load plan execution steps that were executed. This one I use in the end of every load plan execution, so it creates a mini report of how each step performed in that load plan. This is useful for ODI Operators, so they can do some analysis of how well their jobs are doing.

This query was created using ODI 12.2.1.4.

WITH 
LP_STEPS AS (
SELECT SLS.I_LOAD_PLAN
     , LP.LOAD_PLAN_NAME
     , I.I_LP_INST
     , SLS.LP_STEP_NAME
     , SLS.I_LP_STEP
     , SLS.PAR_I_LP_STEP
     , SLS.STEP_ORDER
     , LEVEL AS LEVEL_
     , SYS_CONNECT_BY_PATH(SLS.LP_STEP_NAME, ' -> ') AS STEP_NAME_PATH
     , SYS_CONNECT_BY_PATH(SLS.STEP_ORDER, ' ') AS STEP_ORDER_PATH
     , SYS_CONNECT_BY_PATH(SLS.IND_ENABLED, ' ') AS IND_ENABLED_PATH
     , SYS_CONNECT_BY_PATH(DECODE(SLS.LP_STEP_TYPE, 'EX', 1, 0), ' ') AS EXCEPTION_STEP_PATH
     , CONNECT_BY_ISLEAF IS_LEAF
     , SLS.IND_ENABLED
  FROM SNP_LOAD_PLAN LP
    INNER JOIN SNP_LP_STEP SLS
        ON SLS.I_LOAD_PLAN = LP.I_LOAD_PLAN
    INNER JOIN SNP_LP_INST I
        ON SLS.I_LOAD_PLAN = I.I_LOAD_PLAN
        AND I.GLOBAL_ID = '<LOAD_PLAN_INSTANCE_GLOBAL_ID_HERE>'
 CONNECT BY PRIOR SLS.I_LP_STEP = SLS.PAR_I_LP_STEP 
  START WITH SLS.PAR_I_LP_STEP IS NULL
)  
, ALL_ AS (
SELECT L.I_LOAD_PLAN
     , L.LOAD_PLAN_NAME
     , L.I_LP_INST
     , L.LP_STEP_NAME
     , L.I_LP_STEP
     , L.STEP_NAME_PATH
     , L.STEP_ORDER_PATH
     , CASE WHEN L.IND_ENABLED_PATH LIKE '%0%' THEN 'N' ELSE 'Y' END ENABLED
     , MAX(IL.SESS_NO) AS SESS_NO
  FROM LP_STEPS L
    LEFT JOIN SNP_LPI_STEP_LOG IL
        ON L.I_LP_INST = IL.I_LP_INST
        AND L.I_LP_STEP = IL.I_LP_STEP
 WHERE 1 = 1
   AND L.IS_LEAF  = 1
   AND EXCEPTION_STEP_PATH NOT LIKE '%1%'
 GROUP BY L.I_LOAD_PLAN
        , L.LOAD_PLAN_NAME
        , L.I_LP_INST
        , L.LP_STEP_NAME
        , L.I_LP_STEP
        , L.STEP_NAME_PATH
        , L.STEP_ORDER_PATH 
        , L.IND_ENABLED_PATH
) 
SELECT A.I_LOAD_PLAN
     , A.I_LP_STEP
     , A.LOAD_PLAN_NAME
     , A.LP_STEP_NAME
     , A.STEP_NAME_PATH
     , A.SESS_NO
     , CASE WHEN A.ENABLED = 'N' THEN 'Step has been disabled'
            WHEN A.SESS_NO IS NULL THEN 'Step did not run'
            WHEN S.SESS_STATUS = 'D' THEN 'Step finished successfully'
            WHEN S.SESS_STATUS = 'E' THEN 'Step failed'
            WHEN S.SESS_STATUS = 'W' THEN 'Step finished with warnings'
            ELSE 'Other status not relevant'
        END STATUS
     , TO_CHAR(S.SESS_BEG, 'DD-MON-YYYY HH24:MI:SS') AS SESS_BEG
     , TO_CHAR(S.SESS_END, 'DD-MON-YYYY HH24:MI:SS') AS SESS_END
     , S.SESS_DUR
     , REPLACE(REPLACE(REPLACE(SUBSTR(S.ERROR_MESSAGE, 1, INSTR(S.ERROR_MESSAGE, CHR(9) || 'at ')), CHR(10), ' '), CHR(13), ' '), CHR(9), ' ') AS ERROR_MESSAGE
  FROM ALL_ A
    LEFT JOIN SNP_SESSION S
        ON A.SESS_NO = S.SESS_NO
 ORDER BY A.STEP_ORDER_PATH;

I hope you like it! See you soon!

Advertisement

ODI in the hybrid database world – Google BigQuery – gsutil + bq

Posted in BigQuery, Cloud, Google, ODI with tags , , on December 5, 2022 by Rodrigo Radtke de Souza

As we saw in our last post, ODI + BigQuery + Simba Jdbc didn’t give us the performance that we were expecting. But we may apply the same concept that we did for Snowflake: we may create a file, zip it (using GZIP), send it to a storage location (using gsutil) and load it into a table (using bq). Luckily, this is easy to do in ODI and Bigquery. First, we will need to install Google Cloud CLI, which will enable you to use both gsutil and bq. You may read about it here.

Next, we will follow the same steps that we did for Snowflake, first we will create a copy of an existing IKM. In this case, I duplicated IKM SQL to File Append and gave it a new name:

On Tasks, I left some original ones (which will create a text file for us) and I created three new steps: GZIP, Copy and Load.

The first new step is GZIP, because we want to compress our file before moving it to the cloud. Google BigQuery may load zipped files directly from Cloud Storage, but they need to be in a GZIP format. So, on “Target Command” I added this:

OdiOSCommand "-OUT_FILE=<%=odiRef.getTable("TARG_NAME")%>_zip.log" "-ERR_FILE=<%=odiRef.getTable("TARG_NAME")%>_zip.err"

"C:\Program Files\7-Zip\7z.exe" a "<%=odiRef.getTable("TARG_NAME")%>.GZ" "<%=odiRef.getTable("TARG_NAME")%>.txt"

For the sake of simplicity in this example, the path is hardcoded, but in a real-world project, we would make this path dynamic.

Second one is GCS Copy. In this step, we will call “gsutil cp” command that will get a file (in this case GZIP file that we created in the previous step) and send it to a bucket in Google Cloud Storage:

OdiOSCommand "-OUT_FILE=<%=odiRef.getTable("TARG_NAME")%>_cp.log" "-ERR_FILE=<%=odiRef.getTable("TARG_NAME")%>_cp.err"
gsutil cp "<%=odiRef.getTable("TARG_NAME")%>.GZ" gs://<YOUR_GCS_BUCKET_NAME_HERE>

Last step is using “bq load” command that will get the file that is stored in your bucket and load it to a table in BigQuery:

OdiOSCommand "-OUT_FILE=<%=odiRef.getTable("TARG_NAME")%>_bq.log" "-ERR_FILE=<%=odiRef.getTable("TARG_NAME")%>_bq.err"
echo bq load --source_format=CSV --skip_leading_rows=1 --field_delimiter="|" GOOGLE_TARGET.SAMPLE_DATA gs:// <YOUR_GCS_BUCKET_NAME_HERE>/SAMPLE_DATA.GZ| C:\WINDOWS\system32\cmd.exe /k ""C:\Users\<YOUR_USER>\AppData\Local\Google\Cloud SDK\cloud_env.bat""

This command is a little messy, but it works. In your project you may do something more elaborate, but for the sake of this example, it will work. Basically, I’m using the echo trick to call two command lines at once. The first one I run “cloud_env.bat”, which will set my cloud environment variables (this comes with Google CLI installation). Then, I call “bq load”, passing as parameters the source format, the number of heading lines and the field delimiter. In this case, we are using pipe “|” as our column delimiter. After that we pass which Dataset/Table we want to load and from which file (in this case our GZ file).

Seems complicated at first, but the data load speed is worth it. And since all this is in an IKM, you will do it once and reuse it repeatedly.

Before we create a mapping, we need to remember that now we won’t be loading BigQuery directly, but instead we will be loading a text file first. So, we need to create a new model pointing to a File logical schema that will be used to hold our temporary files. For this example, I’ve created a delimited file using pipe and with header. For the columns, I created all of them as Strings.

Now we may create our mapping. In this case, its pretty simple. We just add our Oracle source table and our File target table.

When we execute it, we can see that I was able to load 1,077,500 rows in 76 seconds. Not bad at all:

Once again, we saw that working with files and command line tools were way faster than JDBC connections. Also, if you think a little bit about it, it has other cool benefits. If you threat all your cloud integrations as file transfers, you may create an architecture that can be used across multiple cloud vendors. Your on-premises output will always be compressed files that will be loaded to cloud systems using different ODI Knowledge modules, one for each vendor. Since all of it are text files, ODI mappings may be used to create all sorts of complex logic using your on-premises database as staging and then only pushing the result to the cloud.

I hope you have enjoyed. See you soon!