Archive for the Google Category

ODI in the hybrid database world – Google BigQuery – gsutil + bq

Posted in BigQuery, Cloud, Google, ODI with tags , , on December 5, 2022 by Rodrigo Radtke de Souza

As we saw in our last post, ODI + BigQuery + Simba Jdbc didn’t give us the performance that we were expecting. But we may apply the same concept that we did for Snowflake: we may create a file, zip it (using GZIP), send it to a storage location (using gsutil) and load it into a table (using bq). Luckily, this is easy to do in ODI and Bigquery. First, we will need to install Google Cloud CLI, which will enable you to use both gsutil and bq. You may read about it here.

Next, we will follow the same steps that we did for Snowflake, first we will create a copy of an existing IKM. In this case, I duplicated IKM SQL to File Append and gave it a new name:

On Tasks, I left some original ones (which will create a text file for us) and I created three new steps: GZIP, Copy and Load.

The first new step is GZIP, because we want to compress our file before moving it to the cloud. Google BigQuery may load zipped files directly from Cloud Storage, but they need to be in a GZIP format. So, on “Target Command” I added this:

OdiOSCommand "-OUT_FILE=<%=odiRef.getTable("TARG_NAME")%>_zip.log" "-ERR_FILE=<%=odiRef.getTable("TARG_NAME")%>_zip.err"

"C:\Program Files\7-Zip\7z.exe" a "<%=odiRef.getTable("TARG_NAME")%>.GZ" "<%=odiRef.getTable("TARG_NAME")%>.txt"

For the sake of simplicity in this example, the path is hardcoded, but in a real-world project, we would make this path dynamic.

Second one is GCS Copy. In this step, we will call “gsutil cp” command that will get a file (in this case GZIP file that we created in the previous step) and send it to a bucket in Google Cloud Storage:

OdiOSCommand "-OUT_FILE=<%=odiRef.getTable("TARG_NAME")%>_cp.log" "-ERR_FILE=<%=odiRef.getTable("TARG_NAME")%>_cp.err"
gsutil cp "<%=odiRef.getTable("TARG_NAME")%>.GZ" gs://<YOUR_GCS_BUCKET_NAME_HERE>

Last step is using “bq load” command that will get the file that is stored in your bucket and load it to a table in BigQuery:

OdiOSCommand "-OUT_FILE=<%=odiRef.getTable("TARG_NAME")%>_bq.log" "-ERR_FILE=<%=odiRef.getTable("TARG_NAME")%>_bq.err"
echo bq load --source_format=CSV --skip_leading_rows=1 --field_delimiter="|" GOOGLE_TARGET.SAMPLE_DATA gs:// <YOUR_GCS_BUCKET_NAME_HERE>/SAMPLE_DATA.GZ| C:\WINDOWS\system32\cmd.exe /k ""C:\Users\<YOUR_USER>\AppData\Local\Google\Cloud SDK\cloud_env.bat""

This command is a little messy, but it works. In your project you may do something more elaborate, but for the sake of this example, it will work. Basically, I’m using the echo trick to call two command lines at once. The first one I run “cloud_env.bat”, which will set my cloud environment variables (this comes with Google CLI installation). Then, I call “bq load”, passing as parameters the source format, the number of heading lines and the field delimiter. In this case, we are using pipe “|” as our column delimiter. After that we pass which Dataset/Table we want to load and from which file (in this case our GZ file).

Seems complicated at first, but the data load speed is worth it. And since all this is in an IKM, you will do it once and reuse it repeatedly.

Before we create a mapping, we need to remember that now we won’t be loading BigQuery directly, but instead we will be loading a text file first. So, we need to create a new model pointing to a File logical schema that will be used to hold our temporary files. For this example, I’ve created a delimited file using pipe and with header. For the columns, I created all of them as Strings.

Now we may create our mapping. In this case, its pretty simple. We just add our Oracle source table and our File target table.

When we execute it, we can see that I was able to load 1,077,500 rows in 76 seconds. Not bad at all:

Once again, we saw that working with files and command line tools were way faster than JDBC connections. Also, if you think a little bit about it, it has other cool benefits. If you threat all your cloud integrations as file transfers, you may create an architecture that can be used across multiple cloud vendors. Your on-premises output will always be compressed files that will be loaded to cloud systems using different ODI Knowledge modules, one for each vendor. Since all of it are text files, ODI mappings may be used to create all sorts of complex logic using your on-premises database as staging and then only pushing the result to the cloud.

I hope you have enjoyed. See you soon!

Advertisement

Pushing files to GCS (Google Cloud Storage) using ODI and GSUTIL

Posted in Cloud, GCS, Google, ODI with tags , , , on January 6, 2020 by Rodrigo Radtke de Souza

Hi all, today’s post is a short one, but since I didn’t find any other post related to that in the Internet, I found it could be useful for someone else. Recently I came across a requirement to push CSV files to GCS using ODI. After some research, I saw that Google has a utility called “gsutil”, that is a tool that enables you to access Cloud Storage from the command-line. So, if something is accessible from a command-line, ODI can also do that for sure.

First we need to have gsutil installed in the server that will be used by ODI to run the OS Command (in my case it was Windows). After that, we need to configure how gsutil will connect to our GCS instance. Probably you will create a service account in GCS to be used by ODI that will contain the correct read/write permissions and from it you may download its private key file. This is a json file that contains a private key and all the login information that will be used to configure gsutil, so it may authenticate with GCS. Save the json file in the ODI agent server and run the following command there:

gcloud auth activate-service-account <ACCOUNT_NAME> –key-file=”<JSON_FILE_LOCATION>” –project=<GCS_PROJECT_NAME>

Now we are all set to start using ODI to issue commands to GCS. For us to push a csv file to the cloud, we would only need to create an ODI procedure, select “Operating System” as a Technology and write a simple CP command on it, like below:

gsutil cp “<CSV_FILE_TO_BE_UPLOADED>” gs://<GCS_BUCKET_NAME>

a

When you run the proc, the files will be pushed to the cloud. Very simple, but powerful example on what we may accomplish with ODI + gsutil. For a complete list of gsutil commands, please see its documentation page.

b

Hope you liked it. Thanks!