Articles For Website

How to execute jobs in an iterative way with Pentaho Data Integration (PDI)

Using Job executors
The Job Executor is a PDI step that allows you to execute a Job several times simulating a loop. The executor receives a dataset, and then executes the Job once for each row or a set of rows of the incoming dataset. To understand how this works, we will build a very simple example. The Job that we will execute will have two parameters: a folder and a file. It will create the folder, and then it will create an empty file inside the new folder. Both the name of the folder and the name of the file will be taken from the parameters. The main transformation will execute the Job iteratively for a list of folder and file names.

Let’s start by creating the Job:

Create a new Job.
Double-click on the work area to bring up the Job properties window. Use it to
define two named parameters: FOLDER_NAME and FILE_NAME.
Drag a START, a Create a folder, and a Create file entry to the work area and
link them as follows:
Create folder4. Double-click the Create a folder entry. As Folder name, type ${FOLDER_NAME}

5. Double-click the Create file entry. As File name, type ${FOLDER_NAME}/${FILE_NAME}.

6. Save the Job and test it, providing values for the folder and filename. The Job should create a folder with an empty file inside, both with the names that you provide as parameters.

Now create the main Transformation:

Create a Transformation.
Drag a Data Grid step to the work area and define a single field named foldername. The type should be String.
Fill the Data Grid with a list of folders to be created, as shown in the next example
4. As the name of the file, you can create any name of your choice. As an example, we will create a random name. For this, we use a Generate random value and a UDJE step, and configure them as shown
Configuring the executors with advanced settings
Just as it happens with the Transformation Executors that you already know, the Job Executors can also be configured with similar settings. This allows you to customize the behavior and the output of the Job to be executed. Let’s summarize the options.

Getting the results of the execution of the job
The Job Executor doesn’t cause the Transformation to abort if the Job that it runs has errors. To verify this, run the sample transformation again. As the folders already exist, you expect that each individual execution fails. However, the Job Executor ends without error. In order to capture the errors in the execution of the Job, you have to get the execution results. This is how you do it
Working with groups of data
As you know, jobs don’t work with datasets. Transformations do. However, you can still use the Job Executor to send the rows to the Job. Then, any transformation executed by your Job can get the rows using a Get rows from result step.

By default, the Job Executor executes once for every row in your dataset, but there are several possibilities where you can configure in the Row Grouping tab of the configuration window:

You can send groups of N rows, where N is greater than 1
You can pass a group of rows based on the value in a field
You can send groups of rows based on the time the step collects rows before executing the Job
Using variables and named parameters
If the Job has named parameters—as in the example that we built—you provide values for them in the Parameters tab of the Job Executor step. For each named parameter, you can assign the value of a field or a fixed-static-value. In case you execute the Job for a group of rows instead of a single one, the parameters will take the values from the first row of data sent to the Job.

Capturing the result filenames
At the output of the Job Executor, there is also the possibility to get the result filenames. Let’s modify the Transformation that we created to show an example of this kind of output:

Open the transformation created at the beginning of the section.
Drag a Write to log step to the work area.
Create a hop from the Job Executor toward the Write to log step. When asked for the kind of hop, select the option named This output will contain the result file names after execution. Your transformation will look as follows:

Exit mobile version