Monday, April 22, 2013

Pretty print XML and JSON files with PDI (2/2)

Today I'm back with the second part of my recipe about how two pretty print XML and JSON files with PDI. You can find the first part here.

To pretty print whatever JSON stream a good starting point is to use the GSON library. It is a nice component that lets you

  1. serialize JSON streams starting from a set of Java objects or 
  2. convert a JSON stream into an equivalent set of JAVA objects
The setup

To prepare PDI to run this example you must:

  1. Download the GSON library from the following link. In my case I've downloaded the version 2.2.3 but consider the same steps with other versions of the library.
  2. Unzip the file on a temporary directory
  3. Copy the gson-2.2.3.jar file to the <PDI_HOME>/libext directory
  4. Restart PDI 

The how-to

First of all I started by making an example to obtain an ugly JSON sample stream to format. To do this I built a new transformation by reusing the input files of the sample multilayer xml file transformation to obtain a simple JSON stream. The interesting part is at the very end of this transformation. Again you have a User Defined Java Class step that contains all the code that does the dirty job for you.

1:  import com.google.gson.Gson;  
2:  import com.google.gson.GsonBuilder;  
3:  import com.google.gson.JsonParser;  
4:  import com.google.gson.JsonElement;  
5:  String jsonOutputField;  
6:  String jsonPPField;  
7:  public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException  
8:  {  
9:    // First, get a row from the default input hop  
10:       //  
11:       Object[] r = getRow();  
12:    // If the row object is null, we are done processing.  
13:       //  
14:       if (r == null) {  
15:            setOutputDone();  
16:            return false;  
17:       }  
18:       // Let's look up parameters only once for performance reason.  
19:       //  
20:       if (first) {  
21:            jsonOutputField = getParameter("JSONOUTPUT_FIELD");  
22:            jsonPPField = getParameter("JSONPP_FIELD");  
23:         first=false;  
24:       }  
25:    // It is always safest to call createOutputRow() to ensure that your output row's Object[] is large  
26:    // enough to handle any new fields you are creating in this step.  
27:       //  
28:    Object[] outputRow = createOutputRow(r, data.outputRowMeta.size());  
29:       logBasic("Input row size: " + r.length);  
30:       logBasic("Output row size: " + data.outputRowMeta.size());  
31:       String jsonOutput = get(Fields.In, jsonOutputField).getString(r);  
32:       Gson gson = new GsonBuilder().setPrettyPrinting().create();  
33:       JsonParser jp = new JsonParser();  
34:       JsonElement je = jp.parse(jsonOutput);  
35:       String jsonpp = gson.toJson(je);  
36:       // Set the value in the output field  
37:       //    
38:       get(Fields.Out, jsonPPField).setValue(outputRow, jsonpp);  
39:    // putRow will send the row on to the default output hop.  
40:       //  
41:    putRow(data.outputRowMeta, outputRow);  
42:       return true;  
43:  }  

This time the interesting code is between lines 32 and 35:

  1. The Gson object is created enabling the pretty printing.
  2. The JSON stream that is coming in is read and parsed appropriately (lines 33-34)
  3. A new JSON stream pretty printed is built and used to fill the jsonpp rowset field (line 35)

Next the output pretty printed JSON stream is saved to a .js file and that's all.

You can download the sample transformation from this link. I hope you enjoyed this two part article and that it can be useful for you.

No comments:

Post a Comment