Thursday, December 10, 2009

Patching JPivot to filter by multiple members belonging to the same dimension

A very annoying problem with the release of JPivot shipped with the actual release of Pentaho is that you can only select one member from the same cube dimension at a time. But frequently we would like to select more than one member at a time. So what a good opportunity to build a patched release of JPivot and make that happens!

The first thing I made was try to test that by writing a sample MDX query on the SteelWheels cube that had, as a filter, a tuple made up by a set of members belonging to the same dimension. It worked. Infact, starting from Mondrian, the OLAP engine supported compound slicers that makes this possible. After this very brief verification I was convinced to continue in my adventure and the next step was to find out where was the latest JPivot sources. As stated by Will Gorman in the post #7 of this thread the latest and good sources are in the obvious place... the JPivot CVS Repository of the JPivot project on SourceForge. So I get it and started my work.

The patch wasn't so difficult to be done and I reached easily my goal. It was a good opportunity to go into the internals of JPivot and learn about it. The new PAT is not round the corner so my opinion is that JPivot will be the production OLAP viewer of choice for Pentaho Community Edition again for a few months. So, I think, this exercise is not a waste of time.

If you want to try it, you can find the patched JPivot jar file here. The setup takes just one minute. Move your original JPivot library from <biserver_home>/tomcat/webapps/WEB-INF/lib in a different directory (just to be safe). Unzip the archive you just downloaded and copy the extracted jar file to the location specified before. Restart Pentaho BI Server and then... here we go!!

Friday, November 27, 2009

Accessing Mondrian cubes through Pentaho Report Designer

Today I had the opportunity to design some complex reports using the cubes published in my customer's Pentaho BI Server as datasources for my report. Using a cube as a datasource to produce reports is good, in my opinion, because it gives you a perfect way to make easy reporting whenever you've, for example, to produce reports that compares data on different periods.

A good sample for what we're going to discuss here is the Top N Analysis report you can find in the Pentaho's Steel Wheels samples. For abbreviation I'll refer to this report simply as "the sample". If you open it and have a look at the defined datasources you can clearly see that it takes the data it needs from steelwheels.mondrian.xml cube schema. So that is good for us.

Publish the report to Pentaho BI Server

Before thinking about the publishing of your report to your bi-server running instance, you've to think about the way Pentaho will use to access the schema it needs for your report. The strategy used by the reporting plugin goes through two possible path:

  • Firstly it tries to access the Analysis Schema file using the path you've specified in the datasource definition of the Pentaho Report Designer. Whenever you're in the Pentaho BI  Server execution environment every path is calculated respect to <BISERVER_HOME>/tomcat/bin. That means that if you set a reference to your Analysis Schema cube as a relative path in your report datasource definition (as is for the sample I mentioned in my opening) Pentaho will look for you schema file calculating the absolute file path respect to <BISERVER_HOME>/tomcat/bin. So you need to be sure that your file is in the right place before the system will try to access it. I think that this way is not as good because is dependent on your BI Server filesystem layout.
  • Secondly it tries to access the schema as an XMLA datasource. That is, in my opinion, the more elegant way to make the schema available to the reporting engine.

How to add a new XMLA datasource to Pentaho BI Server

To define a new XMLA datasource in our Pentaho BI Server environment we've to update the datasources.xml file in <BISERVER_HOME>/pentaho-solution/system/olap.

This file contains the definitions of all the XMLA datasources available in the system. We can add a new datasource definition using one of these two ways:
  1. Manually add a new Catalog element to configure a new Mondrian catalog
  2. While publishing the Analysis Schema from the Schema Workbench flag the Enable JNDI datasource and set the JNDI Data Source field appropriately. You can find the procedure to publish the schema clearly explained in Pentaho's wiki.

Thursday, October 29, 2009

Mondrian cubes debugging: how to display SQL queries

These days I've got the interesting need to look at the queries that Mondrian generates while the user is navigating the OLAP cube. This idea came to me when I decided to the check if the indexes applied to my tables gives me the best performances possible. To decide which indexes are eligible to be applied to my tables, my strategy is
  • collect some queries and then 
  • get the query plan of each query and check if the indexes are properly applied.

It's really useful to look at Mondrian log files because they gives us a lot of useful informations about how our system is behaving. We can
  • look at sql statements and MDX queries,
  • have some profiling informations on queries that are executed,
  • get other useful debugging informations.
The following paragraphs illustrates how to enable Mondrian debugging logs, adding some properties to the Mondrian configuration file.
After that, we'll configure two new log4j appenders to have the desired log files properly written on our filesystem.

Enable Mondrian debug log
Mondrian has a big set of configuration settings that can be modified. In our case, to enable Mondrian debug informations follow the steps detailed below:

Open the file located in <bi-server_home>/pentaho-solution/system/mondrian and add the following line.
You can find the complete set of configuration settings here

Update log4j configuration
At this point we're going to modify the log4j configuration file adding the required appenders abd categories to have our logging informations displayed properly

Open the log4j.xml file located in <bi-server_home>/tomcat/webapps/pentaho/WEB-INF/classes

Based on what you want to log, add the one or each of the following lines to the file. They will create two new RollingFileAppenders. You're free to use
the kind of appender you prefer. In case you need further informations about log4j and its configuration parameters you can have a look at here
IMPORTANT: The location of the produced files is relative to the <bi-server_home>/tomcat/bin directory.  You can put the generated log files wherever you
want in the filesystem but always remember this important consideration.

   <!-- Add the following appender only if you're interested in logging SQL statements -->

   <appender name="SQLLOG" class="org.apache.log4j.RollingFileAppender">
     <param name="File" value="sql.log"/>
     <param name="Append" value="false"/>
     <param name="MaxFileSize" value="500KB"/>
     <param name="MaxBackupIndex" value="1"/>

     <layout class="org.apache.log4j.PatternLayout">
       <param name="ConversionPattern" value="%d %-5p [%c] %m%n"/>

   <!-- Add the following appender only if you're interested in logging MDX statements -->

   <appender name="MONDRIAN" class="org.apache.log4j.RollingFileAppender">
     <param name="File" value="mondrian.log"/>
     <param name="Append" value="false"/>
     <param name="MaxFileSize" value="500KB"/>
     <param name="MaxBackupIndex" value="1"/>

     <layout class="org.apache.log4j.PatternLayout">
       <param name="ConversionPattern" value="%d %-5p [%c] %m%n"/>

Add the following new categories to the log4j.xml file according to your logging needs.

   <!-- and logs only to the SQLLOG -->
   <category name="mondrian.sql">
      <priority value="DEBUG"/>
      <appender-ref ref="SQLLOG"/>

   <!-- and logs only to the MONDRIAN -->
   <category name="mondrian">
      <priority value="DEBUG"/>
      <appender-ref ref="MONDRIAN"/>

Enable the new log settings
To have the new log settings enabled restart the Pentaho bi-server instance. Remember, as soon as you satisfied your debugging needs, to disable the tracing logs because they have a severe impact on system performances.