Category Archives: Huge Data

Business Intelligence

Business intelligence is a practice available at VibrantWorx where we manage the cost and give scalable and advanced solution for organizations who are interested in adopting versatile and open architecture based solution.

Here is the excerpt from the Gartner’s report (Magic Quadrant for Business Intelligence Platforms) that openly votes for Pentaho platform in the arena as very comprehensive  and feature-rich solution considering its open-source foundation.



Gartner’s Review of products in BI space (Pentaho’s Competitors)

“Pentaho, after just four years in existence, has put together a comprehensive open-source BI platform that includes data integration and data mining capabilities. In 2008, Pentaho was noticeably more aggressive, openly competing against traditional BI platform vendors. Like Jaspersoft, Pentaho is affordable and also offers a subscription-based model that avoids an initial large payment for the software license. Some of the significant features Pentaho introduced in 2008 include an automatic table designer that analyses relational schemas and data patterns, performs a cost-benefit analysis of aggregation at different levels, and generates and populates those aggregate tables. Despite a handful of large customers, Pentaho reference survey respondents more frequently indicated that they had more departmental deployments (versus enterprise wide) and smaller data volumes compared with the other vendors.”

Pentaho is also getting adopted and implemented with various financial firms like “Aberdeen Group” who has selected Pentaho for their business intelligence purposes, which is a big win for Pentaho. Find the report how Aberdeen Group has integrated Pentaho in their product portfolio.

Pentaho as a platform is very suitable as it is platform independent which means it can be run on any hardware platform that has a JRE installed in it. Pentaho on top of platform independence also gives freedom of decoupling the underlying database changes. For example your product may be subjected to change in future and you want the BI to independent of underlying database.




It is generally a great worry for CTO of an organization to ensure how will the data warehouse gets impacted when the underlying schema gets changed in the subsequent releases. With the metadata architecture over which the design is built, CTO can rest in peace as there are two non physical layer over which the business intelligence component is build on.

What are steps for under taking a Business Intelligence project and how Pentaho BI Suite of products help in achieving this task?

Each Business Intelligence project has an ETL process that needs to push the data from existing database to other database using some script on real time [Embedded] or as EOD process. The EOD process can push the data from existing Real time data into a staging area from here the data is then filtered and loaded into star schema, here the facts are getting calculated and stored for a particular set of dimensions.

So how does the Pentaho product stacks up for ETL? Calypso has spoon as an editor where user can create extraction and transformation rules in a stand alone application this editor generates the desired result and transforms the data from various datasources into the staging db it can extract from various sources namely flat files, excel files, various db and perform action over these. Pentaho also comes up with a Pentaho data integration later which performs this transformation.

Once the data is secured, it is mined for various dimensions of management view, these contain  analysis logic which then populates the fact tables of star schema.



In the example above we store fact table that is the analysis numbers for a set of each dimensions. For example what is the numbers that we would see if the see from products perspective so the join would one products and numbers are store in fact table. We then have pre defined dimension on product types say for ex: we have $X revenues from FMCG and $X from men’s wear, however the facts sales number would be readily calculated as these products are preconfigured already. Similarly we can guess how the other dimensions would work here.

Now we have the numbers with us. To generate the presentation of these numbers in a dashboard, we need lots of features on these dashboards. So Pentaho provides lots of goodies in dashboard preparation. In the dashboard the user would select and aggregate based on these dimensions and would like how the fact changes for the given condition of dimensions.

Pentaho has dashboard which is rendered using GWT/JPIVOT/GoogleMaps/Servlet and these reports are secured using administration module. This module would be responsible for rendering and slicing/dicing and have in built drill down facility. Companies are free to enhance these functionality and add features to existing utilities that come along with Pentaho.



Management could also be shown with google map representing the sales and figure each unit is bringing. This helps the top management to focus and drill down from each locations.