Friday, November 11, 2016

OBIEE - Selecting date range by selecting a column

Customers often work with specific date ranges: "last 3 days", "last week", "this month", "last 7 working days"... While it's easy to create a filter that does this and reuse it, they prefer an easier method.
Adding those option in the tool by default is problematic, since users always invent new range needs. We need something the local IT can manage without too much work for each new idea.
 
I covered how to do it with prompts in the post: "OBIEE Prompts - Last Year, Last Month and Custom Dates", but the request is to give easy access to date range selection in Answers (and VA) as well.
Sorry, it's going to be a long post.

My basic assumption is: you have a date dimension based on dates table. I will cover the following options:
  • Reusing a filter.
  • Adding date range columns to date table.
  • Create a date_range table and joining it to date dimension.   
  • Having multiple data sources on the same physical table with various "where" conditions in the content as a date dimension and specific column.

In all examples I will use 3 ranges "last 3 days", "last month", "this year". Assuming today is 11-Nov-2016:
last 3 days is 10-Nov-2016, 9-Nov-2016, 8-Nov-2016
last month is all dates from 11-Oct-2016 till 10-Nov-2016
this year are all the dates from 1-Jan-2016 till 11-Nov-2016. 

I'll describe the pros and cons of each method with the following criteria:
  • Ease of use for customer.
  • Requires RPD change for each additional date range.
  • Requires daily ETL.
  • Supports easy reverse selection and combinations (such as: last month and not last 3 days).
  • Works with VA as well or Answers only.
  • Has performance impact.
I use very simple model with measure value 1 in each day of fact.

Reusing Filter

I can create a filter from within an analysis or from menu New / Filter.

 For last 3 days I create a filter "DATE_D"."DATEVALUE" BETWEEN TIMESTAMPADD(SQL_TSI_DAY, -3,CURRENT_DATE) AND TIMESTAMPADD(SQL_TSI_DAY, -1,CURRENT_DATE)
I can save the filter for reuse

 or save the entire analysis.
To use the filter, the user has to select it from the filters list of the Catalog (left panel):





Some users find it easier to use the analysis itself as filter, they believe they have better control this way:
 In a similar manner I can create the other date range filter / Analysis:
Last Month: "DATE_D"."DATEVALUE" BETWEEN TIMESTAMPADD(SQL_TSI_MONTH, -1,CURRENT_DATE) AND TIMESTAMPADD(SQL_TSI_DAY, -1,CURRENT_DATE)
 This Year: "DATE_D"."DATEVALUE" BETWEEN TIMESTAMPADD(SQL_TSI_DAY, (1-DAYOFYEAR(CURRENT_DATE)) ,CURRENT_DATE) AND CURRENT_DATE
Only the analysis based filter let us create reverse selection:






Summary:

The best option for IT and OK but not so great for Users.
  • Ease of use for User is OK but not great.
  • Requires RPD change for each additional date range?  NO.
  • Requires daily ETL? NO.
  • Supports easy reverse selection and combinations (such as: last month and not last 3 days)? Yes, with filter based on analysis only.
  • Works with VA? NO.
  • Has performance impact? NO.



Adding date range columns to date table

This option requires change in the dates table. I'll add 3 columns (last_3_days, last_month, this_year). For each column I have to update daily the Boolean value (in my case Yes/No).


This is the Subject area:


The user has very simple task; Filtering Yes/No on the column:





The reverse option is easy to create:

It works with VA as well:


Summary:

The worst option for IT and great for Users.
  • Ease of use for User? YES.
  • Requires RPD change for each additional date range?  YES.
  • Requires daily ETL? YES.
  • Supports easy reverse selection and combinations (such as: last month and not last 3 days)? YES.
  • Works with VA? YES.
  • Has performance impact? NO.





Create a date_range table with line for each range and joining it to date dimension

The simple option here is to create a table with 3 columns: Range_name, From_date, Till_date and join them to the Date dimension. This sort of join has bad impact on performance. The table also requires ETL for updating the date_range table. We can have a work around for the ETL by creating a dynamic view (Oracle SQL code):
create or REPLACE view date_range_line as
select 'Last 3 days' as Range, trunc(sysdate-3) as from_date, trunc(sysdate-1) as till_date from dual
Union ALL
select 'Last month' as Range, trunc(add_months(sysdate,-1)) as from_date, trunc(sysdate-1) as till_date from dual
Union ALL
select 'This Year' as Range, TRUNC(TO_DATE(sysdate),'YEAR') as from_date, trunc(sysdate) as till_date from dual
This is easy to update for each change or new requirement, in the future. 
So I created it as a View, imported into repository and joined it to date table. 

"ORCL".""."SH"."DATE_RANGE_LINE"."FROM_DATE" <=  "ORCL".""."SH"."DATE_D"."DATEVALUE"   AND   "ORCL".""."SH"."DATE_RANGE_LINE"."TILL_DATE" >= "ORCL".""."SH"."DATE_D"."DATEVALUE"




Now it's working:
Unfortunately, the reverse option returns wrong results (the last 3 days were not removed):


(Actually it can be done using minus option that is under the union option in the criteria, but that's a bit advanced).

For performance we see the select statement from the log:
WITH 
SAWITH0 AS (select sum(T909.MES) as c1
from 
     DATE_RANGE_LINE T1138,
     DATE_D T895,
     TEST_FACT T909
where  ( T895.DATEKEY = T909.DATEKEY and T1138.RANGE = 'Last 3 days' 
and T895.DATEVALUE between T1138.FROM_DATE and T1138.TILL_DATE ) )
select D1.c1 as c1, D1.c2 as c2 from ( select distinct 0 as c1,
     D1.c1 as c2
from 
     SAWITH0 D1 ) D1 where rownum <= 65001
As expected we have 3 tables joined instead of 2.

Summary:

Good option for everyone, if you don't mind the performance problem and the wrong results of reverse or multiple range selection.
  • Ease of use for User? YES.
  • Requires RPD change for each additional date range?  NO.
  • Requires daily ETL? NO (for the view option).
  • Supports easy reverse selection and combinations (such as: last month and not last 3 days)? NO.
  • Works with VA? YES.
  • Has performance impact? YES.But can be solved with driving table as described bellow, in the update.

For this option I also considered creating a date_range table with a line for each date and range combination and joining it to date dimension as driving table. 
This is ETL intensive option where for example for last 3 days I have 3 rows of data, the range column is "last 3 days" and the date column has each of the 3 days that fit the range. For last month, up to 31 rows... We can see that dates such as '10-Nov-2016' appear both in 'last 3 days' and 'last Month' :

This way the join with date table is simple.



UPDATE 

To solve the performance problem we can use a feature named "driving table" in OBIEE. The idea of driving table described here and here. This way OBIEE runs a separate query for the date_range table and places the result into the date table filter. Since the date ranges are usually small (if you have a year, then you should change MAX_PARAMETERS_PER_DRIVE_JOIN parameter). Unfortunately since driving table makes it own optimization, it didn't act in my tests. 
OBIEE prefers to use driving table only with "cross-database joins ".
If you can't beat them, join them.To overcame the optimizer I duplicated my physical data source (now it's cross-database join):

And used the date_range table from the second source (ORCL#1). Set this date_range as driving table in the join at the business model:
Now the performance is fine!

This is from the log:

select T1287.RANGE as c1,
     T1287.DATES as c2
from 
     DATE_RANGE T1287
where  ( T1287.RANGE = 'last 3 days' ) 
order by c2

]]

select  /*+ no_merge */  sum(T909.MES) as c1,
     T909.DATEKEY as c2,
     T895.DATEVALUE as c3
from 
     TEST_FACT T909,
     DATE_D T895
where  ( T895.DATEKEY = T909.DATEKEY and (T895.DATEVALUE in 
(:PARAM1, :PARAM2, :PARAM3, :PARAM4, :PARAM5, :PARAM6, :PARAM7, :PARAM8, :PARAM9, 
:PARAM10, :PARAM11, :PARAM12, :PARAM13, :PARAM14, :PARAM15, :PARAM16, :PARAM17, 
...
:PARAM100)) ) 
group by T895.DATEVALUE, T909.DATEKEY 
group by T895.DATEVALUE, T909.DATEKEY
order by c3




Having multiple data sources on the same physical table with various "where" conditions in the content as a date dimension and specific column

This is a nice option for performance and demands no ETL.
For each range I duplicate the date data source and add a column that is specific for this data source. Selecting that column or filtering on it, enables the filter.

The example:

In the Business Model, in the date dimension table I duplicate the data source (it is still the same physical date_d table):


Rename the duplicate data surce to "last 3 days" and use a where condition, very similar to the one in the first example of "Reusing a filter" in Content Tab:

 "ORCL".""."SH"."DATE_D"."DATEVALUE"  BETWEEN TIMESTAMPADD(SQL_TSI_DAY, -3,CURRENT_DATE) AND TIMESTAMPADD(SQL_TSI_DAY, -1,CURRENT_DATE)

Now I add a new Logical data column to the Date, rename it

 
Set the column source to be from physical mapping from the "last 3 days" data source. It is critical that no other data source will be mapped to this column.


I enter the data source and Map this column:

We don't need a Physical Table, I just write 'Last 3 Days' (or whatever I want) in the Expression. We do need to mark "Show unmapped columns".

The order of data sources is important. We want the default date to be without constraints and place that data source first (by default OBIEE starts with the first data source and goes top down). If you want extra caution, you can work with data source priority group as well.  
Oracle help: "To assign priority group numbers, rank your logical table sources in numeric order, with 0 being the highest-priority source. You can assign the same number to multiple sources. For example, you can have two logical table sources in priority group 0, two logical table sources in priority group 1, and so on. Often, only two priority groups are necessary (0 and 1)."

Here in a similar way I add "Last Month" data source add it's condition in content tab and add the "Last Month column" for that data source:


I map the column to physical source in "Last Month" data source only. In a similar way you can see "Last 3 days" has no mapping in the "Last Month" data source.

After I complete the set this is what I have:
 
When working with Analysis, not using any of the range column returns the usual data:

In our case all the dates:

Adding the "Last 3 Days" column, activates the filtering. Same would happen if it's added hidden or used as filter.

Attempt of reverse option wouldn't work.
In this case when I attempt to add the filter "Last 3 Days is not equal / is not in Last 3 Days" it returns empty set as expected (once I use the "last 3 days" data set, the "Last 3 Days" column value is always 'Last 3 Days').


Summary:

  • Ease of use for User? YES.
  • Requires RPD change for each additional date range?  YES, but it's rather easy.
  • Requires daily ETL? NO.
  • Supports easy reverse selection and combinations (such as: last month and not last 3 days)? NO.
  • Works with VA? YES.
  • Has performance impact? NO.



Final Summary:

The easiest option for IT is the first, "Reusing a filter". It's also the worst for the User.
The best for the user seems to be the "Adding date range columns to date table" option, it's also the worst for IT.
 You can consider the other options as well.






Monday, October 31, 2016

Data Visualization Desktop (DVD) 12.2.2 - New and Custom (SDK) visualization

I want to talk about the new visualization options in DVD 12.2.2.0 and about adding custom visualizations to DVD. The one thing I want to remind myself is the location:  C:\Users\YOUR_USER_NAME\AppData\Local\DVDesktop where to place the plugins. I will get back to this later.


New visualization types

You can see a good YouTube video here about the new visualization types.

I marked the new visualizations here:


Lets see them.

List and Timeline.
The list lets you get a list of values. In my case I selected Year and Month. and changed the layout in properties to horizontal. After turning on the "Use as filter" option I described here, any click on list values filter the other visualizations, in this case the timeline.
The timeline shows event on a timeline. So only dates and date dimensions can be there at the Category (Time).

For timeline we can control the time axis using properties, but I guess most of us will just use the zoom in / out buttons and drag the line left and right.

Tree and Sankey:
Here is a Tree (left) and Sankey (right) visualization of "Product Category" and "Product Sub Category" with Color by "Profit" and Size by "Sales". Both Color and Size must be measures.



Circular Network, Network and Chord diagram

Here is the relations between "Product Category" and "Customer Segment" based on Color by "Profit" and Size by "Sales".Both Color and Size must be measures. Using Circular Network (left) and Chord diagram (right).

Here is Network visualization of the same:

Parallel Coordinates

Here is Parallel Coordinates of relations between "Product Category" and "Customer Segment" and "Quantity Ordered" with line color by "Profit":




Custom Visualizations


In the new DVD there is an option of custom visualizations. We can create new visualizations using the SDK. Serious people can see the YouTube video here, read the documentation here or see tutorial here. I'm sure I will do it when I grow up (or have to), but at the moment I wanted to use custom visualization someone else prepared.

Like the ones marked here:




You can see a video about available plugins here.
Plugins can be found at the public store.
(Or actually, if you want the full path: 
If you look at the DVD community space here, there is a link from there to samples here. The actual link is at the middle of the screen and points to the public store, here.)
At the moment you can find the following plugin visualizations there:
Now we come to the point, that forced me to write this blog. WHERE to put the zip files after I download them? It took me several annoying minutes to find.
All we have to do is place the zip files, as they are (no need to unzip) at the following location on your PC: C:\Users\YOUR_USER_NAME\AppData\Local\DVDesktop.
Since "YOUR_USER_NAME" is probably not your user name on the computer, you should replace it with the actual one.
Once the zip files are there, restart DVD and the new visualizations are ready to use.


Here are few examples:

Calendar Heatmap (naturally has to do with calendars and dates): Here profit by Order date:


Boxplot, that is good for data scientist to see the statistical distribution of the data:


Circle Pack, in this case Profit with Circles by Product Sub Categories, colored by Customer Segment.


Candle Sticks, Oracle stock behavior, 2008-2012. The bottom graph is a scrolling bar for the 2 upper.  


Sunday, October 30, 2016

Data Visualization Desktop (DVD) 12.2.2 - Master Detail (Use as Filter)

In the blog Oracle Underground BI & Dataviz there is a nice post about Oracle DV - New filtering options in Oracle DVD 12.2.2. I want to talk a little more about the master detail option we can see there.

Lets create a project with the default "Sample order lines".
For example, I'll take "Product Category", "Order Year" and "Profit" and use default visualization (mark the columns with Ctrl and drag them to the right):
the result is:

 Next I'll add 2 more default visualizations of "Product sub category", one with "# of Orders" and the other with "Profit". Then right click and sort each of them:
 The result is:
Did you notice the new filter option of DVD 12.2.2 for each visualization? 
I can drag columns here, but I don't (look Ma, no hands).
What I'm doing instead is going to the top visualization, the one I want to be Master here, and in the menu:
 mark the "Use as Filter" option:

That is all I have to do. Now the top visualization attributes are automatically added to the visualization filter in the 2 other visualizations (after I click something).
We can see the change in the background of the Master visualization and the label:
Each click does the master detail reaction:
Before:

After one click:
 
After marking 2 (I prefer Ctrl Click):
Very nice and easy to use.

You want to stop Master - Detail effect without unchecking the "Use as Filter" option? Click on empty space of the master visualization.



Now if you are a malicious type and delete the filters in one or both of the bottom visualizations, it wouldn't help. They will return after each click in the master visualization. Only turning off the "Use as Filter" option will make them disappear.

What is possible is doing some temporary checking. For example:

I clicked on (category Technology & year 2014)

In the # of orders visualization I can remove the "Product Category" from the filter

The change will only influence the # of orders visualization


Another click on master will revive the filters - I clicked on (category Technology & year 2015)