How to identify duplicate rows using custom expressions in TIBCO Spotfire

How to identify duplicate rows using custom expressions in TIBCO Spotfire

book

Article ID: KB0071047

calendar_today

Updated On:

Products Versions
Spotfire Analyst All

Description

Very often it is required to exclude duplicate rows from a calculation in order to get an accurate result. You can identify duplicates within a specified group using a custom expression that uses the Rank() function, to which you also pass the columns you want to use for grouping. The 'duplicate' values can be determined based on any number of specified columns (i.e., if 3 columns are unique, consider this a duplicate). You can use this method to include or exclude duplicates, depending upon your requirements.

Issue/Introduction

Identifying duplicate rows using custom expressions.

Resolution

Use the following custom expression to identify duplicates within your data set:
If(Rank(Baserowid(),"asc",[myGroup1])=1,"First row of group","Duplicate")

If you want to group by multiple columns, then include those columns in the call to the Rank() function. For example:
If(Rank(Baserowid(),"asc",[myGroup1],[myGroup2],[myGroup3])=1,"First row of group","Duplicate")

You can also use the If() and Rank() logic shown above directly within calculations to include or exclude duplicates, as required:

Avg(if(Rank(baserowid(),"asc",[myGroup1])=1,[Value],Null))

Also, starting in Spotfire version 10.1, there is a "Filter Rows" transformation which allows you to remove any duplicates from the data table completely and only retain one row per defined group. This can be accomplished via:
  • Data > Transform data > Transformation: Filter Rows > Insert
Using the same type of expression like:
Rank(baserowid(),"asc",[myGroup1])=1


EXAMPLE
As an example, say you have the data set below and you want to perform a calculation using only a single row per group:

 

 myGroup Value
 Yellow 6
 Yellow 6
 Yellow 6
 Blue 3
 Red 9
 Red 9

 


Using just an "Avg([Value])" custom expression, the Avg() function will average all rows resulting in:
(6+6+6+3+9+9)/6 = 6.5

To find only a single row per group, you can use the above mentioned custom expression:
if(Rank(baserowid(),"asc",[myGroup])=1,"First row of group","Duplicate")

To result in the following data:
 myGroup Value Duplicate?
 Yellow 6 First row of group
 Yellow 6 Duplicate
 Yellow 6 Duplicate
 Blue 3 First row of group
 Red 9 First row of group
 Red 9 Duplicate

 

Using this technique you can perform an average that will only look at a single value per grouping.  For example:
 

 myGroup Value
 Yellow 6
 Blue 3
 Red 9

 

And the average you would like to calculate would be:
(6+3+9)/3 = 6

The following custom expression will then average only those rows that are tagged as the first within the group and will not count subsequent rows within that group:
Avg(if(Rank(baserowid(),"asc",[myGroup])=1,[Value],Null))

This will result in the following calculation, as desired:
(6+3+9)/3 = 6

Additional Information

Doc: Details on Custom Expression Doc: Details on Filter Rows Transformation