How to Calculate the Top Contributing Groups to the Top N% of Aggregated Results

How to Calculate the Top Contributing Groups to the Top N% of Aggregated Results

book

Article ID: KB0080802

calendar_today

Updated On:

Products Versions
Spotfire Analyst All Versions

Description

It is sometimes needed to calculate the groups who contribute most to a total, and specifically those groups whose aggregated results make up the top % of the total. Assuming a data set like the the BASEBALL demo data set, for example, you can calculate the % of total with an expression like:
Sum([HOME_RUNS]) OVER ([TEAM]) / Sum([HOME_RUNS])

Which can give you the Team's % of the Total:
TEAM    Team Total    Team's % of Total
N.Y.    322.00    9.16 %
Chi.    251.00    7.14 %
Min.    190.00    5.40 %
Det.    183.00    5.20 %
Tex.    177.00    5.03 %
Bal.    164.00    4.66 %
Cle.    153.00    4.35 %
Sea.    153.00    4.35 %
Tor.    152.00    4.32 %
Oak.    148.00    4.21 %
...

This article explains a method to further calculate which team's make up the Top N% of that total. So top 30% of HOME_RUNS came from N.Y., Chi., Min., Det., Tex (totaling 31.94% of all home runs).

Issue/Introduction

How to calculate the top contributing groups to the top N% of aggregated results

Resolution

In order to calculate which team's make up the Top N% of that total, you will need to:
  • Sort that team's % of total, descending
  • Create a cumulative sum (over that descending order) of all previous team's total %
  • Identify those teams where the cumulative sum is under N%

This can be accomplished with the following calculated columns:
1. Select File > Add Data Tables...
	Type: Information Link
	Path: /Information Models/dbo/BASEBALL
	Id: 49ccc04a-76fe-46f1-9982-5f65c51d5135
	Last reload: 5/31/2017 4:10 PM
2. Insert > Calculated Column...
	Column name: % of Total
	Expression: Sum([HOME_RUNS]) OVER ([TEAM]) / Sum([HOME_RUNS])
3. Insert > Calculated Column...
	Column name: DescendingIndex
	Expression: DenseRank([% of Total],"desc")
4. Insert > Calculated Column...
	Column name: Cumulative Sum
	Expression: Sum(If(Rank(baserowid(),"asc",[TEAM])=1,[% of Total])) OVER (AllPrevious([DescendingIndex]))
5. Insert > Calculated Column...
	Column name: Contributes to Top N %
	Expression: case  
when [Cumulative Sum]<=0.3 then "Top N % Contributor" 
when ([Cumulative Sum]>0.3) and (Min([Cumulative Sum]) OVER (Previous([DescendingIndex]))<0.3) then "Top N % Contributor"
else "Nope"
end



Resulting in:
TEAM    Team Total    Team's % of Total    Cumulative % Sum    Teams Contributing to the Top 30%
N.Y.    16874.53    9.16 %    9.16 %    Top N % Contributor
Chi.    11841.67    7.14 %    16.30 %    Top N % Contributor
Min.    5272.00    5.40 %    21.70 %    Top N % Contributor
Det.    5473.81    5.20 %    26.91 %    Top N % Contributor
Tex.    3423.50    5.03 %    31.94 %    Top N % Contributor
Bal.    6935.00    4.66 %    36.60 %    Nope
Cle.    5805.00    4.35 %    45.31 %    Nope
Sea.    3107.50    4.35 %    45.31 %    Nope
Tor.    6422.50    4.32 %    49.63 %    Nope
Oak.    4315.00    4.21 %    53.84 %    Nope

See example attached 'Top Contributing Groups.dxp' file.

Additional Information

KB: Identifying duplicate rows using custom expressions.

Attachments

How to Calculate the Top Contributing Groups to the Top N% of Aggregated Results get_app