Why Pivot operator fails with message "The pivot column name has more than 1000 distinct values"?

book

Article ID: KB0073004

calendar_today

Updated On:

Products	Versions
Spotfire Data Science Team Studio	6.5.0, 6.6.0

Description

This article explains why the Pivot operator fails with the message "The pivot column name has more than 1000 distinct values" and how to resolve it.

Environment

Linux.

Resolution

By default, Spark allows pivot column containing less than 1000 distinct values. If the pivot column value exceeds this limit user will get the following error message:

Execution of 'Pivot' failed. Error details: The pivot column name has more than 1000 distinct values, this could indicate an error. If this was intended, set spark.sql.pivotMaxValues to at least the number of distinct values of the pivot column

The user can extend this default pivot column value limit by adding the following additional parameter with desired value:
spark.sql.pivotMaxValues= 3000

To add this additional parameter follow the steps given below:
1. Right-click on the Pivot operator.
2. Click on Edit Operator Properties.
3. Set Advanced Setting Automatic Optimization to "No".
4. Click on the Edit Setting button.
5. Click on Add Parameter.
6. Now add the parameter "spark.sql.pivotMaxValues" and set it to the desired value

User-added image

7. Click on Ok and step run the Pivot operator.

Issue/Introduction

This article explains why the Pivot operator fails with the message "The pivot column name has more than 1000 distinct values" and how to resolve it.

Feedback

thumb_up Yes

thumb_down No

Welcome to "KB Articles"