How can the user optimize query execution time, having scalar sub-queries while using Cloudera Impala within TIBCO Data Virtualization?

How can the user optimize query execution time, having scalar sub-queries while using Cloudera Impala within TIBCO Data Virtualization?

book

Article ID: KB0070877

calendar_today

Updated On:

Products Versions
TIBCO Data Virtualization -

Description

Whenever there is a scalar sub-query implemented in our query, the total query execution takes significantly longer than if the query is directly executed on the Cloudera Impala/Hive database via any 3rd party client tool.
An example of such a subquery is as below:
SELECT * FROM /shared/scalar/test/"default"/tab1
WHERE dt = (SELECT dt2 FROM /shared/scalar/test/"default"/tab2
WHERE dt_key2= '2014-08-20')
In the above example, the 2nd SELECT statement forms a scalar sub-query.

If we check the query execution plan, it mentions a No Push Reason as "No Push Reason: Cannot push sub query because data source Impala does not support scalar sub queries". 
And since the query is not pushed onto the database and all the execution takes place inside TIBCO Data Virtualization (TDV) itself, TDV fetches the data twice, once per SELECT statement. This results in high execution time.

Issue/Introduction

This article will explain how the user can directly push the scalar sub-queries onto the Cloudera Impala database.

Environment

All Supported Environments.

Resolution

  • To resolve this and push the query to the database, we can add and make changes to the 'cloudera_impala_<version> _values.xml' which is located at "<TDV_Server_Install_Directory>\conf\adapters\system".
           The below lines need to be added to the file:
<common:attribute xmlns:common="http://www.compositesw.com/services/system/util/common">
    <common:name>/system/ds.scalar_subqueries</common:name>
    <common:type>BOOLEAN</common:type>
    <common:value>true</common:value>
    <common:configID>jdbcds.scalar_subqueries</common:configID>
</common:attribute>
  • After adding these lines and saving the file, a TDV Server restart needs to be performed to apply the changes.
  • Now the query is being pushed onto the Cloudera Impala database.