Databases - Connecting Spotfire Data Science to Impala via JDBC

Products	Versions
Spotfire Data Science	6.x

Description

Connecting Spotfire Data Science to Impala via JDBC

Resolution

Connecting Spotfire Data Science to Impala via JDBC

Follow the steps below to connect Spotfire Data Science to Impala. In this example we're using Apache Impala 2.2 with JDBC API Version 4.1

1. Copy the Impala driver (made up of several JAR files) to the $CHORUS_HOME/shared/ALPINE_DATA_REPOSITORY/jdbc_driver/Public and $CHORUS_HOME/shared/libraries directories and change the ownership of these copies to the user who runs Spotfire Data Science (usually user 'chorus'). Grab the JAR files from Cloudera by selecting the right version: http://www.cloudera.com/downloads/connectors/impala/jdbc/2-5-5.html

2. Create a new Impala directory: $CHORUS_HOME/shared/ALPINE_DATA_REPOSITORY/jdbc/impala and copy the driver.properties file from $CHORUS_HOME/shared/ALPINE_DATA_REPOSITORY/jdbc/default directory to the newly created impala directory.

3. Edit the content of $CHORUS_HOME/shared/ALPINE_DATA_REPOSITORY/jdbc/impala/driver.properties file:

# Specify the JDBC class driver for the desired database type.
# Examples:
# Oracle = oracle.jdbc.driver.OracleDriver
# Greenplum = org.postgresql.Driver
# DB2 = com.ibm.db2.jcc.DB2Driver
# Netezza = org.netezza.Driver
# PostgreSQL = org.postgresql.Driver
# SQLServer = com.microsoft.sqlserver.jdbc.SQLServerDriver
# MySQL = com.mysql.jdbc.Driver
# Teradata = com.teradata.jdbc.TeraDriver
# Vertica = com.vertica.jdbc.Driver
# Sybase = com.sybase.jdbc2.jdbc.SybDriver
# Informix = com.informix.jdbc.IfxDriver
# SAPDB = com.sap.dbtech.jdbc.DriverSapDB
# InterBase = interbase.interclient.Driver
# HSqlDB = org.hsqldb.jdbcDriver
# MariaDB = org.mariadb.jdbc.Driver
# MySQL = com.mysql.jdbc.Driver
# Make sure to use your specific JDBC API Version
driverClass = com.cloudera.impala.jdbc41.Driver
# Add this so that double quotes can be used
identifierQuotation=

4. Edit the content of additional_jdbc_drivers.rb file (with a path similar to this one: /usr/local/chorus/releases/5.9.1.0.3973-5d95f7c97/components/core/app/mixins/sequel/extensions/additional_jdbc_drivers.rb) and add a line for the impala class so that the content looks similar to this:

module Sequel
  module AdditionalJdbcDrivers
    MAP = {
        mariadb: ->(db) { org.mariadb.jdbc.Driver },
        teradata: ->(db) { com.teradata.jdbc.TeraDriver },
        vertica: ->(db) { com.vertica.jdbc.Driver },
        hive2: ->(db) { org.apache.hive.jdbc.HiveDriver },
        hive: ->(db) { org.apache.hadoop.hive.jdbc.HiveDriver },
        impala: ->(db) { com.cloudera.impala.jdbc41.Driver }
        }

    MAP.each do |key, driver|
      ::Sequel::JDBC::DATABASE_SETUP[key] = driver
    end
  end
end

Note: The change in additional_jdbc_drivers.rb file needs to be applied again after upgrading Spotfire Data Science.

5. Restart Spotfire Data Science and set the Data Connection using a similar URL (you can copy your Impala connection URL):

jdbc:impala://myServer:21050

6. Make sure that your MEM_LIMIT setting in your Impala configuration has adequate memory.  For more information, see: http://www.cloudera.com/documentation/enterprise/5-5-x/topics/impala_config_options.html#config_options

Issue/Introduction

Databases - Connecting Spotfire Data Science to Impala via JDBC

Attachments

Databases - Connecting Spotfire Data Science to Impala via JDBC get_app

Welcome to "KB Articles"