Products | Versions |
---|---|
Spotfire Statistica | 12.7 and higher |
The Apache Hive provides a data warehouse layer on top of Hadoop Distributed File System (HDFS). It is designed to enforce and project structure onto the data and provides SQL-like language for querying. Documentation and developers community are available through the official website here
This document is based on Cloudera distribution of HDFS and its ecosystem (v4.3). It is expected, however, that the general workflow is applicable to other distributions.
Working with Hive:
Creating Connection to Hive from Statistica
Configuration of the Hive connection includes 1.configure a system DSN connection 2. create a new database connection. Below are more details:
Step 1. Open ODBC Data Source Administrator (run a command odbcad32.exe). Switch to System DSN tab and click on “Add…” button.
Step 2. Select a appropriate driver should be available for selection.
In this example we select “Cloudera ODBC Driver for Apache Hive” and click “Finish”.
Step 3. Define parameters of the connection.
Data Source – is the name of the ODBC connection as it is stored in the system.
Description – is information field.
Host – is the hostname or IP address of the Hive server
Port – specific port used by Hive. Typically, it is 10000, which is default setting.
Database – default database selection.
Hive Server Type – “Hive Server 2” should be selected for the ODBC connections.
Provide necessary Authentication information following your Hive configuration.
Detailed instructions are available e.g. here
Once the settings are defined, click the “Test” button to check connection.
Once you click OK on the “Cloudera ODBC Driver for Apache Hive DSN setup” a new connection becomes available under System Data Sources.
Step 4. Open Statistica and select “Create Query”
On the next window, select “New…” to create a new connection. On the “Data Link Properties” select Microsoft OLE DB Provider for ODBC Drivers, click next and select a data source from a drop down menu.
Click Test Connection to check that everything is properly configured.
On the next screen specify the name of this connection.
Now everything is ready to creating HiveQL queries and extracting data to Statistica.
Click “Return data to Statistica” button (green arrow) to get data from Hive imported into Statistica.
Note: to create a database connection in Statistica enterprise, one needs to complete similar steps (1-4).