Operator HDFS Processing Tool Support

Operator HDFS Processing Tool Support

book

Article ID: KB0082646

calendar_today

Updated On:

Products Versions
Spotfire Data Science 6.x

Description

Operator HDFS Processing Tool Support

Issue/Introduction

Operator HDFS Processing Tool Support

Resolution

Operator HDFS Processing Tool Support

The following outlines which Data Processing Tool is supported by each of the various HDFS-supported Operators: 

Note: Operators marked with "n/a" do not use any of the HDFS processing tools. 

Data Load Operators (for HDFS):

 PigMapReduceSparkSqoop
Copy to Database   
(parallel mode)
Copy to Hadoop   X
Hadoop File X  


Explore Operators (for HDFS):

 PigMapReduceSparkSqoop
Bar ChartX   
Box PlotX   
Correlation X  
FrequencyX   
HistogramX   
Scatter Plot MatrixX   
Summary StatisticsX X 
T-Tests  X 
Variable Selection X  


Transform Operators (for HDFS):  

 PigMapReduceSparkSqoop
AggregationX   
Collapse X  
Column FilterX   
Distinct X  
JoinXX  
NormalizationX   
Null Value 
Replacement
X   
Pivot X  
Row FilterX   
Set Operations X  
VariableX   

Sample Operators (for HDFS):  

 PigMapReduceSparkSqoop
Random Sampling X  
Sample Selectorn/an/an/an/a

Model Operators (for HDFS):   

 PigMapReduceSparkSqoop
Alpine Forest XX 
Alpine Forest 
Regression
 XX 
Decision Tree X  

Gradient Boosting Classification

  X 

Gradient Boosting Regression

  X 
K-Means XX 
Linear  Regression XX 
Logistic Regression XX 
Naive Bayes X  
PCA X  
SVM Classification X  
Time Series X  


Predict Operators (for HDFS):

 PigMapReduceSparkSqoop
Classifier X  
Predictor X  
PCA Apply X  
Time Series 
Predictor
n/an/an/an/a 


Model Validate Operators (for HDFS): 

 PigMapReduceSparkSqoop
Alpine Forest  Evaluator X  
Confusion Matrix X  
Goodness of Fit X  
Lift X  
Regression Evaluator  X 
ROC X  

Tool Operators (for HDFS): 

 PigMapReduceSparkSqoop
Export Operatorn/an/an/an/a
Flow ControlX   
Noten/an/an/an/a
Pig ExecuteX   
R Executen/an/an/an/a
Sub-Flow*    

*Note: For the Flow Control and Sub-Flow Operators, the HDFS Data 
Processing
 tool used depends on the customer specific implementation.