Products | Versions |
---|---|
Spotfire Data Science | 6.5.0 |
In Team Studio, with Hadoop operators, Hadoop is able to implement several algorithms based on the conditions and configurations of the cluster. If all implemented methods are used in Team Studio, are they able to run on different configurations of a Hadoop cluster?
That is:
1) Can these operators run on single nodes vs. multiple nodes?
2) Could you force the operator to run a specific way (like copy everything on one node and compute here without distribute compute/without possibility to use for computation the power of more nodes in the cluster)?
The user can actually write operators to run on the server where Team Studio is installed (in-memory PCA is an example) but it is not recommended as Team Studio app consumes lot of the resources
In spark you can force all the computation to single node by using a single partition for your dataframe (df.repartition(1))
In MR you can force all the computation to single node by using a single by using single mapper and reducer
If you are talking our operators that ship with the product, no we DO NOT artificially restrict anything to run in single node but the code is structured in a way that it will run in single node if it has to.