| Products | Versions |
|---|---|
| Spotfire Enterprise Runtime for R | All |
data.frameWhen using data.frame objects:
matrixWhen using matrix objects:
matrix was the fastest method overall for computing mutual information, for both TERR and R.data.tableWhen using data.table objects:
data.table and OpenMP Interaction in TERRThe data.table package includes its own parallel processing capabilities leveraging OpenMP. Since TERR is also configured to use OpenMP, it is suspected that a conflict or interaction between TERR's OpenMP implementation and data.table's internal OpenMP usage might be a contributing factor to TERR's slower performance when specifically using data.table objects.
parallel_TERR_vs_R.html (Contains the detailed analysis of the results)time_all.rds (R object containing the raw timing data)All
The findings from this comparison suggest the following strategies for achieving optimal performance in parallel computations:
When working with tabular data where data.frame semantics are sufficient, TERR may offer a speed advantage.
For matrix-based computations, R generally performs better, and using a matrix data structure is the most efficient approach overall in both TERR and R.
When using data.table objects, R currently demonstrates better performance. Consider this if data.table specific features and performance are critical.
Regardless of the environment (TERR or R) or data structure, prefer a single parallel loop with a standard nested loop over nested parallel loops for the type of computations tested, as it yields better performance.
To reduce computation time, leverage multiple CPUs for parallel tasks whenever possible.
This article summarizes a performance comparison between Spotfire Enterprise Runtime for R (TERR) and open-source R, focusing on parallel computations with varying numbers of CPUs and different data object types. The analysis involved timing computations for mutual information using both nested parallel loops and single parallel loops with a standard nested loop.