Products | Versions |
---|---|
Spotfire Data Science | 6.x |
Java heap space error in the hadoop job logs
Execution of 'Logistic Regression' failed. Error details: Map reduce job [AlpineLogi_Distinct_Job] has failed. Please refer to the logs for more information.
and by looking into the hadoop job logs, you find out that there is a java heap error:
...
2015-08-17 13:05:47,710 FATAL [IPC Server handler 0 on 56369] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1439829714853_0018_m_000000_0 - exited : Java heap space
2015-08-17 13:05:47,711 INFO [IPC Server handler 0 on 56369] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1439829714853_0018_m_000000_0: Error: Java heap space
2015-08-17 13:05:47,715 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1439829714853_0018_m_000000_0: Error: Java heap space
2015-08-17 13:05:47,730 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1439829714853_0018_m_000000_0 TaskAttempt Transitioned from RUNNING to FAIL_CONTAINER_CLEANUP
...
then add the following parameters into the data source connection and tune their values appropriately:
mapreduce.map.memory.mb=3192
mapreduce.reduce.memory.mb=3192
mapreduce.map.java.opts=-Xmx2872m
mapreduce.reduce.java.opts=-Xmx2872m
mapreduce.task.io.sort.mb=512
mapred.child.java.opts=-Xmx1024m
Note: In this example the parameters from HDP 2.1 are used, so check in your hadoop distribution's documentation what parameters are relevant for job memory allocation and make sure to use the latest parameter names (not the deprecated ones).
Make sure that the values specified in "mapreduce.map.java.opts" and "mapreduce.reduce.java.opts" are 10% less than the values configured for "mapreduce.map.memory.mb" and "mapreduce.reduce.memory.mb".