Products | Versions |
---|---|
Spotfire Data Science | All supported versions |
Here is the list of additional parameters that need to configured when you add a data source to TIBCO Spotfire Data Science.
Kerberos Related :alpine.principal=alpine/chorus.alpinenow.local@ALPINENOW.LOCAL
alpine.keytab=/home/chorus/keytab/alpine.keytab
dfs.datanode.kerberos.principal=hdfs/_HOST@TDS.LOCAL
dfs.namenode.kerberos.principal=hdfs/_HOST@TDS.LOCAL
yarn.resourcemanager.principal=yarn/_HOST@TDS.LOCAL (The Kerberos principal for the resource manager.)
mapreduce.jobhistory.principal=mapred/_HOST@TDS.LOCAL
Note: _HOST allows any host to connect using this principle.
Protections:
spark.hadoop.hadoop.rpc.protection=privacy
hadoop.security.authentication=kerberos // Only when you have kerberos
When Data in Transit Encryption is enabled on CDH cluster:
hadoop.rpc.protection=privacy ( authentication is default)
dfs.data.transfer.protection=privacy
Yarn Parameters:
You can get the following parameters from yarn-site.xml on the Hadoop server.yarn.app.mapreduce.am.staging-dir=/tmp
yarn.resourcemanager.admin.address=cdh516dare.tds.local:8033 (The address of the Resource manager admin interface.)
yarn.resourcemanager.resource-tracker.address=cdh516dare.tds.local:8031 (The address of the Resource tracker admin interface.)
yarn.resourcemanager.scheduler.address=cdh516dare.tds.local:8030 (The address of scheduler interface)
yarn.resourcemanager.webapp.address=cdh516dare.tds.local:8088 ( HTTP Address of Resource manager)
yarn.resourcemanager.webapp.https.address=cdh516dare.tds.local:8090 ( HTTPS Address of Resource manager)
yarn.application.classpath= ( We get this value when you run the command yarn classpath on CDH server’s command line)
High availability:
You can get the following parameters from Hdfs-site.xml on the Hadoop server. We are just giving the name to the service
dfs .nameservices=nameservice1
dfs.ha.namenodes.[nameservice ID] - unique identifiers for each NameNode in the nameservice. We need to configure a list of comma-separated NameNode IDs. This will be used by DataNodes to determine all the NameNodes in the cluster. For example, if you use mycluster as the NameService ID previously, and you wanted to use nn1 and nn2 as the individual IDs of the NameNodes, you would configure this as follows:
dfs.ha.namenodes.nameservice1=namenode64,namenode72
For communication between nodes, we use RPC protocol and the following parameters establishes that communication.
For both of the previously-configured NameNode IDs, set the full address and RPC port of the NameNode process.dfs.namenode.rpc-address.nameservice1.namenode64=nn1.alpinenow.local:8020
dfs.namenode.rpc-address.nameservice1.namenode72=nn2.alpinenow.local:8020
dfs.client.failover.proxy.provider.[nameservice ID] - the Java class that HDFS clients use to contact the Active NameNode. Configure the name of the Java class which the DFS client will use to determine which NameNode is the current active, and therefore which NameNode is currently serving client requests. The only implementation which currently ships with Hadoop is the ConfiguredFailoverProxyProvider, so use this unless you are using a custom one.
dfs.client.failover.proxy.provider.nameservice1=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.namenode.http-address.[nameservice ID].[name node ID] - the fully-qualified HTTP address for each NameNode to listen on. Similarly to rpc-address above, set the addresses for both NameNodes' HTTP servers to listen on.
dfs.namenode.http-address.nameservice1.namenode64=nn1.alpinenow.local:50070
dfs.namenode.http-address.nameservice1.namenode72=nn2.alpinenow.local:50070
If you have HTTPS enabled on CDH
dfs.namenode.https-address.nameservice1.namenode64=nn1.alpinenow.local:50470
dfs.namenode.https-address.nameservice1.namenode72=nn2.alpinenow.local:50470
dfs.namenode.servicerpc-address.nameservice1.namenode64=nn1.alpinenow.local:8022
dfs.namenode.servicerpc-address.nameservice1.namenode72=nn2.alpinenow.local:8022
dfs.ha.automatic-failover.enabled.nameservice1=true
If Resource Manager is configured for High Availability:
You can get the following parameters from yarn-site.xml on the Hadoop server.
yarn.resourcemanager.ha.rm-ids=rm60,rm70
yarn.resourcemanager.webapp.https.address.rm70=nn2.alpinenow.local:8090
yarn.resourcemanager.webapp.address.rm70=nn2.alpinenow.local:8088
yarn.resourcemanager.admin.address.rm70=nn2.alpinenow.local:8033
yarn.resourcemanager.resource-tracker.address.rm70=nn2.alpinenow.local:8031
yarn.resourcemanager.scheduler.address.rm70=nn2.alpinenow.local:8030
yarn.resourcemanager.address.rm70=nn2.alpinenow.local:8032
yarn.resourcemanager.webapp.https.address.rm60=nn1.alpinenow.local:8090
yarn.resourcemanager.webapp.address.rm60=nn1.alpinenow.local:8088
yarn.resourcemanager.admin.address.rm60=nn1.alpinenow.local:8033
yarn.resourcemanager.resource-tracker.address.rm60=nn1.alpinenow.local:8031
yarn.resourcemanager.scheduler.address.rm60=nn1.alpinenow.local:8030
yarn.resourcemanager.address.rm60=nn1.alpinenow.local:8032
yarn.resourcemanager.zk-address=cm.alpinenow.local:2181,nn1.alpinenow.local:2181,nn2.alpinenow.local:2181
yarn.resourcemanager.recovery.enabled=true
yarn.resourcemanager.ha.automatic-failover.embedded=true
yarn.resourcemanager.ha.automatic-failover.enabled=true
yarn.resourcemanager.ha.enabled=true
failover_resource_manager_hosts=cdh516node1.tds.local,cdh516node2.tds.local
Mapreduce:
You can get the following parameters from Mapred-site.xml on the Hadoop server.
mapreduce.job.map.output.collector.class=org.apache.hadoop.mapred.MapTask$MapOutputBuffer (The MapOutputCollector implementation(s) to use. This may be a comma-separated list of class names, in which case the map task will try to initialize each of the collectors in turn. The first to successfully initialize will be used.
mapreduce.job.reduce.shuffle.consumer.plugin.class=org.apache.hadoop.mapreduce.task.reduce.Shuffle. (Name of the class whose instance will be used to send shuffle requests by reducetasks of this job. The class must be an instance of org.apache.hadoop.mapred.ShuffleConsumerPlugin.)
mapreduce.jobhistory.address=cdh6dite.tds.local:10020 (MapReduce JobHistory Server IPC host:port)
mapreduce.jobhistory.webapp.address=cdh6dite.tds.local:19888 (MapReduce JobHistory Server Web UI host:port)
Mapreduce.application.classpath= ( We get this value when you run the command hadoop classpath on CDH server’s command line)
If using Hive:
hive.metastore.client.connect.retry.delay=1
hive.metastore.client.socket.timeout=600
Hive with Kerberos:
hive.hiveserver2.uris=jdbc:hive2://cm.alpinenow.local:10000/default
hive.metastore.kerberos.principal=hive/_HOST@ALPINENOW.LOCAL( realm name)
hive.server2.authentication.kerberos.principal=hive/_HOST@ALPINENOW.LOCAL
Spark History Server:
The following parameters need to be added when you have spark service on the Hadoop cluster.
spark.yarn.historyServer.address=http://172.27.0.3:18088
spark.eventLog.dir=hdfs://172.27.0.3:8020/user/spark/applicationHistory
spark.eventLog.enabled=true