Products | Versions |
---|---|
Spotfire Data Science | 6.x |
Configure Spotfire Data Science to work with Data at Rest Encryption (KeyTrustee KMS)
Configure Spotfire Data Science to work with Data at Rest Encryption (KeyTrustee KMS)
The following steps describe how to connect Spotfire Data Science to a Kerberized CDH 5.x cluster that has Data at Rest Encryption enabled where the key management is organized by Key Trustee Server (installed on a separate CDH cluster, which can be managed by the same Cloudera Manager) and Key Trustee KMS (running on the main kerberized cluster and acting as a client).
1. After the standard connection configuration steps listed on this page are done, add the following extra parameters to the Spotfire Data Science data connection (against the kerberized CDH 5.x cluster that has Key Trustee KMS):
hadoop.kms.key.provider.uri | /var/lib/kms-keytrustee |
hadoop.kms.authentication.kerberos.principal | HTTP/_HOST@ALPINENOW.LOCAL |
dfs.encryption.key.provider.uri | kms://http@kms_hostname:16000/kms |
hadoop.security.key.provider.path | kms://http@kms_hostname:16000/kms |
Make sure to replace the 'kms_hostname' with the actual KMS hostname and use 'https' instead of 'http' if TLS/SSL is enabled for Key Trustee KMS. If there are. If there are multiple KMS hosts, use a semicolon separated list of those KMS hostnames instead of 'kms_hostname'. Also, use your own kerberos realm instead of 'ALPINENOW.LOCAL'.
2. Add the following three parameters with '*' values in the Key Trustee KMS configurations (kms-site.xml) and restart the Key Trustee KMS service:
<property>
<name>hadoop.kms.proxyuser.alpine.users</name>
<value>*</value>
</property>
<property>
<name>hadoop.kms.proxyuser.alpine.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.kms.proxyuser.alpine.hosts</name>
<value>*</value>
</property>
Instead of "alpine", use the correct Spotfire Data Science service account which is used to create the kerberos principal and keytab (this account is used for 'alpine.principal' and 'alpine.keytab' parameters in the Spotfire Data Science data connection).
3. Add the Spotfire Data Science service account (in this case 'alpine') and the group to which Spotfire Data Science users belong to (in this case 'engineering') to the Key Trustee KMS service access control list (ACL). The kms-acls.xml file might look like this:
<property><name>hadoop.kms.acl.CREATE</name><value>keyadmin,alpine keyadmingroup,engineering</value><description> ACL for create-key operations. If the user is not in the GET ACL, the key material is not returned as part of the response. </description></property><property><name>hadoop.kms.acl.DELETE</name><value>keyadmin,alpine keyadmingroup,engineering</value><description> ACL for delete-key operations. </description></property><property><name>hadoop.kms.acl.ROLLOVER</name><value>keyadmin,alpine keyadmingroup,engineering</value><description> ACL for rollover-key operations. If the user does is not in the GET ACL, the key material is not returned as part of the response. </description></property><property><name>hadoop.kms.acl.GET</name><value></value><description> ACL for get-key-version and get-current-key operations. </description></property><property><name>hadoop.kms.acl.GET_KEYS</name><value>keyadmin,alpine keyadmingroup,engineering</value><description> ACL for get-keys operations. </description></property><property><name>hadoop.kms.acl.GET_METADATA</name><value>hdfs</value><description> ACL for get-key-metadata and get-keys-metadata operations. </description></property><property><name>hadoop.kms.acl.SET_KEY_MATERIAL</name><value></value><description> Complementary ACL for CREATE and ROLLOVER operations to allow the client to provide the key material when creating or rolling a key. </description></property><property><name>hadoop.kms.acl.GENERATE_EEK</name><value>hdfs</value><description> ACL for generateEncryptedKey CryptoExtension operations. </description></property><property><name>hadoop.kms.acl.DECRYPT_EEK</name><value></value><description> ACL for decryptEncryptedKey CryptoExtension operations. </description></property><property><name>hadoop.kms.blacklist.CREATE</name><value>hdfs</value></property><property><name>hadoop.kms.blacklist.DELETE</name><value>hdfs</value></property><property><name>hadoop.kms.blacklist.ROLLOVER</name><value>hdfs</value></property><property><name>hadoop.kms.blacklist.GET</name><value>*</value></property><property><name>hadoop.kms.blacklist.GET_KEYS</name><value></value></property><property><name>hadoop.kms.blacklist.SET_KEY_MATERIAL</name><value>*</value></property><property><name>hadoop.kms.blacklist.DECRYPT_EEK</name><value>hdfs</value></property><property><name>keytrustee.kms.acl.UNDELETE</name><value></value><description> ACL that grants access to the UNDELETE operation on all keys. Only used by Key Trustee KMS. </description></property><property><name>keytrustee.kms.acl.PURGE</name><value></value><description> ACL that grants access to the PURGE operation on all keys. Only used by Key Trustee KMS. </description></property><property><name>default.key.acl.MANAGEMENT</name><value></value><description> Default ACL that grants access to the MANAGEMENT operation on all keys. </description></property><property><name>default.key.acl.GENERATE_EEK</name><value></value><description> Default ACL that grants access to the GENERATE_EEK operation on all keys. </description></property><property><name>default.key.acl.DECRYPT_EEK</name><value></value><description> Default ACL that grants access to the DECRYPT_EEK operation on all keys. </description></property><property><name>default.key.acl.READ</name><value></value><description> Default ACL that grants access to the READ operation on all keys. </description></property><property><name>default.key.acl.MIGRATE</name><value></value><description> Default ACL that grants access to the MIGRATE operation on all keys. </description></property><property><name>whitelist.key.acl.MANAGEMENT</name><value>keyadmin,alpine keyadmingroup,engineering</value><description> Whitelist ACL for MANAGEMENT operations for all keys. </description></property><property><name>whitelist.key.acl.READ</name><value>hdfs</value><description> Whitelist ACL for READ operations for all keys. </description></property><property><name>whitelist.key.acl.GENERATE_EEK</name><value>hdfs</value><description> Whitelist ACL for GENERATE_EEK operations for all keys. </description></property><property><name>whitelist.key.acl.DECRYPT_EEK</name><value>keyadmin,alpine keyadmingroup,engineering</value><description> Whitelist ACL for DECRYPT_EEK operations for all keys. </description></property>
Note: Please note that in this above configuration the line keyadmin,alpine keyadmingroup,engineering represents users 'keyadmin' and 'alpine' (coma-separated) plus groups 'keyadmingroup' and 'engineering' (coma-separated), which are separated from each other by a space. If you need to make bulk changes to this setup, please make sure to follow the same logic.
4. Cloudera recommends to enable TLS/SSL for all the above mentioned services. Also, you might need to enable TLS/SSL for HttpFS (from the HDFS service configurations).
Note: Currently this setup only works for a couple of hours, and Spotfire Data Science (or the YARN service) needs to be restarted in order to overcome the "kerberos tgt" error.