Products | Versions |
---|---|
Spotfire Data Science | 6.2.2 |
Configure Hive HDFS Permissions
Create a /user/chorus
directory with the owner:group as chorus:supergroup
.
hdfs dfs -mkdir -p /user/chorus
This directory will be used to cache the uploaded JAR files such as spark-assembly.jar
.
The /user/chorus
directory should have read, write, and execute permissions set for the chorus user.
hdfs dfs -chown chorus:supergroup /user/chorus
hdfs dfs -chmod 777 /user/chorus
The staging directory is typically set as /user
. If not, please create a directory using the modified /<stagingdirectory>/chorus
.In order to run Pig jobs, the Spotfire Data Science application attempts to create a folder called /user/<username>
as the AD user. By default, the permissions are set to hdfs:supergroup:drwxr-xr-x
which prevents Spotfire Data Science from creating that folder. Change the permissions to grant write access to that folder to the AD users running the Spotfire Data Science application. (Use drwxrwxr-x
or drwxrwxrwx
).
In order to run YARN, Pig, and similar jobs, each individual user may need to write temporary files to the temporary directories.
There are many Hadoop temp directories such as hadoop.tmp.dir
, pig.tmp.dir
, etc. By default, all of them are based off the /tmp
directory.
Therefore, the /tmp
directory must be writeable to everyone in order to let everyone run different jobs.
Additionally, the /tmp
directory but be executable by everyone in order to let everyone recurse the directory tree.
Set the /tmp
permissions by using the following command:
hdfs dfs -chmod 777 /tmp
Spotfire Data Science uses several temporary directories on HDFS. These directories and files are created with hdfs
. yarn
, mapred
, and other users.
The temporary directories must be made accessible to the user alpine
and other relevant useres at the base level.
Note: Only individual directories for the specified user can be viewed by that user.
These directories are:
@default_tmpdir/tsds
_out/<user_name>/<workflow_name>/
@default_tmpdir/tsds
_runtime/<user_name>/<workflow_name>/
@default_tmpdir/tsds
_model/<user_name>/<workflow_name>/
The /tmp
directory should be readable and writable.
The /tmp/hadoop-yarn
directory should be readable and writable for Spark jobs.
Create the Spotfire Data Science folders and assign permissions to them to avoid permission failures.
hdfs dfs -mkdir /tmp/tsds_out /tmp/tsds_runtime /tmp/tsds_model
hdfs dfs -chown chorus /tmp/tsds_out /tmp/tsds_runtime /tmp/tsds_model
hdfs dfs -chmod 777 /tmp/alpine_out /tmp/tsds_runtime /tmp/tsds_model
In order to run Hive operators and jobs, we need to set up an Access Control List (ACL) for the Hive user.
The Hive user should have read, write, and execute access to /tmp
and all Spotfire Data Science folders.
hdfs dfs -setfacl -m default:user:hive:rwx /tmp
hdfs dfs -setfacl -m user:hive:rwx /tmp
hdfs dfs -setfacl -R -m default:user:hive:rwx /tmp/alpine_*
hdfs dfs -setfacl -R -m user:hive:rwx /tmp
If you're upgrading Spotfire Data Science from a previous version to 6.2 or later, you'll need to perform these actions as well:
Change /tmp/alpine_*
directories to have full permissions so that everyone can read, write, and execute.
hdfs dfs -chmod -R 777 /tmp/alpine_out /tmp/alpine_runtime /tmp/alpine_model
hdfs dfs -setfacl -R -m default:user:hive:rwx /tmp
hdfs dfs -setfacl -R -m user:hive:rwx /tmp
With the following settings, users can customize their permissions for the Spotfire Data Science user folders, workflow folders, operator folders, and output files.
There are three configuration options you can set in alpine.conf
.
alpine.hdfs.userDirPerms
– sets permissions for the user folders @default_tempdir/alpine_*/<user>
alpine.hdfs.dirPerms
– sets permissions for the workflow folders and the operator folders in @default_tempdir/alpine_*/<user>
alpine.hdfs.filePerms
– sets permissions for Spotfire Data Science output files. Each of these needs to be set with a 10 character long permission string. Here are the default settings:
alpine.hdfs.userDirPerms = "-rwxrwxrwx"
alpine.hdfs.dirPerms = "-rwxrwxrwx”
alpine.hdfs.filePerms = "-rwxr-x---"
@default_tmpdir
?@default_tmpdir
is set to /tmp initially. You can change this for individual workflows using Workflow Variables or for all newly created workflows using Work Flow Preferences.
@default_tmpdir
? Spotfire Data Science overwrites @default_tmpdir/alpine_*
files when users re-run workflows.
Spotfire Data Science users can clear selected @default_tmpdir/alpine_out
files using Clear Temporary Data.
Hadoop administrators can safely clear @default_tmpdir/alpine_runtime
from HDFS as this directory is used to store information for which Spotfire Data Science users have chosen the option "Store Results = False".
Please handle @default_tmpdir/alpine_model
with caution, as Spotfire Data Science users may need to export models from this directory.