File Encoding for imported data
book
Article ID: KB0080524
calendar_today
Updated On:
Description
When using importData() to import a file which is encoded using Latin1 or UTF-8, some of the characters are instead displayed as /xxx, where "xxx" is a numeric value.
Issue/Introduction
File Encoding when importing data
Environment
Product: TIBCO Spotfire S+
Version: All supported versions
OS: All supported operating systems
--------------------
Resolution
By default, S+ uses the locale used by the C language (7 bit ASCII characters and US style numbers):
> Sys.getlocale()
[1] "C"
To change the locale to what your operating system is using, you will need to run:
Sys.setlocale(locale="")
For example, on my Windows XP machine, this returns:
> Sys.setlocale(locale="")
[1] "English_United States.1252"
At this point you should be able to import your file and the characters should display correctly.
However, if the file encoding is different from the language/character set that the operating system is using, you will first need to change the locale set in your operating system. For example, a Linux machine returns the following when running "locale" at the Linux command prompt:
[root@localhost ~]# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
To change this to use the character encoding for your location and not use UTF-8, reset the LANG environment variable to use your specific language-country code. For example to set it to use the character encoding for English in the United States (which is using ISO-8859-1), run:
[root@localhost ~]# export LANG=en_US
Then the "locale" command will not show up as using UTF-8 any longer:
[root@localhost ~]# locale
LANG=en_US
LC_CTYPE="en_US"
LC_NUMERIC="en_US"
LC_TIME="en_US"
LC_COLLATE="en_US"
LC_MONETARY="en_US"
LC_MESSAGES="en_US"
LC_PAPER="en_US"
LC_NAME="en_US"
LC_ADDRESS="en_US"
LC_TELEPHONE="en_US"
LC_MEASUREMENT="en_US"
LC_IDENTIFICATION="en_US"
LC_ALL=
At this point, you can now set Sys.setlocale(locale="") in S+ to use the locale set by the operating system and import the data with the accented letters displaying correctly:
[root@localhost ~]# Splus8.2 -e
TIBCO Software Inc. Confidential Information
Copyright (c) 1988-2010 TIBCO Software Inc. ALL RIGHTS RESERVED.
TIBCO Spotfire S+ Version 8.2.0 for Linux 2.6.9-34.EL, 64-bit : 2010
Working data will be in /root/MySwork
> Sys.getlocale()
[1] "C"
> Sys.setlocale(locale="")
[1] "en_US"
Please note that you may also need to set your console to use the proper Character Encoding as well. For the above example, my console had Terminal -> Set Character Encoding set to Western (ISO-8859-1) instead of to Current Locale (UTF-8). This setting also appears to impact how the characters are printed in the Linux Console within S+.
Feedback
thumb_up
Yes
thumb_down
No