File Encoding for imported data

File Encoding for imported data

book

Article ID: KB0080524

calendar_today

Updated On:

Products Versions
Spotfire S+ -

Description

When using importData() to import a file which is encoded using Latin1 or UTF-8, some of the characters are instead displayed as /xxx, where "xxx" is a numeric value.

 

Issue/Introduction

File Encoding when importing data

Environment

Product: TIBCO Spotfire S+ Version: All supported versions OS: All supported operating systems --------------------

Resolution

By default, S+ uses the locale used by the C language (7 bit ASCII characters and US style numbers):

> Sys.getlocale()
[1] "C"

To change the locale to what your operating system is using, you will need to run:

Sys.setlocale(locale="")

For example, on my Windows XP machine, this returns:

> Sys.setlocale(locale="")
[1] "English_United States.1252"

At this point you should be able to import your file and the characters should display correctly.

However, if the file encoding is different from the language/character set that the operating system is using, you will first need to change the locale set in your operating system.  For example, a Linux machine returns the following when running "locale" at the Linux command prompt:

[root@localhost ~]# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

To change this to use the character encoding for your location and not use UTF-8, reset the LANG environment variable to use your specific language-country code.  For example to set it to use the character encoding for English in the United States (which is using ISO-8859-1), run:

[root@localhost ~]# export LANG=en_US

Then the "locale" command will not show up as using UTF-8 any longer:

[root@localhost ~]# locale
LANG=en_US
LC_CTYPE="en_US"
LC_NUMERIC="en_US"
LC_TIME="en_US"
LC_COLLATE="en_US"
LC_MONETARY="en_US"
LC_MESSAGES="en_US"
LC_PAPER="en_US"
LC_NAME="en_US"
LC_ADDRESS="en_US"
LC_TELEPHONE="en_US"
LC_MEASUREMENT="en_US"
LC_IDENTIFICATION="en_US"
LC_ALL=

At this point, you can now set Sys.setlocale(locale="") in S+ to use the locale set by the operating system and import the data with the accented letters displaying correctly:

[root@localhost ~]# Splus8.2 -e
TIBCO Software Inc. Confidential Information
Copyright (c) 1988-2010 TIBCO Software Inc. ALL RIGHTS RESERVED.
TIBCO Spotfire S+ Version 8.2.0 for Linux 2.6.9-34.EL, 64-bit : 2010
Working data will be in /root/MySwork
> Sys.getlocale()
[1] "C"
> Sys.setlocale(locale="")
[1] "en_US"



Please note that you may also need to set your console to use the proper Character Encoding as well.  For the above example, my console had Terminal -> Set Character Encoding set to Western (ISO-8859-1) instead of to Current Locale (UTF-8).  This setting also appears to impact how the characters are printed in the Linux Console within S+.