Text Miner/Web Crawling error : Failed to index document while pointing it to URLS

Text Miner/Web Crawling error : Failed to index document while pointing it to URLS

book

Article ID: KB0079601

calendar_today

Updated On:

Products Versions
Spotfire Statistica 13.3.0 and later versions

Description

On Statistica 13.3.1, when text miner is pointed to a file with URLs, error : Failed to index document [temp file path] is observed.

   User-added image

This error is as expected because several GPL licensed components were removed as per TIBCO policy.
 

Issue/Introduction

When Statistica Text Miner is pointed to a document with URLS, indexing causes an error : Failed to index document [temp file path]

Resolution

WORKAROUND :
 Workaround involves getting the tools Statistica needs and placing them where Statistica expects them to be.

1. Close all instances of Statitsica.

2.  Download this installer: http://invisible-island.net/datafiles/release/lynx-cs-setup.exe   and install it on the machine

3.  Copy the following files from "C:\Program Files (x86)\Lynx - web browser" to [Statistica Installation Folder]\support\lynx directory

    lynx.exe
    lynx.cfg
    libbz2.dll

4. Create a file called lynx.cmd at the destination folder (along with lynx.exe) with the following content:

 set lynx="%~dp0\lynx.exe"
 set cfg="%~dp0\lynx.cfg"
 %lynx% -dump -cfg %cfg% -nolist -hiddenlinks ignore -display_charset UTF-8 -force_html "%1" > "%2" 
5. The indexing in Statistica Text Miner will now succeed as expected.

Additional Information

Lynx project homepage: https://lynx.browser.org/
Win32 installers: http://invisible-island.net/lynx/