How to define reg_exp and options in RE_REPLACE,RE_MATCH,RE_SEARCH in Statistica
book
Article ID: KB0081852
calendar_today
Updated On:
Products
Versions
Spotfire Statistica
10 and later versions
Description
Statistica has 3 text functions in the Function browser named : RE_REPLACE,RE_MATCH and RE_SEARCH. This article details the input parameters format for reg_exp,replace string and options. In general, the format acceptable is PERL format for these parameters.
Resolution
A few examples of the functions before discussing the parameters specifically :
A valid RE_REPLACE can be : =RE_REPLACE(V1, "([0-9]*)", "", "g") Replaces all numbers in Variable 1 with nothing. The "g" option means global replace so it replaces not just the first occurrence of a number but all of them. This one replaces numbers with nothing
=RE_REPLACE(V1, "([^0-9]*)", " ", "g") reg_exp [^0-9] ^ inverts it and makes it match characters NOT in the range 0-9. So this one replaces all non numbers with a space, thus stripping out the non-numeric characters and leaving only the numbers
=RE_REPLACE("hello 123 there", ".*?([0-9]+).*", "$1", "") extracts and returns 123 in this regex, the [.*?] at the beginning means match 0 or more characters, with the ? meaning minimal match; i.e., match the smallest possible thing before a number. ([0-9]+) matches one or more digits and puts it in match group $1. Then .* at the end matches everything after the number. The replace string is $1, the number.
=RE_REPLACE(V1, "([0-9]*)", "$1", "c") Can be used for extracting the number out of a alphanumeric variable () in the regex create groups you can refer to by number, $1. Here ([0-9]*) matches any number of consecutive digits and assigns it to match group 1, and we replace the whole string with $1, the desired output. The last parameter is for options - "c" means don't copy to the output portions of the string which did not match, so we get just the part that did match (namely the number)
=RE_REPLACE(V1, "([0-9]*)", "$`$'", "c") To extract all special characters from a variable that has special characters and numbers.$` (backtick) is everything before the match. $' is everything after the match.The number is the match so it effectively simply removes that part
reg_ex/replace string : Regular expressions are a wonderful kind of black magic, but no single example is going to explain them fully.You may define either a variable here or regular expression patterns formed with meta characters and repetition definitions used in PERL :
Some examples of patterns that can be formed are :
expression
matches...
abc
abc (that exact character sequence, but anywhere in the string)
^abc
abc at the beginning of the string
abc$
abc at the end of the string
a|b
either of a and b
^abc|abc$
the string abc at the beginning or at the end of the string
ab{2,4}c
an a followed by two, three or four b’s followed by a c
ab{2,}c
an a followed by at least two b’s followed by a c
ab*c
an a followed by any number (zero or more) of b’s followed by a c
ab+c
an a followed by one or more b’s followed by a c
ab?c
an a followed by an optional b followed by a c; that is, either abc or ac
a.c
an a followed by any single character (not newline) followed by a c
a\.c
a.c exactly
[abc]
any one of a, b and c
[Aa]bc
either of Abc and abc
[abc]+
any (nonempty) string of a’s, b’s and c’s (such as a, abba, acbabcacaa)
[^abc]+
any (nonempty) string which does not contain any of a, b and c (such as defg)
\d\d
any two decimal digits, such as 42; same as \d{2}
More examples can be online for Regular expressions in PERL or in Statistica help : Under Working with Spreadsheets >>Using Spreadsheets>>Using Formulas : Spreadsheet Formulas - Regular Expression Functions Regular Expression Overview Spreadsheet Formulas - Regular Expression Function Examples
options:
The following options are available for RE_SEARCH, RE_MATCH, and RE_REPLACE:
b: Use POSIX basic grammar rather than PERL-like grammar. Similar to sed and grep.
x: Use POSIX extended grammar (similar to egrep). Note : b and x are mutually exclusive; x overrides b.
i: Ignore case.
e: Allow the pattern to match an empty string.
s: Allow '.' to match newlines (similar to PERL's /s).
m: Allow the use of ^, $ as anchors to match start and end of a line, respectively (similar to PERL's /m).
g: Replace globally (similar to PERL's /g). In a RE_REPLACE, replaces all matches not just the first.
c: In a RE_REPLACE, do not copy to the output string portions of the input that did not match the regular expression.
Issue/Introduction
RE_REPLACE,RE_MATCH,RE_SEARCH accept reg_exp and options as parameters. What format should be used for defining these parameters ?