How to define reg_exp and options in RE_REPLACE,RE_MATCH,RE

Products	Versions
Spotfire Statistica	10 and later versions

Description

Statistica has 3 text functions in the Function browser named : RE_REPLACE,RE_MATCH and RE_SEARCH. This article details the input parameters format for reg_exp,replace string and options. In general, the format acceptable is PERL format for these parameters.

Resolution

A few examples of the functions before discussing the parameters specifically :

A valid RE_REPLACE can be :
=RE_REPLACE(V1, "([0-9]*)", "", "g")
Replaces all numbers in Variable 1 with nothing. The "g" option means global replace so it replaces not just the first occurrence of a number but all of them. This one replaces numbers with nothing

=RE_REPLACE(V1, "([^0-9]*)", " ", "g")
reg_exp [^0-9] ^ inverts it and makes it match characters NOT in the range 0-9. So this one replaces all non numbers with a space, thus stripping out the non-numeric characters and leaving only the numbers

=RE_REPLACE("hello 123 there", ".*?([0-9]+).*", "$1", "")
extracts and returns 123
in this regex, the [.*?] at the beginning means match 0 or more characters, with the ? meaning minimal match; i.e., match the smallest possible thing before a number. ([0-9]+) matches one or more digits and puts it in match group $1. Then .* at the end matches everything after the number. The replace string is $1, the number.

=RE_REPLACE(V1, "([0-9]*)", "$1", "c")
Can be used for extracting the number out of a alphanumeric variable
() in the regex create groups you can refer to by number, $1. Here ([0-9]*) matches any number of consecutive digits and assigns it to match group 1, and we replace the whole string with $1, the desired output. The last parameter is for options - "c" means don't copy to the output portions of the string which did not match, so we get just the part that did match (namely the number)

=RE_REPLACE(V1, "([0-9]*)", "$`$'", "c")
To extract all special characters from a variable that has special characters and numbers.$` (backtick) is everything before the match. $' is everything after the match.The number is the match so it effectively simply removes that part

reg_ex/replace string :
Regular expressions are a wonderful kind of black magic, but no single example is going to explain them fully.You may define either a variable here or regular expression patterns formed with meta characters and repetition definitions used in PERL :

Metacharacters

char	meaning
`^`	beginning of string
`$`	end of string
`.`	any character except newline
`*`	match 0 or more times
`+`	match 1 or more times
`?`	match 0 or 1 times; or: shortest match
`\|`	alternative
`( )`	grouping; “storing”
`[ ]`	set of characters
`{ }`	repetition modifier
`\`	quote or special

Repetition

`a*`	zero or more `a`’s
`a+`	one or more `a`’s
`a?`	zero or one `a`’s (i.e., optional `a`)
`a{m}`	exactly `m` `a`’s
`a{m,}`	at least `m` `a`’s
`a{m,n}`	at least `m` but at most `n` `a`’s

Some examples of patterns that can be formed are :

expression	matches...
`abc`	`abc` (that exact character sequence, but anywhere in the string)
`^abc`	`abc` at the beginning of the string
`abc$`	`abc` at the end of the string
`a\|b`	either of `a` and `b`
`^abc\|abc$`	the string `abc` at the beginning or at the end of the string
`ab{2,4}c`	an `a` followed by two, three or four `b`’s followed by a `c`
`ab{2,}c`	an `a` followed by at least two `b`’s followed by a `c`
`ab*c`	an `a` followed by any number (zero or more) of `b`’s followed by a `c`
`ab+c`	an `a` followed by one or more `b`’s followed by a `c`
`ab?c`	an `a` followed by an optional `b` followed by a `c`; that is, either `abc` or `ac`
`a.c`	an `a` followed by any single character (not newline) followed by a `c`
`a\.c`	`a.c` exactly
`[abc]`	any one of `a`, `b` and `c`
`[Aa]bc`	either of `Abc` and `abc`
`[abc]+`	any (nonempty) string of `a`’s, `b`’s and `c’s` (such as `a`, `abba`, `acbabcacaa`)
`[^abc]+`	any (nonempty) string which does not contain any of `a`, `b` and `c` (such as `defg`)
`\d\d`	any two decimal digits, such as `42`; same as \d{2}

More examples can be online for Regular expressions in PERL or in Statistica help : Under Working with Spreadsheets >>Using Spreadsheets>>Using Formulas :
Spreadsheet Formulas - Regular Expression Functions
Regular Expression Overview
Spreadsheet Formulas - Regular Expression Function Examples

options:

The following options are available for RE_SEARCH, RE_MATCH, and RE_REPLACE:

b: Use POSIX basic grammar rather than PERL-like grammar. Similar to sed and grep.

x: Use POSIX extended grammar (similar to egrep).
Note : b and x are mutually exclusive; x overrides b.

i: Ignore case.

e: Allow the pattern to match an empty string.

s: Allow '.' to match newlines (similar to PERL's /s).

m: Allow the use of ^, $ as anchors to match start and end of a line, respectively (similar to PERL's /m).

g: Replace globally (similar to PERL's /g). In a RE_REPLACE, replaces all matches not just the first.

c: In a RE_REPLACE, do not copy to the output string portions of the input that did not match the regular expression.

Issue/Introduction

RE_REPLACE,RE_MATCH,RE_SEARCH accept reg_exp and options as parameters. What format should be used for defining these parameters ?

Additional Information

https://www.boost.org/doc/libs/1_50_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

Welcome to "KB Articles"