book
Article ID: KB0076099
calendar_today
Updated On:
Description
The TERR operator only returns a single "result" field. The TERR script has multiple outputs. Is there a way to return multiple results from the TERR operator into StreamBase?
Issue/Introduction
The result may contain a data.frame or lists of various types.
Resolution
TERR supports both "data.frame" and "list" types which can be used to return multiple scalars (single values) and vectors (arrays).
For example:
result<-data.frame(...,stringsAsFactors=FALSE)
or
result<-list(...,stringsAsFactors=FALSE)
This matches the top-level fields of the terrResult field output from the TERR operator:
terrResult (
double (names list(string), values list(double)),
integer (names list(string), values list(int)),
boolean (names list(string), values list(boolean)),
string (names list(string), values list(string)),
dataFrame (
names list(string),
integers list((names list(string), values list(int))),
doubles list((names list(string), values list(double))),
logicals list((names list(string), values list(boolean))),
factors list((names list(string), indexes list(int), levels list(string))),
strings list((names list(string), values list(string))),
bytes list((names list(string), values list(int)))),
byte (names list(string), values list(int)),
list (
names list(string),
integers list((names list(string), values list(int))),
doubles list((names list(string), values list(double))),
logicals list((names list(string), values list(boolean))),
factors list((names list(string), indexes list(int), levels list(string))),
strings list((names list(string), values list(string))),
bytes list((names list(string), values list(int)))),
factor (names list(string), indexes list(int), levels list(string))
)
The TERR value may be more complex than the terrResult schema supported by StreamBase. For example, the statement:
result<-list(fit$q,fit$sigma,fit$Yx,fit$d,fit$seq,plist)
may emit with a "null" result for fit$Yx instead of a list of doubles if Yx is not a simple list of doubles, but a list of sub-lists of doubles: Y10, Y50, and Y90.
Instead express the sub-lists individually like so:
tempYx<-fit$Yx
result<-list(fit$q,fit$sigma,tempYx$Y10,tempYx$Y50,tempYx$Y90,fit$d,fit$seq,plist)
Another way of looking at why dereferencing complex variables is needed is if the schema of:
list(fit$Yx)
is effectively:
A. terrResult:
list: <- extra level
names list(string), ["Yx"]
list:
names list(string), ["Y10","Y50","Y90"]
doubles:
names list(string), [null]
values list(doubles)
...which cannot fit into:
B. terrResult:
list:
names list(string),
integers, doubles, logicals, factors, strings, bytes:
names list(string),
values list(type)
To return the result as a data.frame use:
result<-data.frame(fit$q,fit$sigma,tempYx$Y10,tempYx$Y50,tempYx$Y90,fit$d,fit$seq,plist)
but a data.frame requires the result set to be rectangular, every list must have the same number of elements. This allows the individual lists to not have to have their own "names" list. If the data.frame would not be rectangular, TERR reports this error through the operator (example):
2016-08-03 13:30:26.291-0400 [Thread- ThreadPool - 9] INFO - TERR engine error com.streambase.sb.StreamBaseException: TERR: execution error Error in data.frame(...) : arguments imply differing number of rows: 200, 200, 5000, ...
To determine the maximum length of all lists to be combined into a data.frame, use:
maxLength<-max(length(fit$q),length(fit$sigma),length(tempYx$Y10),length(tempYx$Y50),length(tempYx$Y90),length(fit$d),length(fit$seq),length(plist))
and then fill in 'NA' values for the missing elements, like this:
outFitQ<-c(fit$q,rep(NA,maxLength-length(fit$q)))
outFitSigma<-c(fit$sigma,rep(NA,maxLength-length(fit$sigma)))
outY10<-c(tempYx$Y10,rep(NA,maxLength-length(tempYx$Y10)))
outY50<-c(tempYx$Y50,rep(NA,maxLength-length(tempYx$Y50)))
outY90<-c(tempYx$Y90,rep(NA,maxLength-length(tempYx$Y90)))
outFitD<-c(fit$d,rep(NA,maxLength-length(fit$d)))
outFitSeq<-c(fit$seq,rep(NA,maxLength-length(fit$seq)))
outPList<-c(plist,rep(NA,maxLength-length(plist)))
result<-data.frame(fit$q,fit$sigma,tempYx$Y10,tempYx$Y50,tempYx$Y90,fit$d,fit$seq,plist, stringsAsFactors = FALSE)
Also recommended is to use 'stringsAsFactors = FALSE' in the data.frame definition as unpacking factors in SB EventFlow can be complex.
For troubleshooting TERR scripts, in the TERR script write an intermediate result to a CSV file using:
write.csv(fit$Yx, file = "Yxdata.csv")
and then inspect the CSV file to see what data is actually held in the variable.