The General information about Predictive Model Markup Language (PMML) can be found here:
https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language Its general structure can be found on the DMG consortium:
http://dmg.org/pmml/v4-2-1/GeneralStructure.html The PMML example script at below is taken from the PMML node output of the Boosted Classification Tree model in the Basic_DM_Example workspace. And TreeModel PMML reference(
http://dmg.org/pmml/v4-2-1/TreeModel.html#xsdElement_TreeModel) is used as the guideline for the explanation.
------------------------------------------------------------------------------------------------------------
<PMML xmlns="http://www.dmg.org/PMML-4_2" version ="4.2">
<Header copyright="STATISTICA Data Miner, Copyright 1984-2018 TIBCO Software Inc. All rights reserved."></Header>
------------------------------------------------------------------------------------------------------------
This top session displays the PMML version information, and the header describes the copy right of application that generates the model, in this case, STATISTICA Data Miner.
------------------------------------------------------------------------------------------------------------
DataDictionary numberOfFields="18">
<DataField name="Credit Rating" optype="categorical" dataType="string">
<Value value="bad"></Value>
<Value value="good"></Value>
</DataField>
<DataField name="Duration of Credit" optype="continuous" dataType="double"></DataField>
<DataField name="Amount of Credit" optype="continuous" dataType="double"></DataField>
<DataField name="Age" optype="continuous" dataType="double"></DataField>
<DataField name="Balance of Current Account" optype="categorical" dataType="string">
<Value value="no running account"></Value>
<Value value="no balance"></Value>
<Value value="<= $300"></Value>
<Value value=">$300"></Value>
......
</DataField>
</DataDictionary>
------------------------------------------------------------------------------------------------------------
The DataDictionary sessions describes fields that are specified by the user to be used in mining models with their respective types and value ranges.
------------------------------------------------------------------------------------------------------------
<MiningSchema>
<MiningField name="Credit Rating" usageType="predicted" />
...
</MiningSchema>
---------------------------------------------
The MiningSchema describes all data entered in a model. When there are multiple model selected, each MiningSchema corresponds to a specific model. In contrast, the DataDictionary contains data definitions parsed, which do not vary by model. It is important to keep in mind that the MiningSchema lists the fields that have to be provided in order to apply the model(i.e. the PMML script). A target variable is identified by its useageType being "predicted"/"target". An independent variable is identified by its usageType being "active".
------------------------------------------------------------------------------------------------------------
<TreeModel modelName="BoostTreeModel" functionName="regression" algorithmName="BoostedTrees" splitCharacteristic="multiSplit">
...
<Node score="-8.52058874993726e-003">
<True></True>
<Node score="-3.55882666891956e-002">
<SimplePredicate field="Most Valuable Assets" operator="equal" value="no assets"></SimplePredicate>
</Node>
<Node score="6.77198232180596e-003">
<CompoundPredicate booleanOperator="or">
<SimplePredicate field="Most Valuable Assets" operator="equal" value="life insurance"></SimplePredicate>
<SimplePredicate field="Most Valuable Assets" operator="equal" value="car"></SimplePredicate>
<SimplePredicate field="Most Valuable Assets" operator="equal" value="ownership of house or land"></SimplePredicate>
</CompoundPredicate>
</Node>
</Node>
</TreeModel>
</Segment>
<Segment id="57">
------------------------------------------------------------------------------------------------------------
The TreeModel part describes the definition of a tree model. Under the Node, it defines the splitting rules and splitting predictors.
------------------------------------------------------------------------------------------------------------
<Output>
<OutputField name="TreePredictedValue156" optype="continuous" dataType="double" feature="predictedValue" />
<OutputField name="UpdatedPredictedValue156" optype="continuous" dataType="double" feature="transformedValue">
<Apply function="+">
<FieldRef field="UpdatedPredictedValue155" />
<FieldRef field="TreePredictedValue156" />
</Apply>
</OutputField>
</Output>
------------------------------------------------------------------------------------------------------------
The Output session describes the results that need to be returned from a model.