This paper presents a novel approach to generate data-driven regression models that not only give reliable prediction of the observed data but also have smoother response surfaces and extra generalization capabilities with respect to extrapolation. These models are obtained as solutions of a genetic programming (GP) process, where selection is guided by a tradeoff between two competing objectives - numerical accuracy and the order of nonlinearity. The latter is a novel complexity measure that adopts the notion of the minimal degree of the best-fit polynomial, approximating an analytical function with a certain precision. Using nine regression problems, this paper presents and illustrates two different strategies for the use of the order of nonlinearity in symbolic regression via GP. The combination of optimization of the order of nonlinearity together with the numerical accuracy strongly outperforms ldquoconventionalrdquo optimization of a size-related expressional complexity and the accuracy with respect to extrapolative capabilities of solutions on all nine test problems. In addition to exploiting the new complexity measure, this paper also introduces a novel heuristic of alternating several optimization objectives in a 2-D optimization framework. Alternating the objectives at each generation in such a way allows us to exploit the effectiveness of 2-D optimization when more than two objectives are of interest (in this paper, these are accuracy, expressional complexity, and the order of nonlinearity). Results of the experiments on all test problems suggest that alternating the order of nonlinearity of GP individuals with their structural complexity produces solutions that are both compact and have smoother response surfaces, and, hence, contributes to better interpretability and understanding.
|Journal||IEEE Transactions on Evolutionary Computation|
|Publication status||Published - 2009|