- Oral presentation
- Open Access
A new approach to kernel based data analysis algorithms
Chemistry Central Journalvolume 3, Article number: O6 (2009)
Kernel based methods (KBMs) [1, 2] are arguably the best data analysis technique currently available [3, 4]. Unlike Neural Networks in which, besides a global minimum, several local minima exist, a Kernel based fitting/classifying problem is a convex optimization problem with a single minimum. However, finding this minimum (and in doing so yielding optimal parameters of a given observational model) in practice requires the manipulation, such as inversion, of large matrices. This has been challenging even when the number of data points is just over a few thousands .
The well established direct methods for updating, or inverting huge matrices fail due to the expense of a large increase in core-memory storage and CPU-time, even for moderately-sized systems. The root of the problem is that direct methods have O(N2) core memory storage requirements and the CPU-time scales as O(N3), where N is the dimension of the matrix (the number of data points, here). Despite the advances in computer power, "conventional" computers can only solve relatively small problems (N ≈ 104 to 105).
Another outstanding drawback of the KBMs is how to choose the appropriate kernel function for a given data set .
In this paper we would like to propose a computationally efficient training scheme for KBMs for obtaining the global minimum. We also present a systematic approach to selecting the appropriate kernel functions. Some preliminary results on chemical data sets will be illustrated.
Nadaraya EA: Theory Prob Appl. 1964, 10: 186-10.1137/1110024.
Watson GS: Sankhya Ser A. 1964, 26: 359-
Vapnik V: The Nature of Statistical Learning Theory. 1995, Springer-Verlag, New York
Shawe-Taylor J, Cristianini N: Kernel Methods for Pattern Analysis. 2004, Cambridge University Press
Chua KS: Pattern Recognition Letters. 2003, 24: 75-10.1016/S0167-8655(02)00190-3.
Mangasarian OL, Musicant DR: J Mach Learn Res. 2001, 1: 161-10.1162/15324430152748218.