=<(x^-)(x^-)’>+(x-)(x-)’

where the last term can be defined as a “bias matrix’ ?

thanks!

]]>http://www2.sas.com/proceedings/forum2008/360-2008.pdf ]]>

Show a training set for Logistic regression where

a. convergence to global maximum occurs …

b. no convergence occurs…

c. weights tends to inf

d. weights tends to – inf ….

This is the same for either linear or non-linear SVM, only that for non-linear SVM the primal formulation is very rarely used.

]]>For a linear SVM, the weight vector w corresponding to the separating hyperplane is computed from the training data, but after that the training data is no longer needed. Classifying a point x can be done using the inner product of w and x.

For non-linear SVM, you are correct that the training data is needed in order to classify a point, but only the support vectors are needed. One hopes that the number of support vectors is small compared to the size of the entire training set. The reason the support vectors are needed is because the classification is being done with respect to a hyperplane in a higher dimensional space, and the weight vector for that hyperplane does not in general correspond to a single vector in the original space.

]]>