Skip to main content
Log in

A tutorial on support vector regression

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Aizerman M.A., Braverman É.M., and Rozonoér L.I. 1964. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control 25: 821-837.

    Google Scholar 

  • Aronszajn N. 1950. Theory of reproducing kernels. Transactions of the American Mathematical Society 68: 337-404.

    Google Scholar 

  • Bazaraa M.S., Sherali H.D., and Shetty C.M. 1993. Nonlinear Programming: Theory and Algorithms, 2nd edition, Wiley.

  • Bellman R.E. 1961. Adaptive Control Processes. Princeton University Press, Princeton, NJ.

    Google Scholar 

  • Bennett K. 1999. Combining support vector and mathematical programming methods for induction. In: Schölkopf B., Burges C.J.C., and Smola A.J., (Eds.), Advances in Kernel Methods-SV Learning, MIT Press, Cambridge, MA, pp. 307-326.

    Google Scholar 

  • Bennett K.P. and Mangasarian O.L. 1992. Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software 1: 23-34.

    Google Scholar 

  • Berg C., Christensen J.P.R., and Ressel P. 1984. Harmonic Analysis on Semigroups. Springer, New York.

    Google Scholar 

  • Bertsekas D.P. 1995. Nonlinear Programming. Athena Scientific, Belmont, MA.

    Google Scholar 

  • Bishop C.M. 1995. Neural Networks for Pattern Recognition. Clarendon Press, Oxford.

    Google Scholar 

  • Blanz V., Schölkopf B., Bülthoff H., Burges C., Vapnik V., and Vetter T. 1996. Comparison of view-based object recognition algorithms using realistic 3D models. In: von der Malsburg C., von Seelen W., Vorbrüggen J.C., and Sendhoff B. (Eds.), Artificial Neural Networks ICANN'96, Berlin. Springer Lecture Notes in Computer Science, Vol. 1112, pp. 251-256.

  • Bochner S. 1959. Lectures on Fourier integral. Princeton Univ. Press, Princeton, New Jersey.

    Google Scholar 

  • Boser B.E., Guyon I.M., and Vapnik V.N. 1992. Atraining algorithm for optimal margin classifiers. In: Haussler D. (Ed.), Proceedings of the Annual Conference on Computational Learning Theory. ACM Press, Pittsburgh, PA, pp. 144-152.

    Google Scholar 

  • Bradley P.S., Fayyad U.M., and Mangasarian O.L. 1998. Data mining: Overview and optimization opportunities. Technical Report 98-01, University ofWisconsin, Computer Sciences Department, Madison, January. INFORMS Journal on Computing, to appear.

  • Bradley P.S. and Mangasarian O.L. 1998. Feature selection via concave minimization and support vector machines. In: Shavlik J. (Ed.), Proceedings of the International Conference on Machine Learning, Morgan Kaufmann Publishers, San Francisco, California, pp. 82-90. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98-03.ps.Z.

    Google Scholar 

  • Bunch J.R. and Kaufman L. 1977. Some stable methods for calculating inertia and solving symmetric linear systems. Mathematics of Computation 31: 163-179.

    Google Scholar 

  • Bunch J.R. and Kaufman L. 1980. A computational method for the indefinite quadratic programming problem. Linear Algebra and Its Applications, pp. 341-370, December.

  • Bunch J.R., Kaufman L., and Parlett B. 1976. Decomposition of a symmetric matrix. Numerische Mathematik 27: 95-109.

    Google Scholar 

  • Burges C.J.C. 1996. Simplified support vector decision rules. In L. Saitta (Ed.), Proceedings of the International Conference on Machine Learning, Morgan Kaufmann Publishers, San Mateo, CA, pp. 71-77.

    Google Scholar 

  • Burges C.J.C. 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2): 121-167.

    Article  Google Scholar 

  • Burges C.J.C. 1999. Geometry and invariance in kernel based methods. In Schölkopf B., Burges C.J.C., and Smola A.J., (Eds.), Advances in Kernel Methods-Support Vector Learning, MIT Press, Cambridge, MA, pp. 89-116.

    Google Scholar 

  • Burges C.J.C. and SchölkopfB. 1997. Improving the accuracy and speed of support vector learning machines. In Mozer M.C., Jordan M.I., and Petsche T., (Eds.), Advances in Neural Information Processing Systems 9, MIT Press, Cambridge, MA, pp. 375-381.

    Google Scholar 

  • Chalimourda A., Schölkopf B., and Smola A.J. 2004. Experimentally optimal νin support vector regression for different noise models and parameter settings. Neural Networks 17(1): 127-141.

    Google Scholar 

  • Chang C.-C., Hsu C.-W., and Lin C.-J. 1999. The analysis of decomposition methods for support vector machines. In Proceeding of IJCAI99, SVM Workshop.

  • Chang C.C. and Lin C.J. 2001. Training ν-support vector classi-fiers: Theory and algorithms. Neural Computation 13(9): 2119-2147.

    Google Scholar 

  • Chen S., Donoho D., and Saunders M. 1999. Atomic decomposition by basis pursuit. Siam Journal of Scientific Computing 20(1): 33-61.

    Google Scholar 

  • Cherkassky V. and Mulier F. 1998. Learning from Data. JohnWiley and Sons, New York.

    Google Scholar 

  • Cortes C. and Vapnik V. 1995. Support vector networks. Machine Learning 20: 273-297.

    Article  Google Scholar 

  • Cox D. and O'Sullivan F. 1990. Asymptotic analysis of penalized likelihood and related estimators. Annals of Statistics 18: 1676-1695.CPLEX Optimization Inc. Using the CPLEX callable library. Manual, 1994.

    Google Scholar 

  • Cristianini N. and Shawe-Taylor J. 2000. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK.

    Google Scholar 

  • Cristianini N., Campbell C., and Shawe-Taylor J. 1998. Multiplicative updatings for support vector learning. NeuroCOLT Technical Report NC-TR-98-016, Royal Holloway College.

  • Dantzig G.B. 1962. Linear Programming and Extensions. Princeton Univ. Press, Princeton, NJ.

    Google Scholar 

  • Devroye L., Györfi L., and Lugosi G. 1996. A Probabilistic Theory of Pattern Recognition. Number 31 in Applications of mathematics.Springer, New York.

    Google Scholar 

  • Drucker H., Burges C.J.C., Kaufman L., Smola A., and Vapnik V. 1997.Support vector regression machines. In: Mozer M.C., Jordan M.I., and Petsche T. (Eds.), Advances in Neural Information Processing Systems 9, MIT Press, Cambridge, MA, pp. 155-161.

    Google Scholar 

  • Efron B. 1982. The jacknife, the bootstrap, and other resampling plans. SIAM, Philadelphia.

    Google Scholar 

  • Efron B. and Tibshirani R.J. 1994. An Introduction to the Bootstrap. Chapman and Hall, New York.

    Google Scholar 

  • El-Bakry A., Tapia R., Tsuchiya R., and ZhangY. 1996. On the formulation and theory of the Newton interior-point method for nonlinear programming. J. Optimization Theory and Applications 89: 507-541.

    Google Scholar 

  • Fletcher R. 1989. Practical Methods of Optimization. John Wiley and Sons, New York.

    Google Scholar 

  • Girosi F. 1998. An equivalence between sparse approximation and support vector machines. Neural Computation 10(6): 1455-1480.

    Article  Google Scholar 

  • Girosi F., Jones M., and Poggio T. 1993. Priors, stabilizers and basis functions: From regularization to radial, tensor and additive splines. A.I. Memo No. 1430, Artificial Intelligence Laboratory, Massachusetts Institute of Technology.

  • Guyon I., Boser B., and Vapnik V. 1993. Automatic capacity tuning of very large VC-dimension classifiers. In: Hanson S.J., Cowan J.D., and Giles C.L. (Eds.), Advances in Neural Information Processing Systems 5. Morgan Kaufmann Publishers, pp. 147-155.

  • Härdle W. 1990. Applied nonparametric regression, volume 19 of Econometric Society Monographs. Cambridge University Press.

  • Hastie T.J. and Tibshirani R.J. 1990. Generalized Additive Models, volume 43 of Monographs on Statistics and Applied Probability. Chapman and Hall, London.

    Google Scholar 

  • Haykin S. 1998. Neural Networks: A Comprehensive Foundation. 2nd edition. Macmillan, New York.

    Google Scholar 

  • Hearst M.A., Schölkopf B., Dumais S., Osuna E., and Platt J. 1998. Trends and controversies-support vector machines. IEEE Intelligent Systems 13: 18-28.

    Google Scholar 

  • Herbrich R. 2002. LearningKernel Classifiers: Theory and Algorithms. MIT Press.

  • Huber P.J. 1972. Robust statistics: A review. Annals of Statistics 43: 1041.

    Google Scholar 

  • Huber P.J. 1981. Robust Statistics. John Wiley and Sons, New York. IBM Corporation. 1992. IBM optimization subroutine library guide and reference. IBM Systems Journal, 31, SC23-0519.

    Google Scholar 

  • Jaakkola T.S. and Haussler D. 1999. Probabilistic kernel regression models. In: Proceedings of the 1999 Conference on AI and Statistics.

  • Joachims T. 1999. Making large-scale SVM learning practical. In: Schölkopf B., Burges C.J.C., and Smola A.J. (Eds.), Advances in Kernel Methods-Support Vector Learning, MIT Press, Cambridge, MA, pp. 169-184.

    Google Scholar 

  • Karush W. 1939. Minima of functions of several variables with inequalities as side constraints. Master's thesis, Dept. of Mathematics, Univ. of Chicago.

  • Kaufman L. 1999. Solving the quadratic programming problem arising in support vector classification. In: Schölkopf B., Burges C.J.C., and Smola A.J. (Eds.), Advances in Kernel Methods-Support Vector Learning, MIT Press, Cambridge, MA, pp. 147-168

    Google Scholar 

  • Keerthi S.S., Shevade S.K., Bhattacharyya C., and Murthy K.R.K. 1999. Improvements to Platt's SMO algorithm for SVM classifier design. Technical Report CD-99-14, Dept. of Mechanical and Production Engineering, Natl. Univ. Singapore, Singapore.

    Google Scholar 

  • Keerthi S.S., Shevade S.K., Bhattacharyya C., and Murty K.R.K. 2001. Improvements to platt's SMO algorithm for SVM classifier design. Neural Computation 13: 637-649.

    Google Scholar 

  • Kimeldorf G.S. and Wahba G. 1970. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Annals of Mathematical Statistics 41: 495-502.

    Google Scholar 

  • Kimeldorf G.S. and Wahba G. 1971. Some results on Tchebycheffian spline functions. J. Math. Anal. Applic. 33: 82-95.

    Google Scholar 

  • Kowalczyk A. 2000. Maximal margin perceptron. In: Smola A.J., Bartlett P.L., Schölkopf B., and Schuurmans D. (Eds.), Advances in Large Margin Classifiers, MIT Press, Cambridge, MA, pp. 75-113.

    Google Scholar 

  • Kuhn H.W. and Tucker A.W. 1951. Nonlinear programming. In: Proc. 2nd Berkeley Symposium on Mathematical Statistics and Probabilistics, Berkeley. University of California Press, pp. 481-492.

  • Lee Y.J. and Mangasarian O.L. 2001. SSVM: A smooth support vector machine for classification. Computational optimization and Applications 20(1): 5-22.

    Google Scholar 

  • Li M. and Vitányi P. 1993. An introduction to Kolmogorov Complexity and its applications. Texts and Monographs in Computer Science. Springer, New York.

    Google Scholar 

  • Lin C.J. 2001. On the convergence of the decomposition method for support vector machines. IEEE Transactions on Neural Networks 12(6): 1288-1298.

    PubMed  Google Scholar 

  • Lustig I.J., Marsten R.E., and Shanno D.F. 1990. On implementing Mehrotra's predictor-corrector interior point method for linear programming. Princeton Technical Report SOR90-03., Dept. of Civil Engineering and Operations Research, Princeton University.

  • Lustig I.J., Marsten R.E., and Shanno D.F. 1992. On implementing Mehrotra's predictor-corrector interior point method for linear programming. SIAM Journal on Optimization 2(3): 435-449.

    Google Scholar 

  • MacKay D.J.C. 1991. Bayesian Methods for Adaptive Models. PhD thesis, Computation and Neural Systems, California Institute of Technology, Pasadena, CA.

  • Mangasarian O.L. 1965. Linear and nonlinear separation of patterns by linear programming. Operations Research 13: 444-452.

    Google Scholar 

  • Mangasarian O.L. 1968. Multi-surface method of pattern separation. IEEE Transactions on Information Theory IT-14: 801-807.

    Google Scholar 

  • Mangasarian O.L. 1969. Nonlinear Programming. McGraw-Hill, New York.

    Google Scholar 

  • Mattera D. and Haykin S. 1999. Support vector machines for dynamic reconstruction of a chaotic system. In: SchölkopfB., Burges C.J.C., and Smola A.J. (Eds.), Advances in Kernel Methods-Support Vector Learning, MIT Press, Cambridge, MA, pp. 211-242.

    Google Scholar 

  • McCormick G.P. 1983. Nonlinear Programming: Theory, Algorithms, and Applications. John Wiley and Sons, New York.

    Google Scholar 

  • Megiddo N. 1989. Progressin Mathematical Programming, chapter Pathways to the optimal set in linear programming, Springer, New York, NY, pp. 131-158.

    Google Scholar 

  • Mehrotra S. and Sun J. 1992. On the implementation of a (primal-dual) interior point method. SIAM Journal on Optimization 2(4): 575-601.

    Google Scholar 

  • Mercer J. 1909. Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society, London A 209: 415-446.

    Google Scholar 

  • Micchelli C.A. 1986. Algebraic aspects of interpolation. Proceedings of Symposia in Applied Mathematics 36: 81-102.

    Google Scholar 

  • Morozov V.A. 1984. Methods for Solving Incorrectly Posed Problems. Springer.

  • Müller K.-R., Smola A., Rätsch G., Schölkopf B., Kohlmorgen J., and Vapnik V. 1997. Predicting time series with support vector machines. In: Gerstner W., Germond A., Hasler M., and Nicoud J.-D. (Eds.), Artificial Neural Networks ICANN'97, Berlin. Springer Lecture Notes in Computer Science Vol. 1327 pp. 999-1004.

  • Murtagh B.A. and Saunders M.A. 1983. MINOS 5.1 user's guide. Technical Report SOL 83-20R, Stanford University, CA, USA, Revised 1987.

  • Neal R. 1996. Bayesian Learning in Neural Networks. Springer.

  • Nilsson N.J. 1965. Learning machines: Foundations ofTrainable Pattern Classifying Systems. McGraw-Hill.

  • Nyquist. H. 1928. Certain topics in telegraph transmission theory. Trans. A.I.E.E., pp. 617-644.

  • Osuna E., Freund R., and Girosi F. 1997. An improved training algorithm for support vector machines. In Principe J., Gile L., Morgan N., and Wilson E. (Eds.), Neural Networks for Signal Processing VII-Proceedings of the 1997 IEEEWorkshop, pp. 276-285, New York, IEEE.

    Google Scholar 

  • Osuna E. and Girosi F. 1999. Reducing the run-time complexity in support vector regression. In: Schölkopf B., Burges C.J.C., and Smola A. J. (Eds.), Advances in Kernel Methods-Support Vector Learning, pp. 271-284, Cambridge, MA, MIT Press.

    Google Scholar 

  • Ovari Z. 2000.Kernels, eigenvalues and support vector machines. Honours thesis, Australian National University, Canberra.

  • Platt J. 1999. Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B., Burges C.J.C., and Smola A.J. (Eds.) Advances in Kernel Methods-Support Vector Learning, pp. 185-208, Cambridge, MA, MIT Press.

    Google Scholar 

  • Poggio T. 1975. On optimal nonlinear associative recall. Biological Cybernetics, 19: 201-209.

    Google Scholar 

  • Rasmussen C. 1996. Evaluation of Gaussian Processes and Other Methods for Non-Linear Regression. PhD thesis, Department of Computer Science, University of Toronto, ftp://ftp.cs.toronto.edu/pub/carl/thesis.ps.gz.

  • Rissanen J. 1978. Modeling by shortest data description. Automatica, 14: 465-471.

    Article  Google Scholar 

  • Saitoh S. 1988. Theory of Reproducing Kernels and its Applications. Longman Scientific & Technical, Harlow, England.

  • Saunders C., Stitson M.O., Weston J., Bottou L., Schölkopf B., and Smola A. 1998. Support vector machine-reference manual.Technical Report CSD-TR-98-03, Department of Computer Science, Royal Holloway, University of London, Egham, UK. SVM available at http://svm.dcs.rhbnc.ac.uk/.

  • Schoenberg I. 1942. Positive definite functions on spheres. Duke Math. J., 9: 96-108.

    Google Scholar 

  • Schölkopf B. 1997. Support Vector Learning. R. Oldenbourg Verlag, München. Doktorarbeit, TU Berlin. Download: http://www.kernel-machines.org.

  • Schölkopf B., Burges C., and Vapnik V. 1995. Extracting support data for a given task. In: Fayyad U.M. and Uthurusamy R. (Eds.), Proceedings, First International Conference on Knowledge Discovery & Data Mining, Menlo Park, AAAI Press.

    Google Scholar 

  • Schölkopf B., Burges C., and Vapnik V. 1996. Incorporating invariances in support vector learning machines. In: von der Malsburg C., von Seelen W., Vorbrüggen J. C., and Sendhoff B. (Eds.), Artificial Neural Networks ICANN'96, pp. 47-52, Berlin, Springer Lecture Notes in Computer Science, Vol. 1112.

    Google Scholar 

  • Schölkopf B., Burges C.J.C., and Smola A.J. 1999a. (Eds.) Advances in Kernel Methods-Support Vector Learning. MIT Press, Cambridge, MA.

    Google Scholar 

  • Schölkopf B., Herbrich R., Smola A.J., and Williamson R.C. 2001. A generalized representer theorem. Technical Report 2000-81, NeuroCOLT, 2000.To appear in Proceedings of the Annual Conference on Learning Theory, Springer (2001).

  • Schölkopf B., Mika S., Burges C., Knirsch P., Müller K.-R., Rätsch G., and Smola A. 1999b. Input space vs. feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10(5): 1000-1017.

    Google Scholar 

  • Schölkopf B., Platt J., Shawe-Taylor J., Smola A.J., and Williamson R.C.2001. Estimating the support of a high-dimensional distribution. Neural Computation,13(7): 1443-1471.

    Article  Google Scholar 

  • Schölkopf B., Simard P., Smola A., and Vapnik V. 1998a. Prior knowledge in support vector kernels. In: Jordan M.I., Kearns M.J., and Solla S.A. (Eds.) Advances in Neural Information Processing Systems 10, MIT Press. Cambridge, MA, pp. 640-646.

    Google Scholar 

  • Schölkopf B., Smola A., and Müller K.-R. 1998b. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10: 1299-1319.

    Google Scholar 

  • Schölkopf B., Smola A., Williamson R.C., and Bartlett P.L. 2000. New support vector algorithms. Neural Computation, 12: 1207-1245.

    Google Scholar 

  • Schölkopf B. and Smola A.J. 2002. Learning with Kernels. MIT Press.

  • Schölkopf B., Sung K., Burges C., Girosi F., Niyogi P., Poggio T., and Vapnik V. 1997. Comparing support vector machines with Gaussian kernels to radial basis function classifiers. IEEE Transactions on Signal Processing, 45: 2758-2765.

    Article  Google Scholar 

  • Shannon C.E. 1948. A mathematical theory of communication. Bell System Technical Journal, 27: 379-423, 623-656.

    Google Scholar 

  • Shawe-Taylor J., Bartlett P.L., Williamson R.C., and Anthony M. 1998. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5): 1926-1940.

    Google Scholar 

  • Smola A., Murata N., Schölkopf B., and Müller K.-R. 1998a. Asymptotically optimal choice of ε-loss for support vector machines. In: Niklasson L., Bodén M., and Ziemke T. (Eds.) Proceedings of the International Conference on Artificial Neural Networks, Perspectives in Neural Computing, pp. 105-110, Berlin, Springer.

    Google Scholar 

  • Smola A., Schölkopf B., and Müller K.-R. 1998b. The connection between regularization operators and support vector kernels. Neural Networks, 11: 637-649.

    Google Scholar 

  • Smola A., Schölkopf B., and Müller K.-R. 1998c. General cost functions for support vector regression. In: Downs T., Frean M., and Gallagher M. (Eds.) Proc. of the Ninth Australian Conf. on Neural Networks, pp. 79-83, Brisbane, Australia. University of Queensland.

    Google Scholar 

  • Smola A., Schölkopf B., and Rätsch G. 1999. Linear programs for automatic accuracy control in regression. In: Ninth International Conference on Artificial Neural Networks, Conference Publications No. 470, pp. 575-580, London. IEE.

  • Smola. A.J. 1996. Regression estimation with support vector learning machines. Diplomarbeit, Technische Universität München.

  • Smola A.J. 1998. Learning with Kernels. PhD thesis, Technische Universit ät Berlin. GMD Research Series No. 25.

  • Smola A.J., Elisseeff A., Schölkopf B., and Williamson R.C. 2000. Entropy numbers for convex combinations and MLPs. In Smola A.J., Bartlett P.L., Schölkopf B., and Schuurmans D. (Eds.) Advances in Large Margin Classifiers, MIT Press, Cambridge, MA, pp. 369-387.

    Google Scholar 

  • Smola A.J., óvári Z.L., and Williamson R.C. 2001. Regularization with dot-product kernels. In: Leen T.K., Dietterich T.G., and Tresp V. (Eds.) Advances in Neural Information Processing Systems 13, MIT Press, pp. 308-314.

  • Smola A.J. and Schölkopf B. 1998a. On a kernel-based method for pattern recognition, regression, approximation and operator inversion. Algorithmica, 22: 211-231.

    Google Scholar 

  • Smola A.J. and Schölkopf B. 1998b. A tutorial on support vector regression. NeuroCOLT Technical Report NC-TR-98-030, Royal Holloway College, University of London, UK.

    Google Scholar 

  • Smola A.J. and Schölkopf B. 2000. Sparse greedy matrix approximation for machine learning. In: Langley P. (Ed.), Proceedings of the International Conference on Machine Learning, Morgan Kaufmann Publishers, San Francisco, pp. 911-918.

    Google Scholar 

  • Stitson M., Gammerman A., Vapnik V., Vovk V., Watkins C., and Weston J. 1999. Support vector regression with ANOVA decomposition kernels. In: Schölkopf B., Burges C.J.C., and Smola A.J. (Eds.), Advances in Kernel Methods-Support Vector Learning, MIT Press Cambridge, MA, pp. 285-292.

    Google Scholar 

  • Stone C.J. 1985. Additive regression and other nonparametric models. Annals of Statistics, 13: 689-705.

    Google Scholar 

  • Stone M. 1974. Cross-validatory choice and assessment of statistical predictors (with discussion). Journal of the Royal Statistical Society, B36: 111-147.

    Google Scholar 

  • Sreet W.N. and Mangasarian O.L. 1995. Improved generalization via tolerant training. Technical Report MP-TR-95-11, University of Wisconsin, Madison.

    Google Scholar 

  • Tikhonov A.N. and Arsenin V.Y. 1977. Solution of Ill-posed problems. V. H. Winston and Sons.

  • Tipping M.E. 2000. The relevance vector machine. In: Solla S.A., Leen T.K., and Müller K.-R. (Eds.), Advances in Neural Information Processing Systems 12, MIT Press, Cambridge, MA, pp. 652-658.

    Google Scholar 

  • Vanderbei R.J. 1994. LOQO: An interior point code for quadratic programming.TR SOR-94-15, Statistics and Operations Research, Princeton Univ., NJ.

    Google Scholar 

  • Vanderbei R.J. 1997. LOQO user's manual-version 3.10. Technical Report SOR-97-08, Princeton University, Statistics and Operations Research, Code available at http://www.princeton.edu/ ~rvdb/.

  • Vapnik V. 1995. The Nature of Statistical Learning Theory. Springer, New York.

    Google Scholar 

  • Vapnik V. 1998. Statistical Learning Theory. John Wiley and Sons, New York.

    Google Scholar 

  • Vapnik. V. 1999. Three remarks on the support vector method of function estimation. In: Schölkopf B., Burges C.J.C., and Smola A.J. (Eds.), Advances in Kernel Methods-Support Vector Learning, MIT Press, Cambridge, MA, pp. 25-42.

    Google Scholar 

  • Vapnik V. and Chervonenkis A. 1964. A note on one class of perceptrons. Automation and Remote Control, 25.

  • Vapnik V. and Chervonenkis A. 1974. Theory of Pattern Recognition [in Russian]. Nauka, Moscow. (German Translation: Wapnik W. & Tscherwonenkis A., Theorie der Zeichenerkennung, Akademie-Verlag, Berlin, 1979).

    Google Scholar 

  • Vapnik V., Golowich S., and Smola A. 1997. Support vector method for function approximation, regression estimation, and signal processing. In: Mozer M.C., Jordan M.I., and Petsche T. (Eds.) Advances in Neural Information Processing Systems 9, MA, MIT Press, Cambridge. pp. 281-287.

    Google Scholar 

  • Vapnik V. and Lerner A. 1963. Pattern recognition using generalized portrait method. Automation and Remote Control, 24: 774-780.

    Google Scholar 

  • Vapnik V.N. 1982. Estimation of Dependences Based on Empirical Data. Springer, Berlin.

    Google Scholar 

  • Vapnik V.N. and Chervonenkis A.Y. 1971. On the uniformconvergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2): 264-281.

    Google Scholar 

  • Wahba G. 1980. Spline bases, regularization, and generalized cross-validation for solving approximation problems with large quantities of noisy data. In: Ward J. and Cheney E. (Eds.), Proceedings of the International Conference on Approximation theory in honour of George Lorenz, Academic Press, Austin, TX, pp. 8-10.

    Google Scholar 

  • Wahba G. 1990. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia.

    Google Scholar 

  • Wahba G. 1999. Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. In: Schölkopf B., Burges C.J.C., and Smola A.J. (Eds.), Advances in Kernel Methods-Support Vector Learning, MIT Press, Cambridge, MA. pp. 69-88.

    Google Scholar 

  • Weston J., Gammerman A., Stitson M., Vapnik V., Vovk V., and Watkins C. 1999. Support vector density estimation. In: Schölkopf B., Burges C.J.C., and Smola A.J. (Eds.) Advances in Kernel Methods-Support Vector Learning, MIT Press, Cambridge, MA. pp. 293-306.

    Google Scholar 

  • Williams C.K.I. 1998. Prediction with Gaussian processes: From linear regression to linear prediction and beyond. In: Jordan M.I. (Ed.), Learning and Inference in Graphical Models, Kluwer Academic, pp. 599-621.

  • Williamson R.C., Smola A.J., and Schölkopf B. 1998. Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators. Technical Report 19, NeuroCOLT, http://www.neurocolt.com. Published in IEEE Transactions on Information Theory, 47(6): 2516-2532 (2001).

    Google Scholar 

  • Yuille A. and Grzywacz N. 1988. The motion coherence theory. In: Proceedings of the International Conference on Computer Vision, IEEE Computer Society Press, Washington, DC, pp. 344-354.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Smola, A.J., Schölkopf, B. A tutorial on support vector regression. Statistics and Computing 14, 199–222 (2004). https://doi.org/10.1023/B:STCO.0000035301.49549.88

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:STCO.0000035301.49549.88

Navigation