• DocumentCode
    60506
  • Title

    Second Order Methods for Optimizing Convex Matrix Functions and Sparse Covariance Clustering

  • Author

    Chin, Gillian M. ; Nocedal, Jorge ; Olsen, Peder A. ; Rennie, Steven J.

  • Author_Institution
    Dept. of Ind. Eng. & Manage. Sci., Northwestern Univ., Evanston, IL, USA
  • Volume
    21
  • Issue
    11
  • fYear
    2013
  • fDate
    Nov. 2013
  • Firstpage
    2244
  • Lastpage
    2254
  • Abstract
    A variety of first-order methods have recently been proposed for solving matrix optimization problems arising in machine learning. The premise for utilizing such algorithms is that second order information is too expensive to employ, and so simple first-order iterations are likely to be optimal. In this paper, we argue that second-order information is in fact efficiently accessible in many matrix optimization problems, and can be effectively incorporated into optimization algorithms. We begin by reviewing how certain Hessian operations can be conveniently represented in a wide class of matrix optimization problems, and provide the first proofs for these results. Next we consider a concrete problem, namely the minimization of the ℓ1 regularized Jeffreys divergence, and derive formulae for computing Hessians and Hessian vector products. This allows us to propose various second order methods for solving the Jeffreys divergence problem. We present extensive numerical results illustrating the behavior of the algorithms and apply the methods to a speech recognition problem. We compress full covariance Gaussian mixture models utilized for acoustic models in automatic speech recognition. By discovering clusters of (sparse inverse) covariance matrices, we can compress the number of covariance parameters by a factor exceeding 200, while still outperforming the word error rate (WER) performance of a diagonal covariance model that has 20 times less covariance parameters than the original acoustic model.
  • Keywords
    Gaussian processes; covariance analysis; iterative methods; learning (artificial intelligence); optimisation; speech recognition; ℓ1 regularized Jeffreys divergence; Hessian operations; Hessian vector products; Jeffreys divergence problem; acoustic model; automatic speech recognition; convex matrix functions; covariance Gaussian mixture models; covariance parameters; diagonal covariance model; first-order iterations; first-order methods; machine learning; matrix optimization problems; optimization algorithms; second order information; second order methods; second-order information; sparse covariance clustering; speech recognition problem; Large scale systems; Optimization; Pattern recognition; Sparse matrices; Convexity; FISTA; Hessian structure; Jeffreys divergence; Kullback Leibler divergence; LASSO; Newton´s method; clustering;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2013.2263142
  • Filename
    6516015