Section: New Results
Structured Penalties for Log-linear Language Models
Participants : Anil Nelakanti [correspondent] , Cédric Archambeau, Francis Bach, Guillaume Bouchard.
Language models can be formalized as log-linear regression models where the
input features represent previously observed contexts up to a
certain length
Language models are crucial parts of advanced natural language processing pipelines, such as speech recognition [45] , machine translation [47] , or information retrieval [92] . When a sequence of symbols is observed, a language model predicts the probability of occurrence of the next symbol in the sequence. Models based on so-called back-off smoothing have shown good predictive power [60] . In particular, Kneser-Ney (KN) and its variants [66] are still achieving state-of-the-art results for more than a decade after they were originally proposed. Smoothing methods are in fact clever heuristics that require tuning parameters in an ad-hoc fashion. Hence, more principled ways of learning language models have been proposed based on maximum entropy [50] or conditional random fields [81] , or by adopting a Bayesian approach [94] .
We focus on penalized maximum likelihood estimation in log-linear models.
In contrast to language models based on unstructured norms such as
We show that structured tree norms provide an efficient framework for language modeling.
Furthermore, we give the first algorithm for structured