Why synchronous tree substitution grammars




















Square boxes Sonia Rose quality Bienvenue quality represent data; rounded boxes represent programs. Restricting the problem for the text Sonia Rose has very good food quality to sentence planning also means that our system but Bienvenue has excellent food quality. At a high level, our sys- quality Bienvenue quality Mod tem uses heuristic alignments between individual very Mod Mod Mod Mod nodes of these trees to initialize the model and food excellent food then iteratively samples possible alternative and novel alignments to determine the best set of syn- Figure 3: Possible elementary trees for the TP top chronous derivations for TP and LF trees.

The row and LF bottom row in Figure 2, omitting synchronous tree substitution grammar rules in- some detail for simplicity.

First, we could simply have 3 Synchronous TSGs this entire tree memorized in our grammar as an Synchronous tree substitution grammars TSGs elementary tree. This would make the derivation are a subset of synchronous tree adjoining gram- trivial but would also result in a totally ungeneral- mars, both of which represent the relationships be- izable rule.

On the other hand, we could have the tween pairs of trees Shieber and Schabes, ; equivalent of a CFG derivation for the tree consist- Eisner, These rules would to expand non-terminal nodes into a complete tree. The third option, illustrating the the text plan and logical form trees for the sen- appeal of using a tree substitution grammar, in- tence, Sonia Rose has very good food quality, but volves elementary trees of intermediate size, like Bienvenue has excellent food quality.

In these elementary trees, the HDPs. This allows us to learn more informa- empty node sites at the end of an arc represent tive prior distributions the lower-level DPs to im- substitution sites, where another elementary tree prove the quality of our predictions for higher- must be expanded for a complete derivation.

In level DPs. Section 5. In our synchronous TSGs over of elementary trees, as described in Section 5. A plate diagram of the model is pre- node paired with the label of the incoming arc.

A synchronous tree substitution grammar, then, 5. For ex- We begin by defining TSG base distributions for ample, we can combine the TP elementary tree text plans and logical forms independently.

Our rooted at contrast with the LF elementary tree generative story begins with sampling an elemen- rooted at but, aligning each contrast, Arg sub- tary tree for the root of the tree and then repeating stitution site in the TP to the but, F irst and this sampling procedure for each frontier node in but, Next sites in the LF. Since the tree locations l corresponding to fron- 4 Dirichlet Processes tier nodes are completely determined by the cur- rent expansion of the tree, we only need to define The Dirichlet process DP provides a natural way a distribution over possible elementary trees con- to trade off between prior expectations and obser- ditioned on the tree location: vations.

As we observe more data, elementary tree e and children node indicates the however, we rely less on our priors in general. These probabilities are each modeled as a DP with a uniform prior over possible tree lo- cations.

Our Gibbs sampler adapts the blocked sampling approach of Cohn et al. For each text in the corpus, we resam- ple a synchronous derivation for the entire text be- fore updating the associated model parameters. Parsing the other. This option leveragesa large amount of manual domain knowledge en-gineering and is not in general amenable to latentvariable problems.

A simpler alternative to this two step approachis to use a generative model of synchronousderivation and simultaneously segment and weightthe elementary tree pairs to maximize the prob-ability of the training data under that model; thesimplest exemplar of this approach uses expecta-tion maximization EM Dempster et al.

This approach has two frailties. First, EM searchover the space of all possible rules is computation-ally impractical. Second, even if such a searchwere practical, the method is degenerate, pushingthe probability mass towards larger rules in orderto better approximate the empirical distribution ofthe data Goldwater et al. Indeed, the optimal grammar would be onein which each tree pair in the training data is itsown rule.

Therefore, proposals for using EM forthis task start with a precomputed subset of rules,and with EM used just to assign weights withinthis grammar. Such models have been used as gener-ative solutions to several other segmentation prob-lems, ranging from word segmentation Goldwa-ter et al. A Dirichlet process DP prior is typically usedto achieve this interplay.

In this work, we use an extension of the afore-mentioned models of generative segmentation forSTSG induction, and describe an algorithm forposterior inference under this model that is tai-lored to the task of extractive sentence compres-sion. This task is characterized by the availabil-ity of word alignments, providing a clean testbedfor investigating the effects of grammar extraction. We achieve substantial improvements against anumber of baselines including EM, support vectormachine SVM based discriminative training, andvariational Bayes VB.

By comparing our methodto a range of other methods that are subject dif-ferentially to the two problems, we can show thatboth play an important role in performance limi-tations, and that our method helps address both aswell. We then describe the experiments inextractive sentence compression and present ourresults in contrast with alternative algorithms. Weconclude by giving examples of compression pat-terns learned by the Bayesian method.

In extractive sentence compression, which we fo-cus on in this paper, an order-preserving subset ofthe words in the sentence are selected to form thesummary, that is, we summarize by deleting words Knight and Marcu, In su-pervised sentence compression, the goal is to gen-eralize from a parallel training corpus of sentences source and their compressions target to unseensentences in a test set to predict their compres-sions.

See Figure 1 for an example of how an STSGwith these rules would operate in synchronouslygenerating our example sentence pair. STSG is a convenient choice of formalism fora number of reasons.

First, it eliminates the iso-morphism and strong independence assumptionsof SCFGs. Second, the ability to have rules deeperthan one level provides a principled way of model-ing lexicalization, whose importance has been em-phasized Galley and McKeown, ; Yamangiland Nelken, Similarly, anFigure 2: Gibbs sampling updates. We describe such a process forsampling a corpus of tree pairs t.

We then sample a sequence of elementary treepairs to serve as a derivation for each observed de-rived tree pair. If the decision to ex-pand was made, we sample an appropriaterule from a PCFG which we estimate aheadof time from the training corpus. We expandthe nonterminal using this rule, and then re-peat the same procedure for every child gen-erated that is a nonterminal until there are nogenerated nonterminal children left. This isdone independently for both esand et.

Fi-nally, we sample an alignment between thefrontier nodes uniformly at random out of allpossible alingments. The insertion rule case is symmet-ric. The base distribution generates esusingthe same process described for synchronousrules above. This simple base distribution does nothing toenforce an alignment between the internal nodesof esand et. One may come up with more sophis-ticated base distributions.

Bimorphisms and synchronous grammars. Journal of Language Modelling , 2 1 , 51— Issue Vol. In this article, inspired by lexicalized PCFG, which is widely used in monolingual parsing, we propose to upgrade the STSG synchronous tree substitution grammars -based syntax translation model with bilingually lexicalized STSG.

Using the string-to-tree translation model as a case study, we present generative and discriminative models to integrate lexicalized STSG into the translation model. Both small- and large-scale experiments on Chinese-to-English translation demonstrate that the proposed lexicalized STSG can provide superior rule selection in decoding and substantially improve the translation quality.



0コメント

  • 1000 / 1000