Key words: Chinese parsing, Chinese generation, reversible grammar, HPSG
This paper presents a reversible Chinese unification grammar named CPSG. The lexicalized and integrated design of CPSG embodies the general spirit of the modern linguistic theory Head-driven Phrase Structure Grammar (HPSG, Pollard & Sag 1987, 1994). Using ALE formalism in Prolog (Carpenter & Penn 1994), we have implemented a prototype of CPSG.
CPSG covers Chinese morphology, Chinese syntax and semantics in a novel integrated language model (Figure 1, for interface between morphology, see Li 1997; for interface between syntax and semantics, see Li 1996). CPSG model is in sharp contrast to the conventional clear-cut successive design of grammar components (Figure 2, see survey in Feng 1996). We will show that our model is much better suited and more efficient for Chinese analysis (or generation).
Grammar reversibility is a highly desired feature for multi-lingual machine translation application (Hutchins & Somers 1992, Huang 1986, 1987). To test its reversible features, we have applied the CPSG prototype to an experiment of bi-directional machine translation between English and Chinese. The machine translation engine developed in our Natural Language Lab is based on shake-and-bake design, a novel approach to machine translation suited for unification grammars (Whitelock 1992, 1994, Beaven 1992, Brew 1992). The experimental results meet our design objective and verify the feasibility of CPSG approach.
Notes for NWLC-97, UBC, Vancouver
Outline of An HPSG-style Chinese Reversible Grammar
Wei LI (firstname.lastname@example.org)
Linguistics Department, Simon Fraser University
Key words: lexicalist approach, integrated language model, HPSG,
reversible grammar, bi-directional machine translation,
Chinese computational grammar,
Chinese word identification, Chinese parsing,
1.1. design philosophy
Two major obstacles in writing Chinese computational grammar:
lacking in serious study on Chinese lexical base
well designed lexicon is crucial for a successful computational system
theoretical linguists have made fruitful efforts (e.g. Li Linding) but lack formalization
computational linguists require more patience in adapting and formalizing the fruits:
it is huge work, but has to be done if a non-toy system is targeted
lack of effective interaction between morphology, syntax and semantics.
ambiguity in word identification makes it hard to interface morphology & syntax:
a theoretical defect of morphology preprocessor (segmenter)
e.g. ABC: ABC or A | BC or AB | C or A | B | C?
active/passive isomorphic phenomena make semantic constraint a desired need in parsing NP Vt: subject NP or object NP?
Solution: the lexicalized and integrated design of Chinese grammar
1.2. major theoretical foundation:
HPSG: lexicalist theory encouraging integration of different components
a desired framework matching our design philosophy
CPSG: HPSG-style unification grammar
CPSG: reversible grammar suited for both parsing and generation
CPSG: formalized grammar, a description that does not rely on undefined notions
- integrated language model
2.1. CPSG versus conventional Chinese grammar
parse tree embodies both morphological and syntactic structures in CPSG
- lexicalized formal grammar
3.1. formalized grammar, as required by a computational grammar: formulation of CPSG
readily implementable (theories, principles, rules, etc.);
precise definition for the very basic notions (e.g. sign, morpheme, word, phrase, sentence, NP, VP, etc.), rules (PS rules and lexical rules), lexical items (lexical hierarchy), typology (hierarchy embodied in feature structures)
(4.) Definition: sign
A sign is the most fundamental concept of grammar. Formally, a sign is defined by the type [a_sign], which introduces a set of linguistic features for its description, as shown below.
(5.) Definition: word
In CPSG, a word is a sign satisfying the following two conditions: (1) its obligatory morphological expectation has all been saturated; (2) it is not a mother of any syntactic structures, hence no syntactic daughters. Formally, a word is defined as shown below.
3.2. lexicalized grammar
CPSG consists of two parts:
(1) a minimized general grammar:
only 11 phrase structure rules
(covering complement structure, modifier structure,
conjunctive structure and morphological structure)
(2) a feature enriched lexicon:
lexical hierarchy and a set of lexical rules
(capturing lexical generalizations).
(7.) comp0 PS rule
LEFTMOD  category
RIGHTMOD  category
LEFTCOMP  category
RIGHTCOMP  category
COMP1  ~obligatory
COMP2  ~obligatory
EXPECTED a_sign 
MYROLE  comp_role
(8.) lexical entry: chi
INDEX0  index
INDEX1  index
- Implementation and Application of CPSG
CPSG prototype implemented in ALE and Prolog, having parsed a corpus of 200 various types of sentences
ALE and Prolog: suitable for unification grammar
ALE: mechanism for typed feature structures: type polymorphism
a powerful tool in language modeling
CPSG prototype adapted for application to bi-directional MT, having generated the same corpus of 200 sentences
Beaven, John L. (1992): “Shake and Bake Machine Translation”, Proceedings of the 15th International Conference on Computational Linguistics, pp. 603-609, Nantes, France.
Brew, Chris (1992): “Letting the Cat out of the Bag: Generation for Shake-and-bake MT”, Proceedings of the 15th International Conference on Computational Linguistics, pp. 610-616, Nantes, France.
Carpenter, B. & Penn, G. (1994): ALE, The Attribute Logic Engine, User’s Guide
Feng, Z. (1996): “COLIPS Lecture Series – Chinese Natural Language Processing”, Communications of COLIPS, Vol.6, No.1 1996, Singapore (http://www.iscs.nus.sg/~colips/commcolips/paper/p96.html)
Huang, X-M. (1986): “A Bidirectional Grammar for Parsing and Generating Chinese”. Proceedings of the International Conference on Chinese Computing, Singapore, pp. 46-54
Huang, X-M. (1987): XTRA: The Design and Implementation of A Fully Automatic Machine Translation System, Doctoral dissertation, University of Essex.
Hutchins, W.J. & H.L. Somers (1992): An Introduction to Machine Translation. London, Academic Press.
Li, W. (1996): Interaction of Syntax and Semantics in Parsing Chinese Transitive Patterns. Proceedings of International Conference on Chinese Computing (ICCC’96), Singapore
Li, W. (1997): Chart Parsing Chinese Character Strings. Proceedings of The Ninth North American Conference on Chinese Linguistics (NACCL-9, to be available), Victoria, Canada
Pollard, C. & I. Sag (1987): Information based Syntax and Semantics Vol. 1: Fundamentals. Centre for the Study of Language and Information, Stanford University, CA
Pollard, C. & I. Sag (1994): Head-Driven Phrase Structure Grammar, Centre for the Study of Language and Information, Stanford University, CA
Whitelock, Pete (1992): “Shake and Bake Translation”, Proceedings of the 14th International Conference on Computational Linguistics, pp. 784-790, Nantes, France.
Whitelock, Pete (1994). “Shake and Bake Translation”, C.J. Rupp, M.A. Rosner, and R.L. Johnson (eds.), Constraints, Language and Computation, pp. 339-359, London, Academic Press.