ABSTRACT
Key words: Chinese parsing, Chinese generation, reversible grammar, HPSG
This paper presents a reversible Chinese unification grammar named CPSG. The lexicalized and integrated design of CPSG embodies the general spirit of the modern linguistic theory Head-driven Phrase Structure Grammar (HPSG, Pollard & Sag 1987, 1994). Using ALE formalism in Prolog (Carpenter & Penn 1994), we have implemented a prototype of CPSG.
CPSG covers Chinese morphology, Chinese syntax and semantics in a novel integrated language model (Figure 1, for interface between morphology, see Li 1997; for interface between syntax and semantics, see Li 1996). CPSG model is in sharp contrast to the conventional clear-cut successive design of grammar components (Figure 2, see survey in Feng 1996). We will show that our model is much better suited and more efficient for Chinese analysis (or generation).
Grammar reversibility is a highly desired feature for multi-lingual machine translation application (Hutchins & Somers 1992, Huang 1986, 1987). To test its reversible features, we have applied the CPSG prototype to an experiment of bi-directional machine translation between English and Chinese. The machine translation engine developed in our Natural Language Lab is based on shake-and-bake design, a novel approach to machine translation suited for unification grammars (Whitelock 1992, 1994, Beaven 1992, Brew 1992). The experimental results meet our design objective and verify the feasibility of CPSG approach.
~~~~~~~~~~~~~~~~~~~~~
Notes for NWLC-97, UBC, Vancouver
Outline of An HPSG-style Chinese Reversible Grammar
Wei LI ([email protected])
Linguistics Department, Simon Fraser University
Key words: lexicalist approach, integrated language model, HPSG,
reversible grammar, bi-directional machine translation,
Chinese computational grammar,
Chinese word identification, Chinese parsing,
Chinese generation
- background
1.1. design philosophy
Two major obstacles in writing Chinese computational grammar:
lacking in serious study on Chinese lexical base
well designed lexicon is crucial for a successful computational system
theoretical linguists have made fruitful efforts (e.g. Li Linding) but lack formalization
computational linguists require more patience in adapting and formalizing the fruits:
it is huge work, but has to be done if a non-toy system is targeted
lack of effective interaction between morphology, syntax and semantics.
e.g.
ambiguity in word identification makes it hard to interface morphology & syntax:
a theoretical defect of morphology preprocessor (segmenter)
e.g. ABC: ABC or A | BC or AB | C or A | B | C?
active/passive isomorphic phenomena make semantic constraint a desired need in parsing NP Vt: subject NP or object NP?
Solution: the lexicalized and integrated design of Chinese grammar
1.2. major theoretical foundation:
HPSG: lexicalist theory encouraging integration of different components
a desired framework matching our design philosophy
CPSG: HPSG-style unification grammar
CPSG: reversible grammar suited for both parsing and generation
CPSG: formalized grammar, a description that does not rely on undefined notions
- integrated language model
2.1. CPSG versus conventional Chinese grammar
parse tree embodies both morphological and syntactic structures in CPSG
- lexicalized formal grammar
3.1. formalized grammar, as required by a computational grammar: formulation of CPSG
readily implementable (theories, principles, rules, etc.);
precise definition for the very basic notions (e.g. sign, morpheme, word, phrase, sentence, NP, VP, etc.), rules (PS rules and lexical rules), lexical items (lexical hierarchy), typology (hierarchy embodied in feature structures)
(4.) Definition: sign
A sign is the most fundamental concept of grammar. Formally, a sign is defined by the type [a_sign], which introduces a set of linguistic features for its description, as shown below.
a_sign
INDEX index
KANJI kanji
MORPH1 expected
MORPH2 expected
CATEGORY category
COMP0 expected
COMP1 expected
COMP2 expected
MOD expected
KNOWLEDGE knowledge
CONTENT content
INDEX0 index
INDEX1 index
INDEX2 index
DTR dtr
(5.) Definition: word
In CPSG, a word is a sign satisfying the following two conditions: (1) its obligatory morphological expectation has all been saturated; (2) it is not a mother of any syntactic structures, hence no syntactic daughters. Formally, a word is defined as shown below.
(6.) word
a_sign
MORPH1 ~obligatory
MORPH2 ~obligatory
DTR no_syn_dtr
3.2. lexicalized grammar
CPSG consists of two parts:
(1) a minimized general grammar:
only 11 phrase structure rules
(covering complement structure, modifier structure,
conjunctive structure and morphological structure)
(2) a feature enriched lexicon:
lexical entries;
lexical hierarchy and a set of lexical rules
(capturing lexical generalizations).
(7.) comp0 PS rule
MOTHER a_sign
COMP0 saturated
COMP1 [1]
COMP2 [2]
DTR comp0
MYSISTER [6]
LEFTMOD [7] category
RIGHTMOD [8] category
LEFTCOMP [9] category
RIGHTCOMP [10] category
===>
EXPECTING a_sign
COMP0 a_expected
DIRECTION left
ROLE [3]
SIGN [4]
COMP1 [1] ~obligatory
COMP2 [2] ~obligatory
INDEX [5]
DTR dtr
LEFTMOD [7]
RIGHTMOD [8]
RIGHTCOMP [10]
EXPECTED a_sign [4]
CONTENT content
MYHEAD [5]
MYROLE [3] comp_role
INDEX [6]
CATEGORY [9]
PRINCIPLE #head_feature
(8.) lexical entry: chi
a_sign
KANJI one_character
H1 chi
CATEGORY v
INDEX0 [1] index
INDEX1 [2] index
COMP0 a_expected
DIRECTION left
SIGN a_sign
CATEGORY n
INDEX [1]
COMP1 a_expected
DIRECTION right
SIGN a_sign
CATEGORY n
INDEX [2]
KNOWLEDGE eat
U_OBJECT food
MALE none
PERSON 3
SINGULAR bin
U_SUBJECT animate
MALE bin
PERSON tri
SINGULAR bin
- Implementation and Application of CPSG
CPSG prototype implemented in ALE and Prolog, having parsed a corpus of 200 various types of sentences
ALE and Prolog: suitable for unification grammar
ALE: mechanism for typed feature structures: type polymorphism
a powerful tool in language modeling
CPSG prototype adapted for application to bi-directional MT, having generated the same corpus of 200 sentences
References
Beaven, John L. (1992): "Shake and Bake Machine Translation", Proceedings of the 15th International Conference on Computational Linguistics, pp. 603-609, Nantes, France.
Brew, Chris (1992): "Letting the Cat out of the Bag: Generation for Shake-and-bake MT", Proceedings of the 15th International Conference on Computational Linguistics, pp. 610-616, Nantes, France.
Carpenter, B. & Penn, G. (1994): ALE, The Attribute Logic Engine, User's Guide
Feng, Z. (1996): "COLIPS Lecture Series - Chinese Natural Language Processing", Communications of COLIPS, Vol.6, No.1 1996, Singapore (http://www.iscs.nus.sg/~colips/commcolips/paper/p96.html)
Huang, X-M. (1986): "A Bidirectional Grammar for Parsing and Generating Chinese". Proceedings of the International Conference on Chinese Computing, Singapore, pp. 46-54
Huang, X-M. (1987): XTRA: The Design and Implementation of A Fully Automatic Machine Translation System, Doctoral dissertation, University of Essex.
Hutchins, W.J. & H.L. Somers (1992): An Introduction to Machine Translation. London, Academic Press.
Li, W. (1996): Interaction of Syntax and Semantics in Parsing Chinese Transitive Patterns. Proceedings of International Conference on Chinese Computing (ICCC'96), Singapore
Li, W. (1997): Chart Parsing Chinese Character Strings. Proceedings of The Ninth North American Conference on Chinese Linguistics (NACCL-9, to be available), Victoria, Canada
Pollard, C. & I. Sag (1987): Information based Syntax and Semantics Vol. 1: Fundamentals. Centre for the Study of Language and Information, Stanford University, CA
Pollard, C. & I. Sag (1994): Head-Driven Phrase Structure Grammar, Centre for the Study of Language and Information, Stanford University, CA
Whitelock, Pete (1992): "Shake and Bake Translation", Proceedings of the 14th International Conference on Computational Linguistics, pp. 784-790, Nantes, France.
Whitelock, Pete (1994). "Shake and Bake Translation", C.J. Rupp, M.A. Rosner, and R.L. Johnson (eds.), Constraints, Language and Computation, pp. 339-359, London, Academic Press.
[Related]
Outline of an HPSG-style Chinese reversible grammar
PhD Thesis: Morpho-syntactic Interface in CPSG (cover page)
PhD Thesis: Chapter I Introduction
PhD Thesis: Chapter II Role of Grammar
PhD Thesis: Chapter III Design of CPSG95
PhD Thesis: Chapter IV Defining the Chinese Word
PhD Thesis: Chapter V Chinese Separable Verbs
PhD Thesis: Chapter VI Morpho-syntactic Interface Involving Derivation
PhD Thesis: Chapter VII Concluding Remarks