Notes for An HPSG-style Chinese Reversible Grammar

ABSTRACT

Key words: Chinese parsing, Chinese generation, reversible grammar,  HPSG

This paper presents a reversible Chinese unification grammar named CPSG. The lexicalized and integrated design of CPSG embodies the general spirit of the modern linguistic theory Head-driven Phrase Structure Grammar (HPSG, Pollard & Sag 1987, 1994). Using ALE formalism in Prolog (Carpenter & Penn 1994), we have implemented a prototype of CPSG.

CPSG covers Chinese morphology, Chinese syntax and semantics in a novel integrated language model (Figure 1, for interface between morphology, see Li 1997; for interface between syntax and semantics, see Li 1996). CPSG model is in sharp contrast to the conventional clear-cut successive design of grammar components (Figure 2, see survey in Feng 1996). We will show that our model is much better suited and more efficient for Chinese analysis (or generation).

 

cpsg

Grammar reversibility is a highly desired feature for multi-lingual machine translation application (Hutchins & Somers 1992, Huang 1986, 1987). To test its reversible features, we have applied the CPSG prototype to an experiment of bi-directional machine translation between English and Chinese. The machine translation engine developed in our Natural Language Lab is based on shake-and-bake design, a novel approach to machine translation suited for unification grammars (Whitelock 1992, 1994, Beaven 1992, Brew 1992). The experimental results meet our design objective and verify the feasibility of CPSG approach.

~~~~~~~~~~~~~~~~~~~~~

Notes for NWLC-97, UBC, Vancouver

Outline of An HPSG-style Chinese Reversible Grammar

Wei LI   (lio@sfu.ca)

Linguistics Department, Simon Fraser University

 

 Key words:          lexicalist approach, integrated language model, HPSG,

                                reversible grammar,  bi-directional machine translation, 

                                Chinese computational grammar,

                                Chinese word identification, Chinese parsing,
Chinese generation

 

  1. background

1.1. design philosophy

Two major obstacles in writing Chinese computational grammar:

lacking in serious study on Chinese lexical base

well designed lexicon is crucial for a successful computational system

theoretical linguists have made fruitful efforts (e.g. Li Linding) but lack formalization

computational linguists require more patience in adapting and formalizing the fruits:

it is huge work, but has to be done if a non-toy system is targeted

lack of effective interaction between morphology, syntax and semantics.

e.g.

ambiguity in word identification makes it hard to interface morphology & syntax:

a theoretical defect of morphology preprocessor (segmenter)

e.g. ABC: ABC or A | BC or AB | C or A | B | C?

active/passive isomorphic phenomena make semantic constraint a desired need in parsing NP Vt: subject NP or object NP?

Solution: the lexicalized and integrated design of Chinese grammar

1.2. major theoretical foundation:

HPSG:       lexicalist theory encouraging integration of different components

a desired framework matching our design philosophy

CPSG: HPSG-style unification grammar

CPSG: reversible grammar suited for both parsing and generation

CPSG: formalized grammar, a description that does not rely on undefined notions

  1. integrated language model

2.1. CPSG versus conventional Chinese grammar

 

 

parse tree embodies both morphological and syntactic structures in CPSG

  1. lexicalized formal grammar

3.1. formalized grammar, as required by a computational grammar: formulation of CPSG

readily implementable (theories, principles, rules, etc.);

precise definition for the very basic notions (e.g. sign, morpheme, word, phrase, sentence, NP, VP, etc.), rules (PS rules and lexical rules), lexical items (lexical hierarchy), typology (hierarchy embodied in feature structures)

(4.)       Definition: sign

A sign is the most fundamental concept of grammar. Formally, a sign is defined by the type [a_sign], which introduces a set of linguistic features for its description, as shown below.

a_sign
INDEX index
KANJI kanji
MORPH1 expected
MORPH2 expected
CATEGORY category
COMP0 expected
COMP1 expected
COMP2 expected
MOD expected
KNOWLEDGE knowledge
CONTENT content
INDEX0 index
INDEX1 index
INDEX2 index
DTR dtr

(5.)       Definition: word

In CPSG, a word is a sign satisfying the following two conditions: (1) its obligatory morphological expectation has all been saturated; (2) it is not a mother of any syntactic structures, hence no syntactic daughters. Formally, a word is defined as shown below.

(6.)       word

a_sign
MORPH1 ~obligatory
MORPH2 ~obligatory
DTR no_syn_dtr

3.2. lexicalized grammar

CPSG consists of two parts:

(1) a minimized general grammar:

only 11 phrase structure rules
(covering complement structure, modifier structure,
conjunctive structure and morphological structure)

(2) a feature enriched lexicon:

lexical entries;
lexical hierarchy and a set of lexical rules
(capturing lexical generalizations).

 

(7.)          comp0 PS rule

MOTHER               a_sign
COMP0 saturated
COMP1 [1]
COMP2 [2]
DTR comp0
MYSISTER [6]
LEFTMOD [7] category
RIGHTMOD [8] category
LEFTCOMP [9] category
RIGHTCOMP [10] category

===>

EXPECTING          a_sign
COMP0 a_expected
DIRECTION left
ROLE [3]
SIGN [4]
COMP1 [1] ~obligatory
COMP2 [2] ~obligatory
INDEX [5]
DTR dtr
LEFTMOD [7]
RIGHTMOD [8]
RIGHTCOMP [10]

EXPECTED            a_sign [4]
CONTENT content
MYHEAD [5]
MYROLE [3] comp_role
INDEX [6]
CATEGORY [9]

PRINCIPLE            #head_feature

(8.)          lexical entry: chi

a_sign
KANJI one_character
H1 chi
CATEGORY v
INDEX0 [1] index
INDEX1 [2] index
COMP0 a_expected
DIRECTION left
SIGN a_sign
CATEGORY n
INDEX [1]
COMP1 a_expected
DIRECTION right
SIGN a_sign
CATEGORY n
INDEX [2]
KNOWLEDGE eat
U_OBJECT food
MALE none
PERSON 3
SINGULAR bin
U_SUBJECT animate
MALE bin
PERSON tri
SINGULAR bin

  1. Implementation and Application of CPSG

CPSG prototype implemented in ALE and Prolog, having parsed a corpus of 200 various types of sentences

ALE and Prolog: suitable for unification grammar
ALE:         mechanism for typed feature structures: type polymorphism
a powerful tool in language modeling

CPSG prototype adapted for application to bi-directional MT, having generated the same corpus of 200 sentences

References

Beaven, John L. (1992): “Shake and Bake Machine Translation”, Proceedings of the 15th International Conference on Computational Linguistics, pp. 603-609, Nantes, France.

Brew, Chris (1992): “Letting the Cat out of the Bag: Generation for Shake-and-bake MT”, Proceedings of the 15th International Conference on Computational Linguistics, pp. 610-616, Nantes, France.

Carpenter, B. & Penn, G. (1994): ALE, The Attribute Logic Engine, User’s Guide

Feng, Z.  (1996): “COLIPS Lecture Series – Chinese Natural Language Processing”,  Communications of COLIPS, Vol.6, No.1 1996, Singapore (http://www.iscs.nus.sg/~colips/commcolips/paper/p96.html)

Huang, X-M. (1986): “A Bidirectional Grammar for Parsing and Generating Chinese”.  Proceedings of the International Conference on Chinese Computing, Singapore, pp. 46-54

Huang, X-M. (1987): XTRA: The Design and Implementation of A Fully Automatic Machine Translation System, Doctoral dissertation, University of Essex.

Hutchins, W.J. & H.L. Somers (1992): An Introduction to Machine Translation. London, Academic Press.

Li, W. (1996): Interaction of Syntax and Semantics in Parsing Chinese Transitive Patterns. Proceedings of International Conference on Chinese Computing (ICCC’96), Singapore

Li, W. (1997): Chart Parsing Chinese Character Strings. Proceedings of The Ninth North American Conference on Chinese Linguistics (NACCL-9, to be available), Victoria, Canada

Pollard, C.  & I. Sag (1987): Information based Syntax and Semantics Vol. 1: Fundamentals. Centre for the Study of Language  and Information, Stanford University, CA

Pollard, C.  & I. Sag (1994): Head-Driven Phrase Structure Grammar,  Centre for the Study of Language and Information, Stanford University, CA

Whitelock, Pete (1992): “Shake and Bake Translation”, Proceedings of the 14th International Conference on Computational Linguistics, pp. 784-790, Nantes, France.

Whitelock, Pete (1994). “Shake and Bake Translation”, C.J. Rupp, M.A. Rosner, and R.L. Johnson (eds.), Constraints, Language and Computation, pp. 339-359, London, Academic Press.

 

[Related]

Outline of an HPSG-style Chinese reversible grammar

PhD Thesis: Morpho-syntactic Interface in CPSG (cover page)

PhD Thesis: Chapter I Introduction

PhD Thesis: Chapter II Role of Grammar

PhD Thesis: Chapter III Design of CPSG95

PhD Thesis: Chapter IV Defining the Chinese Word

PhD Thesis: Chapter V Chinese Separable Verbs

PhD Thesis: Chapter VI Morpho-syntactic Interface Involving Derivation

PhD Thesis: Chapter VII Concluding Remarks

Overview of Natural Language Processing

Dr. Wei Li’s English Blog on NLP

发布者

liweinlp

立委博士,弘玑首席科学家,自然语言处理(NLP)资深架构师。前讯飞AI研究院副院长,研发支持对话的多语言平台,前京东主任科学家, 主攻深度解析和知识图谱及其应用。Netbase前首席科学家,期间指挥研发了18种语言的理解和应用系统。特别是汉语和英语,具有世界一流的解析(parsing)精度,并且做到鲁棒、线速,scale up to 大数据,语义落地到数据挖掘和问答产品。Cymfony前研发副总,曾荣获第一届问答系统第一名(TREC-8 QA Track),并赢得17个小企业创新研究的信息抽取项目(PI for 17 SBIRs)。立委NLP工作的应用方向包括大数据舆情挖掘、客户情报、信息抽取、知识图谱、问答系统、智能助理、语义搜索等等。

发表评论

电子邮件地址不会被公开。

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据