PhD Thesis: Chapter VII Concluding Remarks

This chapter summarizes the research conducted in this dissertation, including its contributions as well as limitation.

7.0. Summary

The goal of this dissertation is to explore effective ways of formally approaching Chinese morpho-syntactic interface in a phrase structure grammar.  This research has led to the following results:  (i) the design of a Chinese grammar, namely CPSG95, which enables flexible coordination and interaction of morphology and syntax;  (ii) the solutions proposed in CPSG95 to a series of long-standing problems at the Chinese morpho-syntactic interface.

CPSG95 was designed in the general framework of HPSG (Pollard and Sag 1987, 1994).  The sign-based mono-stratal design from HPSG demonstrates the advantage in being capable of accommodating and accessing information of different components of a grammar.  One crucial feature of CPSG95 is its introduction of morphology expectation feature structures and the corresponding morphological PS rules into HPSG.  As a result, CPSG95 has been demonstrated to provide a favorable environment for solving morpho-syntactic interface problems.

Three types of morpho-syntactic interface problems have been studied extensively: (i) the segmentation ambiguity in Chinese word identification;  (ii) Chinese separable verbs, a borderline problem between compounding and syntax; and (iii) borderline phenomena between derivation morphology and syntax.

In the context of the CPSG95 design, the segmentation ambiguity is no longer a problem as morphology and syntax are designed system internally in the grammar to support morpho-syntactic parsing based on non-deterministic tokenization (W. Li 1997, 2000).  In other words, the design of CPSG95 itself entails an adequate solution to this long-standing problem, a problem which has been a central topic in Chinese NLP for the last two decades.  This is made possible because the access to a full grammar including both morphology and syntax is available in the integrated process of Chinese parsing and word identification while traditional word segmenters can at best access partial grammar knowledge.[1]

The second problem involves an interesting case between compounding and syntax:  different types of Chinese separable verbs demonstrate various degrees of separability in syntax while all these verbs, when used contiguously, are part of Chinese verb vocabulary.  For each type of separable verbs, arguments were presented for the proposed linguistic analysis and a solution to the problem was then formulated in CPSG95 based on the analysis.  All the proposed solutions provide a way of capturing the link between the separated use and the contiguous use of the separable verb phenomena.  They are shown to be better solutions than previous approaches in the literature which either cannot link the separated use and the contiguous use in the analysis or suffer from being not formal.

The third problem at the interface of derivation and syntax involves two issues: (i) a considerable amount of ‘quasi-affix’ data, and (ii) the intriguing case of zhe-suffixation which demonstrates an unusual combination of a phrase with a bound morpheme.  A generic analysis of Chinese derivation has been proposed in CPSG95.  This analysis has been demonstrated to be also effective in handling both quasi-affixation and zhe-affixation.

7.1. Contributions

The specific contributions are reflected in the study of the following five topics, each constituting a chapter.

On the topic of the Role of Grammar, the investigation leads to the central argument that knowledge from both morphology and syntax is required to properly handle the major types of morpho-syntactic interface problems.  This establishes the foundation for the general design of CPSG95 as consisting of morphology and syntax in one grammar formalism.

An in-depth study has been conducted in the area of the segmentation ambiguity in Chinese word identification.  The most important discovery from the study is that the disambiguation involves the analysis of the entire input string.  This means that the availability of a grammar is key to the solution of this problem.  A natural solution to this problem is the use of grammatical analysis to resolve, and/or prepare the basis for resolving, the segmentation ambiguity.

On the topic of the Design of CPSG95, a mono-stratal Chinese phrase structure grammar has been established in the spirit of the HPSG theory.  Components of a grammar such as morphology, syntax and semantics are all accommodated in distinct features of a sign.  CPSG95 is designed to provide a framework and means for formalizing the analysis of the linguistic problems at the morpho-syntactic interface.

The essential part of this work is the design of expectation feature structures.  Expectation feature structures are generalized from the HPSG feature structures for syntactic subcategorization and modification.  One characteristic of the CPSG95 structural expectation is the design of morphological expectation features to incorporate Chinese productive derivation, which covers a wide range of linguistic phenomena in Chinese word formation.

In order to meet the requirements induced by introducing morphology into the general grammar and by accommodating linguistic characteristics of Chinese, modifications from the standard HPSG are proposed in CPSG95.  The rationale and arguments for these modifications have been presented.  The design of CPSG95 is demonstrated to be a successful application of HPSG in the study of Chinese morpho-syntactic phenomena.

On the topic of Defining the Chinese Word, efforts have been made to reach a better understanding of Chinese wordhood in theory, methodology and formalization.

The theoretical inquiry follows the insight from Di Sciullo and Williams (1987) and Lü (1989).  Two notions of word, namely grammar word and vocabulary word, have been examined and distinguished.  While vocabulary word is easy to define once a lexicon is given, the object for linguistic study and generalization is actually grammar word.  Unfortunately, as there is a considerable amount of borderline phenomena between Chinese morphology and syntax, no precise definition of Chinese grammar word has been available across systems.  Therefore, an argument in favor of the system-internal wordhood definition and interface coordination within a grammar has been made.  This leads to a case-by-case approach to the analysis of specific Chinese morpho-syntactic interface problems.

On the other hand, three useful wordhood judgment methods have also been proposed as a complementary means to the case-by-case analysis.  These methods are (i) syntactic process test involving passivization and topicalization; (ii) keyword based judgment patterns for verbs, and (iii) a general expansion test named X-insertion.  These methods are demonstrated to be fairly operational and easy to apply.

In terms of formalization, a system-internal representation of word has been defined in CPSG95 feature structures.  This definition distinguishes a grammar word from both bound morphemes and syntactic constructions.  The formalization effort is necessary for the rigid study of Chinese morpho-syntactic problems and ensures the implementability of the solutions to these problems as proposed in the dissertation.

On the topic of Chinese Separable Verbs, the task is to coordinate the idiomatic nature of separable verbs and their separated uses in various syntactic patterns.

Since there are different degrees of ‘separability’ for different types of Chinese separable verbs, there is no uniform analysis which can handle all separable verbs properly.  A case-by-case study for each type of separable verbs has been conducted.  An essential part of this study is the arguments for the wordhood judgment for each type.  In the light of this judgment, CPSG95 provides formalized analyses of separable verbs which satisfy two criteria:  (i)  they all capture both structural and semantic aspects of the constructions at issue; (ii) they all provide a way of capturing the link between the separated use and contiguous use.

Finally, on the topic of Morpho-syntactic Interface Involving Derivation, a general approach to Chinese derivation has been proposed.  This approach not only enables us to handle quasi-affix phenomena, but is also flexible enough to provide an adequate treatment of the special problem in zhe-suffixation.

In the CPSG95 analysis, the affix serves as head of a derivative and can impose various constraints in the lexicon on its expected stem sign for the morphological expectation.  Coupled with only two PS rules formulated in the general grammar (Prefix PS Rule and Suffix PS Rule), it has been shown that various Chinese affixation phenomena can be captured equally well.  The PS rules ensure that all the lexical constraints be observed before the affix and the stem combine and that the output of derivation be a word.

As for the quasi-affixation problem, based on the observation that there is no fundamental structural difference between quasi-affixation and other affixation, a proper treatment of ‘quasi-affixes’ can be established in the same way as other affixes are handled in CPSG95; the individual difference in semantics is shown to be capturable in the lexicon.

The study of zhe-suffixation started with arguments for its analysis of VP+-zhe.  This is an unsolvable problem in any system which enforces sequential processing of morphology before syntax.  The solution which CPSG95 offers demonstrates the power of designing derivation morphology and syntax in a mono-stratal grammar.   With this novel design in modeling Chinese grammar, the CPSG95 general approach to derivation readily applies to the tough case of zhe-suffixation.  This is possible because of the ability of an affix in placing any lexicalized constraints, VP in this case, on the expected stem for morphological expectation.  In addition, the proposed lexicalized solution also captures the building of the semantic content for this morpho-syntactic borderline phenomenon.

7.2. Limitation

The major limitation of the work reported in this thesis lies in the following two aspects.

Limited by space, the thesis has only presented some sample formulation of typical affixes and quasi-affixes to demonstrate the proposed general approach to Chinese derivation morphology.  As many affixes/quasi-affixes have their distinctive semantic property, a reader who likes to experiment with this proposal in implementation still has to work out the technical details for each affix.  However, it is believed that the general strategy has been presented in sufficient details to allow for easy accommodation of individual aspects of an affix which have not been specifically addressed in the thesis.

Limited by the focus on a handful of major morpho-syntactic interface problems, the treatment of reduplication and unlisted proper names have not been listed as special topics for in-depth exploration.  They are only briefly discussed in Chapter II (Section 2.2) as cases of productive word formation for the need to involve syntax when they involve segmentation ambiguity at the boundaries.  However, they are also long-standing word identification problems which affect morpho-syntactic interface when the segmentation ambiguity is involved.  In particular, it is felt that the treatment of transliterated foreign names requires further research before a satisfactory solution can be found in the framework of CPSG95.[2]

7.3. Final Notes

This last section is used to place the research reported in this thesis in a larger context.

Chinese NLP has reached a new stage marked by the publication of Guo’s series of papers on Chinese tokenization (Guo 1997a,b,c,d, Guo 1998).  There are signs that the major research focus is being shifted from word segmentation to the grammar design and development.  In this process,  the morph-syntactic interface will remain a hot topic for quite some time to come.  The work on CPSG95 can be seen as one of the efforts in this direction.

The design of CPSG95, a formal grammar capable of representing both morphology and syntax in a uniform formalism, is one successful application of the modern linguistic theory HPSG in the area of Chinese  morpho-syntactic interface research.  However, this is by no means to claim that CPSG95 is the only or best framework to capture the morpho-syntactic problems.   This is only one approach which has been shown to be feasible and effective.  Other equally good or better approaches may exist.

In terms of future directions, constraints from semantics and discourse should be made available in the grammatical analysis.  In Chapter II (Section 2.4), we have seen problems whose ultimate solutions depend on the access to the semantic or discourse constraints.  It is believed that the sign-based mono-stratal design of CPSG95 will be extensible to accommodate these constraints.  However, this will require years of future research before they can be formally modeled and properly introduced into the grammar.



[1] As a matter of fact, the CPSG95 experiment shows that most segmentation ambiguity is resolved automatically as a by-product of morpho-syntactic parsing and the remaining ambiguity is embodied in the multiple syntactic trees as the results of the analysis.

[2] However, in the CPSG95 implementation, the problem of handling the Chinese person names, a special case of compounding, has been solved fairly satisfactorily.  The proposal is to use the surname as the head sign to expect the given name (of one or two characters) on its right to form potential full names.  As the right boundary of a person name is difficult to define without the support of sentential analysis, the conventional word segmenter frequently makes wrong segmentation in such cases.  In contrast, the approach implemented in CPSG95 is free from this problem because whether a potential name proposed by the surname ultimately survive as a proper name is decided by whether it contributes to a valid parse for the processed sentence.  In last few years, there has been rapid progress on proper name identification in the area of information extraction, called named entity tagging (MUC7 1998; Chen et al 1997).



