PhD Thesis: Chapter V Chinese Separable Verbs


5.0. Introduction

This chapter investigates the phenomena usually referred to as separable verbs (离合动词 lihe dongci) in the form V+X.  Separable verbs constitute a significant portion of Chinese verb vocabulary.[1]  These idiomatic combinations seem to show dual status (Z. Lu 1957; L. Li 1990).  When V+X is not separated, it is like an ordinary verb.   When V is separated from X, it seems to be more like a phrasal combination.  The co-existence of both the separated use and contiguous use for these constructions is recognized as a long-standing problem at the interface of Chinese morphology and syntax (L. Wang 1955;  Z. Lu 1957; Chao 1968; Lü 1989; Lin 1983;  Q. Li 1983; L. Li 1990; Shi 1992; Dai 1993; Zhao and Zhang 1996).

Some linguists (e.g. L. Li 1990; Zhao and Zhang 1996) have made efforts to classify different types of separable verbs and demonstrated different linguistic facts about these types.  There are two major types of separable verbs:  V+N idioms with the verb-object relation and V+A/V idioms with the verb-modifier relation – when X is A or non-conjunctive V.[2]

The V+N idiom is a typical case which demonstrates the mismatch between a vocabulary word and grammar word.  There have been three different views on whether V+N idioms are words or phrases in Chinese grammar.

Given the fact that the V and the N can be separated in usage, the most popular view (e.g. Z. Lu 1957; L. Li 1990; Shi 1992) is that they are words when V+N are contiguous and they are phrases otherwise.  This analysis fails to account for the link between the separated use and the contiguous use of the idioms.  In terms of the type of V+N idioms like 洗澡 xi zao (wash-bath: take a bath), this analysis also fails to explain why a different structural analysis should be given to this type of contiguous V+N idioms listed in the lexicon than the analysis to the also contiguous but non-listable combination of V and N (e.g. 洗碗 xi wan ‘wash dishes’).[3]  As will be shown in Section 5.1, the structural distribution for this type of V+N idioms and the distribution for the corresponding non-listable combinations are identical.

Other grammarians argue that V+N idioms are not phrases (Lin 1983;  Q. Li 1983; Zhao and Zhang 1996).  They insist that they are words, or a special type of words.  This argument cannot explain the demonstrated variety of separated uses.

There are scholars (e.g. Lü 1989; Dai 1993) who indicate that idioms like 洗澡 xi zao are phrases.  Their judgment is based on their observation of the linguistic variations demonstrated by such idioms.  But they have not given detailed formal analyses which account for the difference between these V+N idioms and the non-listable V+NP constructions in the semantic compositionality.  That seems to be the major reason why this insightful argument has not convinced people with different views.

As for V+A/V idioms, Lü (1989) offers a theory that these idioms are words and the insertable signs between V and A/V are Chinese infixes.  This is an insightful hypothesis.  But as in the case of the analyses proposed for V+N idioms, no formal solutions have been proposed based on the analyses in the context of phrase structure grammars.  As a general goal, a good solution should not only be implementable, but also offer an analysis which captures the linguistic link, both structural and semantic, between the separated use and the contiguous use of separable verbs.  It is felt that there is still a distance between the proposed analyses reported in literature and achieving this goal of formally capturing the linguistic generality.

Three types of V+X idioms can be classified based on their different degrees of ‘separability’ between V and X, to be explored in three major sections of this chapter.  Section 5.1 studies the first type of V+N idioms like 洗澡 xi zao (wash-bath: take a bath).  These idioms are freely separable.  It is a relatively easy case.  Section 5.2 investigates the second type of the V+N idioms represented by 伤心 shang xin (hurt-heart: sad or heartbroken).  These idioms are less separable.  This category constitutes the largest part of the V+N phenomena.  It is a more difficult borderline case.  Section 5.3 studies the V+A/V idioms.  These idioms are least separable:  only the two modal signs 得 de3 (can) and 不 bu (cannot) can be inserted inside them, and nothing else.  For all these problems, arguments for the wordhood judgment will be presented first.  A corresponding morphological or syntactic analysis will be proposed, together with the formulation of the solution in CPSG95 based on the given analysis.

5.1. Verb-object Idioms: V+N I

The purpose of this section is to analyze the first type of V+N idioms, represented by 洗澡 xi zao (wash‑bath: take a bath).  The basic arguments to be presented are that they are verb phrases in Chinese syntax and the relationship between the V and the N is syntactic.  Based on these arguments, formal solutions to the problems involved in this construction will be presented.

The idioms like 洗澡 xi zao are classified as V+N I, to be distinguished from another type of idioms V+N II (see 5.2).  The following is a sample list of this type of idioms.

(5-1.) V+N I: xi zao type

洗澡 xi (wash) zao (bath #)              take a bath
擦澡 ca (scrub) zao (bath #)             clean one’s body by scrubbing
吃亏 chi (eat) kui (loss #)                   get the worst
走路 zou (go) lu (way $)                      walk
吃饭 chi (eat) fan (rice $)                    have a meal
睡觉 shui (V:sleep) jiao (N:sleep #)   sleep
做梦 zuo (make) meng (N:dream)     dream (a dream)
吵架  chao (quarrel) jia (N:fight #)    quarrel (or have a row)
打仗 da (beat) zhang (battle)              fight a battle
上当 shang (get) dang (cheating #)                be taken in
拆台 chai (pull down) tai (platform #)          pull away a prop
见面 jian (see) mian (face #)                            meet (face to face)
磕头 ke (knock) tou (head)                              kowtow
带头 dai (lead) tou (head $)                            take the lead
帮忙 bang (help) mang (business #)              give a hand
告状 gao (sue) zhuang (complaint #)            lodge a complaint

Note: Many nouns (marked with # or $) in this type of constructions cannot be used independently of the corresponding V.[4]  But those with the mark $ have no such restriction in their literal sense.  For example, when the sign fan  means ‘meal’, as it does in the idiom, it cannot be used in a context other than the idiom chi-fan (have a meal).  Only when it stands for the literal meaning ‘rice’, it does not have to co-occur with  chi.

There is ample evidence for the phrasal status of the combinations like 洗澡 xi zao.  The evidence is of three types.  The first comes from the free insertion of some syntactic constituent X between the idioms in the form V+X+N: this involves keyword-based judgment patterns and other X‑insertion tests proposed in Chapter IV.  The second type of evidence resorts to some syntactic processes for the transitive VP, namely passivization and long-distance topicalization.  The V+N I idioms can be topicalized and passivized in the same way as ordinary transitive VP structures do.  The last piece of evidence comes from the reduplication process associated with this type of idiom.   All the evidence leads to the conclusion that V+N I idioms are syntactic in nature.

The first evidence comes from using the wordhood judgment pattern: V(X)+zhe/guo à word(X).  It is a well observed syntactic fact that Chinese aspectual markers appear right after a lexical verb (and before the direct object).  If 洗澡 xi zao were a lexical verb, the aspectual markers would appear after the combinations, not inside them.  But that is not the case, shown by the ungrammaticality of the example in (5-2b).  A productive transitive VP example is given in (5-3) to show its syntactic similarity (parallelness) with V+N I idioms.

(5-2.) (a)      他正在洗着澡
ta       zheng-zai    xi      zhe    zao.
he      right-now    wash ZHE   bath
He is taking a bath right now.

(b) *   他正在洗澡着。
ta       zheng-zai    xizao         zhe.
he      right-now    wash-bath   ZHE

(5-3.) (a)      他正在洗着衣服。
ta       zheng-zai    xi      zhe    yi-fu.
he      right-now    wash ZHE   clothes
He is washing the clothes right now.

(b) *   他正在洗衣服着。
ta       zheng-zai    xi      yi-fu           zhe.
he      right-now    wash clothes        ZHE

The above examples show that the aspectual marker 着 zhe (ZHE) should be inserted in the V+N idiom, just as it does in an ordinary transitive VP structure.

Further evidence for X-insertion is given below.   This comes from the post-verbal modifier of ‘action-times’ (动量补语 dongliang buyu) like ‘once’, ‘twice’, etc.  In Chinese, action-times modifiers appear after the lexical verb and aspectual marker (but before the object), as shown in (5-4a) and (5-5a).

(5-4.) (a)      他洗了两次澡。
ta       xi      le       liang  ci       zao.
he      wash LE     two    time   bath
He has taken a bath twice.

(b) *   他洗澡了两次。
ta       xizao         le       liang  ci.
he      wash-bath   LE     two    time

(5-5.) (a)      他洗了两次衣服。
ta       xi      le       liang  ci       yi-fu.
he      wash LE     two    time   clothes
He has washed the clothes twice.

(b) *   他洗衣服了两次。
ta       xi      yi-fu           le       liang  ci.
he      wash clothes        LE     two    time

So far, evidence has been provided of syntactic constituents which are attached to the verb in the V+N I idioms.  To further argue for the VP status of the whole idiom, it will be demonstrated that the N in the V+N I idioms in fact fills the syntactic NP position in the same way as all other objects do in Chinese transitive VP structures.  In fact, N in the V+N I does not have to be a bare N:  it can be legitimately expanded to a full-fledged NP (although it does not normally do so).  A full-fledged NP in Chinese typically consists of a classifier phrase (and modifiers like de-construction) before the noun.  Compare the following pair of examples.  Just like an ordinary NP 一件崭新的衣服 yi jian zan-xin de yi-fu (one piece of brand-new clothes), 一个痛快的澡 yi ge tong-kuai de zao (a comfortable bath) is a full-fledged NP.

(5-6.)           他洗了一个痛快的澡。
ta       xi      le       yi       ge      tong-kuai     de      zao.
he      wash LE     one    CLA   comfortable DE     bath
He has taken a comfortable bath.

(5-7.)           他洗了一件崭新的衣服。
ta       xi      le       yi       jian    zan-xin        de      yi-fu.
he      wash LE     one    CLA   brand-new  DE     clothes
He has washed one piece of brand-new clothes.

It requires attention that the above evidence is directly against the following widespread view, i.e. signs like 澡 zao, marked with # in (5-1), are ‘bound morphemes’ or ‘bound stems’ (e.g. L. Li 1990; Zhao and Zhang 1996).  As shown, like every other free morpheme noun (e.g. yi-fu), zao holds a lexical position in the typical Chinese NP sequence ‘determiner + classifier + (de-construction) + N’, e.g. 一个澡 yi ge zao (a bath), 一个痛快的澡 yi ge tong-kuai de zao (a comfortable bath).[5]  In fact, as long as the ‘V+N I phrase’ arguments are accepted (further evidence to come), by definition ‘bound morpheme’ is a misnomer for 澡 zao.  As a part of morphology, a bound morpheme cannot play a syntactic role:  it is inside a word and cannot be seen in syntax.  The analysis of 洗xi (…) zao as a phrase entails the syntactic roles played by 澡 zao:  (i) 澡 zao is a free morpheme noun which fills the lexical position as the final N inside the possibly full-fledged NP;  (ii) 澡zao plays the object role in the syntactic transitive structure 洗澡xi zao.

This bound morpheme view is an argument used for demonstrating  the relevant V+N idioms to be words rather than phrases (e.g. L. Li 1990).  Further examination of this widely accepted view will help to strengthen the counter-arguments that all V+N I idioms are phrases.

Labeling signs like 澡zao (bath) as bound morphemes seem to come from an inappropriate interpretation of the statement that bound morphemes cannot be ‘freely’, or ‘independently’, used in syntax.[6]  This interpretation places an equal sign between the idiomatic co-occurrence constraint and ‘not being freely used’.  It is true that 澡zao is not an ordinary noun to be used in isolation.  There is a co-occurrence constraint in effect:  澡zao cannot be used without the appearance of 洗xi (or 擦ca).  However, the syntactic role played by 澡zao, the object in the syntactic VP structure, has full potential of being ‘freely’ used as any other Chinese NP object:   it can even be placed before the verb in long-distance constructions as shall be shown shortly.  A more proper interpretation of ‘not being freely used’ in terms of defining bound morphemes should be that a genuine bound morpheme, e.g. the suffix 性 -xing ‘-ness’, has to attach to another sign contiguously to form a word.

A comparison with similar phenomena in English may be helpful.  English also has similar idiomatic VPs, such as kick the bucket.[7]  For the same reason, it cannot be concluded that bucket (or the bucket) is a bound morpheme only because it demonstrates necessary co-occurrence with the verb literal kick.  Signs like bucket, zao (bath) are not of the same nature as bound morphemes like –less, -ly, un-, ‑xing (-ness), etc

The second type of evidence shows some pattern variations for the V+N I idioms.  These variations are typical syntactic patterns for the transitive V+NP structure in Chinese.  One of most frequently used patterns for transitive structures is the topical pattern of long distance dependency.  This provides strong evidence for judging the V+N I idioms as syntactic rather than morphological.  For, with the exception of clitics, morphological theories in general conceive of the parts of a word as being contiguous.[8]  Both the V+N I idiom and the normal V+NP structure can be topicalized, as shown in (5-8b) and (5-9b) below.

(5-8.) (a)      我认为他应该洗澡。
wo     ren-wei        ta       ying-gai       xi zao.
I         think           he      should        wash-bath
I think that he should take a bath.

(b)      澡我认为他应该洗
zao    wo     ren-wei        ta       ying-gai       xi.
bath  I         think           he      should        wash
The bath I think that he should take.

(5-9.) (a)       我认为他应该洗衣服。
wo     ren-wei        ta       ying-gai       xi      yi-fu.
I         think           he      should        wash clothes
I think that he should wash the clothes.

(b)      衣服我认为他应该洗。
yi-fu           wo     ren-wei        ta       ying-gai       xi.
clothes        I         think           he      should        wash
The clothes I think that he should wash.

The minimal pair of passive sentences in (5-10) and (5‑11) further demonstrates the syntactic nature of the V+N I structure.

(5-10.)         澡洗得很干净。
zao             xi      de3    hen    gan-jing.
bath            wash DE3   very   clean
A good bath was taken so that one was very clean.

(5-11.)         衣服洗得很干净。
yi-fu           xi      de3    hen    gan-jing.
clothes        wash DE3   very   clean
The clothes were washed clean.

The third type of evidence involves the nature of reduplication associated with such idioms.  For idioms like 洗澡 xi zao (take a bath), the first sign can be reduplicated to denote the shortness of the action:  洗澡 xi zao (take a bath) –> 洗洗澡 xi xi zao (take a short bath).  If 洗澡 xi zao is a word, by definition, 洗xi is a morpheme inside the word and 洗洗澡 xi-xi-zao belongs to morphological reduplication (AB–>AAB type).  However, this analysis fails to account for the generality of such reduplication:  it is a general rule in Chinese grammar that a verb reduplicates itself contiguously to denote the shortness of the action.  For example, 听音乐 ting (listen to) yin-yue (music) –> 听听音乐 ting ting yin-yue (listen to music for a while); 休息 xiu-xi (rest) –> 休息休息 xiu-xi xiu-xi (have a short rest), etc.  On the other hand, when we accept that 洗澡 xi zao is a verb-object phrase in syntax and the nature of this reduplication is accordingly judged as syntactic,[9] we come to a satisfactory and unified account for all the related data.  As a result, only one reduplication rule is required in CPSG95 to capture the general phenomena;[10]  there is no need to do anything special for V+N  idioms.

This AB ‑‑> AAB type reduplication problem for the V+N idioms poses a big challenge to traditional word segmenters (Sun and Huang 1996).  Moreover, even when a word segmenter successfully incorporates some procedure to cope with this problem, the essentially same rule has to be repeated in the grammar for the general VV reduplication.  This is not desirable in terms of capturing the linguistic generality.

All the evidence presented above indicates that idioms like 洗澡xi zao, no matter whether V and N are used contiguously or not, are not words, but phrases.  The idiomatic nature of such combinations seems to be the reason why most native speakers, including some linguists, regard them as words.  Lü (1989: 113-114) suggests that vocabulary words  like 洗澡 xi zao should be distinguished from grammar words.  He was one of the first Chinese grammarians who found that the V+N relation in the idioms like 洗澡 xi zao is a syntactic verb object relation.  But he did not provide full arguments for his view, neither did he offer a precise formalized analysis of this problem.[11]

As shown in the previous examples, the V+N I idioms do not differ from other transitive verb phrases in all major syntactic behaviors.   However, due to their idiomatic nature, the V+N I idioms are different from ordinary transitive VPs in the following two major aspects.  These differences need to be kept in mind when formulating the grammar to capture the phenomena.

  • Semantics:  the semantics of the idiom should be given directly in the lexicon, not as a result of the computation of the semantics of the parts based on some general principle of compositionality.
  • Co-occurrence requirement:  洗 xi (or 擦 ca) and 澡 zao must co-occur with each other;  走 zou (go) and 路 lu (way) must co-occur; etc.  This is a requirement specific to the idioms at issue.  For example, 洗 xi and 澡 zao must co-occur in order to stand as an idiom to mean ‘take a bath’.

Based on the study above, the CPSG95 solution to this problem is described below.  In order to enforce the co-occurrence of the V+N I idioms, it is specified in the CPSG95 lexicon that the head V obligatorily expects as its object an NP headed by a specific literal.  This treatment originates from the practice of handling collocations in HPSG.  In HPSG, there are features designed to enable the subcategorization for particular words, or phrases headed by particular words.  For example, the feature [NFORM there] and [NFORM it] refer to the expletive there and it respectively for the special treatment of existential constructions, cleft constructions, etc. (Pollard and Sag 1987:62).  The values of the feature PFORM distinguish individual prepositions like for, on, etc.  They are used in phrasal verbs like rely on NP, look for NP, etc.  In CPSG95, this approach is being generalized, as described below.

As presented before, the feature for orthography [HANZI] records the Chinese character string for each lexical sign.  When a specific lexical literal is required in an idiomatic expectation, the constraint is directly placed on the value of the feature [HANZI] of the expected sign, in addition to possible other constraints.  It is standard practice in a lexicalized grammar that the expected complement (object) for the transitive structure be coded directly in the entry of the head V in the lexicon.  Usually, the expected sign is just an ordinary NP.  In the idiomatic VP like 洗 xi (…) 澡 zao, one further constraint is placed:  the expected NP must be headed by the literal character 澡zao.  This treatment ensures that all pattern variations for transitive VP such as passive constructions, topicalized constructions, etc. in Chinese syntax will equally apply to the V+N I idioms.[12]

The difference in semantics is accommodated in the feature [CONTENT] of the head V with proper co-indexing.  In ordinary cases like 洗衣服 xi yi-fu (wash clothes), the argument structure is [vt_semantics] which requires two arguments, with the role [ARG2] filled by the semantics of the object NP.  In the idiomatic case 洗澡 xi zao (take a bath), the V and N form a semantic whole, coded as [RELN take_bath].[13]  The V+N I idioms are formulated like intransitive verbs in terms of composing the semantics – hence coded as [vi_semantics], with only one argument to be co-indexed with the subject NP.  Note that there are two lexical entries in the lexicon for the verb 洗 xi (wash), one for the ordinary use and the other for the idiom, shown in (5-12) and (5-13).


The above solution takes care of the syntactic similarity of the
V+N I idioms and ordinary V+NP structures.  It is also detailed enough to address their major differences.  In addition, the associated reduplication process (i.e. V+N –> V+V+N) is no longer a problem once this solution is adopted.  As the V in the V+N idioms is judged and coded as a lexical V (word) in this proposal, the reduplication rule which handles V –> VV will equally apply here.

5.2. Verb-object Idioms: V+N II

The purpose of this section is to provide an analysis of another type of V+N idiom and present the solution implemented in CPSG95 based on the analysis.

Examples like 洗澡 xi zao (take a bath) are in fact easy cases to judge.   There are more marginal cases.  When discussing Chinese verb-object idioms, L. Li (1990) and Shi (1992) indicate that the boundary between a word and a phrase in Chinese is far from clear-cut.  There is a remarkable “gray area” in between.  Examples in (5-14) are V+N II idioms, in contrast to the V+N I type, classified by L. Li (1990).

(5-14.) V+N II: 伤心 shang xin type

伤心 shang (hurt) xin (heart)             sad or break one’s heart
担心 dan (carry) xin (heart)               worry
留神 liu (pay) shen (attention)           pay attention to
冒险 mao (take) xian (risk)                 take the risk
借光 jie (borrow) guang (light)           benefit from
劳驾 lao (bother) jia (vehicle)             beg the pardon
革命 ge (change) ming (life)                 make revolution
落后 luo (lag) hou (back)                      lag behind
放手 fang (release) shou (hand)          release one’s hold

Compared with V+N I (洗澡xi zao type), V+N II has more characteristics of a word.  The lists below given by L. Li (1990) contrast their respective characteristics.[14]

(5-15.) V+N I (based on L. Li 1990:115-116)

as a word


(a1) corresponds to one generalized sense (concept)

(a2) usually contains ‘bound morpheme(s)’

as a phrase



(b1) may insert an aspectual particle (X=le/zhe/guo)

(b2) may insert all types of post-verbal modifiers (X=BUYU)

(b3) may insert a pre-nominal modifier de-construction (X=DEP)

(5-16.) V+N II (based on L. Li 1990:115)

as a word



(a1) corresponds to one generalized sense (concept)

(a2) usually contains ‘bound morpheme(s)’

(a3) (some) may be followed by an aspectual particle (X=le/zhe/guo)

(a4) (some) may be followed by a post-verbal modifier
of duration or number of times (X=BUYU)

(a5) (some) may take an object (X=BINYU)

as a phrase



(b1) may insert an aspectual particle (X=le/zhe/guo)

(b2) may insert all types of post-verbal modifiers (X=BUYU)

(b3) may insert a pre-nominal modifier de-construction (X=DEP)

For V+N I, the previous text has already given detailed analysis and evidence and decided that such idioms are phrases, not words.  This position is not affected by the demonstrated features (a1) and (a2) in (5‑15);  as argued before,  (a1) and (a2) do not contribute to the definition of a grammar word.

However, (a3), (a4) and (a5) are all syntactic evidence showing that V+N II idioms can be inserted in lexical positions.   On the other hand, these idioms also show the similarity with V+N I idioms in the features (b1), (b2) and (b3) as a phrase.  In particular, (a3) versus (b1) and (a4) versus (b2) demonstrate a ‘minimal pair’ of phrase features and word features.  The following is such a minimal pair example (with the same meaning as well) based on the feature pairs (a3) versus (b1), with a post-verbal modifier 透tou (thorough) and aspectual particle 了le (LE).  It demonstrates the borderline status of such idioms.  As before, a similar example of an ordinary transitive VP is also given below for comparison.

(5-17.)         V+N II: word or phrase?

伤心:sad; heart-broken
shang          xin
hurt            heart

(a)      我伤心透了
wo     shang-xin  tou              le.
I         sad              thorough     LE
I was extremely sad.

(b)      我伤透了心
wo     shang         tou              le       xin.
I         break          thorough     LE     heart
I was extremely sad.

(5-18.)         Ordinary V+NP phrase: 恨hen (hate) 他ta (he)

(a) *   我恨他透了
wo     hen   ta      tou              le.
I         hate   he      thorough     LE

(b)      我恨透了他
wo     hen   tou              le       ta.
I         hate   thorough     LE     he
I thoroughly hate him.

As shown in (5-18), in the common V+NP structure, the post-verbal modifier 透 tou (thorough) and the aspectual particle 了 le (perfect aspect) can only occur between the lexical V and NP.  But in many V+N II idioms, they may occur either after the V+N combination or in between.  In (5‑17a), 伤心 shang xin is in the lexical position because Chinese syntax requires that the post-verbal modifier attach to the lexical V, not to a VP as indicated in (5-18a).  Following the same argument, 伤 shang (hurt) alone in (5-17b) must be a lexical V as well.  The sign 心 xin (heart) in (5‑17b) establishes itself in syntax as object of the V, playing the same role as 他ta (he) in (5-18b).  These facts show clearly that V+N II idioms can be used both as lexical verbs and as transitive verb phrases.   In other words, before entering a context, while still in the lexicon, one can not rule out either possibility.

However, there is a clear cut condition for distinguishing its use as a word and its use as a phrase once a V+N II idiom is placed in a context.   It is observed that the only time a V+N II idiom assumes the lexical status is when V and N are contiguous.  In all other cases, i.e. when V and N are not contiguous, they behave essentially similar to the V+N I type.

In addition to the examples in (5-17) above, two more examples are given below to demonstrate the separated phrasal use of V+N II.  The first is the case V+X+N where X is a possessive modifier attached to the head N.  Note also the post-verbal position of 透 tou (thorough) and 了le (LE).  The second is an example of passivization when N occurs before V.  These examples provide strong evidence for the syntactic nature of V+N II idioms when V and N are not used contiguously.

(5-19.) (a) *   你伤他的心透了
ni       shang         ta       de      xin    tou              le.
you    hurt            she    DE     heart thorough     LE

(b)      你伤透了他的心
ni       shang         tou              le       ta       de      xin.
you    hurt            thorough     LE     she    DE     heart
You broke her heart.

(5-20.)         V+N II: instance of passive with or without 被 bei (BEI)

xin    (bei)   shang         tou              le.
heart BEI    break          thorough     LE
The heart was completely broken.
or: (Someone) was extremely sad.

Based on the above investigation, it is proposed in CPSG95 that two distinct entries be constructed for each such idiom, one as an inseparable lexical V, and the other as a transitive VP just like that of V+N I.  Each entry covers its own part of the phenomena.  In order to capture the semantic link between the two entries, a lexical rule called V_N_II Rule is formulated in CPSG95, shown in (5-21).


The input to the V_N_II Lexical Rule is an entry with [CATEGORY v_n_ii] where [v_n_ii] is a given sub-category in the lexicon for V+N II type verbs.  The output is another entry with the same information except for three features [HANZI], [CATEGROY] and [COMP1_RIGHT].  The new value for [HANZI] is a list concatenating the old [HANZI] and the [HANZI] for the expected [COMP1_RIGHT].  The new [CATEGORY] value is simply [v].  The value for [COMP1_RIGHT] becomes [null].  The outline of the two entries captured by this lexical rule are shown in (5-22) and (5-23).


It needs to be pointed out that the definition of [CATEGORY v_n_ii] in CPSG95 is narrower than L. Li’s definition of V+N II type idioms.  As indicated by L. Li (1990), not all V+N II idioms share the same set of lexical features (a3), (a4) and (a5) as a word.  The definition in CPSG95 does not include the idioms which share the lexical feature (a5), i.e. taking a syntactic object.  These are idioms like 担心danxin (carry-heart: worry about).  For such idioms, when they are used as inseparable compound words, they can take a syntactic object.  This is not possible for all other V+N idioms, as shown below.

(5-24.) (a)     她很担心你
ta       hen    dan-xin                ni.
he      very   worry (about)        you
He is very concerned about you.

(b) *   他很伤心你
ta       hen    shang-xin            ni.
he      very   sad                       you

In addition, these idioms do not demonstrate the full distributional potential of transitive VP constructions.  The separated uses of these idioms are far more limited than other V+N idioms.  For example, they can hardly be passivized or topicalized as other V+N idioms can, as shown by the following minimal pair of passive constructions.

(5-25.)(a) *   心(被)担透了
xin    (bei)   dan             tou              le.
heart BEI    carry           thorough     LE

(b)      心(被)伤透了
xin    (bei)   shang         tou              le.
heart BEI    break          thorough     LE
The heart was completely broken.
or: (Someone) was extremely sad.

In fact, the separated use (‘phrasal use’) for such V+N idioms seems only limited to some type of X-insertion, typically the appearance of aspect signs between V and N.[15]  Such separated use is the only thing shared by all V+N idioms, as shown below.

(5-26.)(a)     他担过心
ta       dan             guo    xin
he      carry           GUO  heart
He (once) was worried.

(b)      他伤过心
ta       shang         guo    xin
he      break          GUO  heart
He (once) was heart-broken.

To summarize,  the V+N idioms like 担心 dan-xin which can take a syntactic object do not share sufficient generality with other V+N II idioms for a lexical rule to capture.  Therefore, such idioms are excluded from the [CATEGORY v_n_ii] type.  This makes these idioms not subject to the lexical rule proposed above.  It is left for future research to answer the question whether there is enough generality among this set of idioms to justify some general approach to this problem, say, another lexical rule or some other ways of generalization of the phenomena.  For time being, CPSG95 simply lists both the contiguous and separated uses of these idioms in the lexicon.[16]

It is worth noticing that leaving such idioms aside, this lexical rule still covers large parts of V+N II phenomena.  The idioms like 担心dan-xin only form a very small set which are in the state of transition to words per se (from the angle of language development) but which still retain some (but not complete) characteristics of a phrase.[17]

5.3. Verb-modifier Idioms: V+A/V

This section investigates the V+X idioms in the form of V+A/V.  The data for the interaction of V+A/V idioms and the modal insertion are presented first.  The subsequent text will argue for Lü’s infix hypothesis for the modal insertion and accordingly propose a lexical rule to capture the idioms with or without modal insertion.

The following is a sample list of V+A/V idioms, represented by kan jian (look-see: have seen).

(5-27.) V+A/V: kan jian type

看见 kan (look) jian (see)                    have seen
看穿 kan (look) chuan (through)        see through
离开 li (leave) kai (off)                         leave
打倒 da (beat) dao (fall)                      down with
打败 da (beat) bai (fail)                       defeat
打赢 da (beat) ying (win)                    fight and win
睡着 shui (sleep) zhao (asleep)            fall asleep
进来 jin (enter) lai (come)                             enter
走开 zou (go) kai (off)                         go away
关上  guan (close) shang (up)             close

In the V+A/V idiom kan jian (have-seen), the first sign kan (look) is the head of the combination while the second jian (see) denotes the result.  So when we say, wo (I) kan-jian (see) ta (he), even without the aspectual marker le (LE) or guo (GUO), we know that it is a completed action:  ‘I have seen him’ or ‘I saw him’.[18]

Idioms like kan-jian (have-seen) function just as a lexical whole (transitive verb).  When there is an aspect marker, it is attached immediately after the idioms as shown in (5‑28).  This is strong evidence for judging V+A/V idioms as words, not as syntactic constructions.

(5-28.)         我看见了他
wo     kan jian     le       ta.
I         look-see       LE     he                   I have seen him.

The only observed separated use is that such idioms allow for two modal signs 得 de3 (can) and 不 bu (cannot) in between, shown by (5-29a) and (5-29b).  But no other signs, operations or processes can enter the internal structure of these idioms.

(5-29.) (a)     我看不见他
wo     kan bu jian         ta.
I         look cannot see     he
I cannot see him.

(c)      你看得见他吗?
ni       kan de3 jian       ta       me?
you    look can see          he      ME
Can you see him?

Note that English modal verbs ‘can’ and ‘cannot’ are used to translate these two modal signs.  In fact, Contemporary Mandarin also has corresponding modal verbs (能愿动词 neng-yuan dong-ci):  能 neng (can) and 不能 bu neng (cannot).  The major difference between Chinese modal verbs 能 neng / 不能 bu neng and the modal signs 得 de3 / 不 bu lies in their different distribution in syntax.  The use of modal signs 得 de3 (can) and 不 bu (cannot) is extremely restrictive:  they have to be inserted into V+BUYU combinations.  But Chinese modal verbs can be used before any VP structures.  It is interesting to see the cases when they are used together in one sentence, as shown in (5-30 a+b) below.  Note that the meaning difference between the two types of modal signs is subtle, as shown in the examples.

(5-30.)(a)     你看得见他吗?
ni       kan de3 jian         ta       me?
you    look can see          he      ME
Can you see him? (Is your eye-sight good enough?)

(b)      你能看见他吗?
ni       neng kan jian      ta       me?
you    can    see              he      ME
Can you see him?
(Note: This is used in more general sense. It covers (a) and more.)

(a+b)  你能看得见他吗?
ni       neng kan de3 jian         ta       me?
you    can    look can see          he      ME
Can you see him? (Is your eye-sight good enough?)

(5-31.)(a)     我看不见他
wo     kan bu jian           ta
I         look cannot see     he
I cannot see him. (My eye-sight is too poor.)

(b)      我不能看见他
wo     bu     neng kan jian      ta
I         not    can    see              he
I cannot see him. (Otherwise, I will go crazy.)

(a+b) 我不能看不见他
wo     bu     neng kan bu jian           ta.
I         not    can    look cannot see     he
I cannot stand not being able to see him.
(I have to keep him always within the reach of my sight.)

Lü (1989:127) indicates that the modal signs are in fact the only two infixes in Contemporary Chinese.  Following this infix hypothesis, there is a good account for all the data above.  In other words, the V+A/V idioms are V+BUYU compound words subject to the modal infixation.  The phenomena of 看得见 kan-de3-jian (can see) and 看不见 kan-bu-jian (cannot see) are therefore morphological by nature.  But Lü did not offer formal analysis for these idioms.

Thompson (1973) first proposed a lexical rule to derive the potential forms V+de3/bu+A/V from the V+A/V idioms.  The lexical rule approach seems to be most suitable for capturing the regularity of the V+A/V idioms and their infixation variants V+de3/bu+A/V.  The  approach taken in CPSG95 is similar to Thompson’s proposal.  More precisely, two lexical rules are formulated in CPSG95 to handle the infixation in V+A/V idioms.  This way, CPSG95 simply lists all V+A/V idioms in the lexicon as V+A/V type compound words, coded as [CATEGORY v_buyu].[19]  Such entries cover all the contiguous uses of the idioms.  It is up to the two lexical rules to produce two infixed entries to cover the separated uses of the idioms.

The change of the infixed entries from the original entry lies in the semantic contribution of the modal signs.  This is captured in the lexical rules in (5-32) and (5-33).  In case of V+de3+A/V, the Modal Infixation Lexical Rule I in (5-32) assigns the value [can] to the feature [MODAL] in the semantics.  As for V+bu+A/V, there is a setting  [POLARITY minus] used to represent the negation in the semantics, shown in (5-33).[20]


The following lexical entry shows the idiomatic compound 看见 kan-jian as coded in the CPSG95 lexicon (leaving some irrelevant details aside).   This entry satisfies the necessary condition for the proposed infixation lexical rules.


The modal infixation lexical rules will take this [v_buyu] type compound as input and produce two V+MODAL+BUYU entries.  As a result, new entries 看得见 kan-de3-jian (can see) and 看不见 kan-bu-jian (cannot see) as shown below are added to the lexicon.[21]



The above proposal offers a simple, effective way of capturing the linguistic data of the interaction of V+A/V idioms and the modal insertion, since it eliminates the need for any change of the general grammar in order to accommodate this type of separable verbs interacting with 得 de3 / 不 bu, the only two infixes in Chinese.

5.4. Summary

This chapter has conducted an inquiry into the linguistic phenomena of Chinese separable verbs, a long-standing difficult problem at the interface of Chinese compounding and syntax.   For each type of separable verb, arguments for the wordhood judgment have been presented.  Based on this judgment, CPSG95 provides analyses which capture both structural and semantic aspects of the constructions at issue.  The proposed solutions are formal and implementable.  All the solutions provide a way of capturing the link between the separated use and contiguous use of the V+X idioms.  The proposals presented in this chapter cover the vast majority of separable verbs.  Some unsolved rare cases or potential problems are also identified for further research.



[1] They are also called phrasal verbs (duanyu dongci) or compound verbs (fuhe dongci) among Chinese grammarians.  For linguists who believe that they are compounds, the V+N separable verbs are often called verb object compounds and the V+A/V separable verbs resultative compounds.  The want of a uniform term for such phenomena reflects the borderline nature of these cases.  According to Zhao and Zhang (1996), out of the 3590 entries in the frequently used verb vocabulary, there are 355 separable V+N idioms.

[2] As the term ‘separable verbs’ gives people an impression that these verbs are words (which is not necessarily true), they are better called V+X (or V+N or V+A/V) idioms.

[3] There is no disagreement among Chinese grammarians for the verb-object combinations like xi wan:  they are analyzed as transitive verb phrases in all analyses, no matter whether the head V and the N is contiguous (e.g. xi wan ‘wash dishes’) or not (e.g. xi san ge wan ‘wash three dishes’).

[4] Such signs as zao (bath), which are marked with # in (5-1), are often labeled as ‘bound morphemes’ among Chinese grammarians, appearing only in idiomatic combinations like xi zao (take a bath), ca zao (clean one’s body by scrubbing).  As will be shown shortly, bound morpheme is an inappropriate classification for these signs.

[5] It is widely acknowledged that the sequence num+classifier+noun is one typical form of Chinese NP in syntax.  The argument that zao is not a bound morpheme does not rely on any particular analysis of such Chinese NPs.  The fact that such a combination is generally regarded as syntactic ensures the validity of this argument.

[6] The notion ‘free’ or ‘freely’ is linked to the generally accepted view of regarding word as a minimal ‘free’ form, which can be traced back to classical linguistics works such as Bloomfield (1933).

[7] It is generally agreed that idioms like kick the bucket are not compounds but phrases (Zwicky 1989).

[8] That is the rationale behind the proposal of inseparability as important criterion for wordhood judgment in Lü (1989).

[9] In Chinese, reduplication is a general mechanism used both in morphology and syntax.  This thesis only addresses certain reduplication issues when they are linked to the morpho-syntactic problems under examination, but cannot elaborate on the Chinese reduplication phenomena in general.  The topic of Chinese reduplication deserves the study of a full-length dissertation.     

[10] In the ALE implementation of CPSG95, there is a VV Diminutive Reduplication Lexical Rule in place for phenomena like xi zao (take a bath) à xi xi zao (take a short bath);  ting yin-yue (listen to music) à ting ting yin-yue (listen to music for a while);  xiu-xi (rest) à xiu-xi xiu-xi (have a short rest).

[11] He observes that there are two distinct principles on wordhood.  The vocabulary principle requires that a word represent an integrated concept, not the simple composition of its parts.  Associated with the above is a tendency to regard as a word a relatively short string.  The grammatical principle, however, emphasizes the inseparability of the internal parts of a combination.  Based on the grammatical principle, xi zao is not a word, but a phrase.  This view is very insightful.

[12] The pattern variations are captured in CPSG95 by lexical rules following the HPSG tradition.  It is out of the scope of this thesis to present these rules in the CPSG95 syntax.  See W. Li (1996) for details.

[13] In the rare cases when the noun zao is realized in a full-fledged phrase like yi ge tong-kuai de zao (a comfortable bath), we may need some complicated special treatment in the building of the semantics.  Semantically, xi (wash) yi (one) ge (CLA) tong‑kuai (comfortable) de (DE) zao (bath): ‘take a comfortable bath’ actually means tong‑kuai (comfortable) de2 (DE2) xi (wash) yi (one) ci (time) zao (bath): ‘comfortably take a bath once’.  The syntactic modifier of the N zao is semantically a modifier attached to the whole idiom.  The classifier phrase of the N becomes the semantic ‘action-times’ modifier of the idiom.  The elaboration of semantics in such cases is left for future research.

[14] The two groups classified by L. Li (1990) are not restricted to the V+N combinations.  In order not to complicate the case,  only the comparison of the two groups of V+N idioms are discussed here.  Note also that in the tables, he used the term ‘bound morpheme’ (inappropriately) to refer to the co-occurrence constraint of the idioms.

[15] Another type of X-insertion is that N can occasionally be expanded by adding a de‑phrase modifier.  However, this use is really rare.

[16] Since they are only a small, easily listable set of verbs, and they only demonstrate limited separated uses (instead of full pattern variations of a transitive VP construction), to list these words and all their separated uses in the lexicon seems to be a better way than, say, trying to come up with another lexical rule just for this small set.  Listing such idiosyncratic use of language in the lexicon is common practice in NLP.

[17] In fact, this set has been becoming smaller because some idioms, say zhu-yi ‘focus-attention: pay attention to’, which used to be in this set, have already lost all separated phrasal uses and have become words per se.  Other idioms including dan-xin (worry about) are in the process of transition (called ionization by Chao 1968) with their increasing frequency of being used as words.   There is a fairly obvious tendency that they combine more and more closely as words, and become transparent to syntax.  It is expected that some, or all, of them will ultimately become words proper in future, just as zhu-yi did.

[18] In general, one cannot use kan-jian to translate English future tense ‘will see’, instead one should use the single-morpheme word kan:  I will see him –> wo (I) jiang (will) kan (see) ta (he).

[19] Of course, [v_buyu] is a sub-type of verb [v].

[20] The use of this feature for representing negation was suggested in  Footnote 18 in Pollard and Sag (1994:25)

[21] This is the procedural perspective of viewing the lexical rules.  As pointed out by Pollard and Sag (1987:209), “Lexical rules can be viewed from either a declarative or a procedural perspective: on the former view, they capture generalizations about static relationships between members of two or more word classes; on the latter view, they describe processes which produce the output from the input form.”



PhD Thesis: Morpho-syntactic Interface in CPSG (cover page)

PhD Thesis: Chapter I Introduction

PhD Thesis: Chapter II Role of Grammar

PhD Thesis: Chapter III Design of CPSG95

PhD Thesis: Chapter IV Defining the Chinese Word

PhD Thesis: Chapter V Chinese Separable Verbs

PhD Thesis: Chapter VI Morpho-syntactic Interface Involving Derivation

PhD Thesis: Chapter VII Concluding Remarks

Overview of Natural Language Processing

Dr. Wei Li’s English Blog on NLP



立委博士,自然语言处理(NLP)资深架构师,Principle Scientist, jd-valley, Netbase前首席科学家,期间指挥团队研发了18种语言的理解和应用系统。特别是汉语和英语,具有世界一流的分析(parsing)精度,并且做到鲁棒、线速,scale up to 大数据,语义落地到数据挖掘和问答产品。Cymfony前研发副总,曾荣获第一届问答系统第一名(TREC-8 QA Track),并赢得17个美国国防部的信息抽取项目(PI for 17 SBIRs)。立委NLP工作的应用方向包括大数据舆情挖掘、客户情报、信息抽取、知识图谱、问答系统、智能助理、语义搜索等等。