RESOLVING AMBICATEGORICALITY IN LANGUAGE ACQUISITION: THE ROLE OF PERCEPTUAL CUES BY ERIN R. CONWELL S.B., MASSACHUSETTS INSTITUTE OF TECHNOLOGY, 2003 A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN THE PROGRAM IN COGNITIVE AND LINGUISTIC SCIENCES: COGNITIVE SCIENCE AT BROWN UNIVERSITY PROVIDENCE, RHODE ISLAND MAY 2009     © Copyright 2009 by Erin R. Conwell     ii    This dissertation by Erin R. Conwell is accepted in its present form by the Department of Cognitive and Linguistic Sciences as satisfying the dissertation requirement for the degree of Doctor of Philosophy. Date _________________ ____________________________________ James L. Morgan, Advisor Recommended to the Graduate Council Date _________________ ____________________________________ Katherine Demuth, Reader Date _________________ ____________________________________ Julie C. Sedivy, Reader Approved by the Graduate Council Date _________________ ____________________________________ Sheila Bonde, Dean of the Graduate School     iii    Curriculum Vitae Erin Conwell Department of Cognitive and Linguistic Sciences Brown University Box 1978 Providence, RI 02912 E-mail: Erin_Conwell@brown.edu EDUCATION PhD. Cognitive and Linguistic Sciences, 2008 (anticipated) Brown University Dissertation title: Cross-category word use in acquisition: A preliminary investigation Dissertation advisor: James L. Morgan S.B., Brain and Cognitive Sciences, 2003 Massachusetts Institute of Technology Research supervisor: Kenneth Wexler HONORS AND AWARDS Peder Estrup Graduate Research Fellowship, Brown University, 2007-2008 NSF Graduate Research Fellowship, Honorable Mention, 2003 Robert C. Byrd Academic Scholarship, 1999-2003 PUBLICATIONS Conwell, E. & Morgan, J. (under revision). Is it a noun or is it a verb? Resolving the ambicategoricality problem. Soderstrom, M., Conwell, E., Feldman, N. & Morgan, J. (forthcoming). The learner as statistician: Three principles of computational success in language acquisition. Developmental Science. Soderstrom, M., White, K. S., Conwell, E. & Morgan, J. L. (2007). Receptive grammatical knowledge of familiar content words and inflection in 16-month- olds. Infancy, 12, 1-29. Conwell, E. & Demuth, K. (2007). Early syntactic productivity: Evidence from dative shift. Cognition, 103,163-179. iv    CONFERENCE PROCEEDINGS PUBLICATIONS Conwell, E. & Balas, B. J. (2007). Assessing the efficacy of transitional probabilities for learning syntactic categories. In D.S. McNamara & J.G. Trafton (Eds.), Proceedings of the 29th Annual Meeting of the Cognitive Science Society (pp. 893- 898). Austin, TX: Cognitive Science Society. Conwell, E. & Morgan, J. (2007). Resolving grammatical category ambiguity in acquisition. In H. Caunt-Nulton, S. Kulatilake and I. Woo (Eds.), Proceedings of the 31st Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press. Balas, B. J., Cox, D. & Conwell E. (2006). The effect of personal familiarity on the speed of face recognition. In R. Sun (Ed.), Proceedings of the 28th Annual Meeting of the Cognitive Science Society (pp. 36-41). Mahwah, NJ: Erlbaum. Conwell, E. (2006). The role of semantic generality in verb acquisition. In D. Bamman, T. Magnitskaia and C. Zaller (Eds.), Proceedings of the 30th Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press. PRESENTATIONS AND TALKS Conwell, E. & Morgan, J. (March, 2008). Learning about cross-category word use: The role of phonetic cues. Poster presented at the 16th International Conference on Infant Studies, Vancouver, BC. Soderstrom, M. & Conwell, E. (March, 2008). How infants acquire grammatical categories: The role of distributional, prosodic and phonotactic information in the acquisition of noun and verb categories. Symposium presented at the 16th International Conference on Infant Studies, Vancouver, BC. Conwell, E. (January, 2008). Everything you ever wanted to know about language development and shouldn’t be afraid to ask. Guest lecture for “Growth and Development,” Dr. J. Schindelheim, Tufts University School of Medicine, Boston, MA. Conwell, E. (January, 2008). The Brain: A walking tour. Guest lecture for “Cognition,” Dr. M. Soderstrom. Northeastern University, Boston, MA. Conwell, E. (January, 2008). Verbing nouns and nouning verbs: Resolving the ambicategoricality problem in acquisition. Department of Psychology, University of South Florida, Tampa, FL. Conwell, E. (October, 2007). Verbing nouns and nouning verbs: Why language is not a complete impediment to acquisition. Department of Cognitive and Linguistic Sciences Colloquium, Brown University, Providence, RI. v    Conwell, E. (October, 2007). Neural correlates of perception. Guest lecture for “Cognition,” Dr. M. Soderstrom. Northeastern University, Boston, MA. Conwell, E. (September, 2007). The Brain: A walking tour. Guest lecture for “Cognition,” Dr. M. Soderstrom. Northeastern University, Boston, MA. Conwell, E. (April, 2007). Resolving the problem of category ambiguity in language acquisition. CogLunch, Department of Brain and Cognitive Science, MIT, Cambridge, MA. Conwell, E. (December, 2005). Problems in the acquisition of verb argument structure. Guest lecture for “Current issues in speech and language pathology,” Prof. K. Froud. Teachers’ College, Columbia University, New York, NY. Soderstrom, M., White, K. & Conwell, E. (November, 2005). Evidence for grammatical knowledge of content words and inflection in 16-month-olds. 30th Annual Boston University Conference on Language Development, Boston, MA. Conwell, E. & Demuth, K. (July, 2005). Verb productivity and dative shift. 10th International Congress for the Study of Child Language, Berlin. Soderstrom, M., White, K., Conwell, E. & Morgan, J. (July, 2005). Sixteen-month- olds are beginning to form categories of “noun” and “verb.” 10th International Congress for the Study of Child Language, Berlin. TEACHING EXPERIENCE Instructor: Language in the Mind, Summer 2007; Summer 2008 Teaching Assistant: Perception, Illusion and the Visual Arts (Prof. W. Warren), Spring 2007 Teaching Assistant: Introduction to Linguistic Theory (Prof. J. Sedivy), Fall 2006 Teaching Assistant: Language in the Mind (Prof. J. Morgan), Spring 2005 Teaching Assistant: Children’s Thinking: Introduction to Cognitive Development (Prof. D. Sobel), Fall 2004         vi    ACKNOWLEDGEMENTS The research presented here was funded by NIH grant HD-32005 to James Morgan. In my last year of preparation, I was supported by the Peder Estrup Graduate Research Fellowship at Brown University. The Demuth Providence Corpus, which is used in every study in this dissertation, was made possible by National Institute of Mental Health Grant #1R0 IMH60922 to Katherine Demuth. I would like to thank my readers, Katherine Demuth and Julie Sedivy, for their very helpful comments over the last few years. Although she is not on my committee, Polly Jacobson deserves recognition for nurturing my inner linguist and for her feedback on my work. Jim Morgan, my advisor, did not have to take me into his lab during my third year at Brown, but I will always be grateful that he did. Thank you, Jim, for all of your support. Working with you has made me a much better researcher, writer and presenter. A lot of people made this work possible, but none so directly as Lori Rolfe. I could write another thesis on everything she did to help. Megan Blossom, Glenda Molina and Elena Tenenbaum also deserve recognition for all of their help with subjects. My other “sisters,” Naomi Feldman, Melanie Soderstrom and Katherine White, all gave me very helpful feedback at every stage of this process, which I appreciate deeply. Rushen Shi, a slightly older “sister,” was the source of very interesting conversation on prosody and ambicategoricality. Finally, the undergraduates who call for subjects have been so very helpful during the last few years. I wish I had the space to thank them all. The friends I have made at Brown have made all of this much more tolerable. My first-year cohort, Socrates Dimitriadis, Justin Owens and Jae Yung Song, has been wonderful to be around and great fun at karaoke. My fellow Infant Labbers, already vii    mentioned above, deserve another mention for also being great friends. Non-Brown friends, including Kate Baker and Andrew Thomas, Chris, Aurora and Max Connor, Mathieu Chaize and Julie Muller, Camilo Aladro and many others, have politely put up with me yammering on about lexical categories for years now without ever telling me to shut up. Friends listen; good friends listen to you talk about your dissertation. I heard someone say once that Midwesterners are of the opinion that “it’s not that big a deal.” When my Midwestern family tells me something is a big deal, I know they really mean it. A huge thank you to them for thinking this is a big deal. Thank you particularly to my mom, for always asking about my work, my dad, for sending me so many newspaper clippings, and my brother and sister, for keeping my feet on the ground. My cats, Mia and Leila, made several interesting additions to this manuscript. I hope I managed to remove them all. Any remaining errors are entirely their fault. Traditionally, the last person to be acknowledged is the author’s loving and supportive spouse, for being loving and supportive. In addition to those things, Ben Balas has been my best friend, my collaborator, my running buddy, my Matlab guru, my chef, my improv muse, my sanity check, my partner in crime and a thousand other things. I probably could have done all of this without him, but I’m so glad I didn’t have to. Thank you, Ben, for everything.     viii    TABLE OF CONTENTS 1. Introduction 1 1.1. Learning grammatical categories 2 1.1.1. Phonological cues to lexical category 4 1.1.2. Referential cues to lexical category 8 1.1.3. Distributional cues to lexical category 15 1.2. The problem of ambicategoricality in language development 21 2. Parental use of ambicategorical words 28 2.1. Ambicategoricality in speech to children 30 2.1.1. Study 1a: The noun/verb ambiguity 30 2.1.2. Study 1b: The verb/adjective ambiguity 37 2.1.3. Study 1c: The noun/adjective ambiguity 42 2.1.4. Discussion 46 2.2. Prosodic cues to category 50 2.3. General Discussion 58 3. Infant sensitivity to prosodic cues 68 3.1. Infants’ perception of prosodic cues to the noun/verb ambiguity 71 3.2. Infants’ perception of prosodic cues to the verb/adjective ambiguity 77 3.3. Infants’ perception of prosodic cues to the noun/adjective ambiguity 82 3.4. General Discussion 88 4. Child use of ambicategorical words 95 ix    4.1. Ambicategoricality in children’s speech 97 4.1.1. Study 1a: The noun/verb ambiguity 98 4.1.2. Study 1b: The verb/adjective ambiguity 103 4.1.3. Study 1c: The noun/adjective ambiguity 107 4.1.4. Discussion 109 4.2. The relationship between maternal and child usage 112 4.2.1. Study 2a: The noun/verb ambiguity 114 4.2.2. Study 2b: The verb/adjective ambiguity 118 4.2.3. Study 2c: The noun/adjective ambiguity 120 4.2.4. Discussion 123 4.3. General Discussion 124 5. Resolving the ambicategoricality problem 135 5.1. Ambicategoricality and grammatical category development 139 5.2. Conclusions 143 6. References 144 7. Appendices 156 x    LIST OF TABLES Table 2-1 61 Table 2-2 61 Table 2-3 61 Table 2-4 62 Table 2-5 63 Table 2-6 63 Table 4-1 128 Table 4-2 128 Table 4-3 128 Table 4-4 128 Table 4-5 129 xi    LIST OF FIGURES Figure 1-1 27 Figure 2-1 64 Figure 2-2 64 Figure 2-3 65 Figure 2-4 65 Figure 2-5 66 Figure 2-6 66 Figure 2-7 67 Figure 2-8 67 Figure 3-1 92 Figure 3-2 93 Figure 3-3 94 Figure 4-1 130 Figure 4-2 130 Figure 4-3 131 Figure 4-4 132 Figure 4-5 133 Figure 4-6 134   xii    CHAPTER 1 When children learn a language, one of their major tasks is to determine which words behave alike syntactically. That is, they must determine which words are nouns, which are verbs, etc., in order to productively use the language. This must be learned, as a particular sequence of sounds may be a noun in one language and a verb in another. Knowing the grammatical category of a word provides considerable information regarding that word’s syntactic properties and allows a language user to extend his or her use of that word beyond those types of sentences in which it has been used before. Without such knowledge, use of a word should be restricted to only those syntactic contexts in which it has been heard before or, alternatively, extensions should be haphazard in nature. It is evident that children learn grammatical categories, as neither of these limitations is a characteristic of child language (Bowerman, 1974; Conwell & Demuth, 2007; Marcus, Pinker, Ullman, Hollander, Rosen & Xu, 1992). How children learn grammatical categories, however, remains an open question for researchers of language acquisition. Unfortunately for language learners, the grammatical category of a word is defined by its syntactic properties, that is, the set of contexts in which the word can appear. The set of contexts in which a given word can appear is what learners are trying to discern in the first place. The circularity of this problem (often called the “bootstrapping problem”) requires learners to have some means of “breaking in” to the system before knowing the syntax of the language. The solution 1 2 to this problem is likely based on aspects of the language that are related to syntax, but not dependent on it. A further problem for language learners is that particular word types do not necessarily adhere to a single lexical category. Words may be ambicategorical, appearing, for example, as both noun and verb or verb and adjective. If children are to solve the problem of lexical categorization, they must have some means of coping with this ambiguity. How they do so remains a central problem for many theories of grammatical category development. This chapter will begin by examining the research to date on lexical category learning and then turn more specifically to explore the potential problems posed by category ambiguity in language learning, as well as reviewing the available literature on that problem. 1.1 Learning grammatical categories Traditional analyses of the problem of learning grammatical categories conclude that grammatical categories are unlearnable on the basis of language input alone (e.g., Chomsky, 1965). Learning these categories would require understanding the complexities of the syntax and could not be deduced from positive evidence. The infinite, recursive and hierarchical structure of language means that learners would need some knowledge of the syntax of their language to even guess at appropriate categories for words. However, empirical examination of the input to language learners, as well as research into the capacities that children bring to the language learning problem, indicate that the situation is not as dire as it might seem. A complete knowledge of syntax may be necessary for parsing and understanding very complex sentences, but a variety of possible cues for solving the bootstrapping problem have been put forth in the literature. While these are not perfectly reliable, they 3 are correlated with grammatical category and may be useful for learners in the very earliest stages of language acquisition. Local co-occurrence cues (Maratsos & Chalkley, 1980; Mintz, 2003; Mintz, Newport & Bever, 2002; Redington, Chater & Finch, 1998), phonotactic restrictions (Kelly, 1992; Monaghan, Chater & Christiansen, 2005; Morgan, Shi & Allopenna, 1996; Shi, Morgan & Allopenna, 1998) and meaning (Bowerman, 1973; Pinker, 1989) are the most commonly studied means of categorizing words without recourse to full knowledge of syntax. Each of these cues has its advantages and disadvantages and learners might be able to use all three to varying degrees over the course of language learning. In morphologically rich languages, such as Russian, morphology may also provide a highly reliable cue to grammatical category, one that infants have been shown to use in laboratory settings (Gerken, Wilson & Lewis, 2005). Meaning cues may be further subdivided into lexical semantic cues (Bowerman, 1973) and pragmatic cues, such as conditions of use and referential context (Tomasello & Akhtar, 1995), which may be effective for categorization to different degrees. When considered at this level, there is no shortage of information that could be used to solve the bootstrapping problem. However, to determine how and when particular cues are used, a more systematic analysis of both their efficacy and their accessibility is necessary. I will now examine the current research on phonotactic, semantic and syntactic cues to lexical category. Most theories of how children learn lexical categories do not directly account for how cross- category usage might be incorporated into the learner’s grammatical system. Still, as I consider each potential cue to lexical category, I will also examine how it might interact with ambicategoricality. 4 1.1.1 Phonological cues to lexical category A newborn infant has heard relatively little language. Likewise, newborns have immature visual and social abilities. These two facts suggest that distributional and referential cues to lexical category may unavailable to the youngest language learners. Does this mean that lexical categorization must wait until learners are older? Not necessarily, as even very young infants have access to a potentially useful source of information regarding grammatical categories. Children must learn the phonological, phonetic and prosodic properties of their language in order to segment words from fluent speech and acquire stable lexical representations. Since these properties of words also correlate to some degree with part of speech, infants who are already attending to these aspects of the speech signal may be able to use this information to begin sorting words into categories. This strategy requires no knowledge of either the meaning or the distribution of the words themselves. An infant who groups words based on their acoustic or phonological properties only needs access to the speech stream. Two fundamental, if coarse-grained, category distinctions have phonetic and phonological correlates. Content words, such as nouns and verb, can be distinguished from function words, such as determiners and prepositions, based on their phonological properties. Beyond this basic dichotomy, the content word category can be further broken down into such classes as noun, verb and adjective. Some phonological cues to the noun/verb distinction exist in English, although they are less robust than those that separate content and function words. Function words and content words differ along several phonological and acoustic dimensions. Function words often have reduced vowels, null codas or onsets (or both), 5 as well as overall lower amplitude and simpler syllabic structures than content words. Content words tend to be longer than function words in terms of both duration and number of segments. Examining a corpus of infant-directed English, Shi and colleagues (Morgan, Shi & Allopenna, 1996; Shi, 1995) found that no one of these cues resulted in better than about 65% correct categorization of function and content words. However, when taken together, they yielded 83-90% accuracy in this categorization. Moving beyond English, Shi, Morgan and Allopenna (1998) found similar effects in Mandarin Chinese and Turkish, although the classification was slightly less accurate in Turkish than in the other two languages (80-85% vs. 80-90%). Because these two languages are typologically distinct from English, these results show that the relevant phonological and acoustic properties co-vary across languages in general rather than being a property of a single language or language family. Newborn infants are sensitive to these cues to the content/function distinction, regardless of the language they have been exposed to (Shi, Werker & Morgan, 1999). These findings have two important implications. First, acoustic and phonological cues to the function/content distinction must be used as a group because no one of them alone categorizes much better than chance. In fact, some of the individual cues that are useful for English do not hold for Mandarin and Turkish (e.g., content words in Mandarin are also very short). However, taken together, these cues yield excellent categorization across languages. Secondly, children do not need to know anything language-specific to use them. The covariance of these cues allows learners to determine which are useful in the language they are exposed to without first having to learn the language. 6 Acoustic and phonological cues may be useful for the function/content distinction, but lexical categorization does not stop at the level of “function” and “content”. Nouns, verbs, adjectives and adverbs all have distinct syntactic properties. Kelly and colleagues (Cassidy & Kelly 1991; Kelly & Bock, 1988; see Kelly, 1992, for a review) have shown that there are acoustic and phonological cues to these distinctions as well. For example, nouns in English tend to have initial stress while verbs tend to have final stress (e.g., a REcord vs. to reCORD). Nouns also tend to contain more syllables than verbs in English. High-frequency nouns are more likely to have back vowels while high- frequency verbs tend to have front vowels (Sereno & Jongman, 1990). Even in cases where the phonotactic properties of a noun and verb are identical (i.e., a word, such as hug, can be both a noun and a verb), prosodic properties such as duration and pitch distinguish noun from verb uses in both adult and child-directed speech (Sorenson, Cooper & Paccia, 1978; Shi & Moisan, 2008). Phonological cues may also be useful for learning subcategories of noun and verb. Noun gender in a number of languages, including Hebrew and French, correlates with certain phonological properties. Phonology has also been implicated in the distinction between alternating and non- alternating dative verbs in English (Kelly, 1992). Little research has been done on whether infants use phonological cues for categorization at the level of the noun/verb distinction. However, Cassidy and Kelly (2001) demonstrated that 4-year-old children use the number of syllables in a nonce word to assign it to either an action or an object. Children interpret trisyllabic words as object labels while monosyllables were more likely to be interpreted as denoting actions. This study does not comment directly on whether 4-year-olds know that the number of 7 syllables in a word correlates with grammatical class, but it does show that they recognize the relationship between a word’s phonology and its intended referent. Adults are also aware of the phonological correlates of these categories. English-speaking adults use syntactic information to determine the placement of stress in disyllabic nonsense words (Kelly & Bock, 1988) and will use a non-word as a noun if it is polysyllabic (Cassidy & Kelly, 1991). Vowel features affect the speed and accuracy of adults’ classifications, indicating that speakers are aware of the relationship between a word’s phonology and its lexical category (Sereno & Jongman, 1990). None of these potential cues perfectly distinguishes nouns from verbs. Nouns are more likely than verbs to have back vowels and be polysyllabic, but leaf is still a noun. Monaghan, Chater and Christiansen (2005) evaluated the effectiveness of these various cues, both individually and as a group, for categorizing nouns and verbs. While phonological and acoustic cues to the content/function distinction result in 80-90% accuracy when taken together, the acoustic and phonological cues to the noun/verb distinction categorize only 65% of words accurately. Still, this categorization is better than would be expected by chance, suggesting that these cues are available and useful for at least coarse categorization. Like the cues to the function/content distinction, a set of phonotactic cues to the noun/verb distinction also exists in languages other than English (Monaghan, Christiansen & Chater, 2007). Taken together, these findings suggest that phonological information regarding lexical category is not only available in speech to children, but also accessible to language learners. The acoustic/phonological information in the speech signal may allow very young children to begin categorizing words. While these cues to grammatical 8 category allow for only very coarse grouping of words, they might be exactly what young learners need to break into the language system and begin the complex process of forming lexical categories. Naturally, the scope of this approach is limited. Acoustic and phonological information categorizes words with varying degrees of accuracy, particularly at finer- grained levels such as noun and verb. These cues are likely most useful in early development when coarse categorization may be sufficient to begin the process of lexical categorization. On its own, phonological information will not result in an adult-like category system. Furthermore, children would need to transition between the categories that they form using phonological cues to categories that more directly reflect something about syntax. Proposed mechanisms for this process tend to be vague and almost all proponents of this approach posit a later interaction with other kinds of information, particularly distributional cues, to achieve adult-like categorization (Monaghan, et al., 2005; Morgan, et al., 1996). Ambicategoricality poses a very straightforward problem for the phonological bootstrapping approach. If the same phonotactic string is used in more than one category, its phonology is not a useful cue to category membership. Because phonological bootstrapping is typically posited as a means of “breaking in” to the system, rather than a complete solution to the category learning problem, however, ambicategoricality may not fully disrupt its initial function. In particular, the function/content distinction, which is the category learning problem for which phonological bootstrapping has been most effective, is less affected by ambicategoricality than is the noun/verb distinction. 1.1.2 Referential cues to lexical category 9 “A noun is a word for a person, place or thing; a verb is a word for an action.” If this elementary school adage holds, learning about lexical categories should be no more difficult than learning the meanings of words, a task that learners must accomplish anyway. Unfortunately, reference does not perfectly correlate with grammatical class, nor is it completely independent. Some nouns refer to highly abstract concepts (e.g., honesty) and the word action is itself a noun. Verbs may describe such unobservable mental states as thinking. Many words from both categories can be used in the other, further confusing the issue of how grammatical category and referent are related. The schoolteacher’s explanation has the direction of causation backward: object labels are typically nouns and actions are usually described by verbs. This does not, however, mean that all verbs are words for actions or that all nouns are words for objects. Still, children’s early language experience is replete with object and action labels, and they may be able to use this correlation to learn about category membership. Young children’s earliest categories could be based on the semantic properties of words, rather than on their syntactic properties. Meaning may have some primacy over syntax in language development. Children show the first signs of understanding word meaning around the age of 6 months, well before they demonstrate any grammatical knowledge (Tincoff & Jusczyk, 1999). Likewise, although early utterances often consist of only a single word or are syntactically incomplete, they do have intended meaning. Meaning may also be more readily available to children than distributional information is: lexical meaning can sometimes be discerned from a single exposure while syntactic distribution requires more 10 experience. Why would learners use information that is slow in coming instead of information that is readily available? When preschoolers use words in a syntactically inappropriate context, their errors tend to respect broad categories of meaning while compromising more arbitrary syntactic subclasses. For example, a child might use an inchoative verb as a causative verb, as in I felt him better (vs. I made him feel better). Subcategories such as inchoative verb and causative verb are formal, arbitrary properties of language, based primarily on syntactic distribution, although some fine-grained semantic correlates may exist (Pinker, 1989). These errors suggest that, while children treat action words differently than object words, they do not understand the more arbitrary grammatical distinctions within the class of noun or verb. Young children may lack a complete grasp of the more fine-grained syntactic subcategories of these words (Bowerman, 1974). Children’s utterances indicate that they attend to the meanings of words, over and above their syntactic properties. If children begin learning language in a meaning-driven way, why would they not use meaning as a basis for early category formation? Children must eventually use syntactic distribution to form adult-like grammatical categories, but semantic categories may precede syntactic ones as a sort of stop-gap in the absence of sufficient information to warrant the formation of distributional categories. The self- referential nature of grammatical categories provides another argument in favor of early semantic categories. Nouns are defined in terms of their relationships with verbs, adjectives, and so forth, but verbs are described based on their relationships with nouns, adverbs and other categories. Because one needs to know something about grammatical categories to learn this, how could the process of category formation begin? Language 11 learners could, however, begin by forming categories based on meaning, lexical categories, rather than strictly grammatical ones. Then they could associate these classes of meaning with particular distributions relative to other classes of meaning, which might avoid the problem of pure distributional analysis. Some of the earliest formal claims that children’s categories are meaning-based came from Braine (1963; 1976). After examining the early word combinations of several children learning a number of different languages, Braine (1976) concluded that children’s early utterances have positional consistency based on their intended meaning. That is, children use meaning to determine word order. Words are grouped into such semantic categories as actor and action which are productively combined in certain ways to express particular meanings. These categories do not correspond to adult syntactic categories or to parts of speech. Indeed, both action and object words may appear with the same function items (e.g., more milk and more read), suggesting that children have made certain semantic distinctions (e.g., only items or actions that can recur are ever used with more) rather than syntactic ones. Braine acknowledged that this position introduces a significant developmental discontinuity, namely how and when child grammar develops adult-like underlying structure. Using diary data from both English and Finnish, Bowerman (1973a; 1973b; 1974) arrived at a description of early lexical categorization that was similar to Braine’s (1976). She found significant evidence for such categories as action, actor and result, but little support for grammatical notions such as subject. Children also use words across category boundaries, suggesting that their lexical categories are not based on syntax. In later work, she considered the problem of how learners eventually arrive at categories based on their syntactic distribution (Bowerman, 12 1982). She proposed that children use their semantic categories to identify relevant syntactic properties of lexical classes, thereby providing a means of breaking into the syntactic system. While Braine and Bowerman emphasized the role of meaning in learning language from input alone, Pinker (1984; 1989) proposed that children form categories of meaning on the grounds that certain semantic categories are universal and, therefore, probably innate. Under Pinker’s theory, human beings are born with a set of semantic primitives, such as agent, patient, transitive verb, and so on. Children learning language merely sort words into these primitive categories based on their meanings. Then, they use innate linking rules to associate these primitive semantic categories with innate syntactic categories. This version of the idea that children use meaning to categorize words is known as the Semantic Bootstrapping Hypothesis. Pinker noted that this strategy is particularly useful for learning the syntactic properties of verbs. While the syntactic properties of the noun category are mostly uniform (arbitrary distinctions such as count and mass or gender aside), the “category” of verb is composed of many overlapping subclasses, all of which have somewhat different syntactic behaviors (Levin, 1993). Each subcategory, however, has relatively coherent semantic characteristics. Rather than trying to sort out the syntax of verb subclasses based on their distribution, which could be misleading, children use verb meaning to help guide the formation of subcategories. Although his approach stems from a very different perspective on the nature of language development, Tomasello (1992; 2000; 2003) also assumes that children begin language learning by determining the meanings of phrases they hear. Under his theory, children use general social-cognitive abilities to understand sentences as they are uttered. 13 On the other hand, learning distributional properties requires that children attend to very abstract syntactic properties such as sentential subject and relate them to abstract semantic properties such as agent. Under Tomasello’s account, every word begins as its own category (or “island”) with no connections between words; that is, very young children make no linguistic abstractions (Tomasello, 2000). Analogies between structures and words are made on the basis of intended meaning. Early work by Tomasello and colleagues (Lieven, Pine & Baldwin, 1997; Olguin & Tomasello, 1993; Tomasello & Olguin, 1993) demonstrated that children do not appear to use words in a way that indicates categorization, particularly at the syntactic level. In these studies of both elicited and spontaneous productions, children up to 3 and a half years old fail to extend new verbs to grammatical forms in which they have not been attested in the input language. A new verb is not a member of a category with the rights and privileges of other verbs; it can only be used in previously modeled constructions. While the claim that children do not have syntactic categories is clear in these “usage-based” theories (Tomasello, 2003), the status of lexical categories in any form is not directly addressed. Tomasello makes no explicit claims regarding whether very young children form any lexical categories at all; instead work has focused on demonstrating that they do not form abstract distributional categories. The argument in favor of early referential categories rests on two key points. First, learning lexical categories based on their distribution may be untenable (Pinker, 1987; Pinker, 1989). Young children not have access to all of the necessary evidence to do so (Tomasello, 2000), and, even if they did, we do not know whether they are able track this information over time. Secondly, children’s spontaneous utterances are best 14 characterized in terms of the semantic, rather than syntactic, properties of the words they contain (Bowerman, 1973a; Braine, 1976). Based on the evidence presented thus far, we could conclude that children’s early categories are entirely driven by lexical semantics, but the data are not airtight. Children learn a great deal about the meanings of words and may make errors based on meaning, but most of their early utterances correspond to the adult grammar, suggesting that they also know something about the syntactic behavior of the words in their lexicons (Bloom, Lightbown & Hood, 1975). In other words, early word order is more grammatical than we would expect if lexical categories were not, at least in part, distributional. The complex nature of the relationship between meaning and grammatical category also indicates that referential context cannot be the sole factor in the formation of lexical categories and might, in fact, do more harm than good. Proponents of the semantic bootstrapping approach argue that ambicategoricality poses less of a problem for their learning strategy than it does for phonological or distributional approaches to category learning (Pinker, 1987; Nelson, 1995). However, if children form categories such as “action word” and “object word,” how do they incorporate words that refer to both objects and actions (e.g., drink)? This system would also need to be robust to noun uses of action words (e.g., I went for a run). Pinker (1987) claims that children can tell the difference between noun uses and verb uses of the same word using subtle semantic differences between the two uses. Still, a word such as hug refers to the same event whether it is used as a noun or as a verb. Exactly how such words are incorporated into a semantic category system is underspecified at best. Most 15 often, these words are simply used as an argument against a strong distributional or phonological learning approach. 1.1.3 Distributional cues to lexical category Does the speech that children hear contain distributional information about category membership that could be useful for the purpose of forming accurate categories? Maratsos and Chalkley (1980) observed that adult’s lexical categories are distributional form classes (see also Harris, 1954). To achieve adult-like grammar, children should eventually arrive at categories based on syntactic distribution; beginning with strictly semantic categories would result in a serious developmental discontinuity. Maratsos and Chalkley proposed that even the earliest lexical categories might be based on “primitive sequential and semantic properties,” as members of a grammatical category share some aspects of meaning, but also co-occurrence properties. However, Maratsos and Chalkley note that they lacked sufficient empirical evidence to evaluate the possibility of forming grammatical classes based on either meaning or local distribution. Advances in computational resources and the increased availability of child speech corpora have the potential to provide exactly this kind of evidence. Although distributional cues could take a variety of forms, including such high-level information as the complete syntactic structure of the utterance, most research on the efficacy of distributional cues focuses on local bigram or trigram co-occurrence properties of words. One factor that may contribute to this tendency is to minimize the hypothetical working memory load on learners and another factor is that sensitivity to local co-occurrence cues is relatively straightforward to test in infants. 16 Redington, Chater and Finch (1998), using local co-occurrence cues in a large corpus of speech, found that such cues are excellent for categorizing nouns and good for categorizing verbs. Categorization improved when frequency of co-occurrence was factored in. Using corpora of child-directed speech, Mintz, Newport and Bever (2002) assessed how accurately words could be categorized based on the words that occurred immediately before or after them. The resulting categories were significantly more accurate than would be predicted by chance. If such features as function words are used to limit the window of analysis, accuracy improved somewhat, suggesting that if young language learners can use cues available in the speech stream to limit their distributional analyses, their categorizations will improve. Conwell and Balas (2007) demonstrated that the transitional probability between a target word and the high frequency word that immediately precedes it allows for good categorization of nouns and verbs in an unsupervised model and better categorization in a supervised learning model. Mintz (2003) further proposed that information about trigram distributions might be more useful for lexical categorization than bigram distribution. He analyzed child- directed speech corpora for so-called frequent frames or those pairs of words that typically appear with only one intervening element (e.g., the__and). Words appearing in the middle of a frequent frame are likely to be of the same grammatical category. When words in child-directed speech are categorized on these grounds, highly accurate, but somewhat incomplete, syntactic categories result. That is, many categories consisted almost entirely of nouns rather than forming a single category of noun. To address this problem, Mintz proposes that a learner might collapse categories that overlap significantly in their membership. Unfortunately, collapsing categories that have 17 overlapping membership becomes problematic when the same phonological strings can be used as both nouns and verbs (e.g., kiss). These category ambiguities pose a challenge to learners in general, but would be especially problematic if one’s learning strategy allowed two categories with sufficient overlap to be collapsed into one (see Pinker, 1987, for further discussion on this point). Can young children notice and use local co-occurrence cues to form categories when processing speech? One approach to answering this question is to evaluate learners’ performance on an artificial language. For example, word categories in the artificial language may be based on co-occurrence with function words, such that all words occurring adjacent to some set of words a can be described as being of type X and all words occurring adjacent to a different set of words b can be said to belong to the class Y (sometimes called aX/bY or MN/PQ grammars). Learners can be said to generalize class membership based on these distributional properties if they can distinguish between novel combinations that are “grammatical” (e.g., a previously unattested aX pairing) and those that are “ungrammatical” (e.g., a bX pairing). In general, early work in this area found that, while adults could learn about categories in terms of features like absolute location and position relative to “marker” elements, they were unable to learn about dependencies between categories (Smith, 1969). That is, they were able to learn that a words and b words always occurred in initial position and X and Y words were always in final position, but they did not learn that Xs must follow as and Ys only follow bs. Adults cannot categorize words based on distribution alone, even in a situation that is much like a natural linguistic phenomenon (determiner-based gender marking). However, further studies with adults demonstrated 18 that adding information about grouping or meaning supports this kind of learning (Moeser & Bregman, 1972; Morgan & Newport, 1981). For dependencies among categories to be learned from distribution, redundancy of cues is necessary. Braine, Brody, Brooks, Sudhalter, Ross, Catalano & Fisch (1990) used an artificial language in which categories were indicated by morphemes to examine school-aged children’s learning of meaning and syntax. Children used meaning-based morphemes appropriately, even when the morpheme had not been used with a particular lexical item, indicating that they had generalized that aspect of form and meaning. They did not, however, learn arbitrary phonological changes to individual lexical items by rote and had difficulty learning subcategories that were built into the distribution of the language but had no corresponding semantic or phonological correlates. Again, some kind of cue redundancy was necessary for learning lexical categories. Combining artificial language methodologies with infant perception paradigms, Gómez and Gerken (1999) showed that 1-year-olds learn certain properties of a finite state grammar from distribution alone. In particular, they discriminate between novel grammatical strings and those with illegal endpoints (a violation in absolute string position) or illegal internal orders (a violation of transitional probabilities). Further work by Gómez and Lakusta (2004) demonstrated that children at about the same age learn distributional categories from auditory exposure as long as category-internal cues such as syllable number are also present. Gerken, Wilson and Lewis (2005) found that 17-month-olds will learn a subset of a real-world gender paradigm (the Russian gender system) from distribution if the set of words they hear has multiple cues to category (e.g., phonological features). Children can learn about grammatical categories in the absence of referential information as long as several cues 19 indicate the relevant distributional features. Mintz (2002; 2006) demonstrated that infants and adults can use frequent frames to categorize words in the absence of referential or phonological information. These artificial language studies differ from natural language learning in at least one important way: learners receive extensive massed input. Because the natural language environment is less consistent and learning is more distributed, this raises the concern that, while one-year-olds can use these kinds of co-occurrence cues under constrained experimental conditions, they might not actually do so under more realistic conditions. What do young children understand about the nature of distributional cues to lexical categories? Do they use this information to form categories from an early age? Höhle and colleagues demonstrated that 15-month-old German-learning children categorize novel words as nouns based on co-occurrence with a familiar determiner (Höhle, Weissenborn, Kiefer, Schulz & Schmitz, 2004). However, younger infants (12 months old) do not use this information to recognize a novel word as a noun nor do children in either age group treat novel words following familiar subject pronouns as verbs. Gordon (1985) showed that preschool-aged children use syntactic cues to categorize a novel noun as “mass” or “count” (see also Barner & Snedeker, 2005). These studies indicate that children are sensitive to the distributional cues to lexical categories not only in artificial languages, but in natural language learning as well. Even very young children can use local distributional cues such as co-occurrence with function words to categorize words. Furthermore, such cues are available in the speech that children hear. Relying strictly on these cues will not result in complete, adult-like categories, but they can coarsely segregate nouns from verbs, a critical step in the 20 acquisition of syntax. Nevertheless, these theories may be highly susceptible to the ambicategoricality problem. Distributional accounts of lexical category learning assume that language learners are capable of fairly sophisticated statistical processing, a not unwarranted assumption (Saffran, Aslin & Newport, 1996), and that they use this ability to detect frequencies and transitional probabilities to categorize the words that they hear (Maratsos & Chalkley, 1980; Redington, et al., 1998; Mintz, 2003; Conwell & Balas, 2007). Most accounts suggest that learning which “frames” or high frequency function words predict which categories is a central part of the learning process. If words can be used in more than one category, there is a real risk that learners could conflate the co- occurrence properties of different categories. To be more concrete, Figure 1-1 shows two versions of the situation that a learner might encounter using a distributional bootstrapping approach. The diagram on the left is the category learning problem as it is usually described in the literature: one set of words appears in one set of contexts and another mutually exclusive set of words appears in a mutually exclusive set of contexts. In such a case, determining which words belong in which category is fairly straightforward. The diagram on the right, however, represents the grammatical category learning problem when ambicategorical words are introduced. In this diagram, there is no way to draw a line separating the contexts or the categories. This problem has not been seriously addressed in the literature on distributional bootstrapping. The current research on lexical category acquisition does not definitively favor any one account of the category learning process. Although the most recent research supports at least some role for local co-occurrence information in grammatical category learning, a 21 complete account will almost certainly incorporate many different kinds of cues, as the convergence of cues has been shown to support language learning (Moeser & Bregman, 1973; Morgan & Newport, 1981; Monaghan, et al., 2005). Another issue that few accounts of category learning address is the level of specificity that learners might need in their categories. Some research examines only very coarse-grained categorizations such as the content/function distinction(e.g., Shi, et al., 1998) while other work addresses the acquisition of more fine-grained categories, such as count and mass nouns (e.g., Gordon, 1985). Evidence indicates that even fairly young children distinguish between verb subcategories (e.g., Kline, 2008). What levels of subcategorization children have at different stages of development remains unresolved and research in this area is influenced by differing perspectives on whether children’s earliest categories are over- or underspecified. Furthermore, none of these accounts of grammatical category learning adequately addresses the potential problems posed by ambicategorical words. Before any theory of grammatical category development can move forward, it must account for how learners represent those words that can appear in more than one lexical category. 1.2 The problem of ambicategoricality in language development Ambicategorical words pose a potentially very serious problem to learners trying to sort words into categories. Because the phonology of the word remains constant across categories, phonotactic cues to category are rendered ineffective. Furthermore, words that appear in multiple categories should make distributional cues to category less effective. Learners could conflate distributions and create a single category that contains both nouns and verbs (Pinker, 1987; Figure 1-1). For a more concrete example, a learner exposed to sentences (1a-c) should conclude that (1d) is a legitimate English sentence. 22 1. a. John likes fish. b. John likes rabbits. c. John can fish. d. * John can rabbits. Although sometimes discussed as a better solution to this problem (Pinker, 1987; Nelson, 1995; Oshima-Tanake, et al., 2001), the semantic bootstrapping approach does not resolve the ambicategoricality problem either. The referent of a word may not change as it switches from one lexical category to another. For example, hug when used as a noun has a very similar referent as does hug when used as a verb. Given that every cue that children might use to solve the bootstrapping problem suffers under ambicategoricality, there is a striking paucity of research on the nature of category ambiguity in speech to children. Macnamara (1982) described attempts to teach his own son, Kieran, the same word as both a noun and a verb. He reported that, at 17 months of age, Kieran was able to learn the same word to refer to both an object and an action, but that he began to introduce phonological distinctions between the noun and verb forms. For example, within two weeks of being taught the nonsense word “bel” to refer to both an action and an unrelated object, Kieran used “bam” to refer to the action and “ban” to refer to the object. In a longitudinal study, Macnamara (1982) examined the use of words as noun and verb in the speech in the Sarah corpus (Brown, 1973). He found that adults did not seem to avoid cross-category use when talking to Sarah, but that Sarah failed to use any word as both a noun and a verb until the age of 30 months. Once she began using the same word in both categories, she primarily used object words to refer to actions characteristically 23 performed with those objects. However, this study is limited to a single child and it is, therefore, difficult to assess how general the results are. Nelson (1995) examined the use of six ambicategorical word types in speech both to and by children. She found that mothers did use these words as both noun and verb when speaking to their children. On the other hand, the children in her study only used the six target words very occasionally, limiting her ability to make a general statement about children’s representations of these words. Her primary concern was the way in which these words affect the semantic bootstrapping hypothesis; that is, how children who are forming categories such as “action word” and “object word” incorporate words that refer to actions, but are used in object word positions. Although her results were too limited to directly address the question of how children represent such words, she claimed that they do argue against a strict syntactic (or distributional) bootstrapping approach, as the use of some words as both noun and verb constitutes ambiguous evidence for such an approach. Work by Barner and colleagues (Barner, 2001; Oshima-Tanake, Barner, Elsabbaugh & Guerriero, 2001) assessed how mothers and children use denominal verbs and deverbal nouns. These words are taken to have a basic form in one category (e.g., walk is typically a verb), but can appear in the other category with the use of null morphology. (One can take a walk.) Because there is no overt derivative morphology, the new form of the word is phonotactically identical to the root form. The similarity of the two forms creates an ambiguity regarding the appropriate lexical category of that phonotactic string. Barner and colleagues asked whether parents used denominal verbs and deverbal nouns in speech to children, as well as examining how children use these words. They found that both mothers and children use these word forms as noun and verb, although children use 24 them somewhat less than their mothers do. No direct comparison of input and production was conducted. They conclude that the semantic properties of a word will support or hinder its use in more than one lexical category by children. Barner’s work does not directly address the issue of how ambicategoricality affects grammatical category learning. These analyses also only included derived forms, not those words that are accidental noun/verb homophones (e.g., a big brown bear vs. to bear a heavy burden). In addition to focusing only on a subset of potentially ambiguous word forms, these previous studies do not address how language learners avoid conflating cross-category uses of the same words nor do they examine the precise nature of the relationship between the input that a child receives and his or her later use of ambicategorical words. Additionally, these studies are most concerned with how the semantic relationships between the noun and verb forms affect children’s acquisition of these words. Only the noun/verb ambiguity is addressed, leaving open the possibility that the findings of these studies apply on to noun/verb ambiguous words, but not to other words, such as those that are verb/adjective or adjective/noun ambiguous. The research discussed thus far focuses solely on the nature of ambicategoricality in English. Is this an English-only phenomenon and, therefore, a problem with very limited scope? While little data is available on how wide-spread cross-category usage is among the world’s languages, ambicategoricality may be more prevalent in languages with a weak system of inflectional morphology. In languages with obligatory case marking, for example, nouns may only rarely appear as bare stems, reducing the likelihood that they might take the same form as a verb. However, in languages with a weak morphological system (such as English, but also including Mandarin, among others) or where 25 ambiguities exist within the morphological system (e.g., French, in which the participle forms of many verbs can surface as adjectives), ambicategoricality is a potential problem for learners. This variability across languages indicates that, like other language learning problems (e.g., noun gender), whether a child must solve the ambicategoricality problem depends on which language s/he is learning. For practical reasons, including the availability of corpora and of study participants, this dissertation examines the nature of ambicategoricality in the experience and production of English-learning children. Ambicategoricality is regularly mentioned as a potential problem for language learning (e.g., Macnamara, 1982; Pinker, 1987; Nelson, 1995); however, no large-scale empirical investigation of the phenomenon in children’s experience and production has been conducted. Ambicategoricality certainly could pose a problem to learners under certain circumstances. First, if cross-category usage of words is wide-spread in child- directed speech, learners might have difficulty determining which syntactic contexts predict which grammatical categories (see Figure 1-1). Furthermore, ambicategoricality could become a problem for learners if they are unable to distinguish between, for example, noun and verb uses of the same words. Finally, one might predict that the complex nature of the syntactic behavior of such words would delay their acquisition. This dissertation takes up the issue of how children hear, perceive and produce words that are ambiguous with regard to grammatical category with an eye toward evaluating the extent of the learning problem posed by such words. By considering input, perception and production, a more complete picture of learners’ experience with ambicategorical words will be possible. When all of these aspects of cross-category word use are 26 examined together, it may become possible to assess whether the ambicategoricality “problem” is really a problem for learners at all. The rest of this dissertation is organized as follows. Chapter 2 will examine how English-speaking caregivers use ambicategorical words when speaking to their children. Both the frequency of cross-category usage and the presence of prosodic cues to category will be considered. Chapter 3 will explore learners’ sensitivity to the prosodic cues to category available in ambiguous words. Chapter 4 will ask whether and how children use words across categories and examine the extent to which their use of such words is related to their caregivers’ patterns of use. Chapter 5 will synthesize the results of all of these studies and discuss the consequences of these findings for the theories of grammatical category acquisition presented in this introduction. 27 Figure 1-1 Word 1 Noun Word 1 Noun Context 1 Context 1 Word 2 Noun Word 2 Noun Context 2 Context 2 Word 3 Noun Word 3 Noun Context 3 Context 3 Word 4 Verb Word 4 Verb Context 1 Context 1 Word 5 Verb Word 5 Verb Context 2 Context 2 Word 6 Verb Word 6 Verb Context 3 Context 3 These models represent two language learning situations. The one on the left, where a category boundary is easily drawn, represents the distributional bootstrapping approach as it is usually discussed by its proponents. The diagram to the right, however, represents the distributional bootstrapping approach when some words are used ambicategorically. Drawing a category boundary is virtually impossible CHAPTER 2 Grammatical categories allow language users productive control of their language. Because a word’s lexical category determines its syntactic privileges, awareness of the word’s category allows speakers to extend their use of a word beyond those contexts in which they have heard it used by other speakers. Whether lexical categories such as noun, verb and adjective are universal across languages, and therefore possibly innate, or whether lexical category variability across languages suggests that these categories must themselves be learned is a matter of some debate (Braine, 1987; Pinker, 1984; Tomasello, 2000; Fisher, 2002). Regardless, the same sequence of phonemes may correspond to very different referents across languages, which means that the category membership of a particular string must be learned. For example, the phonotactic string /no/ is a negation operator in English, but the genitive case marker in Japanese. Because the grammatical category of a given word must be learned, a number of theories have been put forth to explain which aspects of a word learners use to determine its lexical category. Perhaps the two most predominant are a word’s syntactic distribution (Harris, 1954; Maratsos & Chalkley, 1980; Redington, Chater & Finch, 1998; Mintz, 2003) and its semantic referent (Bowerman, 1973; Pinker, 1984; Braine, 1976), although phonotactic properties of words also correspond somewhat to lexical category (Kelly, 1992; Monaghan, Chater & Christiansen, 2005; Monaghan, Christiansen & Chater, 2007; Sereno & Jongman, 1990). However, it is not clear how any of these 28 29 theories allow learners to accommodate words that are used in more than one grammatical category: phonotactic properties of such words are identical regardless of the category of use, the referent often remains constant across such derivations and syntactic distribution is what children are trying to learn in the first place. There is also a very real possibility that such words would cause children to conflate the distributional properties that diagnose lexical category (see Chapter 1 for a full discussion). In spite of the theoretical problems raised by words that can be used in more than one grammatical category, children have never been reported to have difficulty learning such words and this kind of ambiguity is so robustly incorporated into the adult language system as to allow for productive cross-category derivation (Clark & Clark, 1979). This suggests that language learners have the ability to incorporate cross-category word use into their linguistic system. Such ability may arise from one (or both) of two sources: the language learning environment or the learner him/herself. This chapter examines the nature of the language learning environment with regard to cross-category word use. Chapter 3 will ask what abilities learners themselves may bring to the problem and Chapter 4 will ask how these abilities play out in a natural language learning situation. The organization of this chapter is as follows. First, I will examine six longitudinal corpora of child-directed speech and assess how frequent ambicategoricality is in the language environment of English-learning children. These corpus analyses will consider three kinds of lexical category ambiguity: noun/verb, verb/adjective and noun/adjective. If children are not exposed to ambicategoricality until their language abilities are fairly advanced, then these words pose no problem to distributional or phonological theories of grammatical category learning. Perhaps parents actively avoid using words in multiple 30 lexical categories until their children have linguistic representations that are robust enough to incorporate this kind of ambiguity without conflating categories. If, however, young children do hear the same word used in more than one grammatical context, they must avoiding conflating uses in one category with uses in another. I will ask whether one possible cue, prosody, reliably distinguishes uses in one lexical category from uses in another in natural child-directed speech. I will discuss the results of all of these analyses in terms of their implications for lexical category learning, in particular those theories that suggest primary roles for distributional and phonotactic properties of words in determining grammatical category. 2.1 Ambicategoricality in speech to children This study consists of three sub-studies, each addressing the same question: to what extent do children hear words used in more than one grammatical category? English has words that can be used as both noun and verb, words that can be used as both verb and adjective and also words that can be used as both noun and adjective. Using the same methods, each sub-study will examine one of these sources of category ambiguity. The results of each analysis will be discussed separately. Then, the general problem of ambicategoricality in children’s language experience will be considered in light of all three sub-studies. 2.1.1 Study 1a: The noun/verb ambiguity Many of the examples of cross-category word use center on words that can be used as both noun and verb. Pinker (1987) argued against the logical possibility of learning lexical categories from distribution by citing the example of the word fish, as in (1a, b), 31 and suggesting that the facts about this word would lead children to take the evidence about the word rabbits in (1c) and conclude that (1d) is a grammatical English sentence. 1. a. John likes fish. b. John can fish. c. John likes rabbits. d. *John can rabbits. This argument is potentially very problematic for distributional learning of lexical categories. Children who heard the first three sentences would have to know that in (1a) fish is being used as a noun and in (1b) fish is being used a verb and that not all words are allowed to appear in both contexts to avoid making the ungrammatical utterance in (1d). But what if children never hear both (1a) and (1b)? Although many of the basic nouns in English have verb uses or homophones that are verbs and many of the basic verbs have noun uses or noun homophones, if children are not exposed to this kind of category ambiguity, the ambicategoricality “problem” is moot. To date, there has been surprisingly little research on whether parents use the same words as both noun and verb when talking to their children. Previous corpus analyses have examined in a limited way whether children hear the same words used as both noun and verb. Nelson (1995) examined naturalistic corpora from 12 mother-child dyads for whether 6 words (call, drink, help, hug, kiss and walk) were used in both noun and verb contexts. Each corpus consisted of 5 hour-long recordings. Nelson categorized each use of these words as either a noun use or verb use and then used the proportional use in each category to determine whether the word was used in both categories. She found that these words are, in fact, used as both noun and verb by mothers speaking to their 32 children. Other work by Nelson and colleagues (Nelson, Hampson & Shaw, 1993) included two more words (bite and work) in a similar analysis and found that they, too, were used as both noun and verb in speech to children. However, these analyses are very limited. The corpora themselves are fairly brief, but because the study examined only six word types, it is not evident how generalizable these findings are. Perhaps these six word types are the only words used in more than one lexical category in speech to children. If parents use the preponderance of word types only in a single category when speaking to their children, then children might not encounter category ambiguity until their knowledge of language is robust enough to incorporate it. In that case, the scope of the ambicategoricality problem would be very limited. Oshima-Tanake, Barner, Elsabbagh and Guerriero (2001) conducted a somewhat broader analysis of how deverbal nouns and their verb roots are used in child-directed speech. They examined used of 16 potentially ambiguous word types in speech to three children. Their primary interest was in how lexical semantics interacts with deverbalization, but their data also speak to the issue of whether words are used in multiple lexical categories at all in speech to children. Of the 16 word types that they analyzed, 14 were used at least once in each lexical category in the input to at least one child. Again, however, this analysis examines only a very small number of potentially ambiguous words in speech to only a small number of children. Barner’s (2001) more extensive analysis examined all denominal verbs and deverbal nouns in nine corpora of mother/child speech. Like the work by Oshima-Tanake and colleagues, his analysis focused primarily on the role of lexical semantics in use of words as both noun and verb. He found that adults and children use some words as both noun and verb, but to a lesser 33 extent than they could. However, his analysis examined only data from Brown’s (1973) Stage 1 (mean length of utterance less than 2). It is possible that the rate of cross- category word use by both caregivers and children might increase with grammatical ability. Furthermore, restricting the analysis to denominal verbs and deverbal nouns neglects the potential contributions of words that are accidental homophones (e.g., fit, leaves, etc.) to the ambicategoricality problem. To address the limitations in the scope of previous studies of noun/verb ambiguity in speech to children, study 1a examines the use of several hundred word types in six longitudinal corpora of child-directed speech. By expanding both the number of word types and the amount of child-directed speech in the analysis, this study will give the most complete picture yet of noun/verb ambiguity in speech to young children. Method Corpora. Six longitudinal corpora of maternal speech were examined. Five of these corpora came from the Demuth Providence Corpus (Demuth, Culbertson & Alter, 2006). The sixth was the Nina corpus (Suppes, 1974) from the CHILDES database (MacWhinney, 2000), which was included to provide evidence that these results generalize beyond the dialect of English spoken in Providence, Rhode Island. The ages and number of recordings for each corpus are presented in Table 2-1. Children in the Providence corpus were recorded every other week for 2-3 years, beginning as soon as they uttered their first words. The Lily corpus is an exception, as a sudden, rapid increase in her language production created a need for weekly recordings approximately a year after recording commenced. For completeness, all of the Lily files are included in this analysis. Nina was recorded approximately weekly. In total, these corpora comprise 34 approximately 330 hours of mother/child interaction. In all cases, the child’s mother is the primary caregiver and interlocutor. This age range (approximately 1-3 years) is of particular interest because it provides a comprehensive view of the child’s language experience from the time s/he utters his/her very first words to the time that s/he is speaking in complete, well-formed sentences. These corpora will capture changes in parental speech that may accompany the child’s shift from language receiver to active conversationalist. Procedure. For each corpus, the number of maternal uses of each word type was counted, with morphologically complex words treated as individual types (e.g., run, runs and running were each counted separately). Because each corpus contained over 3,000 word types, it was impractical to examine every single one for cross-category use. Therefore, three frequency ranges were chosen as “core samples” for analysis. High frequency words were those used more than 150 times by the mother, middle frequency words were those used 40-60 times and low frequency words were those used 3-10 times. Within each frequency range, every word type was placed in one of two categories: “noun or verb” and “neither noun nor verb”. Then, all those words that were nouns or verbs were further categorized as potentially ambicategorical or not. Whether or not a word was potentially ambicategorical was based on an analysis of the Brown Corpus (Francis & Kucera, 1983). Words that were used at least once as a noun and at least once as a verb in the Brown Corpus were considered potentially ambiguous 1 . For a complete list of potentially ambiguous nouns and verbs in these corpora, see Appendix A. 1 The Brown Corpus consists of written texts, which limits its accuracy in reflecting typical adult-directed speech, and there are many words that are not used ambicategorically in the Brown Corpus that have very 35 For every word type that was potentially ambicategorical, each utterance including one or more tokens of that type was extracted from the corpus, and each token was classified by hand as a noun, a verb or “other.” Single word utterances, proper nouns and metalinguistic uses were classified as “other.” A token was considered a noun if it was modified by an adjective, appeared as the head of a noun phrase, was an argument of a verb or could be replaced with a pronoun. A token was counted as a verb if it was modified by an adverb, took noun phrase or prepositional phrase arguments or could be replaced with a pro-verb. When context was ambiguous, a token was coded as “other.” The breakdown of number of types analyzed in each corpus is shown in Table 2-2. Classification was done by trained coders. To assess the consistency of the classifications, 5% of all word types were reclassified by a second coder. Reliability between coders was very high (Cohen’s K=.93). The total proportion of potentially ambicategorical words that were actually used across category was calculated for each mother as the number of words used at least once as both noun and verb divided by the total number of potentially ambiguous words analyzed. To obtain a better idea of how ambicategoricality relates to frequency of use, for each frequency range for each mother, the same kind of calculation was done on only those word types within a given frequency range. These numbers provide an estimate of how many of the word types that each child heard were used across category boundaries at least once. Results natural cross-category uses in adult speech (e.g., comb). However, there exist no corpora of spoken adult language that are comparably large. 36 Four of the six mothers used at least one quarter of the potentially ambiguous words at least once in both categories. The proportions of ambicategorical use for all types used by a given mother ranged from .19-.32. This overall rate of cross-category usage is a bit higher than that found by Barner (2001). Figure 2-1 shows the results broken over frequency ranges and also the total over all three frequency ranges. Because the particular word types within each frequency range are different for each mother, these data cannot be directly compared. However, all mothers showed a similar relationship between the frequency of a potentially ambicategorical word and the likelihood that it would be used as both noun and verb. Specifically, words in the high and middle frequency ranges were more likely to be used across category than were words in the low frequency range. In the speech of three of the mothers, words in the middle frequency range were the most likely to be used across category. The other three mothers used words in the high frequency range across category more than words in the other frequency ranges. Interestingly, the sex of the child being spoken to predicts whether high frequency or middle frequency words will be used more across category. However, these results are only from 6 speakers; whether such sex differences maintain over a larger sample size is a question for future research. Those words that were used as both noun and verb were only rarely used equally as both. That is, many words were only used once or twice in their minority category. Figure 2-2 shows the percent noun use of high and middle frequency words. Words used as nouns 100% of the time are unambiguously nouns, while those used as nouns 0% of the time are unambiguously verbs. As previously discussed, such words constitute the majority of potentially ambiguous words in child-directed speech. Words that were used 37 at least once as both noun and verb were sorted into three bins: words that were predominantly used as nouns, with some verb uses (99-66% noun use), words that were predominantly verbs with some noun uses (1-33% noun use) and words that were used roughly evenly in both categories (33-66% noun use). The patterns of use across mothers are consistent. All mothers use very few words equally in both categories. This shows that young children do not hear many words that are perfectly ambiguous between noun and verb. Rather, words appear in a single category the majority of the time with a few uses in the alternate category. For all mothers, verbs are more likely to be occasionally used across category than nouns are. This may be due to the high frequency of “light verb” constructions in speech to children (Barner, 2001; Theakston, Lieven, Pine & Rowland, 2004). These findings indicate that use of words as both noun and verb is not as prevalent in speech to young children as it might be, given that roughly one third of the nouns and verbs they hear can be used across category. Neither, however, is it so rare as to be irrelevant to the problem of language learning. Because the noun/verb ambiguity is just one potential source of ambicategoricality in speech to children, I turn now to another potential category ambiguity: those words that can be used as both verb and adjective. 2.1.2 Study1b: The verb/adjective ambiguity Although the noun/verb distinction is the most often discussed in the literature on grammatical category learning, there are many more grammatical categories that children must learn. Among these is the adjective category, which has primarily been studied in terms of semantic development, rather than syntactic (e.g., Waxman & Booth, 2001; Mintz, 2005). Adjectives are interesting to students of lexical semantic development 38 because they can refer to properties that are independent of an object’s identity (like verbs), but often have concrete or durative referents (like nouns). This intermediate semantic status also confers the ability to shift between lexical categories. Study 1b examines the nature of verb/adjective ambiguity in speech to children. Study 1c will address the issue of noun/adjective ambiguity. Verb/adjective ambiguity may result from homophony or derivation, much like the noun/verb ambiguity, but with the added complication that some bound morphemes in English may be used to derive a form that is ambiguous between adjective and verb, even though the verb root itself is not (e.g., -ing, -ed, although stress patterns help to disambiguate some verb/adjective pairs derived though the –ed suffix). A similar pattern emerges in other languages, including French, in which the past participle form of many verbs also has an adjectival use. There is virtually no previous work on young children’s experience with words that are ambiguous between verb and adjective. However, the logical arguments for why such ambiguity may be problematic for learners hold here as well. A learner hearing (2a, b, c) would draw the conclusion that (2d) is grammatical. 2. a. The dog is running along the beach. b. The running dog barks. c. The brown dog barks. d. *The dog is brown along the beach. Indeed, somewhat similar errors are found in the early utterances of French-learning children (Pinker, 1989). I now ask whether verb/adjective ambicategoricality, like 39 noun/verb ambicategoricality, is present in speech to children and, therefore, poses a potential problem for lexical category learning. Method Corpora. The six corpora analyzed in study 1a were also used for this study. In this analysis, only the maternal speech from each corpus was considered. Again, for all six children, mother is primary caregiver and interlocutor. Procedure. Drawing on the maternal frequency counts calculated for Study 1a, words from three frequency ranges were analyzed. High frequency words were those used more than 150 times by the mother, middle frequency words were those used 40-60 times and low frequency words were those used 3-10 times. Within each frequency range, every word type was placed in one of two categories: “adjective or verb” and “neither adjective nor verb”. Then, all those words that were adjectives or verbs were further categorized as potentially ambicategorical or not. Whether or not a word was potentially ambicategorical was based on an analysis of the Brown Corpus (Francis & Kucera, 1983). Words that were used at least once as an adjective and at least once as a verb in the Brown Corpus were considered potentially ambiguous. For a complete list of potentially ambiguous verbs and adjectives in these corpora, see Appendix B. For every word type that was potentially ambicategorical, each utterance including one or more tokens of that type was extracted from the corpus, and each token was classified by hand as an adjective, a verb or “other.” Single word utterances and metalinguistic uses were classified as “other.” A token was considered an adjective if it modified a noun or stood as the head of a predicate adjective phrase. No distinction was made among the various subclasses of adjectives, as no such distinction was made in 40 Study 1a regarding subclasses of verbs. A token was counted as a verb if it was modified by an adverb, took noun phrase or prepositional phrase arguments or could be replaced with a pro-verb. In cases of ambiguous contexts, tokens were classified as “other.” The breakdown of number of types analyzed in each corpus is shown in Table 2-3. Classification was done by trained coders. The total proportion of potentially ambicategorical words that were actually used across category was calculated for each mother as the number of words used at least once as both adjective and verb divided by the total number of potentially ambiguous words analyzed. To obtain a better idea of how ambicategoricality relates to frequency of use, for each frequency range for each mother, the same kind of calculation was done on only those word types within a given frequency range. These numbers provide an estimate of how many of the potentially Results The proportions of ambicategorical use for all potentially verb/adjective ambiguous types used by a given mother ranged from .17-.26. Only one mother used more than a quarter of the potentially ambicategorical words across category boundaries. Figure 2-3 shows the results broken over frequency ranges. Because the particular word types within each frequency range are different for each mother, these data cannot be directly compared. However, the relationship between a word’s frequency and the likelihood that it would be used in both categories was not consistent across mothers. These data may be somewhat skewed by the very small number of ambiguous word types in some frequency ranges for some mothers. 41 Those words that were used as both noun and verb were only rarely used equally as both. To assess whether this pattern is also present in those words that can be used as both verb and adjective, a similar comparison was done with these data. Figure 2-4 shows the percent verb use of high and middle frequency words. Words used as verbs 100% of the time are unambiguously verbs, while those used as verbs 0% of the time are unambiguously adjectives. The majority of potentially ambiguous words in child- directed speech are used in an unambiguous way. Words that were used at least once as both verb and adjective were sorted into three bins: words that were predominantly used as verbs, with some adjective uses (99-66% verb use), words that were predominantly adjectives with some verb uses (1-33% verb use) and words that were used roughly evenly in both categories (33-66% verb use). The patterns of use across mothers are not consistent. Although most mothers use very few words equally in both categories, some mothers use more words primarily as verbs but occasionally as adjectives than strictly as verbs. This suggests that young children’s experience with verb/adjective ambiguity is qualitatively somewhat different from their experience with noun/verb ambiguity. Although verb/adjective ambiguity is less common, words that do display this ambiguity are more likely to be used evenly in both categories than are words that are ambiguous between noun and verb. Like noun/verb ambicategoricality, use of a single word as both verb and adjective is less prevalent than it might be. Furthermore, this kind of ambiguity it is not so rare as to be irrelevant to the problem of language learning. The next study will examine a third possible source of ambicategoricality in the environment of a language learner: noun/adjective ambiguity. 42 2.1.3 Study 1c: The noun/adjective ambiguity Like verbs, adjectives refer to properties that are independent of an object’s identity. However, adjectives are also like nouns in that their referents may be physical and durative in the world, as opposed to verb referents, which tend to be non-ostensive. Some adjectives, primarily color and material words, also have noun uses, creating another potential source of ambicategoricality in child-directed language. Like the verb/adjective ambiguity, there is no previous research on the use of words as both noun and adjective in speech to children. Unlike noun/verb and verb/adjective ambiguity, almost all noun/adjective ambiguous words are derived forms. That is, there is a systematic semantic link between the two uses and many of the words that can undergo these derivations are from a few semantic classes. However, these semantic links alone cannot be relied on to avoid the potential problem posed by ambicategoricality. The arguments put forth by Pinker (1987) and others regarding the interaction of distributional factors to lead to ungrammatical utterances hold for these forms as well. For example, the sentences in (3a-c) might lead a child to conclude that (3d) is grammatical. 3. a. John broke the glass vase. b. John broke the new vase. c. This glass is broken. d. *This new is broken. This study asks whether young children encounter noun and adjective uses of the same word form in their language environments. Method 43 Corpora. The maternal speech from the six corpora analyzed in studies 1a and 1b was also analyzed for this study. Procedure. Drawing on the maternal frequency counts calculated for Study 1a, words from three frequency ranges were analyzed. High frequency words were those used more than 150 times by the mother, middle frequency words were those used 40-60 times and low frequency words were those used 3-10 times. Within each frequency range, every word type was placed in one of two categories: “noun or adjective” and “neither noun nor adjective”. Then, all those words that were adjectives or nouns were further categorized as potentially ambicategorical or not. Whether or not a word was potentially ambicategorical was based on an analysis of the Brown Corpus (Francis & Kucera, 1983). Words that were used at least once as an adjective and at least once as a noun in the Brown Corpus were considered potentially ambiguous. For a complete list of potentially ambiguous nouns and adjectives in these corpora, see Appendix C. For every word type that was potentially ambicategorical, each utterance including one or more tokens of that type was extracted from the corpus, and each token was classified by hand as a noun, an adjective or “other.” Single word utterances, proper nouns and metalinguistic uses were classified as “other.” A token was considered an adjective if it modified a noun or stood as the head of a predicate adjective phrase. This potentially includes uses of nouns as noun modifiers; however, this distinction can only be made over a number of observations. To approach the problem with the kind of information a learner might have, all noun modifiers were considered adjectives for the purposes of coding. A token was considered a noun if it was modified by an adjective, appeared as the head of a noun phrase, was an argument of a verb or could be replaced 44 with a pronoun. Where context was ambiguous, tokens were classified as “other.” The breakdown of number of types analyzed in each corpus is shown in Table 2-4. Classification was done by trained coders. The total proportion of potentially ambicategorical words that were actually used across category was calculated for each mother as the number of words used at least once as both noun and adjective divided by the total number of potentially ambiguous words analyzed. To obtain a better idea of how ambicategoricality relates to frequency of use, for each frequency range for each mother, the same kind of calculation was done on only those word types within a given frequency range. These numbers provide an estimate of how many of the word types that are potentially ambiguous between adjective and noun were used across category boundaries at least once in speech to a given child. Results The proportion of cross-category use was much higher among those words that are ambiguous between noun and adjective than among the other two comparisons, ranging from .21-.42. Figure 2-5 shows the results broken over frequency ranges. Because the particular word types within each frequency range are different for each mother, these data cannot be directly compared. However, all mothers showed a similar relationship between the frequency of a potentially ambicategorical word and the likelihood that it would be used as both noun and adjective. Specifically, words in the high and middle frequency ranges were more likely to be used across category than were words in the low frequency range. Five of the six mothers in this study were more likely to use high frequency words as both noun and adjective than words from the other two frequency ranges. 45 In the two previous studies, only a few of the potentially ambiguous words were used equally often in both categories. To assess whether this pattern is also present in those words that can be used as both noun and adjective, a similar comparison was done with these data. Figure 2-6 shows the percent noun use of high and middle frequency words. Words used as nouns 100% of the time are unambiguously nouns, while those used as nouns 0% of the time are unambiguously adjectives. The majority of potentially ambiguous words in child-directed speech are used in an unambiguous way. Words that were used at least once as both noun and adjective were sorted into three bins: words that were predominantly used as nouns, with some adjective uses (99-66% noun use), words that were predominantly adjectives with some noun uses (1-33% noun use) and words that were used roughly evenly in both categories (33-66% noun use). The patterns of use across mothers are not consistent. Some mothers show the same v-shaped pattern found in the data on words that are noun/verb ambiguous. Other mothers show an inverted pattern, with a high proportion of words being used in each category with roughly equal frequency. This shows that some children hear many words that are perfectly ambiguous between noun and adjective while others hear very few words used in a perfectly ambiguous way. These findings indicate that noun/adjective ambicategoricality in speech to young children may be somewhat different from noun/verb and verb/adjective ambicategoricality. Some mothers use over 40% of potentially ambiguous words in both categories. These data suggest that children may treat the noun/adjective ambiguity differently than the other two potential sources of ambiguity that these studies have examined. 46 2.1.4 Discussion This set of studies was intended to quantify young children’s experience with ambicategoricality. The three sub-studies each examined a different potential source of ambicategoricality in natural speech to six children. The naturalistic, longitudinal nature of these corpora and the inclusion of homophonous forms, as well as those that are derived with null morphology, makes this set of studies the most comprehensive examination yet of cross-category usage in speech to children. Furthermore, no previous work has looked beyond the noun/verb distinction to assess whether cross-category usage may also pose a problem for the learning of other lexical categories, such as adjective. The findings indicate that children hear far less cross-category usage than they could, especially with regard to noun/verb and verb/adjective ambiguity. Nevertheless, they do not hear so little as to make the phenomenon uninteresting or trivial. Roughly a quarter of the words that could be used as both nouns and verbs are used in both categories in speech to children. A slightly smaller proportion of the words that can be used as both verb and adjective appear in both categories in child directed speech. Although the situation is not as dire as the sort of worst-case scenario described by Pinker (1987), in which cross-category use wreaks havoc with the very possibility of distributional category learning, it is also not the case that young children never hear words used across category. One question that these results raise is whether the overall rate of cross-category usage in speech to children is higher or lower than that in speech to adults. Because a word’s potential for ambicategoricality in these studies was based on a large corpus of adult-directed written language, words are clearly used less often across categories in 47 speech to children than they are in written language directed at adults. Unfortunately, no large corpus of natural adult-directed speech that is comparable to these corpora of child- directed speech (e.g., both speakers are in the same location, the speakers are highly familiar to one another, etc.) is available. Therefore, it is not possible to address whether cross-category usage occurs more or less often in speech to children than in speech to adults. Should a comparable corpus become available, this is a very interesting area for further research. Another aspect of these studies to consider is that words were selected for analysis on the basis on their frequency in the corpus. This was motivated partially by practical reasons, as these corpora contain too many word types to allow for analysis of every word type. However, using frequency to select words for analysis also allows for assessment of how commonly ambicategorical words appear in speech to children. If only very low frequency words are used ambiguously, learners might have very limited exposure to category ambiguity. The evidence presented here suggests that learners have more experience with category ambiguity in high and middle frequency words. That is, ambicategoricality is not a property of rare or obscure words that children hear only occasionally; even very highly frequent words are used in more than one category in speech to children. Other factors could have been used to select words for analysis, including semantic factors such as concreteness of meaning or developmental factors such as age of acquisition. Use of semantic factors is potentially difficult, as they frequently fall along a spectrum and may be subjective, although there is consistency across raters (Cortese & Fugett, 2004). Age of acquisition norms might bring an interesting perspective on whether words that can be used ambiguously are acquired later 48 or earlier than words that are not ambiguous. However, this causal relationship may affect the results. If only words that are acquired early are examined and words that are ambiguous are acquired late, selecting words on the basis of age of acquisition may underestimate the frequency of ambicategoricality in speech to children. In many cases, even those words that are used in both of the categories being considered are not used equally across the two. This may be interpreted one of two ways. First, one could suggest that because use in the alternate category is rare, these words need not posit major difficulty for learners; they may simply ignore those uses that do not conform to their expectations about how the word should behave. This kind of argument presupposes that children have some sense of which contexts go with which category, that is, which uses are the anomalous ones. If this were the case, the general problem of category learning, regardless of ambiguity, would be must more straightforward than it really is. Learners must not only determine which words belong to which categories, but they must also induce which contexts predict which categories. Neither of these pieces of information is given directly to the learner. A second possible interpretation of these facts about how many uses in each category a word is likely to have is that words with fewer uses in the alternate category are no easier to incorporate into the system of lexical categories than those words with an equal number of uses in both categories. In either case, the learner must somehow deduce that the word in question is being used in more than one category, whether the cross-category use is frequent or rare. If, however, a word is frequently used in more than one category, learners might notice that the word appear in two distinct referential contexts or with two distinct prosodic forms, allowing them to begin segregating uses. Words that have only 49 one or two uses in the alternate category are potentially more difficult for learners, who must somehow avoid including the occasional cross-category use in their assessment of its syntactic behavior, or, alternatively, flag those cross-category uses as somehow distinct from the others. When the noun/adjective ambiguity is brought into the picture, the situation becomes more interesting. Although there are fewer word types that can cross this category boundary, they are fairly frequent in speech to children and most such word types are used in both categories. Also, unlike words that can be used as both noun and verb or as both verb and adjective, these words are more likely to be used roughly equally in both categories. This suggests that the noun/adjective ambiguity might be a sort of middle- ground between the noun/verb and verb/adjective ambiguities. Like the verb/adjective ambiguity, relatively few words are ambicategorical between noun and adjective, but like the noun/verb ambiguity, a decent proportion of those words that can be used across category are. Together, these three studies suggest that ambicategoricality is, in fact, a potential problem for grammatical category learning. This raises the issue of how language learners might cope with this ambiguity. One possibility is that learners would use the changes in referent that accompany changes in lexical category to distinguish cross- category uses of the same word (e.g. Pinker, 1987). However, this strategy only works for those words for which use in a different category really has a distinct referent (e.g., fit); it is not clear how such a strategy would apply to words that refer to basically the same event regardless of lexical category (e.g., hug). A second strategy would be to use morphological cues. Given that these corpus analyses were based on morphologically 50 unanalyzed words, however, it would appear that morphology is not a particularly strong cue to disambiguating across categories. For words that are ambiguous between verb and adjective, morphology creates more ambiguity than it resolves. Another possible cue to distinguish use of a word in one category from use in another category is prosody. Because different lexical categories appear in different phrasal and clausal positions, the prosodic properties associated with those positions may help learners to distinguish cross- category uses of the same word. The next study evaluates the availability of such cues in speech to young English-learning children. 2.2 Prosodic cues to category Grammatical categories are defined by their privileges of occurrence in sentences. Nouns are words that behave like nouns: appearing as arguments of verbs and prepositions, taking adjectival modifiers, etc.; verbs act like verbs: taking noun phrase and prepositional phrase arguments, etc. These occurrence properties correlate with particular sentential (and phrasal and clausal) positions: verbs are more likely than nouns to be sentence-medial, as opposed to sentence-final, adjectives are only rarely phrase final, etc. Various sentential positions have distinct prosodic properties; in particular, phrase and utterance final words tend to be lengthened relative to phrase and utterance medial words (Sorenson, Cooper & Paccia, 1978). Given these facts about lexical category distribution and how it might interact with prosody, perhaps language learners use the prosodic properties of individual lexical tokens to distinguish cross-category uses of a single word type. The same word type has different prosodic properties depending on its grammatical category (Sorenson, et al., 1978; Shi & Moisan, 2008). In adult-directed English, these cues are based solely on 51 sentential position. That is, when the same word type is used as both noun and verb, noun tokens are longer than verb tokens; however, this effect disappears when sentential position is controlled for. The prosodic properties of speech to young children tend to be more exaggerated than the prosodic properties of adult-directed speech (Ferguson, 1964; Fernald, Taeschner, Dunn, Papousek, Boysson-Bardies, & Fukui, 1989; Fisher & Tokura, 1996). Therefore, we might expect prosodic cues to the lexical category of a token of an ambicategorical word to be more apparent in child-directed speech (Kelly, 1992). Indeed, in a study of elicited infant-directed French, Shi and Moisan (2008) found that mothers reliably distinguish noun and verb tokens of the same word using prosodic cues, independent of sentential position. In that study, noun tokens of disyllabic nonce words had longer duration than did verb tokens and the duration of each syllable also varied depending on the lexical category. The generalizability of these results is limited by two factors. First, these analyses were based on mothers reading text to their children; read speech to children has somewhat different phonetic properties than spontaneous speech, including increased duration and expanded vowel space (Song, 2005). Second, these effects may be related to the disyllabic nature of the nonce word stimuli. While disyllabic words are frequent in French and these cues may be available to French-learning infants, relatively few words in speech to young English-learning children have disyllabic roots. An examination of the Demuth Providence Corpus (Demuth, et al., 2006) revealed that fewer than 20% of the words in the high, middle and low frequency ranges defined in Study 2-1 had di- or multi-syllabic roots. 52 To evaluate whether such prosodic cues to lexical category might be available to English-learning children, Study 2-2 examines tokens of ambicategorical words in five corpora of natural, child-directed English. Each token is measured along six acoustic dimensions and a k-means clustering algorithm is applied to those measurements to assess whether those dimensions reliably distinguish noun and verb tokens of the same word type. I first consider each dimension separately and then consider them together, as multiple cues in conjunction with one another have been shown to provide better categorization than any individual cue to, for example, the function/content word distinction (Shi, Morgan & Allopenna, 1998). This study will determine whether prosodic cues that differentiate cross-category uses of the same word are available in speech to language learners. Whether or not learners are sensitive to these cues will be assessed in Chapter 3. Method Corpora. Five longitudinal corpora of maternal speech were examined. All five of these corpora came from the Demuth Providence Corpus (Demuth, Culbertson & Alter, 2006). These are the same children whose input was analyzed in Study 2-1, excluding the Nina corpus, as those audio files are not available. The ages and number of recordings for each corpus are presented in Table 2-1. Procedure. Based on the analysis from Study 2-1a, all words with at least 10 noun uses and 10 verbs uses in a given corpus were evaluated for prosodic cues to category. The analysis was conducted within a corpus, as individual mothers’ speech may have somewhat different prosodic properties. The word types and number of tokens in each category are listed in Table 2-5. For each word type with at least 10 noun and 10 verb 53 uses, all tokens were extracted from the accompanying video and digitized. This process was done using QuickTime Pro in conjunction with the CLAN program. The audio track was extracted from the video using SoundConverter. To facilitate comparison with chance, an equal number of tokens were selected from each category. If a corpus contained more tokens of a word in one lexical category than another, tokens were randomly selected for analysis prior to being extracted or measured. Sentential or phrasal position was not taken into account during the extraction process. After tokens were extracted from the corpus and converted to sound files, six prosodic properties of each token were measured using PRAAT (Boersma & Weenink, 2008): token duration (ms), vowel duration (ms), mean pitch (Hz), minimum pitch (Hz), maximum pitch (Hz) and pitch change (ST). All measurements except vowel duration were taken from the onset of the word to the offset of the word. Vowel duration was measured on the basis of the formant transitions marking the onset and offset of the vowel. To assess the reliability of these prosodic cues for categorizing word tokens as either noun or verb, we plotted each token of a word type along an axis. Then, we performed a linear discriminant analysis with spherical pooled co-variance. This procedure fits a Gaussian probability distribution function to each category of labeled data and determines which distribution is the most likely source of each data point. This analysis was conducted within word type, as tokens of the same word type from different categories have more similar durations than do tokens of different word types from the same category. The issue at hand is whether tokens of the same word might be different depending on syntactic category. Therefore, the critical comparison is within type, rather 54 than between. The linear discriminant analysis was conducted on every measurement of each word type for each mother. Previous studies of prosodic cues to lexical categorization found that multiple cues provide more accurate categorization than any single cue (Shi, et al., 1998). To assess whether this is also the case for these data, we created a multi-dimensional space for each word type, using each of the six prosodic measurements as a dimension. This procedure captures the power of all of the cues together, as opposed to the individual contributions of each. Like the uni-dimensional analysis, this procedure was conducted within word type and within mother. Although Study 2-1 examined three possible sources of ambicategoricality in speech to children, this analysis focuses only on the noun/verb ambiguity. This is in part due to a limited number of words, both types and tokens, used across the verb/adjective boundary and due to the limited number of word types used as both noun and adjective. Even the noun/verb distinction provided somewhat limited data for this analysis. Results To evaluate which individual cue is most effective for categorization, the percentage of tokens correctly categorized by the linear discriminant was calculated; these data are shown in Figure 2-7. Although the measurements were evaluated within word type, the accuracy data are collapsed over types. Because an equal number of noun and verb tokens were measured for each word type, chance is 50%. For all mothers, durational cues are most reliable for categorization, as compared with pitch cues. However, there is variability among mothers regarding which duration cue (vowel or overall token) is most useful. Mothers also vary in terms of how reliably they use prosodic cues to distinguish 55 noun and verb tokens of the same word. For example, Ethan’s mother did not show good separation of noun and verb tokens using any one of these prosodic cues. On the other hand, Alex’s mother used duration very consistently to distinguish noun tokens and verb tokens of the same word. When these results are combined over mothers, highest accuracy of categorization is achieved using token duration. Pitch cues are not much better than chance. One possible source of these prosodic cues is utterance position. Because nouns may occur more frequently in utterance final position, the longer duration of noun tokens in this study may simply be the result of utterance-final lengthening. To assess this possibility, token duration was entered into a univariate ANOVA with utterance position (medial or final) and grammatical category (noun or verb) as fixed factors. The small number of verb tokens in utterance-final position for some mothers required that all observations be collapsed into a single ANOVA for adequate statistical power to be obtained. Table 2-6 shows the number of tokens of each category appearing in each utterance position. The ANOVA indicates that noun tokens are not longer than verb tokens in utterance final position, but noun tokens are significantly longer than verb tokens in utterance medial position (p<.05). Mean durations and standard error for these comparisons are shown in Figure 2-8. These results suggest that noun tokens of words have longer duration than do verb tokens of those same words, but this effect is neutralized by phrase-final lengthening. Perhaps, however, the best categorization will be achieved when all possible cues are considered. Indeed, there is precedent for such a pattern in cues to the function/content word distinction (Shi, et al., 1998). Taken together, the prosodic cues examined in this 56 study correctly categorized 73% of word tokens as noun or verb. Differences were seen among mothers in terms of how reliably this set of prosodic cues distinguished nouns from verbs. The percentages of tokens correctly categorized using all of the cues are shown in Figure 2-7 (“combined”); accuracy is shown both within and across mothers. Again, these accuracy data have been collapsed across word types. Considering all cues together results in a slight increase in accuracy within the speech of each mother, as well as over the entire dataset. The accuracy when all cues are considered is 5-10% better than the accuracy for any single cue. Discussion The work presented here demonstrates that noun and verb tokens of the same word can be distinguished on the basis of their prosodic properties at a better than 70% rate. However, there is variability among mothers in terms of how reliably they use individual cues in spontaneous speech. This variability might predict variation among children’s use of these words across categories. The corpora examined here do not contain sufficient tokens of noun/adjective or verb/adjective ambiguous words for a reliable linear discriminant analysis to be performed. Very small numbers of observations tend to result in overfitting of the linear discriminant and produce unreliable results. Overall, noun tokens have longer durations than verb tokens of the same word types. Although there is variability in the duration ranges for different word types across mothers, due to such factors as number of phonemes and maternal speaking rate, duration is the most consistent cue available for distinguishing noun and verb uses of the same word. Nouns are more likely than verbs to appear in utterance final position, but durational differences between nouns and verbs persist in utterance medial position. That 57 is, even when they are not in utterance final position, noun tokens of words are longer than verb tokens of words. This suggests that prosodic cues to category are available over and above positional cues to category. Prosodic cues to lexical category are just one set of cues that might be available to learners to distinguish noun and verb uses of the same word. Nevertheless, it is a potentially powerful source of information. Children do not need to know the semantic distinctions between noun and verb uses of the same word to use these cues; rather, the information is available in the speech signal. In this way, learners might differentiate between the two uses and learn two word forms that have distinct prosodic and syntactic properties, rather than a single ambicategorical word. Extending these analyses to words that are noun/adjective or verb/adjective ambiguous would help to elucidate the nature of these cues as well as assess their availability to children. Unfortunately, the available corpus data do not provide adequate tokens of such words to allow linear discriminant analyses of this type. While statistical analyses such as t-tests might provide some indication of differences in the mean values of various prosodic measurements by category, differences in the mean alone do not indicate that the two groups are sufficiently separable to allow for categorization of individual observations. Therefore, an analysis of the availability of prosodic cues to category for words that are verb/adjective or adjective/noun ambiguous cannot be included in this dissertation. The presence of a cue in speech to children does not mean that children are able to use it for language learning purposes. To be useful for learning, a cue must not merely be available in speech; learners must also be sensitive to that cue. Infants have been shown 58 to be sensitive to prosodic information for segmenting speech (Jusczyk, Houston &Newsome, 1999; Thiessen & Saffran, 2003) as well as for detecting phrase and clause boundaries (Nazzi, Kemler Nelson, Jusczyk & Jusczyk, 2000; Soderstrom, Seidl, Kemler Nelson, & Jusczyk, 2003). Because infants are sensitive to prosodic information in fluent speech, it is possible that they may also be able to use that information to distinguish two similar lexical forms. Whether they are able to do so is an empirical question that will be taken up in the next chapter. 2.3 General Discussion This chapter has examined the nature of cross-category word use in speech to children in two ways. First, it describes how frequently words that are potentially ambiguous are actually used in more than one lexical category in speech to children. This analysis includes not only the well-known noun/verb ambiguity, but also the verb/adjective and noun/adjective ambiguities. Next, it asks whether prosodic cues to category are available in natural, child-directed speech by examining the acoustic/prosodic properties of noun/verb ambiguous words extracted from child-directed speech corpora. The distribution of potentially ambiguous words in the corpora differs depending on the nature of the ambiguity. Words that are noun/verb ambiguous are typically only used in a single category, and those types that are used across category in speech to children are rarely used equally in both categories. Conversely, words that are noun/adjective ambiguous are very often used in both categories in child-directed speech and may appear roughly equally in both categories. It is difficult to draw conclusions regarding the nature of the verb/adjective ambiguity in children’s linguistic experience because 59 there are very few such word types in these corpora. At the very least, however, we can conclude that such words are rare in the experience of most children. The fact that some words are used in more than one lexical category in speech to children means that learners must somehow incorporate these words into their grammar without disrupting their system of lexical categories. One means of accomplishing this task would be to somehow identify when words were used in more than one category. Alternatively, learners might simply learn two distinct forms, one that appears in the contexts associated with one category and another that appears in contexts associated with another category. Learners need not recognize a relationship between these word forms, at least early in development. The data from Study 2 show that learning two distinct forms, rather than one that is ambicategorical, may be possible. Noun uses of a word are prosodically distinct from verb uses of the same word. The prosodic differences between nouns and verbs may arise in part from the different sentence positions in which they appear, as is the case in adult-directed speech (Sorenson, et al., 1978). However, work on Canadian French indicates that these prosodic differences between noun and verb forms of the same word are independent of sentential position in infant-directed speech (Shi & Moisan, 2008). Because Study 2 examined naturally produced language, it was not possible to control for sentential position, nor was there sufficient data to systematically examine the role of sentential position in creating these prosodic cues. Nevertheless, these prosodic cues to category do exist. If they are correlated with sentential position, learners may be able to use both kinds of cues, prosodic information and sentential position, to help them distinguish cross-category uses of a word. The idea that learners might use correlated cues to learn 60 lexical categories has been discussed elsewhere in the literature (Shi, Morgan & Allopenna, 1998; Monaghan, Chater & Christiansen, 2005, 2007) and studies of artificial language learning in infants find that the presence of multiple correlated cues improves category learning (Gómez & Lakusta, 2004; Gerken, Wilson & Lewis, 2005). The studies reported in this chapter were intended to describe the nature of the ambicategoricality problem in speech to children. The findings suggest that the situation is not as dire as some theorists have suggested (e.g., Pinker, 1987), but also indicates that children hear a considerable amount of cross-category word use in their language environments. Still, this cross-category usage need not pose a major problem to learners, as prosodic cues help to differentiate noun and verb uses of the same words. These findings raise two questions. First, are young language learners sensitive to this prosodic information? If they are, then they might be able to create two distinct word forms. If they are not sensitive to these cues, however, they probably cannot use prosody to help solve the ambicategoricality problem. This question will be addressed in Chapter 3. The second issue that these findings raise is how learners interpret and use these words in their own productions. Do children restrict words to a single category in their speech or do they produce words in more than one category? If they do produce words in more than one category, how are their productions related to those of their parents? These questions will be taken up in Chapter 4. 61 Table 2-1 Child Sex Age Range (years; months) # of Files Alex M 1;5-3;5 52 Ethan M 0;11-2;11 50 Lily F 1;1-4;0 80 Nina F 1;11-3;3 52 Violet F 1;2-3;11 52 William M 1;4-3;4 44 Table 2-2 # Noun or Verb Types # Potentially Ambicategorical # Used Across Categories High Middle Low Total High Middle Low Total High Middle Low Total Alex 63 81 780 924 27 36 208 271 9 10 45 64 Ethan 72 101 938 1111 28 39 210 277 10 14 40 64 Lily 185 179 1652 2016 39 46 291 376 17 26 76 119 Nina 75 103 677 855 30 45 175 250 6 13 28 47 Violet 47 77 1042 1166 18 35 266 319 4 13 69 86 William 45 73 717 835 21 32 193 246 8 10 46 64 Table 2-3 # Verb or Adjective Types # Potentially Ambicategorical # Used Across Categories High Middle Low Total High Middle Low Total High Middle Low Total Alex 63 65 451 579 6 9 19 34 2 5 1 8 Ethan 63 68 554 685 7 8 20 35 2 2 2 6 Lily 132 108 962 1202 15 11 32 58 5 6 4 15 Nina 65 68 398 531 5 4 16 25 0 2 0 2 Violet 43 53 618 714 3 4 30 37 1 0 6 7 William 40 48 414 502 3 6 24 33 1 2 4 7 62 Table 2-4 # Noun or Adjective Types # Potentially Ambicategorical # Used Across Categories High Middle Low Total High Middle Low Total High Middle Low Total Alex 46 51 506 603 10 8 31 49 8 4 7 19 Ethan 65 106 783 954 13 10 27 50 10 6 5 21 Lily 154 157 1377 1688 21 17 46 84 16 9 10 35 Nina 67 97 571 735 6 5 17 28 3 2 1 6 Violet 40 77 966 1083 5 4 48 57 2 2 12 16 William 36 68 572 676 5 9 30 44 4 5 6 15 63 Table 2-5 Word Tokens in each category Alex Drink 12 Help 23 Kiss 11 Turn 59 Ethan Help 18 Paint 20 Scoop 14 Whistle 22 Lily Can 27 Clip 12 Color 30 Dance 21 Fit 11 Fly 12 Help 31 Kiss 35 Love 17 Show 20 Slide 13 Sound 22 Stick 11 Work 50 Violet Ride 16 William Catch 14 Color 49 Help 12 Saw 15 Sound 17 Try 13 Table 2-6 Noun Tokens Verb Tokens Utterance Medial 334 505 Utterance Final 270 99 64 Figure 2-1 1 Proportion used across category 0.9 0.8 0.7 0.6 High 0.5 Medium 0.4 0.3 Low 0.2 Total 0.1 0 Alex's Ethan's Lily's Violet's William's Nina's Mother Mother Mother Mother Mother Mother The proportions of potentially ambiguous words that mothers actually use as both noun and verb are shown, broken down by frequency range (high, medium and low), as well as collapsed over all three frequency ranges (total). Figure 2-2 1 Proportion of ambicategorical 0.9 0.8 0.7 0.6 100% Noun words 0.5 66-99% Noun 0.4 33-66% Noun 0.3 1-33% Noun 0.2 0.1 0% Noun 0 Alex's Ethan's Lily's Nina's Violet's William's Mother Mother Mother Mother Mother Mother The proportions of potentially ambiguous words used a given percentage of the time as nouns by each mother are shown above. Words used as nouns 0% of the time are always used as verbs. 65 Figure 2-3 1 Proportion used across category 0.9 0.8 0.7 0.6 High 0.5 Medium 0.4 Low 0.3 0.2 Total 0.1 0 Alex's Ethan's Lily's Nina's Violet's William's Mother Mother Mother Mother Mother Mother The proportions of potentially ambiguous words that mothers actually use as both verb and adjective are shown, broken down by frequency range (high, medium and low), as well as collapsed over all three frequency ranges (total). Figure 2-4 1 Proportion of ambicategorical 0.9 0.8 0.7 0.6 100% Verb words 0.5 66-99% Verb 0.4 33-66% Verb 0.3 1-33% Verb 0.2 0.1 0% Verb 0 Alex's Ethan's Lily's Nina's Violet's William's Mother Mother Mother Mother Mother Mother The proportions of potentially ambiguous words used a given percentage of the time as verbs by each mother are shown above. Words used as verbs 0% of the time are always used as adjectives. 66 Figure 2-5 1 Proportion used across category 0.9 0.8 0.7 0.6 High 0.5 0.4 Medium 0.3 Low 0.2 Total 0.1 0 Alex's Ethan's Lily's Nina's Violet's William's Mother Mother Mother Mother Mother Mother The proportions of potentially ambiguous words that mothers actually use as both noun and adjective are shown, broken down by frequency range (high, medium and low), as well as collapsed over all three frequency ranges (total). Figure 2-6 1 Proportion of ambicategorical words 0.9 0.8 0.7 0.6 100% Noun 0.5 66-99% Noun 0.4 33-66% Noun 0.3 1-33% Noun 0.2 0% Noun 0.1 0 Alex's Ethan's Lily's Nina's Violet's William's Mother Mother Mother Mother Mother Mother The proportions of potentially ambiguous words used a given percentage of the time as nouns by each mother are shown above. Words used as nouns 0% of the time are always used as adjectives. 67 Figure 2-7 100% Percent correctly classified 90% 80% 70% Token Duration 60% Vowel Duration 50% Mean Pitch 40% Minimum Pitch 30% Maximum Pitch 20% Pitch Change 10% Combined 0% Alex's Ethan's Lily's Violet's William's Mother Mother Mother Mother Mother Accuracy data from the linear discriminant analysis on noun and verb tokens of the same words. Chance is 50%; the most accurate cues to lexical category are durational. When all cues are considered together (“combined”), accuracy does not improve significantly over the most accurate single cue for most mothers. Figure 2-8 500 Token Duration (ms) 450 400 350 Noun 300 Verb 250 200 Utterance Medial Utterance Final Mean durations for tokens of each category appearing in one of two utterance position. Noun tokens are longer than verb tokens in utterance medial position. The two categories do not differ in duration when they appear in utterance final position. CHAPTER 3 When words are used in more than one lexical category, learners may assume that all uses are from a single category. If this is the case, children could conflate lexical categories and the syntactic contexts in which they occur. (For a more complete discussion of this problem, see Chapter 1 of this dissertation; Pinker, 1987). If, however, uses of a word type in one category are distinct in some way from uses of that word in another category, children may be able to segregate tokens of a word into noun uses or verb uses or adjective uses. Two cues that could be used to make this distinction are meaning and prosody. The role of meaning in distinguishing cross-category uses of the same word type has been discussed primarily in the context of the semantic bootstrapping hypothesis (Pinker, 1987; Pinker, 1989; Nelson, 1995; Barner, 2001; Oshima-Tanake, Barner, Elsabbagh & Guerriero, 2001). However, there is little empirical work on exactly how reliably meaning might distinguish cross-category uses and how available such information is to children. Certainly, counter-examples abound, as many words have similar referents regardless of the grammatical category in which they are used. Because Chapter 2 of this dissertation found reliable prosodic cues to noun- and verbhood in cases of ambicategoricality, this chapter will focus on the nature of prosodic cues to category and infants’ sensitivity to them. Prosody has been shown to distinguish noun and verb uses of the same word type in both adult and infant-directed speech (Sorenson, Cooper & Paccia-Cooper, 1978; Shi & 68 69 Moisan, 2008; Chapter 2 of this dissertation). In adult-directed speech, this information arises from the characteristic sentence positions of nouns and verbs (Sorenson, et al, 1978). The examination of natural child-directed speech presented in Chapter 2 found that mothers reliably use prosody to distinguish between noun and verb uses of the same words, although that study lack sufficient data to address how the result interacted with sentential position. However, in a study of read (as opposed to spontaneous) speech to infants, Shi and Moisan (2008) showed that the prosodic cues that differentiate noun and verb homophones in Quebecois French are independent of sentential position. The availability and reliability of prosodic cues to category in the case of ambicategorical words suggests that learners might be able to use these cues to distinguish cross-category uses. Perhaps learners even have two prosodically distinct representations of such words, one that appears in one category and one that appears in the other. For this to be possible, however, learners must be sensitive to these prosodic differences. Learners have been shown to be sensitive to prosodic information for such purposes as finding syntactically relevant boundaries (Nazzi, Kemler Nelson, Jusczyk, & Jusczyk, 2000; Soderstrom, Seidl, Kemler Nelson, & Jusczyk, 2003). Perhaps they are able to use this information to distinguish between cross-category uses as well. One reason to suspect that infants may not be able to incorporate subphonemic variation into their lexical representations comes from work on the phonological specificity of children’s early words. Although studies of early word recognition indicate that young infants are sensitive to a wide variety of linguistically irrelevant acoustic information, including speaker affect, speaker gender and emphatic stress, most studies find that these sources of information are no longer used in word recognition tasks by the 70 end of the first year of life (Houston & Jusczyk, 2000; Singh, Morgan & White, 2004; Bortfeld & Morgan, submitted; but see Gorkin, 2008, for a counter-example). Learners also lose sensitivity to non-native phonemic contrasts over the first year of life (Werker & Tees, 1984; Kuhl, Williams, Lacerda, Stevens & Lindblom, 1992). They do, however, show sensitivity to single-feature mispronunciations of familiar words, with increased processing cost for increased mismatch of features (Swingley & Aslin, 2000; White & Morgan, 2008). Whether infants also remain sensitive to sub-phonemic variation for some word recognition tasks has not yet been established. Prosodic information may be the reason that ambicategoricality causes relatively few processing difficulties for adults. Previous research indicates that adults use prosody to mark syntactic boundaries in a way that disambiguates many potentially ambiguous sentences (Cooper & Paccia-Cooper, 1980; Watson & Gibson, 2004). Not only do adults produce prosody that is useful for these purposes, they are also sensitive to it as listeners (Schafer, Carter, Clifton & Frazier, 1996; Snedeker & Trueswell, 2003). The use of prosody for disambiguation by adults suggests that children may also be able to use such information to partially resolve the ambicategoricality problem. Indeed, because adults use prosody to disambiguate sentences, use of prosody by children would be especially promising as support for developmental continuity in sentence processing. This chapter contains three related studies regarding infants’ sensitivity to prosodic information for distinguishing cross-category uses of the same words. Using a habituation paradigm, I first ask whether infants can distinguish noun and verb uses of the same word types on the basis of prosody alone. Because tokens are presented in isolation, infants receive no syntactic or semantic information and any distinction that 71 they are able to make must be on the basis of word-level prosody alone. Using the same methods, I also ask whether infants can use prosody to distinguish verb and adjective uses of words or noun and adjective uses of the same words. The well-established reliability of prosodic cues to noun- or –verbhood for ambicategorical words suggests that infants should be sensitive to these cues. However, as there is no empirical work on the prosodic cues that might be available in the cases of the verb/adjective and noun/adjective ambiguities, making explicit predictions about infants’ performance in those studies is more difficult. 3.1 Infants’ perception of prosodic cues to the noun/verb ambiguity The presence of reliable prosodic cues to lexical category in the case of noun/verb ambiguous words suggests that young children might be able to use this information to dissociate uses of these words in one category from uses in another. If children are sensitive to this prosodic information, they might be able to learn two distinct forms of the same word: one that occurs in one category and one that occurs in the other. Using natural child-directed speech, this study asks whether young language learners are sensitive to the prosodic cues that distinguish noun and verb uses of the same word in natural, child-directed speech. Method Participants. A total of 36 13-month-old infants from the Providence, Rhode Island, area participated (12 male and 24 female). The mean age was 391 days (range 358-432 days). Previous work has shown that infants at this age are able to categorize words based on distribution (Gómez & Gerken, 1999; Mintz, 2006). If they distinguish between noun and verb uses of the same word at this age, they may be able to use such information in 72 real-world lexical categorization. An additional 15 infants participated in the study but were excluded due to excessive fussiness or squirminess (9), failure to habituate (1) or a looking time on either test trial that was more than two standard deviations from the group mean (5). Procedure. Infants’ ability to distinguish noun and verb uses of the same words was tested via an infant-controlled habituation paradigm. Each infant was seated in a testing room on a caregiver’s lap, while the caregiver listened to masking music over headphones. The infant’s gaze was coded by an experimenter observing via video camera from a separate control room where the audio stimuli could not be heard. At the beginning of each trial, a computer monitor mounted on the wall in front of the infant displayed a flashing yellow ball to attract the infant’s attention. Once the infant oriented toward the monitor, the yellow ball was replaced with a static black and white checkerboard pattern and the audio stimulus began to play. The audio stimulus was contingent on the infant’s looking and played only when the infant looked at the monitor. Each trial lasted for a minimum of 2 seconds and a maximum of 15 seconds or until the infant looked away for at least 2 continuous seconds, whichever came first. The average looking time on the first three habituation trials was the baseline looking time for the infant. The habituation criterion was reached when the average looking time on three contiguous trials (not including the baseline trials) declined to less than 65% of the baseline looking time. Two test trials followed the same format as the habituation trials. The dependent measure was the length of time the infant listened to each of the two test trials. 73 Design. For each of 7 monosyllabic word types, 4 noun tokens and 4 verb tokens were extracted from the audio recordings of the mother in the Lily corpus (Demuth, Culbertson & Alter, 2006). The word types were dance, drink, help, kiss, rest, slide and swing. Tokens were selected on the basis of having little extraneous noise and low co- articulation with surrounding words. The audio track for each token was extracted from the video using SoundConverter and edited using PRAAT (Boersma & Weenink, 2008). Intensity was normalized across tokens. Two sets of noun stimuli were created by randomly assigning two noun tokens of each word type to each set. Two sets of verb stimuli were created the same way. All stimulus sets contained unique, isolated tokens of the same word types. Each infant was habituated to one stimulus set. One-quarter of the participants was habituated to each of the four stimulus sets. In each trial, a new ordering of tokens was presented, created by randomly sampling from the token set without replacement. An interstimulus interval of 500 ms was used. When the infant reached the habituation criterion, two test trials were presented. On the “same” test trial, the infant heard the tokens from the other stimulus set that were of the same grammatical category. On the “switch” test trial, the infant heard the tokens from one of the stimulus sets containing items from the non-habituated category. Importantly, all tokens in both test trials were novel, but in the “same” trial they were tokens of the habituated category and in the “switch” trial, they were tokens of the other category. Order of test trials was counterbalanced across subjects. Results The results of this study are presented in Figure 3-1. Infants listened to “switch” test trials for a mean of 5.4 s (SD=2.6) and to “same” test trials for a mean of 4.4 s (SD=1.9). 74 This difference is significant (t(35)=1.89, p=.033, one-tailed, d=.44). A 2 (test trial) by 2 (habituated category) by 2 (sex) ANOVA revealed no three-way interaction (F(1, 1, 34)=1.42, p=.242), no interaction of trial by habituated category (F(1, 34)=.688, p=.413) and no interaction of trial by sex (F(1, 34)=.087, p=.77). These results show that infants can discriminate between noun and verb tokens of the same words using only the acoustic cues available in the words themselves. When the stimuli themselves are examined, it appears that the infants may be making this discrimination based on both duration and pitch cues. Using PRAAT (Boersma & Weenink, 2008), mean pitch, minimum pitch and maximum pitch were measured in Hertz and token duration as well as vowel length were measured in milliseconds. Pitch change was calculated in semitones, using the minimum and maximum pitch measured for each token. Noun tokens were reliably longer (mean duration=468 ms; SD=193) than verb tokens (mean duration=366 ms; SD=192; t(27)=1.99, p=.05). Mean pitch (242.8 Hz; SD=77.75) and minimum pitch (163.97 Hz; SD=75.55) of noun tokens were lower than mean pitch (280.5 Hz; SD=84.56) and minimum pitch (202.4 Hz; SD=75.94) of verb tokens. These differences are marginal (t(27)=1.74, p=.088; t(27)=1.90, p=.063, respectively). Noun tokens had greater pitch change (13.6 ST; SD=9.92) than did verb tokens (9.53 ST; SD=7.48), although this difference was also marginal (t(27)=1.75, p=.086). These cues may arise from the characteristic syntactic positions in which nouns and verb occur. Nouns are more likely than verbs to be phrase and clause final and therefore are more likely to be subject to phrase final lengthening and falling pitch (Sorenson, et al., 1978). However, in this study, the tokens from both categories came from both sentence-medial and sentence-final positions and previous work has 75 demonstrated that, in child-directed speech, prosodic cues to grammatical category may be available regardless of sentential position (Shi & Moisan, 2008). Discussion Because infants are can distinguish noun and verb uses of the same words based only on prosodic cues, it is possible that they could use this information to avoid the problem of ambicategoricality in acquisition. Rather than learning a single word that can be used as both a noun and a verb, they might learn two distinct, homophonous forms, one that appears in noun environments and one that appears in verb environments. This would not be a conscious strategy; rather, the consistent perceptual differences between noun and verb forms of the same word might result in the formation of two distinct lexical forms. The results of this study show that learners are able to tell the difference between noun and verb uses of the same words at the categorical level, as opposed to on an item- by-item basis. One potential concern regarding these results is that the cues available in the stimuli are somewhat stronger than those described in Chapter 2. That is, the naturally produced tokens used as stimuli in this study provide more prosodic cues to the noun/verb distinction than were found in several naturalistic longitudinal corpora. The corpus study in Chapter 2 found that durational cues most reliably distinguished noun and verb uses of the same words, while pitch cues were not particularly informative. In this study, duration was a reliable cue to category membership, but some pitch information was also marginally reliable. Although infants are able to distinguish noun and verb tokens of the same words, the fact that the stimuli in this study contained more prosodic cues to 76 category than might actually be available in a natural language learning situation limits the generalizability of the results to a real world learning situation. Another issue that must be seriously considered regarding the application of these findings to a natural language learning situation is the simplicity of the study. Participants were exposed to isolated tokens of potentially ambiguous words and, essentially, provided with information regarding category membership, in the sense that they first hear a set of words from only one category and are then asked to judge whether or not a new set of words belongs to that same category. However, the actual experience of a language learner is somewhat different. Words are not presented as isolated tokens, but embedded in sentences. Likewise, mothers do not necessarily cluster all noun uses of a word together temporally and then introduce a cluster of verb uses. Whether the ability that infants demonstrate in this study scales up to the real life problems presented to language learners is a question that can only be answered with further study, including how learners respond to these words when they are embedded in sentences and also how learners use these words in their own productions, an issue that will be taken up in Chapter 4. Still, this study indicates that learners are able to differentiate noun and verb uses of the same words on the basis of prosodic information alone. This ability may allow learners to create distinct lexical entries, one for nouns and one for verbs. If children are learning distinct forms, ambicategorical words should pose no problems for language learning. If they do not, however, such words should be more difficult to learn or may be used in only one category. One larger scale test of whether children learn multiple word 77 forms to avoid ambicategoricality is how they use these words in their own productions. This question is addressed in Chapter 4. This study shows that learners can use the prosodic cues available in natural child- directed speech to distinguish noun and verb uses of the same words. However, learners also encounter ambicategoricality between verbs and adjectives as well as nouns and adjectives. The next two studies ask whether learners can distinguish between uses in those categories, a question that may illuminate where these prosodic cues come from and also whether this distinction is about properties of nouns and verbs or properties of arguments and predicates. 3.2 Infants’ perception of prosodic cues to the verb/adjective ambiguity Young language learners are sensitive to the prosodic cues that distinguish noun and verb uses of the same words. Because the noun/verb ambiguity is not the only case of ambicategoricality that children are exposed to, I now ask whether infants are also able to distinguish cross-category uses of words that are ambiguous between verb and adjective on the basis of prosody alone. If they can, this might facilitate learning of these words by allowing for two distinct word forms, one that is used in each category. If they cannot make this distinction, however, one would predict that these words might be more difficult to learn or that children would produce them in only one category. Study 3-1 was partially motivated by the existence of naturally-occurring prosodic cues differentiating noun and verb tokens of the same words. Unfortunately, the corpora examined in Chapter 2 did not contain sufficient tokens of enough lexical types to allow for an analysis of whether there are cues to category in words that are ambiguous between verb and adjective. If, however, the prosodic cues to category among words that 78 are noun/verb ambiguous are primarily the result of sentential position, there should not be particularly strong cues to category among words that are ambicategorical between verb and adjective. Verbs and adjectives tend to be sentence and phrase medial, although they can appear in phrase or utterance final position. Likewise, if the prosodic differentiation of noun and verb tokens is the result of some higher-level linguistic distinction, such as predicate and argument, one would not expect the same kinds of prosodic cues to be available for the verb/adjective distinction. This study has two main goals. The first is to assess, indirectly, whether prosodic cues to category exist for words that are ambiguous between verb and adjective. The second is to determine whether the cues that infants used in Study 3-1 are due to the sentential and phrasal position of nouns and verbs, as verbs and adjectives appear in more similar sentential positions then do nouns and verbs. Using the same methods as in Study 3-1, we now evaluate whether infants can distinguish verb and adjective uses of the same words. Method Participants. A total of 36 13-month-old infants from the Providence, Rhode Island, area participated (21 male and 15 female). The mean age was 395 days (range 380-414 days). The previous study showed that infants at this age can distinguish between noun and verb uses of the same word using only prosodic information. This study asks whether they are able to do so for verb and adjective uses of the same word. An additional 9 infants participated in the study but were excluded due to excessive fussiness or squirminess (4), failure to habituate (3) or a looking time on either test trial that was more than two standard deviations from the group mean (2). 79 Procedure. Infants’ ability to distinguish verb and adjective uses of the same words was tested via an infant-controlled habituation paradigm. The procedure was the same as in Study 3-1. Design. For each of 5 monosyllabic word types, 4 verb tokens and 4 adjective tokens were extracted from the audio recordings of the mother in the Lily corpus. The word types were caught, clean, closed, looking, and mean. Tokens were selected on the basis of having little extraneous noise and low co-articulation with surrounding words. The audio track for each token was extracted from the video using SoundConverter and edited using PRAAT. Intensity was normalized across tokens. Two sets of verb stimuli were created by randomly assigning two verb tokens of each word type to each set. Two sets of adjective stimuli were created the same way. All stimulus sets contained unique, isolated tokens of the same word types. Each infant was habituated to one stimulus set. One-quarter of the participants was habituated to each of the four stimulus sets. In each trial, a new ordering of tokens was presented, created by randomly sampling from the token set without replacement. An interstimulus interval of 500 ms was used. When the infant reached the habituation criterion, two test trials were presented. On the “same” test trial, the infant heard the tokens from the other stimulus set that were of the same grammatical category. On the “switch” test trial, the infant heard the tokens from one of the stimulus sets containing items from the non-habituated category. Importantly, all tokens in both test trials were novel, but in the “same” trial they were tokens of the habituated category and in the “switch” trial, they were tokens of the other category. Order of test trials was counterbalanced across subjects. Results 80 The results of this study are presented in Figure 3-2. Infants listened to “switch” test trials for a mean of 5.2 s (SD=2.4) and to “same” test trials for a mean of 4.8 s (SD=2.5). This difference is not significant (t(35)=0.757, p=.23, one-tailed, d=.16). A 2 (test trial) by 2 (habituated category) by 2 (sex) ANOVA revealed no three-way interaction (F(1, 1, 34)=1.94, p=.173), no interaction of trial by habituated category (F(1, 34)=.542, p=.467) and no interaction of trial by sex (F(1, 34)=.207, p=.653). These results show that infants cannot discriminate between adjective and verb tokens of the same words using only the acoustic cues available in the words themselves. When the stimuli themselves are examined, it appears that the infants’ inability to distinguish verb and adjective tokens of the same words may be attributed to the lack of prosodic cues differentiating verb and adjective tokens. Using PRAAT, mean pitch, minimum pitch and maximum pitch were measured in Hertz and token duration as well as vowel length were measured in milliseconds. Pitch change was calculated in semitones, using the minimum and maximum pitch measured for each token. The overall duration of verb tokens was not significantly different from that of adjective tokens (t(19)=1.5, p=.14); likewise, there was no difference in vowel duration between the two categories (t(19)=1.59, p=.12). None of the pitch cues were significantly different between the two categories (Mean Pitch t(19)=.02, p=.98; Maximum Pitch t(19)=.08, p=.93; Minimum Pitch t(19)=.35, p=.65; Pitch Change t(19)=.35, p=.72). The lack of cues is may be due to the characteristic syntactic positions in which adjectives and verb occur. Adjectives and verbs both occur utterance and phrase medially far more often than they occur utterance or phrase finally. The similarity of distribution of adjectives and verbs may underlie the lack of prosodic cues to this distinction. Alternatively, 81 speakers may be less concerned with emphasizing verbs and adjectives, as opposed to nouns. This may also reduce the prosodic differences between verb and adjective tokens. Discussion This study found that infants are not able to distinguish verb and adjective uses of the same words on the basis of prosody alone. Prosodic cues to category were not reliable in the tokens used in this study, although some weak durational cues were present. These results show that infants will not distinguish cross-category uses of the same words unless the cues to category are highly reliable, as in Study 3-1. Furthermore, the random selection of tokens for use as stimuli suggests that, although there are not sufficient types and tokens to fully evaluate the presence of prosodic cues to category over the longitudinal corpora, such cues are not very reliable in the case of verb/adjective ambiguity. These results may be interpreted as showing that the prosodic cues that supported infants’ performance in Study 3-1 are due to the sentential positions characteristic of nouns and verbs. Verbs and adjectives typically appear sentence and phrase medially, although both categories may appear utterance finally. Conversely, nouns are much more likely than verbs to appear in utterance final position and may, therefore, be more prone to such prosodic phenomena as phrase final lengthening. Alternatively, these prosodic distinctions may arise from a linguistic property such as argument or predicate status. Verbs and adjectives may both act as predicates of nouns, while nouns act as arguments of verbs. This study cannot distinguish between these two explanations, as the predicate/argument distinction tends to align with sentential position in English. However, studies of languages with more flexible word order may illuminate this point. 82 Infants cannot differentiate verb and adjective uses of the same words using prosody. Based on this finding, one might predict that language learners should have more difficulty with these words or that they might be more prone to use these words in only a single category. Alternatively, learners might make more errors of overgeneralization along the verb/adjective distinction because they do not recognize that these words are not part of the same category. These hypotheses will be addressed in the analysis of children’s cross-category word use in Chapter 4. The final source of ambicategoricality that this dissertation is concerned with is the noun/adjective ambiguity. Because infants can differentiate noun and verb uses of the same words, but show no differentiation of verb and adjective uses, one might predict that they will be able to distinguish noun and adjective uses, as adjective uses are not prosodically distinct from verb uses. However, very different words participate in each of these kinds of category ambiguity, so such transitive logic may not apply. Study 3-3 will evaluate whether infants are able to differentiate noun and adjective uses of the same words on the basis of prosody. 3.3 Infants’ perception of prosodic cues to the noun/adjective ambiguity Young language learners are sensitive to the prosodic cues that distinguish noun and verb uses of the same words, but they cannot differentiate verb and adjective uses of ambicategorical words. This raises the issue of whether the perceptual cues that are available to distinguish noun and verb uses of the same words are really about lexical categories such as noun, verb and adjective, or about sentential position, which in English results from argument and predicate status. The corpora examined in Chapter 2 did not contain sufficient tokens of enough lexical types to allow for an analysis of whether there 83 are cues to category in words that are ambiguous between noun and adjective. If, however, the prosodic cues to category among words that are noun/verb ambiguous and the lack of such cues in words that are verb/adjective ambiguous are primarily the result of sentential position, there should be perceptible cues to category among words that are ambicategorical between noun and adjective. Adjectives appear in sentential contexts that are more similar to verbs than to nouns; nouns, unlike verbs and adjectives, often appear at the ends of phrases and utterances. If prosodic cues to the adjective/noun distinction do exist for ambiguous words, the results from study 3-1 suggest that infants should be sensitive to them. However, the prosodic cues that distinguish noun and adjective uses of words may be different, either qualitatively or quantitatively, from those that distinguish noun and verb uses of words. These differences may reduce infants’ sensitivity to those prosodic cues. Alternatively, there may be no reliable prosodic cues to this distinction. This study asks whether infants can differentiate noun and adjective uses of the same words on the basis of prosody alone as well as whether such cues are available in tokens of noun/adjective ambiguous words drawn from natural child-directed speech. Method Participants. A total of 28 13-month-old infants from the Providence, Rhode Island, area participated (20 male and 8 female). The mean age was 396 days (range 381-412 days). The previous studies showed that infants at this age can distinguish between noun and verb uses of the same word using only prosodic information, but that this ability does not extend to verb and adjective uses of the same word. This study asks whether they can distinguish noun and adjective uses of words based solely on prosodic cues. An 84 additional 8 infants participated in the study but were excluded due to excessive fussiness or squirminess (3), failure to habituate (1) or a looking time on either test trial that was more than two standard deviations from the group mean (4). Procedure. Infants’ ability to distinguish noun and adjective uses of the same words was tested via an infant-controlled habituation paradigm. The procedure was the same as in Study 3-1. Design. For each of 7 monosyllabic word types, 4 noun tokens and 4 adjective tokens were extracted from the audio recordings of the mother in the Lily corpus. The word types were beach, glass, green, school, snack, snow and stone. Tokens were selected on the basis of having little extraneous noise and low co-articulation with surrounding words. The audio track for each token was extracted from the video using SoundConverter and edited using PRAAT. Intensity was normalized across tokens. Two sets of noun stimuli were created by randomly assigning two noun tokens of each word type to each set. Two sets of adjective stimuli were created the same way. All stimulus sets contained unique, isolated tokens of the same word types. Each infant was habituated to one stimulus set. One-quarter of the participants was habituated to each of the four stimulus sets. In each trial, a new ordering of tokens was presented, created by randomly sampling from the token set without replacement. An interstimulus interval of 500 ms was used. When the infant reached the habituation criterion, two test trials were presented. On the “same” test trial, the infant heard the tokens from the other stimulus set that were of the same grammatical category. On the “switch” test trial, the infant heard the tokens from one of the stimulus sets containing items from the non-habituated category. 85 Importantly, all tokens in both test trials were novel, but in the “same” trial they were tokens of the habituated category and in the “switch” trial, they were tokens of the other category. Order of test trials was counterbalanced across subjects. Results The results of this study are presented in Figure 3-3. Infants listened to “switch” test trials for a mean of 4.0 s (SD=1.4) and to “same” test trials for a mean of 3.8 s (SD=1.2). This difference is not significant (t(27)=0.667, p=.25, one-tailed, d=.15). A 2 (test trial) by 2 (habituated category) by 2 (sex) ANOVA revealed no three-way interaction (F(1, 1, 26)=0.34, p=.57), no interaction of trial by habituated category (F(1, 26)=.39, p=.54) and no interaction of trial by sex (F(1, 26)=2.14, p=.16). These results show that infants cannot discriminate between adjective and noun tokens of the same words using only the acoustic cues available in the words themselves. When the stimuli themselves are examined, the fact that infants are unable to distinguish these two categories is a bit surprising. Noun and adjective tokens were reliably different from one another in terms of both duration and pitch. Using PRAAT, mean pitch, minimum pitch and maximum pitch were measured in Hertz and token duration as well as vowel length were measured in milliseconds. Pitch change was calculated in semitones, using the minimum and maximum pitch measured for each token. Noun tokens were reliably longer (mean duration=531.85 ms; SD=202.59) than adjective tokens (mean duration=368.34 ms; SD=120.8; t(27)=3.67, p<.001). Maximum pitch of noun tokens (mean=287.95; SD=116.88) was significantly higher than that of adjective tokens (mean=227.8; SD=92.04; t(27)=2.12, p=.04). Mean pitch (mean=208.81 Hz; SD=89.4) of noun tokens was higher than mean pitch (mean=172.55 Hz; SD=60.82) 86 of adjective tokens, although this difference is marginal (t(27)=1.76, p=.08). Noun tokens also had significantly greater pitch change (13.5 ST; SD=6.55) than did adjective tokens (8.90 ST; SD=8.95; t(27)=2.15, p=.04). Minimum pitch was not significantly different between the noun and adjective tokens used in this study (t(27)=.012, p=.99). These cues may arise from the characteristic syntactic positions in which nouns and adjectives occur. Nouns are more likely than adjectives to be phrase and clause final and therefore are more likely to be subject to phrase final lengthening and falling pitch. However, in this study, the tokens from both categories came from both sentence-medial and sentence-final positions and previous work has demonstrated that, in child-directed speech, prosodic cues to grammatical category may be available regardless of sentential position (Shi & Moisan, 2008; Chapter 2 of this dissertation). Discussion The finding that infants cannot distinguish between noun and adjective uses of the same words on the basis of prosody alone is somewhat surprising. Abundant prosodic cues to this distinction were present in the stimuli and stronger than those available in the stimuli for study 3-1, in which infants were able to make the distinction between noun and verb uses of the same words. One especially curious finding is that looking time on both test trials decreased relative to the habituated looking time, indicating that infants in this study found neither same nor switch test trials more interesting than those items to which they had already been habituated. Furthermore, the average initial looking times in this study were shorter than those in studies 31- and 3-2. These discrepancies suggest that infants may have found these stimuli less compelling to listen to than those from previous studies. 87 Alternatively, this study had a greater imbalance in terms of sex than either of the two previous studies, with only 25% female participants. Although the ANOVA found no effect of sex on the results of this study, perhaps male infants are less sensitive to these prosodic differences than female infants are or perhaps they are less attentive overall in studies of this type. However, the lack of sex effects in any of the studies presented here suggests that this is not the case. Another factor that may have played a role in infants’ failure to detect these cues is total time to habituation, that is, the total amount of exposure to the habituation stimuli. Perhaps infants need a certain amount of exposure to the habituation stimuli to achieve the habituation/dishabituation effect seen in study 3-1. A comparison of infants’ total exposure time during habituation finds no significant difference in exposure times for infants in study 3-1 and those in this study (t(62)=.996, p=.32, two-tailed), suggesting that the amount of exposure to the stimuli during habituation does not account for the difference in looking times at test for these two studies. Because study-internal factors do not seem to account for the discrepancies between the results in study 3-1 and the results of this study, perhaps external factors bear on these results. In particular, children’s experience with the word types used in this study and those used in study 3-1 may affect their performance. Based on the MacArthur Communicative Development Inventory (Dale & Fenson, 1996), the word types used in this study are acquired an average of 3 months later than the word types used in study 3- 1. That is, infants in this study were less likely to be familiar with the word types they heard than were infants in study 3-1. This difference may account for the differences in performance between the two studies. Another possibility is that the combination of 88 “true” adjectives, such as green, and noun modifiers, such as beach, influenced children’s performance. The nature of the habituation paradigm makes item analyses impossible, but this possibility also suggests that infants’ prior experience with the word types used in this study influenced their performance. 3.4 General Discussion The three studies in this chapter indicate that infants are able to distinguish between uses of a word in one lexical category and uses in another category on the basis of prosodic information, only when that information is available and statistically reliable. These studies find a difference between infants’ capacity to differentiate noun and verb uses of words and their ability to differentiate verb and adjective uses or adjective and noun uses of words. This difference may arise for a variety of reasons, including the availability of cues to the distinction. Still, these findings have important ramifications for theories of how learners might incorporate ambicategorical words into their lexicon and grammar. If children use their ability to distinguish noun and verb tokens of the same word types in real-world learning contexts, they may be able to create two prosodically distinct representations of these words, one that is used as a noun and one that is used as a verb. This would allow learners to avoid the problem of ambicategoricality, at least until their grammatical systems are a little more robust. Rather than having a single word that is ambicategorical, they would have two distinct forms, one that appears in each category. In this way, learners might avoid conflating noun and verb contexts and therefore not make the kinds of errors that a strict interpretation of the syntactic bootstrapping hypothesis would predict (e.g., Pinker, 1987). The ability to make this distinction also 89 allows learners to avoid another potential pitfall of ambicategoricality: the problem of homonymy. Young language learners prefer to avoid using the same form for multiple functions (Slobin, 1973) and also tend to avoid homonymy (Markman & Wachtel, 1988; Clark, 1988; Golinkoff, Mervis & Hirsh-Pasek, 1994). One finding that does not follow from these well-established trends in child language is that children will readily accept homonyms if they believe that the two forms are members of different grammatical categories (Casenhiser, 2005). Perhaps the prosodic differences inherent to noun/verb homophones allow children to violate these otherwise strongly held principles. A more conclusive test of this hypothesis will come with the examination of children’s own use of ambicategorical words in Chapter 4. One factor that must be considered when evaluating infants’ performance in these studies is that all tokens in these experiments were presented in isolation. Participants did not have access to syntactic or semantic information that might have helped them distinguish between categories. However, they also did not have any prosodic information from the rest of the sentence. This eliminates such cues as relative amplitude (which might indicate stress) and the speaking rate over the whole utterance (which might alter perception of duration). By eliminating these additional sources of prosodic information, these studies may underestimate infants’ ability to use prosodic cues to distinguish cross-category uses of the same words. One prediction that might be made is that these cues may be more perceptible in a sentential context. To test this, one might splice a noun token, for example, into a verb context and a novel verb token into that same context. If infants respond differently to these two manipulations, such results would suggest that their sensitivity to prosodic cues that differentiate noun and verb 90 tokens of the same words scale up to the more natural situation of hearing words in sentences. However, this manipulation is outside the scope of the present research. The inability of infants to distinguish noun and adjective or adjective and verb uses of words suggests that they may not be as able to separate cross-category uses of these words in natural language learning. One possible ramification of this is that children may be more prone to make the kinds of errors predicted by Pinker (1987) with verbs and adjectives or adjectives and nouns. Indeed, Pinker (1989) reports some instances of children using verbs as adjectives in non-adult ways. Likewise, an examination of children’s own use of words that are verb/adjective or adjective/noun ambicategorical may find that children are more prone to use these words in a single category than those words that are noun/verb ambiguous. Because they cannot separate uses in one category from uses in another, they may be unable to support two distinct representations of such words, one for use in one category and one for use in the other. Instead, they may adhere to such principles as one form, one function (Slobin, 1973) and use these words only in a single category. The evidence presented in this chapter indicates that young language learners are able to distinguish noun tokens of a word from verb tokens of that same word on the basis of prosodic information alone. However, this ability does not extend to words that are noun/adjective or verb/adjective ambiguous. Infants may not be able to make this distinction in the case of verb/adjective ambiguous words due to an absence of reliable prosodic cues to that distinction. However, no reason for children’s inability to distinguish noun and adjective tokens on the basis of prosody has been firmly established. How children’s ability to make the noun/verb distinction and inability to make the 91 verb/adjective and adjective/noun distinctions affects their productions of ambicategorical words will be examined in the next chapter. 92 Figure 3-1 9000 8000 7000 Looking Time (ms) 6000 Switch Same 5000 4000 3000 Initial Habituated Test Results of the noun/verb habituation study indicate that infants prefer word usages from a new category over those from the habituated one (t(35)=1.89, p=.033, one-tailed). 93 Figure 3-2 9000 8000 Looking Time (ms) 7000 Switch 6000 Same 5000 4000 3000 Initial Habituated Test Results of the verb/adjective habituation study indicate that infants do not prefer word usages from a new category over those from the habituated one (t(35)=0.757, p=.23, one- tailed). This suggests that they are unable to distinguish between the habituated and novel categories. 94 Figure 3-3 8000 7000 Looking Time (ms) 6000 Switch Same 5000 4000 3000 Initial Habituated Test Results of the noun/adjective habituation study indicate that infants do not prefer word usages from a new category over those from the habituated one (t(27)=0.667, p=.25, one- tailed). This suggests that they are unable to distinguish between the habituated and novel categories. CHAPTER 4 Thus far, this dissertation has examined the nature of ambicategoricality in speech to children (Chapter 2) and how infants’ perceptual abilities might alleviate the problem of ambicategoricality in speech to children (Chapter 3). Examining children’s early productions and usage of ambicategorical words will make it possible to integrate those findings. Taking into account the nature of the input, the perceptual capabilities of the learner and the learner’s own production will provide as complete a picture as possible of the nature of ambicategoricality in children’s early linguistic representations. Ambicategoricality poses an interesting problem to children learning to speak. Learners are exposed to significant amounts of cross-category usage, even very early in language development. However, if learners are really trying to sort words into grammatical categories such as noun, verb and adjective, such cross-category usage should constitute irregular evidence, similar to that received by children whose parents are not native speakers of a language. Previous work with children who receive such variable input indicates that they regularize these irregularities, creating rules or schema where there are none (Goldin-Meadow & Mylander, 1984; Singleton & Newport, 2004; Hudson Kam & Newport, 2005). Alternatively, learners may simply avoid using such words across category because of their well-documented tendency to restrict a single form to a single purpose and to avoid assigning more than one referent to the same word (e.g., Slobin, 1973; Markman & Wachtel, 1988; Clark, 1988; Golinkoff, Mervis & Hirsh- 95 96 Pasek, 1994, but see Casenhiser, 2005). Most previous work suggests that learners should not use words in multiple grammatical categories. Nevertheless, some work has shown that young children will use words as both noun and verb, even in their earliest productions. Anecdotal reports of children spontaneously creating cross-category uses of words have appeared in the literature for a long time (e.g., Clark, 1983; Kuczaj, 1978). However, such reports do not speak to how children use words that are ambicategorical for adults, nor do they address the relationship between parental cross-category usage of words and children’s use of those same words. Early work on this issue was somewhat limited in scope. Macnamara (1982) found that Sarah, one of Brown’s (1973) subjects, used only a very few words flexibly as both noun and verb. Nelson (1995) examined six word types speech by 12 children over five recordings and found that, in general, children used very few of them as both noun and verb. In somewhat more comprehensive studies, Barner and colleagues (Barner, 2001; Oshima-Tanake, Barner, Elsabbagh & Guerriero, 2001) found that learners will use denominal verbs and deverbal nouns, although most words are restricted to a single category. Children are also less likely to use words as both noun and verb than are their caregivers. Barner and colleagues’ analyses focused primarily on the semantic nature of words that are used in more than one category and how such phenomena as light verb constructions interact with cross-category usage. However, they examined a relatively restricted age range (Brown’s (1973) Stage 1) and only considered noun/verb pairs for which one usage is derived from the other with null-morphology. These analyses should be expanded to include more longitudinal corpora and also to consider words that are verb/adjective and noun/adjective ambiguous. 97 A second question, raised if children do use words in more than one category, is why they do so. Do children use words in more than one category because they have learned from their mothers that some words behave this way? Or do they spontaneously create cross-category usages to fill lexical gaps? Previous work on children’s use of words in more than one category often focuses on the later possibility. Reports of children’s spontaneous cross-category usages abound (e.g., Clark, 1983; Kuczaj, 1978) and elicited production studies have found that children will use a noun as a verb to describe a novel actions (Bushnell & Maratsos, 1984). Again, these studies do not look at how children use words that are ambicategorical to adults, nor do they ask what the relationship is between a child’s use of a word and his or her caregivers’ usage. This chapter will address the issue of how and whether children use words in more than one lexical category. First, it will examine children’s use of words that are ambicategorical for adults, using the same methods described in Chapter 2 for analyzing maternal utterances. Then it will ask whether children’s use of ambicategorical words is well-predicted by their mothers’ usage. These two studies will allow for a closer examination of children’s understanding of ambicategoricality. 4.1 Ambicategoricality in children’s speech To assess whether children do, in fact, use the same word in more than one lexical category across development, this study will look at children’s cross-category use not just of words that are noun/verb ambiguous, but also those that are verb/adjective ambiguous and those that are adjective/noun ambiguous. All of these analyses will be conducted on the same corpora used to evaluate maternal cross-category use in Chapter 2, except now the focus will be on the children’s productions. Using the same corpora will facilitate 98 comparison between children’s rate of cross-category word use and that of their caregivers. Although anecdotal reports (e.g., Clark, 1983) and some systematic analyses indicate that children will use words in more than one category (Barner, 2001; Oshima- Tanake, et al., 2001), this study represents the first longitudinal, systematic evaluation of this phenomenon that includes not only derived noun/verb pairs, but those cross-category pairs that are unrelated homophones. Furthermore, this study will move beyond the noun/verb ambiguity to include two other potential sources of cross-category use, verb/adjective and noun/adjective ambiguity. The results of these corpus analyses will provide evidence regarding how children incorporate ambiguity into their own lexicons. 4.1.1 Study 1a: The noun/verb ambiguity Previous research indicates that children will use some words as both noun and verb. Early reports of this phenomenon focused primarily on children’s spontaneous creation of nouns from verbs and verbs from nouns, typically to fill a lexical gap (e.g., Clark, 1982; Bushnell & Maratsos, 1984). That research does not, however, provide information about how children use words that are ambicategorical for adults. More recent work systematically looks at children’s early use of denominal verbs and deverbal nouns (Barner, 2001; Oshima-Tanake, et al., 2001), but does not take into account how children use noun/verb homophones that do not have a systematic semantic link and is also limited to very early productions. Those studies also used pragmatic and morphological cues, as well as syntactic information, to categorize uses as noun or verb. Use of such information leaves open the possibility of the researchers’ subjective interpretations influencing the outcome. Because the question at hand is the nature of children’s early syntactic representations, more appropriate criteria for categorizing individual uses are the 99 syntactic contexts in which a token appears. This study will explore children’s use of words that are ambicategorical between noun and verb, regardless of the semantic relationship between the two forms, from the onset of first words and continuing for two to three years. The examination of noun/verb ambiguity in the maternal speech to six children is reported in Chapter 2. That study found that words that are noun/verb ambiguous are used in both categories in speech to children, although not to the degree that they could be. This study will ask whether children also use these words in more than one category, like their mothers do, or whether they restrict words to single category early in development, as might be predicted by previous work showing that children tend to regularize irregular input (e.g., Goldin-Meadow & Mylander, 1984; Singleton & Newport, 2004; Hudson Kam & Newport, 2005). Method Corpora. Six longitudinal corpora of child speech were examined. Five of these corpora came from the Demuth Providence Corpus (Demuth, Culbertson & Alter, 2006). The sixth was the Nina corpus (Suppes, 1974) from the CHILDES database (MacWhinney, 2000), which was included to provide evidence that these results generalize beyond the dialect of English spoken in Providence, Rhode Island. The ages and number of recordings for each corpus are presented in Table 4-1. Children in the Providence corpus were recorded every other week for 2-3 years, beginning as soon as they uttered their first words. The Lily corpus is an exception, as a sudden, rapid increase in her language production created a need for weekly recordings approximately a year after recording commenced. For completeness, all of the Lily files are included in this 100 analysis. Nina was recorded approximately weekly. In total, these corpora comprise approximately 330 hours of mother/child interaction. This age range (approximately 1-3 years) is of particular interest because it provides a comprehensive view of the child’s language experience from the time s/he utters his/her very first words to the time that s/he is speaking in complete, well-formed sentences. Procedure. For each corpus, the number of child uses of each word type was counted, with morphologically complex words treated as individual types (e.g., run, runs and running were each counted separately). Because each corpus contained between 1,800 and 3,700 word types, examining every word type for cross-category use was untenable. Therefore, three frequency ranges were chosen as “core samples” for analysis. High frequency words were those used more than 150 times by the child, middle frequency words were those used 40-60 times and low frequency words were those used 3-10 times. Within each frequency range, every word type was placed in one of two categories: “noun or verb” and “neither noun nor verb”. Then, all those words that were nouns or verbs were further categorized as potentially ambicategorical or not. Whether or not a word was potentially ambicategorical was based on an analysis of the Brown Corpus (Francis & Kucera, 1983). Words that were used at least once as a noun and at least once as a verb in the Brown Corpus were considered potentially ambiguous 2 . For every word type that was potentially ambicategorical, each utterance including one or more tokens of that type was extracted from the corpus, and each token was classified by hand as a noun, a verb or “other.” Single word utterances, proper nouns and 2 The Brown Corpus consists of written texts, which limits its accuracy in reflecting typical adult-directed speech, and there are many words that are not used ambicategorically in the Brown Corpus that have very natural cross-category uses in adult speech (e.g., comb). However, there exist no corpora of spoken adult language that are comparably large. 101 metalinguistic uses were classified as “other.” A token was considered a noun if it was modified by an adjective, appeared as the head of a noun phrase, was an argument of a verb or could be replaced with a pronoun. A token was counted as a verb if it was modified by an adverb, took noun phrase or prepositional phrase arguments or could be replaced with a pro-verb. The breakdown of number of types analyzed in each corpus is shown in Table 4-2. Classification was done by trained coders. To assess the consistency of the classifications, 5% of all word types were reclassified by a second coder. Reliability between coders was very high (Cohen’s K=.81). The total proportion of potentially ambicategorical words that were actually used across category was calculated for each mother as the number of words used at least once as both noun and verb divided by the total number of potentially ambiguous words analyzed. To obtain a better idea of how ambicategoricality relates to frequency of use, for each frequency range for each child, the same kind of calculation was done on only those word types within a given frequency range. These numbers provide an estimate of how many of the potentially ambiguous word types that each child used were actually used across category boundaries at least once. Results Overall, children use a smaller proportion of words in both categories than do their mothers. For children, the proportion of potentially ambiguous words that were actually used across categories ranged from .12-.17. This overall rate of cross-category usage is lower than that of their mothers, whose proportional cross-category use was .19-.32. Figure 4-1 shows the results broken over frequency ranges and also the total over all three frequency ranges. Because the particular word types within each frequency range 102 are different for each child, these data cannot be directly compared. However, there is no consistent relationship between a word’s frequency and the likelihood that it would be used in both categories. The very high or very low proportions of cross-category use in some frequency ranges by some children may merely be a product of the relatively small number of potentially ambiguous words used in that frequency range. For example, Ethan used 75% of high frequency words across category, while Violet used no high frequency words in this way. However, Ethan had only 4 words in this frequency range and Violet had only 5. When considered in terms of absolute number, the difference between Ethan’s cross-category use and Violet’s usage appears less significant. Only in the case of low frequency words and when all words are considered are there sufficient word types to reduce this problem. Likewise, the very small number of word types in the high and middle frequency ranges makes examining the distribution of uses in each category more difficult (but see Chapter 2 for this analysis of maternal cross-category usage). These findings indicate that children do use some words as both noun and verb. However, they do not do so as often as their mothers do, nor do they use the majority of words in this way. This analysis does not allow the children’s cross-category use to be directly compared to that of the mothers because the word types in a given frequency range for a child are not necessarily the same word types in that frequency range for his/her mother. These data can only be used for a rough comparison of overall frequency of cross-category use by both child and mother. Study 2 will make a more direct comparison of the use of individual words across categories by mother and child. 103 These findings further indicate that children are willing to use the same phonotactic string as both noun and verb, a somewhat surprising result, given children’s widely reported tendency to regularize irregular input (e.g., Goldin-Meadow & Mylander, 1984; Singleton & Newport, 2004; Hudson Kam & Newport, 2005) and the general linguistic principle of one-form/one-function (Slobin, 1973). That is, young language learners tend to avoid irregularity and homonymy in their language. Why might children be willing to tolerate this kind of irregularity or homonymy? Perhaps they do not recognize it as homonymy. Chapter 3 demonstrated that infants are able to distinguish noun and verb uses of the same word types on the basis of prosodic information alone. In this way, a noun use of hug and a verb use of hug may actually be distinct word forms, not homonyms, for a language learner. Furthermore, the ability to make this distinction would allow learners to avoid conflating lexical categories on the basis on the conflicting information that ambicategoricality introduces into the category learning problem. The noun/verb ambiguity is the most studied source of ambicategoricality, but it is not the only one. English contains many words that can also be used as both adjective and verb. Unlike noun/verb ambiguous words, infants are not able to discriminate between verb and adjective uses of the same word types. Perhaps this, combined with their tendency to avoid using a single word for more than one grammatical purpose, limits children’s ability to use a single word form as both verb and adjective. 4.1.2 Study 1b: The verb/adjective ambiguity Although the noun/verb ambiguity is the most obvious and most studied potential source of category ambiguity, an examination of maternal speech (see Chapter 2) found that children also hear words used as both verb and adjective. Words that can be used in 104 this way are less frequent in speech to children than are words that are potentially ambicategorical between noun and verb and are less likely than noun/verb ambiguous words to actually be used in both categories. However, children are more likely to hear verb/adjective ambiguous words used equally often in both categories than noun/verb ambiguous words. Furthermore, infants are not able to discriminate verb uses of a word from adjective uses of that same word type on the basis of prosodic features alone (see Chapter 3). This would suggest that children might have greater trouble distinguishing between verb and adjective uses of the same word in day-to-day experience. Such difficulty may influence children’s ability to use these words in more than one category, given their well-known tendency to avoid using a single form for more than one grammatical function (Slobin, 1973). Although they show this ability for noun/verb ambiguous words, that ability may be due more to the ability to prosodically distinguish noun and verb uses of ambiguous words, an ability that does not extend to verb/adjective ambiguous words. If children are able to use the same word types as both verb and adjective, in spite of their inability to distinguish between cross-category uses, they may be using a cue other than prosody, such as referential context, to form two representations of the same word form: one that behaves as a verb and the other that behaves like an adjective. Method Corpora. The six corpora analyzed in study 1a were also used for this study. In this analysis, only the child speech from each corpus was considered. Procedure. Drawing on the child frequency counts calculated for Study 1a, words from three frequency ranges were analyzed. High frequency words were those used more 105 than 150 times by the child, middle frequency words were those used 40-60 times and low frequency words were those used 3-10 times. Within each frequency range, every word type was placed in one of two categories: “adjective or verb” and “neither adjective nor verb”. Then, all those words that were adjectives or verbs were further categorized as potentially ambicategorical or not. Whether or not a word was potentially ambicategorical was based on an analysis of the Brown Corpus (Francis & Kucera, 1983). Words that were used at least once as an adjective and at least once as a verb in the Brown Corpus were considered potentially ambiguous. For every word type that was potentially ambicategorical, each utterance including one or more tokens of that type was extracted from the corpus, and each token was classified by hand as an adjective, a verb or “other.” Single word utterances and metalinguistic uses were classified as “other.” A token was considered an adjective if it modified a noun or stood as the head of a predicate adjective phrase. No distinction was made among the various subclasses of adjectives, as no such distinction was made in Study 1a regarding subclasses of verbs. A token was counted as a verb if it was modified by an adverb, took noun phrase or prepositional phrase arguments or could be replaced with a pro-verb. The breakdown of number of types analyzed in each corpus is shown in Table 4-3. Classification was done by trained coders. The total proportion of potentially ambicategorical words that were actually used across category was calculated for each child as the number of words used at least once as both adjective and verb divided by the total number of potentially ambiguous words analyzed. To obtain a better idea of how ambicategoricality relates to frequency of use, for each frequency range for each child, the same kind of calculation was done on only 106 those word types within a given frequency range. These numbers provide an estimate of how many of the potentially ambiguous word types that each child used were, in fact, used across category boundaries at least once. Results Children’s proportional use of verb/adjective ambiguous words in both categories was ranged between .10-.23. Like the noun/verb ambiguity, children’s overall rate of cross- category use of these words was somewhat lower than that of their mothers, whose proportion of cross-category use .17-.26. Figure 4-2 shows the results broken over frequency ranges. Because the particular word types within each frequency range are different for each child, these data cannot be directly compared. However, the relationship between a word’s frequency and the likelihood that it would be used in both categories was not consistent across children. Like the noun/verb data, these data may be somewhat skewed by the very small number of ambiguous word types in some frequency ranges for some children. Again, these data cannot be directly compared to the maternal data in Chapter 2, as the types word in a given frequency range for a child may not be the same word types in that frequency range for the mother. However, the overall proportion of use for each child is somewhat lower than that for his/her mother. A more direct comparison of cross- category use of individual word types will be made in Study 2. This study has found that, like words that are ambicategorical between noun and verb, children do demonstrate cross-category use of some words that are potentially ambiguous between verb and adjective. Like their mothers, however, they only use a minority of such words in both categories. Again, children do not appear to be restricting these 107 words to a single function, but rather allowing them to play two distinct grammatical roles. The next substudy will examine a third possible source of lexical category ambiguity, the noun/adjective ambiguity. 4.1.3 Study 1c: The noun/adjective ambiguity Thus far, these studies have established that, like their mothers, children do use some word types as both noun and verb and others as both verb and adjective. Still, the majority of such words are used only in a single category by both mother and child. In speech to children, the noun/adjective ambiguity behaves very differently than the noun/verb and verb/adjective ambiguities. Specifically, mothers use a far greater proportion of potentially noun/adjective ambiguous words in both categories than they do noun/verb or verb/adjective ambiguous words. This might increase the likelihood that children will also use noun/adjective ambiguous words in both categories. However, data from perceptual studies suggests that infants are not able to distinguish noun and adjective uses of the same words on the basis of prosody alone. Perhaps this inability will reduce children’s use of such words in both categories. Alternatively, another cue, such as referential context, may help children resolve this ambiguity and allow them to use noun/adjective ambiguous words in both categories. Method Corpora. The child speech from the six corpora analyzed in studies 1a and 1b was also analyzed for this study. Procedure. Drawing on the child frequency counts calculated for Study 1a, words from three frequency ranges were analyzed. High frequency words were those used more than 150 times by the child, middle frequency words were those used 40-60 times and 108 low frequency words were those used 3-10 times. Within each frequency range, every word type was placed in one of two categories: “noun or adjective” and “neither noun nor adjective”. Then, all those words that were adjectives or nouns were further categorized as potentially ambicategorical or not. Whether or not a word was potentially ambicategorical was based on an analysis of the Brown Corpus (Francis & Kucera, 1983). Words that were used at least once as an adjective and at least once as a noun in the Brown Corpus were considered potentially ambiguous. For every word type that was potentially ambicategorical, each utterance including one or more tokens of that type was extracted from the corpus, and each token was classified by hand as a noun, an adjective or “other.” Single word utterances, proper nouns and metalinguistic uses were classified as “other.” A token was considered an adjective if it modified a noun or stood as the head of a predicate adjective phrase. A token was considered a noun if it was modified by an adjective, appeared as the head of a noun phrase, was an argument of a verb or could be replaced with a pronoun. The breakdown of number of types analyzed in each corpus is shown in Table 4-4. Classification was done by trained coders. The total proportion of potentially ambicategorical words that were actually used across category was calculated for each child as the number of words used at least once as both noun and adjective divided by the total number of potentially ambiguous words analyzed. To obtain a better idea of how ambicategoricality relates to frequency of use, for each frequency range for each noun, the same kind of calculation was done on only those word types within a given frequency range. These numbers provide an estimate of 109 how many of the word types that are potentially ambiguous between adjective and noun were used across category boundaries at least once in the child’s speech. Results The proportion of cross-category use was much higher among those words that are ambiguous between noun and adjective than among the other two comparisons, ranging from .10-.52. Figure 2-3 shows the results broken over frequency ranges. Because the particular word types within each frequency range are different for each child, these data cannot be directly compared. However, four children (Alex, Ethan, Lily and Nina) showed a similar relationship between the frequency of a potentially ambicategorical word and the likelihood that it would be used as both noun and adjective. Specifically, words in the high and middle frequency ranges were more likely to be used across category than were words in the low frequency range. However, the low number of word types in some of these frequency ranges may account for the high proportion of cross- category use. Unlike the noun/verb and verb/adjective ambiguities, some children actually showed higher proportional cross-category use of noun/adjective ambiguous words than did their mothers. In particular, Alex and Nina used a larger proportion of word types in both categories than their mothers did. Low numbers of word types may be responsible for this skew in the data and only the relative frequency of cross-category use can be compared, as these frequency ranges may not contain the same words for both mother and child. A direct comparison of cross-category use of individual word types will be the subject of Study 2. 4.1.4 Discussion 110 The three substudies presented in this section indicate that children, like their mothers, will use some words in more than one lexical category, although the majority of potentially ambiguous words are restricted to a single category. Children’s overall use of words in more than one category is somewhat lower than that of their mothers for the noun/verb and verb/adjective ambiguities, but a bit higher for the noun/adjective ambiguity. Nevertheless, children do use words in more than one lexical category in their own speech. Although children’s usage of words as both noun and verb has been discussed in previous literature, these studies are the first to examine cross-category usage of three different types in speech by children. The noun/verb ambiguity is attested in speech to and by children, indicating that they are able to incorporate this kind of ambiguity into their linguistic system without conflating the two categories. Infants’ performance in the perceptual studies reported in Chapter 3 suggests that the difficulties that noun/verb ambiguity might present to learners could be ameliorated by the prosodic differences between noun and verb uses of the same word. However, infants are not able to distinguish between verb and adjective uses or adjective and noun uses of the same words on the basis of prosody alone. Nevertheless, they are able to use some words as both verb and adjective and as both adjective and noun. This suggests a role for a cue other than prosody, perhaps referential context, in distinguishing between cross-category uses of the same word forms. Beyond the category learning difficulties that ambicategoricality poses, there are reasons to be surprised that learners will use words across category boundaries at all. Anecdotal reports of cross-category usage tend to be of spontaneous, creative cross- 111 category usage, often to fill a lexical gap (Clark, 1983; Bushnell & Maratsos, 1984). Children’s tendency to regularize irregular input is well-reported in the language development literature (Goldin-Meadow & Mylander, 1984; Singleton & Newport, 2004; Hudson Kam & Newport, 2005). Furthermore, learners tend to restrict a single lexical or grammatical form to a single function (Slobin, 1973) and avoid assigning the same label to more than one referent (Markman & Wachtel, 1988; Clark, 1988; Golinkoff, Mervis & Hirsh-Pasek, 1994; but see Casenhiser, 2005, for a counter-example). Ambicategoricality in children’s speech seems to violate both of these widely attested principles of language development. If, however, children have some means of segregating cross-category uses of particular word types, they might be able to create distinct, if related, word forms, thereby avoiding irregularity and/or homonymy in their language. One cue that might be useful for noun/verb ambiguous words is prosody. However, as infants do not appear to be sensitive to prosodic differences between verb and adjective or adjective and noun uses of the same word forms, other cues may also be at work. The results of all three studies must be interpreted carefully for several reasons. First, very small numbers of word types in some of these categories may skew the proportions. Therefore, it may be informative to compare the absolute number of words used in across categories (presented in Tables 4-2 through 4-4) to the proportions presented in the figures for perspective on why some proportions are so high and others are so low. The small number of word types in some frequency ranges for some children also limits the ability to examine the frequency of use in one category or the other, as was done with the noun/verb ambiguity in maternal speech (see Chapter 2). This limitation means that these data lack perspective on whether use in more than one category is the exception or the 112 rule for individual word types. Another issue that must be considered is that these studies used only syntactic information to categorize a particular use of a word as noun, verb or adjective, as opposed to work by Barner (2001) and Oshima-Tanake and colleagues (2001), who used pragmatic context and morphology, in addition to syntactic information, to categorize uses of deverbal nouns. Because single word utterances contain no syntactic information, all such utterances were excluded from the analysis. Children’s earliest utterances, however, tend to consist of a single word. This means that, although the five corpora from the Providence Corpus begin with the child’s first words, the evaluation of their cross-category use could not begin until they were producing multi-word utterances. The status of ambicategorical words in children’s very earliest language development cannot be addressed using these methods. A final reason that these results must be interpreted carefully is that different frequency ranges contain not only different words for each child, but that these words may not be the same types that are included in that frequency range for the child’s mother. Therefore, direct comparison of how many words the child uses across category and how many his/her mother uses cannot be made. The general quality of these results, that is, that some words, but not most, are used across category, can be compared, but a direct, quantitative comparison is not possible from this analysis. To address this issue, the next study will take up a more direct comparison, asking how a child’s proportional cross-category use of a word relates to his/her mother’s cross-category use of that same word. The results of that study will elucidate whether children are learning about ambicategoricality from their parents or spontaneously creating it on their own. 4.2 The relationship between maternal and child usage 113 Study 1 demonstrated that children do use words in more than one grammatical category, but it could not address how children’s cross-category use is related to that of the adults in their environment. Because the same word types were not necessarily examined for both mother and child, only relative frequency of cross-category use could be discussed. A more direct comparison of cross-category use of individual word types by mother and child will indicate whether children are learning about cross-category usage of specific words from their parents or whether children simply use words in more than one category somewhat arbitrarily. This study, which will be presented in three parts, asks whether children’s cross-category usage of words is related to that of their mothers. These comparisons have two possible outcomes: either children’s use of a particular word type across categories is closely related to their mothers’ use of those words or it is unrelated. If the latter is the case, one might argue that children “invent” cross-category usage. That is, children use words in more than one category, not because they have learned that some words behave this way, but rather to fill lexical gaps. Adults are more than capable of such innovation (Clark & Clark, 1979). Likewise, children have been shown to spontaneously use words across category in elicitation studies (Bushnell & Maratsos, 1984) and isolated examples from natural speech have also been reported (Clark, 1982; Kuczaj, 1978). Of course, a lack of a relationship between child and maternal usage of ambicategorical words may also be due to the noisy nature of naturalistic corpus data. If, however, children’s use of ambicategorical words is tightly related to that of their mothers, this would suggest that children learn about ambicategoricality from their 114 mothers. In particular, it is not clear how such behavior would surface if children cannot dissociate uses in different categories. Exactly what means of dissociating use in one category from that in another cannot be determined by a corpus study alone. Factors such as prosodic information (as discussed in Chapter 2), reference and context may all play a role. However, differences in the relationship between mother and child use among the different forms of ambicategoricality may help to elucidate the underlying cause. Each substudy will examine one possible source of ambicategoricality. In every case, the primary question will be the relationship between maternal proportion of use in each category and child proportion of use in each category. The first source of ambicategoricality to be considered will be the noun/verb ambiguity. 4.2.1 Study 2a: The noun/verb ambiguity Previous work on the noun/verb ambiguity in language development has examined maternal use of words as both noun and verb, as well as children’s use of words in both categories, occasionally discussing the relationship between maternal use of a particular word type and the child’s use of that same word type. Oshima-Tanake and colleagues (2001) described the number of uses of some denominal verb types in both the input and productions of three children and found that children typically only use denominal verbs if they have been attested in their input. In addition to examining only a twelve word types, that study did not assess how the frequency of use in one category or another influenced the child’s usage of that word. Barner (2001) provided data regarding children and their caregivers’ frequency of use of deverbal nouns and denominal verbs in each category, but did not directly compare rates of cross-category use for a large number of potentially ambicategorical words. The following study is the first large-scale, 115 longitudinal examination of how a child’s use of a given word type as both noun and verb is related to his/her parent’s use of that word. By directly comparing maternal use of individual word types to child use of individual word types, it is possible to determine whether child cross-category usage is something that has been learned from experience or a spontaneous invention on the part of the child (e.g., Clark, 1982). This study asks whether child use of ambicategorical words is well-predicted by maternal use of those words or whether child use is unrelated to maternal use. Method This study uses data collected in Study 1a, as well as that from the study of noun/verb ambiguity in maternal speech described in Chapter 2 (Study 1a in that chapter). Within a mother-child dyad, all tokens of each word type in the high and middle frequency ranges for the mother were extracted from the child’s speech and coded as noun, verb or other, as described above. Likewise, all tokens of each word type in the high and middle frequency ranges for the child were extracted from the mother’s speech and coded as described above. This allowed for calculation of the proportion of noun uses of each of these word types for each speaker. Proportion of noun use for a given word type was calculated as the number of noun uses of that type divided by the total number of noun and verb uses of that word type. If a word was only used as a noun, it would have a proportional noun use of 1. Words used only as verbs would have a proportional noun use of 0. For each word type, this calculation was performed on maternal tokens to obtain maternal proportion of noun use and on child tokens to obtain child proportion of noun use. A correlation analysis on these values within a dyad will reveal the extent to which maternal use of a word in a given category predicts child use in a category. 116 Results The results of the analysis comparing children’s cross-category use of particular lexical items to that of their mothers are presented in Figure 4-4. For all children, use of a particular word was well-predicted by their mother’s use of that word (all R>.94, all p<.001, two-tailed). However, it is possible that because most potentially ambicategorical words were not used across category by either the mother or the child, these words are driving the correlation. That children’s very early utterances do not include spontaneous (as opposed to attested) cross-category use is unsurprising. Overgeneralizations and creative word use often do not appear until the third or fourth year of life (Clark, 1982; Tomasello, 2000). Therefore, an important test of the extent to which maternal use of a word across category boundaries predicts the child’s use of the word is whether these correlations remain strong when only those words that are used as both noun and verb are included in the analysis. To this end, words that are used in only one category by both the mother and the child were removed from the data set and the correlations were recalculated. For five of the six children, this had only a small effect on the correlations (all R>.91, all p<.001, two- tailed). However, in the William corpus, the correlation decreased notably, although it remained highly significant (R=.73, p<.001, two-tailed). While it is difficult to tell exactly why William’s cross-category word use was less correlated with his mother’s than that of the other children, it is important to note that, of all the children analyzed, William was the only one with siblings much older than himself. It is possible that the presence of more interlocutors resulted in a more variable linguistic environment for 117 William, which would explain the lower correlation of his word use with that of his mother. A second possible driving force for these results is that some words are simply more likely to be used in more than one category. In this case, the high correlation of maternal and child noun use of a given word is not caused by children learning to use words in this way from their mothers, but rather by some semantic or cultural pressure that makes these words better candidates for cross-category use. To determine whether this is the case, the proportional noun use for each word type for each child was compared to the proportional noun use of each word type used by a different mother. That is, correlations between maternal and child noun use of the same word types were calculated again, but each child was paired with a new mother. Because there are gender-specific cross- category uses of some words (e.g., dress), children were paired with a mother whose child was of the same sex. The results of these correlations as well as the correlations of each child with his/her own mother are presented in Table 4-5. For all children, the correlation of noun use of a word with maternal noun use of that word decreased when the child was not paired with his/her own mother. However, all of these correlations remain significant (p<.001, two-tailed). These results indicate that children’s cross-category use of words that are noun/verb ambiguous is driven at least partially by the way in which their mothers use these words. Although a child’s cross-category use of words also correlated with that of a different mother, these correlations were not as strong, indicating some role of environment in determining how children use particular words. The fact that children not only use words as both noun and verb, but that they do so in a way that mirrors their mothers’ usage, 118 suggests that they can distinguish between verb uses and noun uses of the same words. These findings cannot be used to address the issue of how children make this distinction, but it is not clear how learners would so precisely mirror the statistics of their environment were they not somehow sensitive to the difference between a noun use and a verb use of the same word. 4.2.2 Study 2b: The verb/adjective ambiguity Children’s proportional noun use of noun/verb ambiguous words correlates strongly with that of their mothers. This result may be due in part to children’s ability to distinguish noun and verb uses of the same words on the basis of prosody. If that is the case, children should show a less strong correlation with their mothers’ use of verb/adjective ambiguous words. Experimental work with 13-month-old infants shows that they are unable to distinguish verb and adjective uses of the same words using prosody alone, in part because reliable prosodic cues to this distinction do no exist. Other cues may help children distinguish verb and adjective uses of the same words and allow them to replicate the statistics of their mothers’ usage to some extent. Still, if prosody plays a significant role in allowing children to mirror their mothers’ usage of words that are noun/verb ambiguous, the absence of prosodic cues to the verb/adjective distinction may reduce the correlation of child and maternal use. This study asks whether children’s proportional verb use of verb/adjective ambiguous words is correlated with that of their mothers. Method This study uses data collected in Study 1b, as well as that from the study of verb/adjective ambiguity in maternal speech described in Chapter 2 (Study 1b in that 119 chapter). Within a mother-child dyad, all tokens of each word type in the high and middle frequency ranges for the mother were extracted from the child’s speech and coded as adjective, verb or other, as described above. Likewise, all tokens of each word type in the high and middle frequency ranges for the child were extracted from the mother’s speech and coded as described above. This allowed for calculation of the proportion of verb uses of each of these word types for each speaker. Proportion of verb use for a given word type was calculated as the number of verb uses of that type divided by the total number of adjective and verb uses of that word type. If a word was only used as a verb, it would have a proportional verb use of 1. Words used only as adjectives would have a proportional verb use of 0. For each word type, this calculation was performed on maternal tokens to obtain maternal proportion of verb use and on child tokens to obtain child proportion of verb use. A correlation analysis on these values within a dyad will reveal the extent to which maternal use of a word in a given category predicts child use in a category. Results The results of the analysis comparing children’s cross-category use of particular lexical items to that of their mothers are presented in Figure 4-5. Because words that are used only in a single category by both mother and child comprise a majority of the word types in this analysis, these words could drive a correlation. To this end, words that are used in only one category by both the mother and the child were removed from the data set and the correlations were recalculated. These correlations are reported in Table 4-5. Each child’s proportional verb use of words was correlated with his/her mother’s proportional verb use, although to a lesser extent than found in study 2a for words that 120 are noun/verb ambiguous. For all children except Nina, this correlation was significant (p<.05, two-tailed). These results suggest that children are able to distinguish verb and adjective uses of the same words in natural language learning. The correlation between a child’s usage patterns and his or her mother’s usage pattern indicates that the child is able to detect the frequency with which these words are used in each category. Because the perceptual studies in Chapter 3 did not indicate that infants were able to detect prosodic differences between noun and verb uses of the same words, this pattern of results cannot be attributed to prosodic information alone. Furthermore, these results suggest that the high correlations found for child and maternal use of words that are noun/verb ambiguous may be due in part to prosody, but that other cues may also play a critical role. The existence of stronger mother/child correlations in the use of words that are noun/verb ambiguous suggests that prosodic information is at least helpful to learners in detecting cross- category usage. 4.2.3 Study 2c: The noun/adjective ambiguity Children’s usage patterns of words that are noun/verb ambiguous are highly correlated with their mothers’ usage patterns of those words. Likewise, children show correlations with their mothers’ use of verb/adjective ambiguous words, although these correlations are weaker than those found for noun/verb ambiguous words. These differences could be attributed in part to the differences in the availability and accessibility of prosodic cues to category in these two types of ambiguity. Prosodic cues to the noun/verb ambiguity exist and are perceived by infants, while prosodic cues to the verb/adjective ambiguity are less available and are not perceived by infants. 121 An interesting dissociation was found for the noun/adjective ambiguity in Chapter 3. In the case of the noun/adjective ambiguity, prosodic cues to grammatical category are available in the input, but are not perceived by infants in a habituation study. This finding raises two possibilities. Either infants simply cannot perceive the prosodic differences between noun and adjective uses of the same words, or they were unable to do so in the experiment for unknown task-specific reasons. (See Chapter 3 for further discussion.) If the first possibility is the cause of infants’ performance in the habituation task, correlations with maternal use of noun/adjective ambiguous words should behave much like correlations of words that are verb/adjective ambiguous. Alternatively, if infants are sensitive to these cues, but were unable to use them in the habituation study, children’s usage of noun/adjective ambiguous words should correlate with their mothers’ usage of these words in the same way that their usage of noun/verb ambiguous words correlates. To assess which of these possible accounts is more likely, this study asks whether children’s use of noun/adjective ambiguous words correlates with their mothers’ usage of such words. Method This study uses data collected in Study 1c, as well as that from the study of noun/adjective ambiguity in maternal speech described in Chapter 2 (Study 1c in that chapter). Within a mother-child dyad, all tokens of each word type in the high and middle frequency ranges for the mother were extracted from the child’s speech and coded as adjective, noun or other, as described above. Likewise, all tokens of each word type in the high and middle frequency ranges for the child were extracted from the mother’s speech and coded as described above. This allowed for calculation of the proportion of 122 noun uses of each of these word types for each speaker. Proportion of noun use for a given word type was calculated as the number of noun uses of that type divided by the total number of adjective and noun uses of that word type. If a word was only used as a noun, it would have a proportional noun use of 1. Words used only as adjectives would have a proportional noun use of 0. For each word type, this calculation was performed on maternal tokens to obtain maternal proportion of noun use and on child tokens to obtain child proportion of noun use. A correlation analysis on these values within a dyad will reveal the extent to which maternal use of a word in a given category predicts child use in a category. Results The results of the analysis comparing children’s cross-category use of particular lexical items to that of their mothers are presented in Figure 4-5. Because words that are used only in a single category by both mother and child comprise a majority of the word types in this analysis, these words could drive a correlation. To this end, words that are used in only one category by both the mother and the child were removed from the data set and the correlations were recalculated. These correlations are reported in Table 4-5. Each child’s proportional noun use of noun/adjective ambiguous words was significantly correlated with his/her mother’s proportional noun use (p<.05, two-tailed). For most children, usage patterns of noun/adjective ambiguous words correlated as well or slightly less strongly with maternal usage than for noun/verb ambiguous words, but more strongly than for verb/adjective ambiguous words. Violet and William are the exceptions to this pattern, although the very small number of word types produced by Violet and William 123 (as compared to the other children) suggests that their language may be developing at a different pace. These results suggest that learners are able to differentiate noun and adjective uses of the same words sufficiently well that their usage patterns mirror the statistics of their environments. How such a pattern would manifest if children were unable to make this distinction is unclear. Still, the weaker correlation for the noun/adjective ambiguity, combined with infants’ failure to detect prosodic cues to lexical category in words that are noun/adjective ambiguous, suggests that children do not use prosodic information about category membership in the same way for these words. 4.2.4 Discussion Children’s usage of ambicategorical words is well-correlated with their mothers’ usage patterns of those same words. The data from the three substudies presented in this section indicate that language learners are highly sensitive to the statistics of their language environments, even in cases of ambicategoricality. Replication of statistical patterns by children is not unheard of in the language development literature; children’s productions are often closely related to the language that they hear (Demuth, Machobane & Moloi, 2003; Lieven, Pine & Baldwin, 1997; Tomasello, 1992). However close adherence to language statistics in cases of irregularity or inconsistency runs counter to much previous work on creolization and learning from non-native speakers (Goldin- Meadow & Mylander, 1984; Singleton & Newport, 2004; Hudson Kam & Newport, 2005). The results from these studies suggest that, for one reason or another, learners do not treat ambicategoricality as a case of variability or irregularity. 124 One way to reconcile these findings with previous work on the regularization of irregular input is to consider the possibility that language learners are treating ambicategorical words not as one word used two ways, but as two distinct forms. In this way, they may be able to avoid the problem of irregularity presented by ambicategorical words. Such words, if they have two distinct representations, are not used irregularly at all, but rather each form has a consistent usage. The data from these studies alone cannot address the issue of how children form these distinct representations, but taken in conjunction with the studies reported in Chapters 2 and 3, prosody may be one way in which the two forms are distinguished. The prosodic story is particularly strong for words that are noun/verb ambiguous, as consistent prosodic cues to category not only exist in such words, but are detected by infants. However, prosody cannot explain the correlations between maternal and child usage of verb/adjective or adjective/noun ambiguous words. Still, the higher correlations in the case of the noun/verb ambiguity suggest that prosody is helpful for learning about ambicategoricality and for segregating cross-category uses of words. Other cues, including referential context, may also play a role in this learning process and may help to account for the correlations seen for noun/adjective and verb/adjective ambiguous words. 4.3 General Discussion The two studies presented here address the issue of how children use words that are ambicategorical to adults. Unlike previous research, this work includes not only derived forms, but cases of accidental ambicategoricality. It also considers words that are verb/adjective ambiguous and noun/adjective ambiguous, in addition to the more frequently studied noun/verb ambiguous words. Beyond describing how many words 125 children use in more than one category, this work also directly compares children’s use of particular word types to that of their mothers, thereby addressing the question of what gives rise to ambicategoricality in children’s speech. The findings of Study 1 indicate that children do use words in more than one lexical category. Although previous work has indicated that this is the case for the noun/verb ambiguity (Macnamara, 1982; Barner, 2001; Oshima-Tanake, et al., 2001), this study comprises a more longitudinal sample of child speech and also considers the verb/adjective and noun/adjective ambiguities. Rates of cross-category usage of noun/verb ambiguous words by children are similar to those reported in previous studies (Barner, 2001) with slightly lower rates of cross-category usage of verb/adjective ambiguous words and slightly higher rates of cross-category usage of noun/adjective ambiguous words. These findings are a bit surprising given children’s well-documented tendency to restrict a single form to a single function and to avoid homonymy (Slobin, 1973; Macnamara, 1982; Markman & Wachtel, 1988; Clark, 1988; Golinkoff, Mervis & Hirsh- Pasek, 1994). However, Casenhiser (2005) showed that children will accept the same label for more than one referent if a grammatical category change is indicated. Taken with the results of the research presented here, as well as with previous corpus work (Barner, 2001; Oshima-Tanake, 2001; Nelson, 1995), these findings suggest that uses across lexical categories are an exception to the principle of one form/one function. Why children make this exception is not clear. One possibility is that words are represented distinctly in each grammatical category. On the basis of the findings in Chapters 2 and 3 of this dissertation, one might conclude that at least some of the distinction between the 126 noun form of a word and the verb form of a word is prosodic in nature. However, not all of the ambiguities described here are well-supported by prosodic differences (see Chapter 3). Other cues, such as referential context, may also play a role. Indeed, multiple cues often aid learning, especially of grammar (Moeser & Bregman, 1972; Morgan & Newport, 1981; Gerken, Wilson & Lewis, 2005). The second study presented in this chapter shows that language learners replicate their mothers’ usage of ambicategorical words. The way in which a mother uses a given ambicategorical word strongly predicts the way in which her child will use that word. The very high correlations between mother and child usage of noun/verb and noun/adjective ambiguous words, in particular, suggest that children’s use of words across category is not haphazard or spontaneous, but rather learned from the environment. Although children are known to use statistical processes in language learning (Saffran, Aslin & Newport, 1996; Demuth, Machobane & Moloi, 2003; Lieven, Pine & Baldwin, 1997; Tomasello, 1992), abundant evidence suggests that children should regularize input that is irregular (Goldin-Meadow & Mylander, 1984; Singleton & Newport, 2004; Hudson Kam & Newport, 2005). A word that is sometimes a noun and sometimes a verb is certainly irregular in terms of its syntactic behavior, yet children not only fail to regularize their usage of such words, they actually mirror the irregularity of the input language. Again, this discrepancy between previous work and the findings presented here may be resolved if learners’ representations of ambicategorical words are distinct in some way. Although this work cannot rule conclusively on what might differentiate uses in one category from uses in another, the correlations between maternal and child usage are stronger for words that are noun/verb ambiguous and words that are 127 noun/adjective ambiguous than for those that are verb/adjective ambiguous. Verb/adjective ambiguous words were the only ones to contain no reliable prosodic cues to lexical category in Chapter 3. Although infants in the studies presented in Chapter 3 were not sensitive to the prosodic cues differentiating noun and adjective uses of the same words, prosodic cues to this distinction do exist and may be used by infants in a natural language learning context, if not in laboratory studies. The work presented in this chapter represents the first large-scale, longitudinal examination of children’s use of ambicategorical words. Not only do children use words in more than one category, they do so in a way that mirrors their mothers’ usage patterns. For these results to manifest, children must distinguish cross-category uses of the same words in natural language learning. Evidence presented in Chapter 3 indicates that one cue that may play a role in making these distinctions is prosody. Still, prosody alone cannot account for all of these results, which suggests that other cues may also be at work. Converging cues have been found to support language learning in other contexts (e.g., Moeser & Bregman, 1973; Morgan & Newport, 1981; Gerken, et al., 2005; Monaghan, Chater & Christiansen, 2005). How prosody interacts with other cues, such as referential context or morphology, is a question for future research. 128 Table 4-1 Child Sex Age Range # of Files (years; months) Alex M 1;5-3;5 52 Ethan M 0;11-2;11 50 Lily F 1;1-4;0 80 Nina F 1;11-3;3 52 Violet F 1;2-3;11 52 William M 1;4-3;4 44 Table 4-2 # Noun or Verb Types # Potentially Ambicategorical # Used Across Categories High Middle Low Total High Middle Low Total High Middle Low Total Alex 22 32 374 428 7 14 96 117 4 5 11 20 Ethan 17 38 646 701 4 15 170 189 3 4 24 31 Lily 30 58 718 806 8 22 178 208 3 3 19 25 Nina 43 73 563 679 16 31 147 194 3 6 19 28 Violet 13 11 460 484 5 6 131 142 0 0 17 17 William 15 21 394 430 5 9 108 122 0 6 12 18 Table 4-3 # Verb or Adjective Types # Potentially Ambicategorical # Used Across Categories High Middle Low Total High Middle Low Total High Middle Low Total Alex 17 22 197 236 2 1 9 12 0 0 2 2 Ethan 8 28 360 396 3 1 28 32 1 0 3 4 Lily 23 41 392 456 1 4 22 27 0 1 3 4 Nina 32 50 301 383 4 3 24 31 1 0 2 3 Violet 8 12 264 284 1 2 18 21 0 1 4 5 William 10 15 225 250 1 1 15 17 0 1 2 3 Table 4-4 # Noun or Adjective Types # Potentially Ambicategorical # Used Across Categories High Middle Low Total High Middle Low Total High Middle Low Total Alex 22 30 366 418 6 3 12 21 6 2 3 11 Ethan 13 33 584 630 3 5 30 38 2 3 4 9 Lily 27 55 666 748 3 4 28 35 3 1 4 8 Nina 35 70 509 614 4 7 20 31 3 6 6 15 Violet 13 11 420 444 0 4 19 23 0 2 2 4 William 13 23 369 405 0 3 17 20 0 0 2 2 129 Table 4-5 Noun/Verb Noun/Verb Verb/Adjective Noun/Adjective Correlation Correlation Correlation Correlation (Own Mother) (Other Mother) (Own Mother) (Own Mother) Alex .91 .67 .77 .84 Ethan .92 .73 .70 .85 Lily .95 .87 .90 .81 Nina .91 .76 .49 .91 Violet .96 .93 .90 .59 William .73 .71 .98 .87 130 Figure 4-1 1 Proportion of potentially ambiguous 0.9 words used in both categories 0.8 0.7 High 0.6 Middle 0.5 Low 0.4 0.3 Total 0.2 Mother Total 0.1 0 Alex Ethan Lily Nina Violet William The proportions of potentially ambiguous words that children actually use as both noun and verb are shown, broken down by frequency range (high, medium and low), as well as collapsed over all three frequency ranges (total). Overall maternal cross-category use is included for comparison (see Chapter 2). Figure 4-2 1 0.9 ambiguous words used in both Proportion of potenially 0.8 0.7 High categories 0.6 Medium 0.5 0.4 Low 0.3 Total 0.2 Mother Total 0.1 0 Alex Ethan Lily Nina Violet William The proportions of potentially ambiguous words that children use as both verb and adjective are shown, broken down by frequency range (high, medium and low), as well as collapsed over all three frequency ranges (total). Maternal total is included for comparison (see Chapter 2). 131 Figure 4-3 1 Proportion of potentially ambiguous 0.9 words used in both categories 0.8 0.7 High 0.6 Medium 0.5 Low 0.4 Total 0.3 Mother Total 0.2 0.1 0 Alex Ethan Lily Nina Violet William The proportions of potentially ambiguous words that children use as both noun and adjective are shown, broken down by frequency range (high, medium and low), as well as collapsed over all three frequency ranges (total). Maternal total is included for comparison (see Chapter 2). 132 Figure 4-4 Alex Ethan 1 1 Child Noun Proportion Child Noun Proportion 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 0.5 1 0 0.5 1 Mother Noun Proportion Mother Noun Proportion Lily Nina 1 1 Child Noun Proportion 0.8 Child Noun Proportion 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 0.5 1 0 0.5 1 Mother Noun Proportion Mother Noun Proportion Violet William 1 1 Child Noun Proportion Child Noun Proportion 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 0.5 1 0 0.5 1 Mother Noun Proportion Mother Noun Proportion This figure shows proportion of child use of noun/verb ambiguous word types as nouns as a function of maternal proportion of noun use. All correlations are highly significant; values of Pearson’s R are presented in Table 4-5. 133 Figure 4-5 Alex Ethan 1 1 Child Verb Proportion Child Verb Proportion 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 0.5 1 0 0.5 1 Mother Verb Proportion Mother Verb Proportion Lily Nina 1 1 Child Verb Proportion 0.8 Child Verb Proportion 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 0.5 1 0 0.5 1 Mother Verb Proportion Mother Verb Proportion Violet William 1 1 Child Verb Proportion Child Verb Proportion 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 0.5 1 0 0.5 1 Mother Verb Proportion Mother Verb Proportion This figure shows proportion of child use of verb/adjective ambiguous word types as verbs as a function of maternal proportion of verb use. Values of Pearson’s R are presented in Table 4-5. 134 Figure 4-6 Alex Ethan 1 1 Child Noun Proportion Child Noun Proportion 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 0.5 1 0 0.5 1 Mother Noun Proportion Mother Noun Proportion Lily Nina 1 1 Child Noun Proportion Child Noun Proportion 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 0.5 1 0 0.5 1 Mother Noun Proportion Mother Noun Proportion Violet William 1 1 Child Noun Proportion Child Noun Proportion 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 0.5 1 0 0.5 1 Mother Noun Proportion Mother Noun Proportion This figure shows proportion of child use of noun/adjective ambiguous word types as nouns as a function of maternal proportion of noun use. Values of Pearson’s R are presented in Table 4-5. CHAPTER 5 This dissertation has examined the nature of ambicategoricality in language development. Ambicategoricality, the ability of a word to appear in more than one grammatical category, poses a potentially major problem to any theory of lexical category development, one that no theory adequately addresses. With an eye toward how grammatical category learning might go forward in spite of category ambiguity, this dissertation has asked three questions regarding the nature of ambicategoricality in language development. First, what is the nature of grammatical category ambiguity in speech to children? Do children encounter grammatical ambiguity and, if they do, are there cues available in the speech that they hear that might help them to resolve the ambiguity? Second, do learners possess perceptual abilities that might alleviate the ambicategoricality problem by allowing them to distinguish cross-category uses of the same words? Finally, how do children use words that are ambiguous with regard to grammatical category? Do they restrict such words to a single grammatical category or do they use words ambicategorically? Is their use of ambicategorical words related to that of their caregivers or do they use words across category in a spontaneous or haphazard manner? By answering these questions, I hope to provide a partial resolution to the ambicategoricality problem in theories of grammatical category learning. Chapter 2 of this dissertation focused on the nature of ambicategoricality in speech to children. Examining longitudinal samples maternal speech to six young children, these 135 136 studies found that mothers do use words in more than one grammatical category when speaking to their toddlers. Roughly 25% of words that can be used as both noun and verb are actually used in both categories by mothers. A smaller percentage of words (~20%) that can be used as both verb and adjective are used across category boundaries, while a larger percentage (~35%) of those words that are ambiguous between noun and adjective appear in both categories in child-directed speech. Almost none of the words that are used ambicategorically appear equally often in both categories. Because a comparable corpus of adult-directed speech is not available, it is not possible to determine whether mothers use fewer words ambicategorically when speaking to their children than they do when speaking to adults. However, the number of words that are used in more than one category is not insignificant and learners do encounter some ambicategoricality in the language that they hear. A further examination of words that mothers use as both noun and verb showed that mothers differentiate prosodically between noun and verb uses of the same words. In general, noun uses are longer than verb uses and, when length cues are considered in conjunction with pitch information, prosodic information can be used to correctly categorize 70-80% of tokens of ambicategorical words. Because chance categorization is 50%, this level of accuracy suggests that prosody may be a very useful cue for categorizing individual tokens of noun/verb ambiguous words. Unfortunately, the available corpora did not contain sufficient tokens of verb/adjective or adjective/noun ambiguous words to assess the efficacy of prosodic information for disambiguating uses of those words. Still, some prosodic information regarding lexical category is available in the speech that children hear. 137 Chapter 3 asked whether young language learners are sensitive to the prosodic information that might distinguish between cross-category uses of the same words. Infants were habituated to isolated tokens of ambicategorical words, all from the same category, and tested on novel exemplars of the same word types from both the habituated category (same trials) and the other category in which the word could be used (switch trials). When the words were ambiguous between noun and verb, infants were able to detect the differences between the two sets on the basis of prosodic information alone. However, infants did not show this ability with words that were verb/adjective or noun/adjective ambiguous. Because the corpora analyzed in Chapter 2 did not contain enough tokens to evaluate the reliability of prosodic cues to the verb/adjective and noun/adjective ambiguities, it is not possible to say whether infants failed in these tasks because they lacked the perceptual sensitivity to succeed or because there was not enough prosodic information to disambiguate the two sets. Although the noun and adjective tokens used in these studies differed significantly on a variety of prosodic measures, perhaps infants’ prior experience with these words impacted their performance in these tasks. Because infants can use prosody to differentiate noun and verb uses of the same word types, they may use this information in real-world language learning. Their inability to distinguish verb and adjective and noun and adjective tokens of the same words on the basis of prosody alone suggests that their representations of words that are ambicategorical in these ways may be qualitatively different from their representations of noun/verb ambiguous words. Specifically, language learners may be able to segregate noun and verb tokens of the same words more easily in natural language learning contexts and use that information to learn which words are ambicategorical and which are 138 not. We might also expect fewer errors in noun/verb cross-category usage than in other types of cross-category usage, as learners are less likely to conflate noun and verb contexts if they can distinguish between noun and verb uses of the same word types. To evaluate children’s representations of ambicategorical words, Chapter 4 asked how young language learners use words that can appear in more than one lexical category. First, it found that children use noun/verb ambiguous words in more than one category less freely than their mothers do, but that they do not restrict words to a single category. This is also true for words that are verb/adjective ambiguous, although such words are used slightly less often in both categories by children (and by their mothers). Noun/adjective ambiguous words are used quite freely across categories by children, with as many as half of the potentially noun/adjective ambiguous words appearing in both categories for some children. Children’s overall rate of cross-category usage of noun/adjective ambiguous words is somewhat higher than that of their mothers. Although these proportions indicate that language learners will use words in more than one category, they do not address how or whether children’s use of particular word types across categories is related to that of their caregivers. To this end, proportional use of specific word types in each category was directly compared between mother and child. Nearly all correlations were significant. When each child’s cross-category usage of specific words was compared with that of another child’s mother, the correlations decreased but remained significant. These findings suggest that children learn a considerable amount about cross-category usage from their caregivers, but that there may be other influences on cross-category usage as well, including the semantic properties of a word (Barner, 2001; Oshima-Tanake, Barner, Elsabbagh & Guerriero, 2001). Because 139 close correlations in terms of frequency of use in each category are unlikely to manifest if children are unable to distinguish between uses in one category and uses in another, these results indicate that learners are somehow able to tell a noun use of a word from a verb use of a word (and a verb use from an adjective use and an adjective use from a noun use). Given the results of the analyses in Chapter 2 and the experiments in Chapter 3, one cue that learners might use is prosody. However, because prosodic cues to category may not be available and are not accessible to learners for words that are noun/adjective and verb/adjective ambiguous, the high correlations between mother and child use of those words must be attributed to other cues, such as referential context. The findings presented in this dissertation represent the first large-scale study of the problem of ambicategoricality in language development. This research covers the nature of the input, the perceptual abilities of learners and the ways in which children use ambicategorical words, providing converging evidence regarding the problems that ambicategoricality does and does not present in language learning. Having summarized the findings, I now turn to the implications of these results for theories of lexical category development. 5.1 Ambicategoricality and grammatical category development As described in Chapter 1, ambicategoricality poses a potentially serious problem to all major accounts of how grammatical categories develop. If one task facing a language learner is to sort the words of the language into part of speech categories such as noun, verb and adjective, any theory of this process must account for how words that can be both noun and verb (or both verb and adjective, etc.) are incorporated into the grammatical category system. The “bootstrapping problem” that learners must solve is 140 this: Grammatical categories are defined in terms of their distributions and dependencies relative to other grammatical categories. Because the circularity of this definition makes the system virtually impossible to break into at a high level, most theories of lexical category development suggest that learners may use other sources of information to begin forming categories. While not strictly diagnostic, cues such as phonotactic structure, word meaning and local co-occurrence information may be correlated with particular lexical categories. Ambicategoricality poses a potential problem for each of these accounts; I will now examine the implications of my findings for each of these theories of grammatical category development. Phonotactic structure correlates to some extent with noun and verb categories (Sereno & Jongman, 1990; Kelly, 1992). Because learners must acquire the phonological structure of words anyway, perhaps they use this information to begin forming lexical categories. Ambicategoricality poses a fairly obvious problem for this approach. Although differences in stress patterns indicate noun or verbhood for some disyllabic ambicategorical words in English, most words that children hear (over 80% of nouns and verbs) have monosyllabic roots. Words that have the phonotactic properties of one category, but can be used in both, may provide mixed information regarding the syntactic privileges of words with those phonotactic patterns. In other words, the problem that ambicategorical words pose to phonological bootstrapping is at the stage of connecting phonotactic categories with syntactic information. The availability of prosodic information to dissociate noun and verb uses of the same phonotactic string may help learners avoid this problem (see also Shi & Moisan, 2008). The fact that these cues are not available to infants for words that are noun/adjective and verb/adjective ambiguous 141 suggests that such words should continue to pose a problem for the phonological bootstrapping account. Specifically, the phonological bootstrapping account, taken together with these findings, would predict that children should make more errors indicating conflation of verb and adjective or noun and adjective categories than errors suggesting a conflation of noun and verb categories. The semantic bootstrapping hypothesis suggests that learners use the meanings of words to sort them into rudimentary categories, such as “action word” and “object word” that then roughly map on to grammatical categories such as verb and noun. Proponents of this theory invoke ambicategoricality to argue against strong versions of the distributional bootstrapping hypothesis (Pinker, 1987; Nelson, 1995), but exploration of the real consequences of “action words” being used as nouns has been limited. If words for actions map to verbs, cross-category usage of such words should pose a serious problem to “linking” of semantic and syntactic categories. The findings presented in this dissertation could be applied to the semantic bootstrapping account by suggesting that prosodically distinct representations may alleviate some of these problems. However, such an account would not cover cases of noun/adjective and verb/adjective ambiguity. Barner and colleagues (Barner, 2001; Oshima-Tanake, et al., 2001) present some evidence that lexical meaning influences the likelihood of a word being used as both noun and verb by children and their caregivers, suggesting that there is also a role for semantic information in resolving ambicategoricality in language development. Perhaps prosodic cues combined with subtle semantic cues could aid language learners in distinguishing cross-category uses of the same words, especially in the case of the noun/adjective and verb/adjective ambiguities. Prosodic and semantic information have 142 both been shown to improve learning of artificial grammars (Moeser & Bregman, 1973; Morgan & Newport, 1981; Braine, Brody, Brooks, Sudhalter, Ross, Catalano & Fisch, 1990). The approach to grammatical category learning that may be most impacted by a solution to the ambicategoricality problem is distributional bootstrapping. The distributional approach posits that learners track local co-occurrence information and extend words to new contexts on the basis of how other words from the same category are distributed. For example, a learner who has rudimentary categories such as “words that appear after the” and “words that appear after a” might notice the overlap in these categories and extend membership in one to members of the other. In other words, the learner might assume that all words that appear after the can also appear after a once s/he notices that the categories overlap significantly (Mintz, 2003). However, if a word is ambicategorical, it will have the co-occurrence properties of both its categories, which might lead a learner to conflate categories or combine categories that should remain separate. The findings reported in this dissertation can partially resolve this issue. If children form distinct representations for cross-category uses of the same words, the problem of conflating categories is lessened. In a sense, ambicategorical words are not one word that appears in two categories, but two forms, one that occurs in each category. Some of the evidence presented here suggests that, in the case of the noun/verb ambiguity, prosodic differences between the two categories may allow learners to form distinct representations for uses in each category. The lack of prosodic cues to the verb/adjective and noun/adjective ambiguities implies that learners do not use prosody to make these kinds of distinctions. Learners may maintain distinct forms of noun/adjective 143 or verb/adjective ambiguous words by incorporating other information, such as lexical meaning or referential context. Alternatively, words that are ambiguous between verb and adjective or noun and adjective may be more prone than noun/verb ambiguous words to the kinds of category conflation that strict adherence to the distributional bootstrapping approach would predict. 5.2 Conclusions Language learners encounter ambicategoricality regularly. Words are used in more than one lexical category in speech to children, but this problem may be partially mitigated by prosodic differences between noun uses and verb uses of ambiguous words. Mothers reliably produce these cues in their natural speech and infants are sensitive to this kind of prosodic information. Children also produce ambicategorical words in their own speech in a way that mirrors the statistics of their language environments. These findings indicate that the ambicategoricality problem in language development may be more apparent than real. Learners are well-equipped to distinguish cross-category uses of words and to use that distinction in their language development. REFERENCES Barner, D. (2001). Light verbs and the flexible use of words as noun and verb in early language learning. Unpublished Master’s thesis, McGill University. Barner, D. & Snedeker, J. (2005). Quantity judgments and individuation: Evidence that mass nouns count. Cognition, 97, 41-66. Bloom, L., Lightbown, P. & Hood, L. (1975). Structure and variation in child language. Monographs of the Society for Research in Child Development, 40. Boersma, P. & Weenink, D. (2008). Praat: Doing phonetics by computer (Version 5.0.15). http://www.praat.org. Bortfeld, H. & Morgan, J. L. (submitted). Early word recognition may be stress-full. Bowerman, M. (1973a). Early syntactic development: A cross-linguistic study with special reference to Finnish. Cambridge, UK: Cambridge University Press. Bowerman, M. (1973b). Structural relationships in children’s utterances: Syntactic or semantic? In Moore, T. E. (ed.), Cognitive development and the acquisition of language. New York: Academic Press. Bowerman, M. (1974). Learning the structure of causative verbs: a study in the relationship of cognitive, semantic and syntactic development. Papers and Reports on Child Language Development, 8, 142-179. 144 145 Bowerman, M. (1982). Reorganizational processes in lexical and syntactic development. In Wanner, E. & Gleitman, L. R. (eds.), Language acquisition: The state of the art. Cambridge, UK: Cambridge University Press. Braine, M. D. S. (1963). The ontogeny of English phrase structure: The first phase. Language, 39, 1-13. Braine, M. D. S. (1976). Children’s first word combinations. Monographs of the Society of Research in Child Development, 41. Braine, M. D. S. (1987). What is learned in acquiring word classes- a step toward an acquisition theory. In B. MacWhinney (Ed.), Mechanisms of language acquisition. Hillsdale, NJ: Erlbaum. Braine, M. D. S., Brody, R. E., Brooks, P. J., Sudhalter, V., Ross, J. A., Catalano, L. & Fisch, S. M. (1990). Exploring language acquisition in children with a miniature artificial language: Effects of item and pattern frequency, arbitrary subclasses and correction. Journal of Memory and Language, 29, 591-610. Brown, R. (1973). A first language: The early stages. Cambridge, MA: Harvard University Press. Bushnell, E. W. & Maratsos, M. P. (1984). “Spooning” and “basketing”: Children’s dealing with accidental gaps in the lexicon. Child Development, 55, 893-902. Casenhiser, D. M. (2005). Children’s resistance to homonymy: An experimental study of pseudohomonyms. Journal of Child Language, 32, 319-343. Cassidy, K. W. & Kelly, M. H. (1991). Phonological information for grammatical category assignments. Journal of Memory and Language, 30, 348-369. 146 Cassidy, K. W. & Kelly, M. H. (2001). Children’s use of phonology to infer grammatical class in vocabulary learning. Psychonomic Bulletin and Review Journal, 8, 519-523. Chomsky, N. (1965). Aspects of the theory of syntax. Boston, MA: MIT Press. Clark, E. V. (1982). The young word maker: A case study of innovation in the child’s lexicon. In Wanner, E. & Gleitman, L. R. (eds.), Language acquisition: The state of the art. New York: Cambridge University Press. Clark, E. V. (1988). On the logic of contrast. Journal of Child Language 15, 317-335. Clark, E. V. (1988). On the logic of contrast. Journal of Child Language 15, 317-335. Clark, E. V. & Clark, H. H. (1979). When nouns surface as verbs. Language, 55, 767- 811. Conwell, E. & Balas, B. J. (2007). Assessing the efficacy of transitional probabilities for learning syntactic categories. In D.S. McNamara & J.G. Trafton (Eds.), Proceedings of the 29th Annual Meeting of the Cognitive Science Society (pp. 893-898). Austin, TX: Cognitive Science Society. Conwell, E. & Demuth, K. (2007). Early syntactic productivity: Evidence from dative shift. Cognition, 103, 163-179. Cooper, W. E. & Paccia-Cooper, J. (1980). Syntax and speech. Cambridge, MA: Harvard University Press. Cortese, M. J. & Fugett, A. (2004). Imageability ratings for 3,000 monosyllabic words. Behavior Research Methods, Instruments and Computers, 36, 384-387. Dale, P. S. & Fenson, L. (1996). Lexical development norms for young children. Behavior Research Methods, Instruments and Compters, 28, 125-127. 147 Demuth, K., Culbertson, J. & Alter, J. (2006). Word-minimality, epenthesis and coda licensing in the acquisition of English. Language and Speech, 49, 137-174. Demuth, K., Machobane, M., & Moloi, F. (2003). Rules and construction effects in learning the argument structure of verbs. Journal of Child Language, 30, 1-25. Ferguson, C. A. (1964). Baby talk in six languages. American Anthropologist, 66, 103- 114. Fernald, A., Taeschner, T., Dunn, J., Papousek, M., Boysson-Bardies, B., & Fukui, I. (1989). A cross-language study of prosodic modifications in mothers’ and fathers’ speech to preverbal infants. Journal of Child Language, 16, 477-501. Fisher, C. (2002). The role of abstract syntactic knowledge in language acquisition: A reply to Tomasello (2000). Cognition, 82, 259-278. Fisher, C. & Tokura, H. (1996). Acoustic cues to grammatical structure in infant-directed speech: Cross-linguistic evidence. Child Development, 67, 3192-3218. Francis, W. N. & Kucera, H. (1983). Frequency analysis of English usage: Lexicon and grammar. Boston: Houghton Mifflin. Gerken, L., Wilson, R. & Lewis W. (2005). Infants can use distributional cues to form syntactic categories. Journal of Child Language, 32, 249-268. Goldin-Meadow, S. & Mylander, C. (1984). Gestural communication in deaf children: The effects and noneffects of parental input on early language development. Monographs of the Society for Research in Child Development, 49(3/4), 1-151. Golinkoff, R. M., Mervis, C. V., & Hirsh-Pasek, K. (1994). Early object labels: The case for a developmental lexical principles framework. Journal of Child Language, 21, 125-155. 148 Gómez, R. & Gerken, L. (1999). Artificial grammar learning by one-year-olds leads to specific and abstract knowledge. Cognition, 70, 109-135. Gómez, R. L. & Lakusta, L. (2004). A first step in form-based category abstraction by 12-month-old infants. Developmental Science, 7, 567-580. Gordon, P. (1985). Evaluating the semantic categories hypothesis: The case of the count/mass distinction. Cognition, 20, 209-242. Gorkin, J. R. (2008). Word recognition and lexical stress: When the emPHAsis is on the wrong sylLAble. Unpublished undergraduate honors thesis, Brown University. Harris, Z. S. (1954). Distributional structure. Word, 10, 140-162. Höhle, B., Weissenborn, J., Keifer, D., Schulz, A. & Schmitz, M. (2004). Functional elements in infants’ speech processing: The role of determiners in syntactic categorization of lexical elements. Infancy, 5, 341-353. Houston, D. M. & Jusczyk, P. W. (2000). The role of talker-specific information in word segmentation by infants. Journal of Experimental Psychology: Human Perception and Performance, 26, 1579-1582. Hudson Kam, C. & Newport, E. (2005). Regularizing unpredictable variation: The roles of adult and child learners in language formation and change. Language Learning and Development, 1, 151-195. Jusczyk, P.W., Houston, D.M., &. Newsome, M. (1999). The beginnings of word segmentation in English-learning infants. Cognitive Psychology, 39, 159-207. Kelly, M. H. & Bock, J. K. (1988). Stress in time. Journal of Experimental Psychology: Human Perception and Performance, 14, 389-403. 149 Kelly, M. H. (1992). Using sound to solve syntactic problems: The role of phonology in grammatical category assignments. Psychological Review, 99, 349-364. Kuczaj, S. (1978). Why do children fail to overgeneralize the progressive inflection? Journal of Child Language, 5, 167-171. Kuhl, P. K., Williams, K. A., Lacerda, F., Stevens, K. N. & Lindblom, B. (1992). Linguistic experience alters phonetic perception in infants by 6 months of age. Science, 255, 606-608. Levin, B. (1993). English verb classes and alternations: A preliminary investigation. Chicago: University of Chicago Press. Lieven, E. V. M., Pine, J. M. & Baldwin, G. (1997). Lexically-based learning and early grammatical development. Journal of Child Language, 24, 187-219. Macnamara, J. (1982). Names for things: A study of human learning. Cambridge, MA: MIT Press. MacWhinney, B. J. (2000). The CHILDES project: Tools for analyzing talk. 3rd edition. Mahwah, NJ: Erlbaum. Maratsos, M. P. & Chalkley, M. A. (1980). The internal language of children’s syntax: The ontogenesis and representation of syntactic categories. In Nelson, K. (ed.), Children’s language, Vol. 2. New York: Gardner Press. Marcus, G. F., Pinker, S., Ullman, M., Hollander, M., Rosen, T. J. & Xu, F. (1992). Overregularization in language acquisition. Monographs of the Society for Research in Child Development, 57. Markman, E. & Wachtel, G. (1988). Children’s use of mutual exclusivity to constrain the meanings of words. Cognitive Psychology, 20, 121-157. 150 Mintz, T. H. (2002). Category induction from distributional cues in an artificial language. Memory and Cognition, 30, 678-686. Mintz, T. H. (2003). Frequent frames as a cue for grammatical categories in child directed speech. Cognition, 90, 91-117. Mintz, T. H. (2005). Linguistic and conceptual influences on adjective acquisition in 24- and 36-month olds. Developmental Psychology, 41, 17-29. Mintz, T. H. (2006). Finding the verbs: Distributional cues to categories available to young learners. In Hirsh-Pasek, K. & Golinkoff, R. M. (eds.), Action meets word: how children learn verbs. Oxford: Oxford University Press. Mintz, T. H., Newport, E. L. & Bever, T. G. (2002). The distributional structure of grammatical categories in speech to young children. Cognitive Science, 26, 393-424. Moeser, S. D. & Bregman, A. (1972). The role of reference in children’s acquisition of a miniature artificial language. Journal of Verbal Learning and Verbal Behavior, 11, 759-769. Monaghan, P., Chater N. & Christiansen, M.H. (2005). The differential contribution of phonological and distributional cues in grammatical categorization. Cognition, 96, 143-182. Monaghan, P., Christiansen, M.H. & Chater, N. (2007). The phonological-distributional coherence hypothesis: Cross-linguistic evidence in language acquisition. Cognitive Psychology, 55, 259-305. Morgan, J. L. & Newport, E. L. (1981). The role of constituent structure in the induction of an artificial language. Journal of Verbal Learning and Verbal Behavior, 20, 67- 85. 151 Morgan, J. L., Shi. R. & Allopenna, P. (1996). Perceptual bases of rudimentary grammatical categories: Toward a broader conceptualization of bootstrapping. In Morgan, J. L. & Demuth, K. (eds.), Signal to Syntax. Hillsdale, NJ: Erlbaum. Nazzi, T., Kemler Nelson, D.G., Jusczyk, P.W., & Jusczyk, A.M. (2000). Six-month- olds’ detection of clauses in continuous speech: Effects of prosodic well-formedness. Infancy, 1, 123-147. Nelson, K. (1995). The dual category problem in the acquisition of action words. In Tomasello, M. & Merriman, W. E. (eds.), Beyond names for things: Young children’s acquisition of verbs. Hillsdale, NJ: Erlbaum. Nelson, K., Hampson, J., & Shaw, L.K. (1993). Nouns in early lexicons: Evidence, explanations and implications. Journal of Child Language, 20, 61-84. Olguin, R. & Tomasello, M. (1993). Twenty-five-month-old children do not have a grammatical category of verb. Cognitive Development, 8, 245-272. Oshima-Tanake, Y., Barner, D., Elsabbagh, M. & Guerriero, A. M. S. (2001). Learning of deverbal nouns. In Almgren, M., Barreña, A., Ezeizabarrena, M-J., Idiazabal, I. & MacWhinney, B (Eds.), Research in language acquisition: Proceedings of the 8th Congress of the International Association for the Study of Child Language. Somerville, MA: Cascadilla Press. Pinker, S. (1984). Language learnability and language development. Cambridge, MA: Harvard University Press. Pinker, S. (1987). The bootstrapping problem in language acquisition. In MacWhinney, B. (ed.), Mechanisms of language acquisition. Hillsdale, NJ: Erlbaum. 152 Pinker, S. (1989). Learnability and cognition: The acquisition of argument structure. Cambridge, MA: MIT Press. Redington, M., Chater, N. & Finch, S. (1998). Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science, 22, 425-469. Saffran, J. R., Aslin, R. N. & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926-1928. Schafer, A., Carter, J., Clifton, C. & Frazier, L. (1996). Focus in relative clause construal. Language and Cognitive Processes, 11, 135-163. Sereno, J. A. & Jongman, A. (1990). Phonological and form class relations in the lexicon. Journal of Psycholinguistic Research, 19, 387-404. Shi, R. & Moisan, A. (2008). Prosodic Cues to Noun and Verb Categories in Infant- Directed Speech. In Chan, H., Jacob, H. & Kapia, E. (Eds.), Proceedings of the 32nd Annual Boston University Conference on Language Development. Somerville, MA: Cascadilla Press. Shi, R. (1995). Perceptual correlates of content words and function words in early language input. Unpublished PhD. dissertation, Brown University. Shi, R., Morgan, J. L. & Allopenna, P. (1998). Phonological and acoustic bases for earliest grammatical category assignment: A cross-linguistic perspective. Journal of Child Language, 25, 169-201. Shi, R., Werker, J. F. & Morgan, J. L. (1999). Newborns infants’ sensitivity to perceptual cues to lexical and grammatical words. Cognition, 72, B11-B21. 153 Singh, L., Morgan, J. L. & White, K. S. (2004). Preference and processing: The role of speech affect in early spoken word recognition. Journal of Memory and Language, 51, 173-189. Singleton, J. L. & Newport, E. L. (2004). When learners surpass their models: The acquisition of American Sign Language from inconsistent input. Cognitive Psychology, 49, 370-407. Slobin, D. I. (1973). Cognitive prerequisites for the development of grammar. In Ferguson, C. A. & Slobin, D. I. (eds.), Studies in child language development. New York: Holt, Rinehart, Winston. Smith, K. H. (1969). Learning co-occurrence restrictions: Rule learning or rote learning. Journal of Verbal Learning and Verbal Behavior, 8, 319-321. Snedeker, J. & Trueswell, J. (2003). Using prosody to avoid ambiguity: Effects of speaker awareness and referential context. Journal of Memory and Language, 48, 103-130. Soderstrom, M., Seidl, A., Kemler Nelson, D. G., & Jusczyk, P.W. (2003). The prosodic bootstrapping of phrases: Evidence from prelinguistic infants. Journal of Memory and Language, 49, 249-267. Song, J. Y. (2005). The phonetic properties of vowels in child-directed read and spontaneous speech. Poster presented at the 30th Annual Boston University Conference on Language Development, Boston, MA. Sorenson, J. M., Cooper, W. E. & Paccia, J. M. (1978). Speech timing of grammatical categories. Cognition, 6, 135-153. 154 Suppes, P. (1974). The semantics of children’s language. American Psychologist, 29, 103– 114. Swingley, D. & Aslin, R. N. (2000). Spoken word recognition and lexical representation in very young children. Cognition, 76, 147-166. Theakston, A. L., Lieven, E. V. M., Pine, J M. & Rowland, C. F. (2004). Semantic generality, input frequency and the acquisition of syntax. Journal of Child Language, 31, 61-99. Thiessen, E.D., & Saffran, J.R. (2003). When cues collide: Use of statistical and stress cues to word boundaries by 7- and 9-month-old infants. Developmental Psychology, 39, 706-716. Tincoff, R. & Jusczyk, P.W. (1999). Some beginnings of word comprehension in six- month-olds. Psychological Science, 10, 172-175. Tomasello, M. (1992). First verbs: A case study of early grammatical development. Cambridge, UK: Cambridge University Press. Tomasello, M. (2000). Do young children have adult syntactic competence? Cognition, 74, 209-253. Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Cambridge, MA: Harvard University Press. Tomasello, M. & Akhtar, N. (1995). Two-year-olds use pragmatic cues to differentiate reference to objects and actions. Cognitive Development, 10, 201-224. Tomasello, M. & Olguin, R. (1993). Twenty-three-month-old children have a grammatical category of noun. Cognitive Development, 8, 451-464. 155 Watson, D. & Gibson, E. (2004). The relationship between intonational phrasing and syntactic structure in language production. Language and Cognitive Processes, 19, 713-755. Waxman, S. R. & Booth, A. E. (2001). Seeing pink elephants: Fourteen-month-olds’ interpretation of novel nouns and adjectives. Cognitive Psychology, 43, 217-242. Werker, J. F. & Tees, R. C. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development, 7, 49-63. White, K. S. & Morgan, J. L. (2008). Sub-segmental detail in early lexical representations. Journal of Memory and Language, 59, 114-132. APPENDIX A The following is a list of potentially ambiguous nouns and verbs examined in at least one of the six maternal speech corpora, as described in Chapter 2. Those words in bold print are used ambiguously by at least one mother. ache bench brave case act bend break cash acting bending breaking catch address bent breaks catches adventure bet bridge catching alarm birdie bridges cause angle bit bringing cement answer bite brush center answers bites brushing chalk arm black buckle champion attack blame bugs chance back blanket build changes backs blind building changing bag block bully charge baking bloom bum chase balance blow bump chasing balancing blowing burn check bang blows burst checking banging board bust cheer bark boot buy chewing barrel borders buying chicken bars boring call chin base bottles calling circle bases bounce calls clapping bat bouncing calm class bats bow camp clean bear bows can cleaning beat box care clearing beginning branch cart climb 156 157 climbing dance dust finishes close dances eating fire closing dancing echo fishes clouds darn edge fishing coat date end fit collar dawn ends fits color deal escape fix coloring deck exchange fixing colors delight excuse flag coming desert exercise flags contest designs exercises flash control dig exercising flies cook dining exhibit flower cooking disguise exhibits flush cool dishes express fly copy dive face flying cost diving faces foil cough dogs fake fold coughing doing fall following count dot falling fool counter double falls fork cover down fan frame covering drag fancy freeze covers draw fashion fuss crack drawing father gallop cracks dream favor game crash dreams feather garage crawl dress feed garden credit dressing feeding gas cross drill feeling gathering crossing drink felt gaze crowd drinking field getting crown drinks fight give crush drive fighting giving cry drives fights glow cup driving figure go curse drop file going cushion drove film grab cut drying find grant cuts duck finding grill cutting dump fine grin dam dumping finish grip 158 ground hug leads matches group hum leap matter growing hunt leaps mean grunt hunting learning means guess hurry leaves measure guide hurt left measures hail index let meet hammer invite letting meeting hand iron level mention handle jam license mess hanging joke lie mind happening journey lift mirror harbor judge lifting miss hatch jump lifts misses hatching jumping lights missing hate jumps line mistake haul keeping lines mix head keeps list mixing heading key lit molding heads kick lives moon hearing kicking living mop heat kicks load mother help kid loading motor helps kidding lock mount herd kids logging move hide kiss look moves hiding kisses looking moving hike knock looks mustard hiking knocking loose name hint knot love names hit know lug neck hold label lunch need holding land mail needs holds landing make nest hole lands make nip home lap makes nod honor last making nose hook laugh making note hop laughs mark notes hope launch mask notice house lays master number houses lead match numbers 159 offer plan race runs open plane radio saddle opening planning rain sail order plans rains sailing pack plant raise sample package play rake sauce pad playing reach saw page plays reading say paint pleasure reason saying painting plug record says paints point register scare pan points rent schedule panic polish repair school paper pool repeat scoop parade pop rescue scratch park position rest scream parking post ride screaming part pound rides screen party power riding screw pass powers ring scrub pat practice ringing scrubbing pattern present rings sea paw presents rinse seal pay press rise seals peck price roar search pedal pride roast searching pen print rock seat people prize rocking seed phone project rocks sell photograph promise roll selling pick prop rolls sending picking pull room sense picks pulls root service picture pump rose set pictures punch round sets piece punching row setting pile push rub shade pilot pushes ruin shadow pin puzzle rule shake pinch question rules shaking place questions run shame places quiet runaway shape 160 shapes smile stays telescope share snap step telling sharing sneak steps test shed snoring stick thinking sheds snow sticks thump shift snowballs still thunder shine snows sting tick ship sock stink tie shock soil stirring ties shoot sort stop time shop sound stops times shopping sounds store tip shore space storm tire shot speaking strap tires shoulder spell stream toe shout spelling stretch top shouting spice strike tops shouts spin stuff tossing show spit suck total shower splinter suit touch showing sponge suits toys shows sports sun trace shriek spot supplies track sigh spray surprise trade sight spread swallow trail sign spring sweep train signs sprinkle swim trains singing spy swing traps single square switch travel sink squeeze take treat sinking stains taking treats size stamp talk tries skiing stampede talking trim sleep stand talks trip sleeping standing tan trot slice stands tap trouble slip start taste truck slips starting tastes trumpet slug starts teaching trust smash state tear try smell states tease tune smells stay telephone tunnel 161 turn wink turning wins turns wish twist woman twisting wonder type wondering upset work uses works vacation worry view wound visit wrapping wait wreck waiting writing wake yawn walk yell walks yelling wall yellow want wants war warning wash washing waste watch watches watching water wave waving wax wear welcome well while whisper whispering whistle will win wind wing APPENDIX B The following is a list of potentially ambiguous verbs and adjectives examined in at least one of the six maternal speech corpora, as described in Chapter 2. Those words in bold print are used ambiguously by at least one mother. abstract drawing like sound appropriate drop live sour awake dry long spare back even looking sparkling bake exciting loose square bare express lower still base fake marked stirring beat fancy master suggested better fell matching sure black fine mean surprise blind fit moving swollen born flush near tan brave fool open telling broke free own thin bum frustrating pointed tired busy further present top calm gentle pushing tops chance graduate quiet total clean head ready touching clear hurt rid trace close insulting rose trim collar interested rough trying complete iron round using cool key runaway warm crack knowing scrub welcome crash lay selling well cross lean separate wet dam learned shocking wonder double left single yellow down level slow 162 APPENDIX C The following is a list of potentially ambiguous nouns and adjectives examined in at least one of the six maternal speech corpora, as described in Chapter 2. Those words in bold print are used ambiguously by at least one mother. abstract concrete flat looking adult cool flush loose ancient cooler fool low attic crack fool mad back crash frank magic bad cream front main base cross full male bass dam future master beat dark general material black darling giant mean blank dead gold medium blind dear good middle blonde deep graduate miniature blue double gray minute brave down green mobile brown drawing grey moving bum drop gross national calm due head net capital electric hip north chance evil holy nuts cherry express hurt olive chief fair inside open chill fake iron opposite choice fancy key original clean fast kind outside cleaner favorite large oval close female left overhead cold fine level particular collar firm lighter past common fit liquid patient 163 164 pedestrian round stiff trim pink runaway still true plain safe stirring turquoise plastic scrub stranger uniform pleasant secret submarine upstairs positive selling sudden welcome present side super well private silver surprise white public single tan whole purple ski telling wonder quick sound thick worth quiet special tool wrong red square top yellow right stable tops young romantic steep total rose stereo trace