Abstract
A procedure that processes a corpus of text and produces numeric vectors containing information about its meanings for each word is presented. This procedure is applied to a large corpus of natural language text taken from Usenet, and the resulting vectors are examined to determine what information is contained within them. These vectors provide the coordinates in a high-dimensional space in which word relationships can be analyzed. Analyses of both vector similarity and multidimensional scaling demonstrate that there is significant semantic information carried in the vectors. A comparison of vector similarity with human reaction times in a single-word priming experiment is presented. These vectors provide the basis for a representational model of semantic memory, hyperspace analogue to language (HAL).
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Armstrong, S. (Ed.) (1994).Using large corpora. Cambridge, MA: MIT Press.
Burgess, C., &Cottrell, G. (1995). Using high-dimensional semantic spaces derived from large text corpora. InProceedings of the Seventeenth Annual Conference of the Cognitive Science Society (pp. 13–14). Hillsdale, NJ: Erlbaum.
Burgess, C., &Lund, K. (1994). Multiple constraints in syntactic ambiguity resolution: A connectionist account of psycholinguistic data. In A. Ram & K. Eiselt (Eds.),Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society (pp. 90–95). Hillsdale, NJ: Erlbaum.
Burgess, C., &Lund, K. (1995a).High-dimensional semantics from corpora and human syntactic processing constraints. Paper presented at the 8th Annual CUNY Sentence Processing Conference, Tucson, AZ.
Burgess, C., &Lund, K. (1995b, November).Hyperspace analogue to language (HAL): A general model of semantic representation. Paper presented at the annual meeting of the Psychonomic Society, Los Angeles.
Burgess, C., &Lund, K. (in press). Modeling cerebral asymmetries of semantic memory using high-dimensional semantic space. In M. Beeman & C. Chiarello (Eds.),Getting it right: The cognitive neuroscience of right hemisphere language comprehension. Hillsdale, NJ: Erlbaum.
Chiarello, C., Burgess, C., Richards, L., &Pollock, A. (1990). Semantic and associative priming in the cerebral hemispheres: Some words do, some words don’t … sometimes, some places.Brain & Language,38, 75–104.
Ervin-Tripp, S. M. (1970). Substitution, context, and association. In L. Postman & G. Keppel (Eds.),Norms of word association (pp. 383–467). New York: Academic Press.
Fischler, I. (1977). Semantic facilitation without association in a lexical decision task.Memory & Cognition,5, 335–339.
Landauer, T. K., &Dumais, S. (1994, November).Memory model reads encyclopedia, passes vocabulary test. Paper presented at the annual meeting of the Psychonomic Society, St. Louis.
Lund, K., &Burgess, C. (in press). A general model of semantic representation (abstract).Brain & Cognition.
Lund, K., Burgess, C., &Atchley, R. A. (1995). Semantic and associative priming in high-dimensional semantic space. InProceedings of the Seventeenth Annual Conference of the Cognitive Science Society (pp. 660–665). Hillsdale, NJ: Erlbaum.
McRae, K.,de Sa, V., &Seidenberg, M. S. (1993).The role of correlated properties in accessing conceptual memory. Unpublished manuscript.
Neely, J. H. (1977). Semantic priming and retrieval from lexical memory: Roles of inhibitionless spreading activation and limited-capacity attention.Journal of Experimental Psychology: General,106, 226–254.
Osgood, C. E., Suci, G. J., &Tannenbaum, P. H. (1957).The measurement of meaning. Urbana: University of Illinois Press.
Schütze, H. (1992).Dimensions of meaning. Unpublished manuscript.
Schvaneveldt, R. W. (1990).Pathfinder associative networks: Studies in knowledge organization. Norwood, NJ: Ablex.
Shepard, R. N. (1980). Multidimensional scaling, tree-fitting, and clustering.Science,210, 390–398.
Shepard, R. N., Romney, A. K., &Nerlove, S. B. (Eds.) (1972).Multidimension scaling: Theory and applications in the behavioral sciences. New York and London: Seminar Press.
Spence, D. P.&Owens, K. C. (1990). Lexical co-occurrence and association strength.Journal of Psycholinguistic Research,19, 317–330.
Zernik, U. (Ed.) (1991).Lexical acquisition: Exploiting on-line resources to build a lexicon. Hillsdale, NJ: Erlbaum.
Author information
Authors and Affiliations
Corresponding authors
Additional information
This research was supported by an NSF Presidential Faculty Fellow award (SBR-9453406) to C.B.
Rights and permissions
About this article
Cite this article
Lund, K., Burgess, C. Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, & Computers 28, 203–208 (1996). https://doi.org/10.3758/BF03204766
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BF03204766