DCPP Notes
DCPP Notes
DCPP Notes
2. Advanced
a. Advanced Linguistically motivated, more complex
implementations
b. Phrase/name identification
i. Goal is to use phrases as text units
ii. Statistical approach : find all pairs of adjacent words
(‘bigrams’). Explosion of elements – makes this non-
feasible. Also it adds a lot of nonsense pharases
iii. NLP Approach
1. Run of words
2. Sentence parsing
3. Statistical models
c. Sentence Segmentation: Split sentences at specific
punctuation marks like period, question mark, exclamation?
d. Code-mixing and code-switching :
i. Code-switching: shifting from one linguistic code (a
language or or dialect) to another, depending on the
social context or conversational setting
ii. Code-mixing: placing or mixing of various linguistic
units from two different grammatical systems within
the same sentence and speech context
iii. Code-switching is inter-sentential, while code-
mixing, is intra-sentential
iv. Language in Online Social Media ~ extensive use of
transliteration
v. Rise of Hinglish, Singlish,
e. Word sense disambiguation
f. Lexical acquisition
g. Parts of speech Synonym expansion