Language and Linguistic Diversity in the U.S.: An Introduction, Dec 1, 2014
This highly engaging textbook presents a linguistic view of the history, society, and culture of ... more This highly engaging textbook presents a linguistic view of the history, society, and culture of the United States. It discusses the many languages and forms of language that have been used in the US – including standard and nonstandard forms of English, creoles, Native American languages, and immigrant languages from across the globe – and shows how this distribution and diversity of languages has helped shape and define America as well as an American identity. The volume introduces the basic concepts of sociolinguistics and the politics of language through cohesive, up-to-date and accessible coverage of such key topics as dialectal development and the role of English as the majority language, controversies concerning language use in society, languages other than English used in the US, and the policies that have directly or indirectly influenced language use.
These topics are presented in such a way that students can examine the inherent diversity of the communicative systems used in the United States as both a form of cultural enrichment and as the basis for socio-political conflict. The author team outlines the different viewpoints on contemporary issues surrounding language in the US and contextualizes these issues within linguistic facts, to help students think critically and formulate logical discussions. To provide opportunities for further examination and debate, chapters are organized around key misconceptions or questions ("I don't have an accent" or "Immigrants don't want to learn English"), bringing them to the forefront for readers to address directly.
Language and Linguistic Diversity in the US is a fresh and unique take on a widely taught topic. It is ideal for students from a variety of disciplines or with no prior knowledge of the field, and a useful text for introductory courses on language in the USA, American English, language variation, language ideology, and sociolinguistics.
This presentation investigates the social and linguistic distribution of syntactic doubling pheno... more This presentation investigates the social and linguistic distribution of syntactic doubling phenomenon (such as personal datives, double modals, and double complementizers) in the Linguistic Atlas of the Middle Rockies. It finds that sex and education level play a role in the types of doubling features that are used by informants and the frequency with which they use them.
Ain'thology: The Life and History of a Taboo Word, 2015
This study investigates the distribution of ain’t in the Linguistic Atlas of the Middle Rockies, ... more This study investigates the distribution of ain’t in the Linguistic Atlas of the Middle Rockies, a collection of interviews conducted in Colorado, Utah, and Wyoming toward a Linguistic Atlas of the Western States. Both the linguistic and social distributions of this English shibboleth show that ain’t has a limited distribution in the dataset: with respect to its linguistic distribution, the term is used by informants in just under one-third of the interviews and often co-occurs with other nonstandard variants in sentences and small chunks of discourse; in terms of its social distribution, ain’t is significantly correlated with the educational level of informants and is used more by males than females in the collection. Additionally, ain’t is frequently used in idiomatic expressions in the corpus, hence its productivity is limited.
This study examines comparative constructions that appear in the Linguistic Atlas of the Middle R... more This study examines comparative constructions that appear in the Linguistic Atlas of the Middle Rockies, a linguistic survey conducted with native informants in Colorado, Utah, and Wyoming from 1988 to 2004. In particular, it analyzes responses to three prompts targeting expressions used to describe extremes in temperature and aridity in the region. As part of this investigation, linguistic patterns in the range of responses to each prompt are identified, and several social characteristics of the informants, including sex and religion of informant, are tested for correlations with individual linguistic items as well as groups of categories extracted via the linguistic analysis. While the word hell is pervasive in these responses (e.g., the weather is hotter than hell), and thus has not become obsolete like some earlier scholars predicted, variation in its distribution among social groups and metadata suggest that the word is still considered taboo by some speakers.
This study uses the tools of corpus linguistics to investigate ascending kinship terminology in t... more This study uses the tools of corpus linguistics to investigate ascending kinship terminology in the Linguistic Atlas of the Middle Rockies, a collection of interviews gathered in Colorado, Utah, and Wyoming as part of a dialectological survey of the American West. Relying in part on the framework of Dahl and Koptjevskaja-Tamm (2001), particularly with respect to their notion of a parental kin prototype, the study examines lexical and grammatical variation in the use of terms for parents and grandparents in different interviewing contexts in an effort to identify patterns in these distributions. The study finds important quantitative differences in the distribution of mother and father, as well as differences in the grammatical behavior of these and other kinship variants. While these results provide some support for a parental kin prototype, they also suggest the benefits that survey data collected within a variationist framework offer such a prototype, both with respect to the counterexamples to broad generalizations that such datasets inevitably include as well as the variable patterns that often emerge from such data that might go unobserved using formal methods.
"Objective
Both healthcare professionals and healthcare consumers have information needs that ca... more "Objective
Both healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers.
Design
We obtained two sets of consumer questions (~10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer-PointCare dataset on the Consumer-OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features.
Results
10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer-PointCare and Consumer-OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer-PointCare model on the Consumer-OnlinePractice dataset.
Conclusion
Healthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering."
OBJECTIVE:
Clinical questions are often long and complex and take many forms. We have built a cl... more OBJECTIVE:
Clinical questions are often long and complex and take many forms. We have built a clinical question answering system named AskHERMES to perform robust semantic analysis on complex clinical questions and output question-focused extractive summaries as answers.
DESIGN:
This paper describes the system architecture and a preliminary evaluation of AskHERMES, which implements innovative approaches in question analysis, summarization, and answer presentation. Five types of resources were indexed in this system: MEDLINE abstracts, PubMed Central full-text articles, eMedicine documents, clinical guidelines and Wikipedia articles.
MEASUREMENT:
We compared the AskHERMES system with Google (Google and Google Scholar) and UpToDate and asked physicians to score the three systems by ease of use, quality of answer, time spent, and overall performance.
RESULTS:
AskHERMES allows physicians to enter a question in a natural way with minimal query formulation and allows physicians to efficiently navigate among all the answer sentences to quickly meet their information needs. In contrast, physicians need to formulate queries to search for information in Google and UpToDate. The development of the AskHERMES system is still at an early stage, and the knowledge resource is limited compared with Google or UpToDate. Nevertheless, the evaluation results show that AskHERMES' performance is comparable to the other systems. In particular, when answering complex clinical questions, it demonstrates the potential to outperform both Google and UpToDate systems.
CONCLUSIONS:
AskHERMES, available at http://www.AskHERMES.org, has the potential to help physicians practice evidence-based medicine and improve the quality of patient care.
Journal of The American Medical Informatics Association, 2010
Objective: We present Lancet, a supervised machine-learning system that automatically extracts me... more Objective: We present Lancet, a supervised machine-learning system that automatically extracts medication events consisting of medication names and information pertaining to their prescribed
use (dosage, mode, frequency, duration and reason) from lists or narrative text in medical discharge summaries.
Design: The Lancet system incorporates three supervised machine-learning models: a conditional random fields (CRF) model for tagging individual medication names and associated fields, an AdaBoost model with decision stump algorithm for determining which medication names and fields belong to a single medication event, and a support vector machines (SVM) disambiguation model for identifying the context style (narrative or list).
Measurements: We participated in the third i2b2 shared-task for challenges in natural language processing for clinical data: medication extraction challenge. With the performance metrics
provided by the i2b2 Challenge, we report the micro F1 (precision/recall) scores on both the horizontal and vertical level.
Results: Among the top ten teams, the Lancet system achieved the highest precision at 90.4% with an overall F1 score of 76.4% (horizontal system level with exact match), a gain of 11.2% and 12%, respectively, compared to the rule-based baseline system jMerki. By combining the two systems, the hybrid system further increased the F1 score by 3.4% from 76.4% to 79.0%.
Conclusions: We conclude that supervised machine-learning systems with minimal external knowledge resources can achieve a high precision with a competitive overall F1 score. The light weight learning framework makes the system more scalable and adaptable on other tasks even in other domains. The system is available online at http://code.google.com/p/lancet/.
Question answering is different from information
retrieval in that it attempts to answer
questi... more Question answering is different from information
retrieval in that it attempts to answer
questions by providing summaries
from numerous retrieved documents rather
than by simply providing a list of documents
that requires users to do additional
work. However, the quality of answers that
question answering provides has not been
investigated extensively, and the practical
approach to presenting question answers
still needs more study. In addition to factoid
answering using phrases or entities,
most question answering systems use a sentence-
based approach for generating answers.
However, many sentences are often
only meaningful or understandable in their
context, and a passage-based presentation
can often provide richer, more coherent
context. However, passage-based presentations
may introduce additional noise that
places greater burden on users. In this
study, we performed a quantitative evaluation
on the two kinds of presentation produced
by our online clinical question
answering system, AskHERMES
(http://www.AskHERMES.org). The overall
finding is that, although irrelevant context
can hurt the quality of an answer, the
passage-based approach is generally more
effective in that it provides richer context
and matching across sentences.
Discussions of the louisiana purchase tend to focus on the delta region now bearing the state nam... more Discussions of the louisiana purchase tend to focus on the delta region now bearing the state name of Louisiana or to emphasize its size by mentioning the states that were carved from the acquisition and now form an integral part of America's heartland, such as Kansas, Missouri, and Nebraska. Often ignored is the northwest part of the purchase, which comprised large parts of present-day Colorado, Wyoming, and Montana. That all three of these states are strongly identified with their mountains also obscures the fact that ...
This article examines a-prefixing as it occurs in Mark Twain's The Adventures of Huckleberry Finn... more This article examines a-prefixing as it occurs in Mark Twain's The Adventures of Huckleberry Finn and compares it to the a-prefixing data collected by dialectologists and sociolinguists, particularly those presented by Wolfram in his work on nonstandard dialects of American English. Although the two sets of data prove to be similar in most respects, there is a difference in how a-prefixes pattern in coordinate constructions. In this article, I illustrate how this difference could occur as the result of the process of grammaticalization.
Language and Linguistic Diversity in the U.S.: An Introduction, Dec 1, 2014
This highly engaging textbook presents a linguistic view of the history, society, and culture of ... more This highly engaging textbook presents a linguistic view of the history, society, and culture of the United States. It discusses the many languages and forms of language that have been used in the US – including standard and nonstandard forms of English, creoles, Native American languages, and immigrant languages from across the globe – and shows how this distribution and diversity of languages has helped shape and define America as well as an American identity. The volume introduces the basic concepts of sociolinguistics and the politics of language through cohesive, up-to-date and accessible coverage of such key topics as dialectal development and the role of English as the majority language, controversies concerning language use in society, languages other than English used in the US, and the policies that have directly or indirectly influenced language use.
These topics are presented in such a way that students can examine the inherent diversity of the communicative systems used in the United States as both a form of cultural enrichment and as the basis for socio-political conflict. The author team outlines the different viewpoints on contemporary issues surrounding language in the US and contextualizes these issues within linguistic facts, to help students think critically and formulate logical discussions. To provide opportunities for further examination and debate, chapters are organized around key misconceptions or questions ("I don't have an accent" or "Immigrants don't want to learn English"), bringing them to the forefront for readers to address directly.
Language and Linguistic Diversity in the US is a fresh and unique take on a widely taught topic. It is ideal for students from a variety of disciplines or with no prior knowledge of the field, and a useful text for introductory courses on language in the USA, American English, language variation, language ideology, and sociolinguistics.
This presentation investigates the social and linguistic distribution of syntactic doubling pheno... more This presentation investigates the social and linguistic distribution of syntactic doubling phenomenon (such as personal datives, double modals, and double complementizers) in the Linguistic Atlas of the Middle Rockies. It finds that sex and education level play a role in the types of doubling features that are used by informants and the frequency with which they use them.
Ain'thology: The Life and History of a Taboo Word, 2015
This study investigates the distribution of ain’t in the Linguistic Atlas of the Middle Rockies, ... more This study investigates the distribution of ain’t in the Linguistic Atlas of the Middle Rockies, a collection of interviews conducted in Colorado, Utah, and Wyoming toward a Linguistic Atlas of the Western States. Both the linguistic and social distributions of this English shibboleth show that ain’t has a limited distribution in the dataset: with respect to its linguistic distribution, the term is used by informants in just under one-third of the interviews and often co-occurs with other nonstandard variants in sentences and small chunks of discourse; in terms of its social distribution, ain’t is significantly correlated with the educational level of informants and is used more by males than females in the collection. Additionally, ain’t is frequently used in idiomatic expressions in the corpus, hence its productivity is limited.
This study examines comparative constructions that appear in the Linguistic Atlas of the Middle R... more This study examines comparative constructions that appear in the Linguistic Atlas of the Middle Rockies, a linguistic survey conducted with native informants in Colorado, Utah, and Wyoming from 1988 to 2004. In particular, it analyzes responses to three prompts targeting expressions used to describe extremes in temperature and aridity in the region. As part of this investigation, linguistic patterns in the range of responses to each prompt are identified, and several social characteristics of the informants, including sex and religion of informant, are tested for correlations with individual linguistic items as well as groups of categories extracted via the linguistic analysis. While the word hell is pervasive in these responses (e.g., the weather is hotter than hell), and thus has not become obsolete like some earlier scholars predicted, variation in its distribution among social groups and metadata suggest that the word is still considered taboo by some speakers.
This study uses the tools of corpus linguistics to investigate ascending kinship terminology in t... more This study uses the tools of corpus linguistics to investigate ascending kinship terminology in the Linguistic Atlas of the Middle Rockies, a collection of interviews gathered in Colorado, Utah, and Wyoming as part of a dialectological survey of the American West. Relying in part on the framework of Dahl and Koptjevskaja-Tamm (2001), particularly with respect to their notion of a parental kin prototype, the study examines lexical and grammatical variation in the use of terms for parents and grandparents in different interviewing contexts in an effort to identify patterns in these distributions. The study finds important quantitative differences in the distribution of mother and father, as well as differences in the grammatical behavior of these and other kinship variants. While these results provide some support for a parental kin prototype, they also suggest the benefits that survey data collected within a variationist framework offer such a prototype, both with respect to the counterexamples to broad generalizations that such datasets inevitably include as well as the variable patterns that often emerge from such data that might go unobserved using formal methods.
"Objective
Both healthcare professionals and healthcare consumers have information needs that ca... more "Objective
Both healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers.
Design
We obtained two sets of consumer questions (~10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer-PointCare dataset on the Consumer-OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features.
Results
10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer-PointCare and Consumer-OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer-PointCare model on the Consumer-OnlinePractice dataset.
Conclusion
Healthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering."
OBJECTIVE:
Clinical questions are often long and complex and take many forms. We have built a cl... more OBJECTIVE:
Clinical questions are often long and complex and take many forms. We have built a clinical question answering system named AskHERMES to perform robust semantic analysis on complex clinical questions and output question-focused extractive summaries as answers.
DESIGN:
This paper describes the system architecture and a preliminary evaluation of AskHERMES, which implements innovative approaches in question analysis, summarization, and answer presentation. Five types of resources were indexed in this system: MEDLINE abstracts, PubMed Central full-text articles, eMedicine documents, clinical guidelines and Wikipedia articles.
MEASUREMENT:
We compared the AskHERMES system with Google (Google and Google Scholar) and UpToDate and asked physicians to score the three systems by ease of use, quality of answer, time spent, and overall performance.
RESULTS:
AskHERMES allows physicians to enter a question in a natural way with minimal query formulation and allows physicians to efficiently navigate among all the answer sentences to quickly meet their information needs. In contrast, physicians need to formulate queries to search for information in Google and UpToDate. The development of the AskHERMES system is still at an early stage, and the knowledge resource is limited compared with Google or UpToDate. Nevertheless, the evaluation results show that AskHERMES' performance is comparable to the other systems. In particular, when answering complex clinical questions, it demonstrates the potential to outperform both Google and UpToDate systems.
CONCLUSIONS:
AskHERMES, available at http://www.AskHERMES.org, has the potential to help physicians practice evidence-based medicine and improve the quality of patient care.
Journal of The American Medical Informatics Association, 2010
Objective: We present Lancet, a supervised machine-learning system that automatically extracts me... more Objective: We present Lancet, a supervised machine-learning system that automatically extracts medication events consisting of medication names and information pertaining to their prescribed
use (dosage, mode, frequency, duration and reason) from lists or narrative text in medical discharge summaries.
Design: The Lancet system incorporates three supervised machine-learning models: a conditional random fields (CRF) model for tagging individual medication names and associated fields, an AdaBoost model with decision stump algorithm for determining which medication names and fields belong to a single medication event, and a support vector machines (SVM) disambiguation model for identifying the context style (narrative or list).
Measurements: We participated in the third i2b2 shared-task for challenges in natural language processing for clinical data: medication extraction challenge. With the performance metrics
provided by the i2b2 Challenge, we report the micro F1 (precision/recall) scores on both the horizontal and vertical level.
Results: Among the top ten teams, the Lancet system achieved the highest precision at 90.4% with an overall F1 score of 76.4% (horizontal system level with exact match), a gain of 11.2% and 12%, respectively, compared to the rule-based baseline system jMerki. By combining the two systems, the hybrid system further increased the F1 score by 3.4% from 76.4% to 79.0%.
Conclusions: We conclude that supervised machine-learning systems with minimal external knowledge resources can achieve a high precision with a competitive overall F1 score. The light weight learning framework makes the system more scalable and adaptable on other tasks even in other domains. The system is available online at http://code.google.com/p/lancet/.
Question answering is different from information
retrieval in that it attempts to answer
questi... more Question answering is different from information
retrieval in that it attempts to answer
questions by providing summaries
from numerous retrieved documents rather
than by simply providing a list of documents
that requires users to do additional
work. However, the quality of answers that
question answering provides has not been
investigated extensively, and the practical
approach to presenting question answers
still needs more study. In addition to factoid
answering using phrases or entities,
most question answering systems use a sentence-
based approach for generating answers.
However, many sentences are often
only meaningful or understandable in their
context, and a passage-based presentation
can often provide richer, more coherent
context. However, passage-based presentations
may introduce additional noise that
places greater burden on users. In this
study, we performed a quantitative evaluation
on the two kinds of presentation produced
by our online clinical question
answering system, AskHERMES
(http://www.AskHERMES.org). The overall
finding is that, although irrelevant context
can hurt the quality of an answer, the
passage-based approach is generally more
effective in that it provides richer context
and matching across sentences.
Discussions of the louisiana purchase tend to focus on the delta region now bearing the state nam... more Discussions of the louisiana purchase tend to focus on the delta region now bearing the state name of Louisiana or to emphasize its size by mentioning the states that were carved from the acquisition and now form an integral part of America's heartland, such as Kansas, Missouri, and Nebraska. Often ignored is the northwest part of the purchase, which comprised large parts of present-day Colorado, Wyoming, and Montana. That all three of these states are strongly identified with their mountains also obscures the fact that ...
This article examines a-prefixing as it occurs in Mark Twain's The Adventures of Huckleberry Finn... more This article examines a-prefixing as it occurs in Mark Twain's The Adventures of Huckleberry Finn and compares it to the a-prefixing data collected by dialectologists and sociolinguists, particularly those presented by Wolfram in his work on nonstandard dialects of American English. Although the two sets of data prove to be similar in most respects, there is a difference in how a-prefixes pattern in coordinate constructions. In this article, I illustrate how this difference could occur as the result of the process of grammaticalization.
Uploads
Book by Lamont Antieau
These topics are presented in such a way that students can examine the inherent diversity of the communicative systems used in the United States as both a form of cultural enrichment and as the basis for socio-political conflict. The author team outlines the different viewpoints on contemporary issues surrounding language in the US and contextualizes these issues within linguistic facts, to help students think critically and formulate logical discussions. To provide opportunities for further examination and debate, chapters are organized around key misconceptions or questions ("I don't have an accent" or "Immigrants don't want to learn English"), bringing them to the forefront for readers to address directly.
Language and Linguistic Diversity in the US is a fresh and unique take on a widely taught topic. It is ideal for students from a variety of disciplines or with no prior knowledge of the field, and a useful text for introductory courses on language in the USA, American English, language variation, language ideology, and sociolinguistics.
Papers by Lamont Antieau
Both healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers.
Design
We obtained two sets of consumer questions (~10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer-PointCare dataset on the Consumer-OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features.
Results
10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer-PointCare and Consumer-OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer-PointCare model on the Consumer-OnlinePractice dataset.
Conclusion
Healthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering."
Clinical questions are often long and complex and take many forms. We have built a clinical question answering system named AskHERMES to perform robust semantic analysis on complex clinical questions and output question-focused extractive summaries as answers.
DESIGN:
This paper describes the system architecture and a preliminary evaluation of AskHERMES, which implements innovative approaches in question analysis, summarization, and answer presentation. Five types of resources were indexed in this system: MEDLINE abstracts, PubMed Central full-text articles, eMedicine documents, clinical guidelines and Wikipedia articles.
MEASUREMENT:
We compared the AskHERMES system with Google (Google and Google Scholar) and UpToDate and asked physicians to score the three systems by ease of use, quality of answer, time spent, and overall performance.
RESULTS:
AskHERMES allows physicians to enter a question in a natural way with minimal query formulation and allows physicians to efficiently navigate among all the answer sentences to quickly meet their information needs. In contrast, physicians need to formulate queries to search for information in Google and UpToDate. The development of the AskHERMES system is still at an early stage, and the knowledge resource is limited compared with Google or UpToDate. Nevertheless, the evaluation results show that AskHERMES' performance is comparable to the other systems. In particular, when answering complex clinical questions, it demonstrates the potential to outperform both Google and UpToDate systems.
CONCLUSIONS:
AskHERMES, available at http://www.AskHERMES.org, has the potential to help physicians practice evidence-based medicine and improve the quality of patient care.
use (dosage, mode, frequency, duration and reason) from lists or narrative text in medical discharge summaries.
Design: The Lancet system incorporates three supervised machine-learning models: a conditional random fields (CRF) model for tagging individual medication names and associated fields, an AdaBoost model with decision stump algorithm for determining which medication names and fields belong to a single medication event, and a support vector machines (SVM) disambiguation model for identifying the context style (narrative or list).
Measurements: We participated in the third i2b2 shared-task for challenges in natural language processing for clinical data: medication extraction challenge. With the performance metrics
provided by the i2b2 Challenge, we report the micro F1 (precision/recall) scores on both the horizontal and vertical level.
Results: Among the top ten teams, the Lancet system achieved the highest precision at 90.4% with an overall F1 score of 76.4% (horizontal system level with exact match), a gain of 11.2% and 12%, respectively, compared to the rule-based baseline system jMerki. By combining the two systems, the hybrid system further increased the F1 score by 3.4% from 76.4% to 79.0%.
Conclusions: We conclude that supervised machine-learning systems with minimal external knowledge resources can achieve a high precision with a competitive overall F1 score. The light weight learning framework makes the system more scalable and adaptable on other tasks even in other domains. The system is available online at http://code.google.com/p/lancet/.
retrieval in that it attempts to answer
questions by providing summaries
from numerous retrieved documents rather
than by simply providing a list of documents
that requires users to do additional
work. However, the quality of answers that
question answering provides has not been
investigated extensively, and the practical
approach to presenting question answers
still needs more study. In addition to factoid
answering using phrases or entities,
most question answering systems use a sentence-
based approach for generating answers.
However, many sentences are often
only meaningful or understandable in their
context, and a passage-based presentation
can often provide richer, more coherent
context. However, passage-based presentations
may introduce additional noise that
places greater burden on users. In this
study, we performed a quantitative evaluation
on the two kinds of presentation produced
by our online clinical question
answering system, AskHERMES
(http://www.AskHERMES.org). The overall
finding is that, although irrelevant context
can hurt the quality of an answer, the
passage-based approach is generally more
effective in that it provides richer context
and matching across sentences.
Book Reviews by Lamont Antieau
These topics are presented in such a way that students can examine the inherent diversity of the communicative systems used in the United States as both a form of cultural enrichment and as the basis for socio-political conflict. The author team outlines the different viewpoints on contemporary issues surrounding language in the US and contextualizes these issues within linguistic facts, to help students think critically and formulate logical discussions. To provide opportunities for further examination and debate, chapters are organized around key misconceptions or questions ("I don't have an accent" or "Immigrants don't want to learn English"), bringing them to the forefront for readers to address directly.
Language and Linguistic Diversity in the US is a fresh and unique take on a widely taught topic. It is ideal for students from a variety of disciplines or with no prior knowledge of the field, and a useful text for introductory courses on language in the USA, American English, language variation, language ideology, and sociolinguistics.
Both healthcare professionals and healthcare consumers have information needs that can be met through the use of computers, specifically via medical question answering systems. However, the information needs of both groups are different in terms of literacy levels and technical expertise, and an effective question answering system must be able to account for these differences if it is to formulate the most relevant responses for users from each group. In this paper, we propose that a first step toward answering the queries of different users is automatically classifying questions according to whether they were asked by healthcare professionals or consumers.
Design
We obtained two sets of consumer questions (~10,000 questions in total) from Yahoo answers. The professional questions consist of two question collections: 4654 point-of-care questions (denoted as PointCare) obtained from interviews of a group of family doctors following patient visits and 5378 questions from physician practices through professional online services (denoted as OnlinePractice). With more than 20,000 questions combined, we developed supervised machine-learning models for automatic classification between consumer questions and professional questions. To evaluate the robustness of our models, we tested the model that was trained on the Consumer-PointCare dataset on the Consumer-OnlinePractice dataset. We evaluated both linguistic features and statistical features and examined how the characteristics in two different types of professional questions (PointCare vs. OnlinePractice) may affect the classification performance. We explored information gain for feature reduction and the back-off linguistic category features.
Results
10-fold cross-validation results showed the best F1-measure of 0.936 and 0.946 on Consumer-PointCare and Consumer-OnlinePractice respectively, and the best F1-measure of 0.891 when testing the Consumer-PointCare model on the Consumer-OnlinePractice dataset.
Conclusion
Healthcare consumer questions posted at Yahoo online communities can be reliably classified from professional questions posted by point-of-care clinicians and online physicians. The supervised machine-learning models are robust for this task. Our study will significantly benefit further development in automated consumer question answering."
Clinical questions are often long and complex and take many forms. We have built a clinical question answering system named AskHERMES to perform robust semantic analysis on complex clinical questions and output question-focused extractive summaries as answers.
DESIGN:
This paper describes the system architecture and a preliminary evaluation of AskHERMES, which implements innovative approaches in question analysis, summarization, and answer presentation. Five types of resources were indexed in this system: MEDLINE abstracts, PubMed Central full-text articles, eMedicine documents, clinical guidelines and Wikipedia articles.
MEASUREMENT:
We compared the AskHERMES system with Google (Google and Google Scholar) and UpToDate and asked physicians to score the three systems by ease of use, quality of answer, time spent, and overall performance.
RESULTS:
AskHERMES allows physicians to enter a question in a natural way with minimal query formulation and allows physicians to efficiently navigate among all the answer sentences to quickly meet their information needs. In contrast, physicians need to formulate queries to search for information in Google and UpToDate. The development of the AskHERMES system is still at an early stage, and the knowledge resource is limited compared with Google or UpToDate. Nevertheless, the evaluation results show that AskHERMES' performance is comparable to the other systems. In particular, when answering complex clinical questions, it demonstrates the potential to outperform both Google and UpToDate systems.
CONCLUSIONS:
AskHERMES, available at http://www.AskHERMES.org, has the potential to help physicians practice evidence-based medicine and improve the quality of patient care.
use (dosage, mode, frequency, duration and reason) from lists or narrative text in medical discharge summaries.
Design: The Lancet system incorporates three supervised machine-learning models: a conditional random fields (CRF) model for tagging individual medication names and associated fields, an AdaBoost model with decision stump algorithm for determining which medication names and fields belong to a single medication event, and a support vector machines (SVM) disambiguation model for identifying the context style (narrative or list).
Measurements: We participated in the third i2b2 shared-task for challenges in natural language processing for clinical data: medication extraction challenge. With the performance metrics
provided by the i2b2 Challenge, we report the micro F1 (precision/recall) scores on both the horizontal and vertical level.
Results: Among the top ten teams, the Lancet system achieved the highest precision at 90.4% with an overall F1 score of 76.4% (horizontal system level with exact match), a gain of 11.2% and 12%, respectively, compared to the rule-based baseline system jMerki. By combining the two systems, the hybrid system further increased the F1 score by 3.4% from 76.4% to 79.0%.
Conclusions: We conclude that supervised machine-learning systems with minimal external knowledge resources can achieve a high precision with a competitive overall F1 score. The light weight learning framework makes the system more scalable and adaptable on other tasks even in other domains. The system is available online at http://code.google.com/p/lancet/.
retrieval in that it attempts to answer
questions by providing summaries
from numerous retrieved documents rather
than by simply providing a list of documents
that requires users to do additional
work. However, the quality of answers that
question answering provides has not been
investigated extensively, and the practical
approach to presenting question answers
still needs more study. In addition to factoid
answering using phrases or entities,
most question answering systems use a sentence-
based approach for generating answers.
However, many sentences are often
only meaningful or understandable in their
context, and a passage-based presentation
can often provide richer, more coherent
context. However, passage-based presentations
may introduce additional noise that
places greater burden on users. In this
study, we performed a quantitative evaluation
on the two kinds of presentation produced
by our online clinical question
answering system, AskHERMES
(http://www.AskHERMES.org). The overall
finding is that, although irrelevant context
can hurt the quality of an answer, the
passage-based approach is generally more
effective in that it provides richer context
and matching across sentences.