My general research interests lie in the extraction of information from text. I have worked on question answering, named entity recognition, and summarisation.
Evidence-based medicine practice requires medical practitioners to rely on the best available evi... more Evidence-based medicine practice requires medical practitioners to rely on the best available evidence, in addition to their expertise, when making clinical decisions. The medical domain boasts a large amount of published medical research data, indexed in various medical databases such as MEDLINE. As the size of this data grows, practitioners increasingly face the problem of information overload, and past research has established the time-associated obstacles faced by evidence-based medicine practitioners. In this paper, we focus on the problem of automatic text summarisation to help practitioners quickly find query-focused information from relevant documents. We utilise an annotated corpus that is specialised for the task of evidence-based summarisation of text. In contrast to past summarisation approaches, which mostly rely on surface level features to identify salient pieces of texts that form the summaries, our approach focuses on the use of corpus based statistics, and domain s...
Background
Evidence-based medicine practice requires practitioners to obtain the best available ... more Background
Evidence-based medicine practice requires practitioners to obtain the best available medical evidence, and appraise the quality of the evidence when making clinical decisions. Primarily due to the plethora of electronically available data from the medical literature, the manual appraisal of the quality of evidence is a time-consuming process. We present a fully automatic approach for predicting the quality of medical evidence in order to aid practitioners at point-of-care.
Methods
Our approach extracts relevant information from medical article abstracts and utilises data from a specialised corpus to apply supervised machine learning for the prediction of the quality grades. Following an in-depth analysis of the usefulness of features (e.g., publication types of articles), they are extracted from the text via rule-based approaches and from the meta-data associated with the articles, and then applied in the supervised classification model. We propose the use of a highly scalable and portable approach using a sequence of high precision classifiers, and introduce a simple evaluation metric called average error distance (AED) that simplifies the comparison of systems. We also perform elaborate human evaluations to compare the performance of our system against human judgments.
Results
We test and evaluate our approaches on a publicly available, specialised, annotated corpus containing 1132 evidence-based recommendations. Our rule-based approach performs exceptionally well at the automatic extraction of publication types of articles, with F-scores of up to 0.99 for high-quality publication types. For evidence quality classification, our approach obtains an accuracy of 63.84% and an AED of 0.271. The human evaluations show that the performance of our system, in terms of AED and accuracy, is comparable to the performance of humans on the same data.
Conclusions
The experiments suggest that our structured text classification framework achieves evaluation results comparable to those of human performance. Our overall classification approach and evaluation technique are also highly portable and can be used for various evidence grading scales.
Proceedings of the fifth conference TALN 1998, Jun 1, 1998
R��sum�� Nous d��crivons dans cet article un syst��me d'extraction automatique de r��ponses.... more R��sum�� Nous d��crivons dans cet article un syst��me d'extraction automatique de r��ponses. L'extraction automatique de r��ponses (EAR) a pour but de trouver les passages d'un document qui r��pondent directement �� une question pos��e par un utilisateur. L'EAR est plus ambitieuse que la recherche d'informations et l'extraction d'informations dans le sens que les r��sultats de la recherche sont des phrases et non pas des documents en entier, et dans le sens que les questions peuvent ��tre formul��es de fa��on libre. Elle est par contre moins ...
Abstract. We introduce an approach to question answering in the biomedical domain that utilises s... more Abstract. We introduce an approach to question answering in the biomedical domain that utilises similarity matching of question/answer pairs in a document, or a set of background documents, to select the best answer to a multiple-choice question. We explored a range of possible similarity matching methods, ranging from simple word overlap, to dependency graph matching, to feature-based vector similarity models that incorporate lexical, syntactic and/or semantic features. We found that while these methods performed reasonably well ...
Description The ALTA shared tasks are programming competitions where all participants attempt to ... more Description The ALTA shared tasks are programming competitions where all participants attempt to solve the same problem, and the winner is the system with the best results. The 2011 ALTA shared task is the second in the series and it focuses on trying to automatically grade the level of clinical evidence in medical research papers. In this paper we describe the task, present the results of several baselines, and the results of our method. We apply a sequence of high precision machine learning classifiers with varying feature sets for each. ...
The Australian Computational and Linguistics Olympiad (OzCLO) started in 2008 in only two locatio... more The Australian Computational and Linguistics Olympiad (OzCLO) started in 2008 in only two locations and has since grown to a nationwide competition with almost 1500 high school students participating in 2013. An Australian team has participated in the International Linguistics Olympiad (IOL) every year since 2009. This paper describes how the competition is run (with a regional first round and a final national round) and the organisation of the competition (a National Steering Committee and Local Organising Committees for each region) and discusses the particular challenges faced by Australia (timing of the competition and distance between the major population centres). One major factor in the growth and success of OzCLO has been the introduction of the online competition, allowing participation of students from rural and remote country areas. The organisation relies on the goodwill and volunteer work of university and school staff but the strong interest amongst students and teachers shows that OzCLO is responding to a demand for linguistic challenges.
Evidence-based medicine practice requires medical practitioners to rely on the best available evi... more Evidence-based medicine practice requires medical practitioners to rely on the best available evidence, in addition to their expertise, when making clinical decisions. The medical domain boasts a large amount of published medical research data, indexed in various medical databases such as MEDLINE. As the size of this data grows, practitioners increasingly face the problem of information overload, and past research has established the time-associated obstacles faced by evidence-based medicine practitioners. In this paper, we focus on the problem of automatic text summarisation to help practitioners quickly find query-focused information from relevant documents. We utilise an annotated corpus that is specialised for the task of evidence-based summarisation of text. In contrast to past summarisation approaches, which mostly rely on surface level features to identify salient pieces of texts that form the summaries, our approach focuses on the use of corpus based statistics, and domain s...
Background
Evidence-based medicine practice requires practitioners to obtain the best available ... more Background
Evidence-based medicine practice requires practitioners to obtain the best available medical evidence, and appraise the quality of the evidence when making clinical decisions. Primarily due to the plethora of electronically available data from the medical literature, the manual appraisal of the quality of evidence is a time-consuming process. We present a fully automatic approach for predicting the quality of medical evidence in order to aid practitioners at point-of-care.
Methods
Our approach extracts relevant information from medical article abstracts and utilises data from a specialised corpus to apply supervised machine learning for the prediction of the quality grades. Following an in-depth analysis of the usefulness of features (e.g., publication types of articles), they are extracted from the text via rule-based approaches and from the meta-data associated with the articles, and then applied in the supervised classification model. We propose the use of a highly scalable and portable approach using a sequence of high precision classifiers, and introduce a simple evaluation metric called average error distance (AED) that simplifies the comparison of systems. We also perform elaborate human evaluations to compare the performance of our system against human judgments.
Results
We test and evaluate our approaches on a publicly available, specialised, annotated corpus containing 1132 evidence-based recommendations. Our rule-based approach performs exceptionally well at the automatic extraction of publication types of articles, with F-scores of up to 0.99 for high-quality publication types. For evidence quality classification, our approach obtains an accuracy of 63.84% and an AED of 0.271. The human evaluations show that the performance of our system, in terms of AED and accuracy, is comparable to the performance of humans on the same data.
Conclusions
The experiments suggest that our structured text classification framework achieves evaluation results comparable to those of human performance. Our overall classification approach and evaluation technique are also highly portable and can be used for various evidence grading scales.
Proceedings of the fifth conference TALN 1998, Jun 1, 1998
R��sum�� Nous d��crivons dans cet article un syst��me d'extraction automatique de r��ponses.... more R��sum�� Nous d��crivons dans cet article un syst��me d'extraction automatique de r��ponses. L'extraction automatique de r��ponses (EAR) a pour but de trouver les passages d'un document qui r��pondent directement �� une question pos��e par un utilisateur. L'EAR est plus ambitieuse que la recherche d'informations et l'extraction d'informations dans le sens que les r��sultats de la recherche sont des phrases et non pas des documents en entier, et dans le sens que les questions peuvent ��tre formul��es de fa��on libre. Elle est par contre moins ...
Abstract. We introduce an approach to question answering in the biomedical domain that utilises s... more Abstract. We introduce an approach to question answering in the biomedical domain that utilises similarity matching of question/answer pairs in a document, or a set of background documents, to select the best answer to a multiple-choice question. We explored a range of possible similarity matching methods, ranging from simple word overlap, to dependency graph matching, to feature-based vector similarity models that incorporate lexical, syntactic and/or semantic features. We found that while these methods performed reasonably well ...
Description The ALTA shared tasks are programming competitions where all participants attempt to ... more Description The ALTA shared tasks are programming competitions where all participants attempt to solve the same problem, and the winner is the system with the best results. The 2011 ALTA shared task is the second in the series and it focuses on trying to automatically grade the level of clinical evidence in medical research papers. In this paper we describe the task, present the results of several baselines, and the results of our method. We apply a sequence of high precision machine learning classifiers with varying feature sets for each. ...
The Australian Computational and Linguistics Olympiad (OzCLO) started in 2008 in only two locatio... more The Australian Computational and Linguistics Olympiad (OzCLO) started in 2008 in only two locations and has since grown to a nationwide competition with almost 1500 high school students participating in 2013. An Australian team has participated in the International Linguistics Olympiad (IOL) every year since 2009. This paper describes how the competition is run (with a regional first round and a final national round) and the organisation of the competition (a National Steering Committee and Local Organising Committees for each region) and discusses the particular challenges faced by Australia (timing of the competition and distance between the major population centres). One major factor in the growth and success of OzCLO has been the introduction of the online competition, allowing participation of students from rural and remote country areas. The organisation relies on the goodwill and volunteer work of university and school staff but the strong interest amongst students and teachers shows that OzCLO is responding to a demand for linguistic challenges.
Uploads
Papers by Diego Molla
Evidence-based medicine practice requires practitioners to obtain the best available medical evidence, and appraise the quality of the evidence when making clinical decisions. Primarily due to the plethora of electronically available data from the medical literature, the manual appraisal of the quality of evidence is a time-consuming process. We present a fully automatic approach for predicting the quality of medical evidence in order to aid practitioners at point-of-care.
Methods
Our approach extracts relevant information from medical article abstracts and utilises data from a specialised corpus to apply supervised machine learning for the prediction of the quality grades. Following an in-depth analysis of the usefulness of features (e.g., publication types of articles), they are extracted from the text via rule-based approaches and from the meta-data associated with the articles, and then applied in the supervised classification model. We propose the use of a highly scalable and portable approach using a sequence of high precision classifiers, and introduce a simple evaluation metric called average error distance (AED) that simplifies the comparison of systems. We also perform elaborate human evaluations to compare the performance of our system against human judgments.
Results
We test and evaluate our approaches on a publicly available, specialised, annotated corpus containing 1132 evidence-based recommendations. Our rule-based approach performs exceptionally well at the automatic extraction of publication types of articles, with F-scores of up to 0.99 for high-quality publication types. For evidence quality classification, our approach obtains an accuracy of 63.84% and an AED of 0.271. The human evaluations show that the performance of our system, in terms of AED and accuracy, is comparable to the performance of humans on the same data.
Conclusions
The experiments suggest that our structured text classification framework achieves evaluation results comparable to those of human performance. Our overall classification approach and evaluation technique are also highly portable and can be used for various evidence grading scales.
Evidence-based medicine practice requires practitioners to obtain the best available medical evidence, and appraise the quality of the evidence when making clinical decisions. Primarily due to the plethora of electronically available data from the medical literature, the manual appraisal of the quality of evidence is a time-consuming process. We present a fully automatic approach for predicting the quality of medical evidence in order to aid practitioners at point-of-care.
Methods
Our approach extracts relevant information from medical article abstracts and utilises data from a specialised corpus to apply supervised machine learning for the prediction of the quality grades. Following an in-depth analysis of the usefulness of features (e.g., publication types of articles), they are extracted from the text via rule-based approaches and from the meta-data associated with the articles, and then applied in the supervised classification model. We propose the use of a highly scalable and portable approach using a sequence of high precision classifiers, and introduce a simple evaluation metric called average error distance (AED) that simplifies the comparison of systems. We also perform elaborate human evaluations to compare the performance of our system against human judgments.
Results
We test and evaluate our approaches on a publicly available, specialised, annotated corpus containing 1132 evidence-based recommendations. Our rule-based approach performs exceptionally well at the automatic extraction of publication types of articles, with F-scores of up to 0.99 for high-quality publication types. For evidence quality classification, our approach obtains an accuracy of 63.84% and an AED of 0.271. The human evaluations show that the performance of our system, in terms of AED and accuracy, is comparable to the performance of humans on the same data.
Conclusions
The experiments suggest that our structured text classification framework achieves evaluation results comparable to those of human performance. Our overall classification approach and evaluation technique are also highly portable and can be used for various evidence grading scales.