Spotlights



Afra Alishahi

I am an associate professor at Tilburg University, the Netherlands. My research focuses on applying computational techniques for modeling human cognition, specifically human language acquisition and processing. I was born in Tehran, Iran, where I lived till 2002. I was always interested in Math. Towards the end of high school, a favorite cousin of mine talked me into studying “computers”. There was no real computer science program offered in Iran when I started the university, so I did a bachelor’s in software engineering and then a master’s in artificial intelligence before moving to Canada to do a PhD in computer science at the University of Toronto.

Over the course of my scientific career, I have worked on modeling several aspects of child language development, including the acquisition of words, lexical categories and argument structure constructions in monolingual and bilingual communities. More recently I have been working on grounded models of language learning from naturalistic input (raw speech and naturalistic images). I believe my research, as well as others who work on similar topics, has contributed to our understanding of child language acquisition, and also brought cognitive models of language more into focus in the NLP community in recent years. What I find challenging in NLP research is to keep up with the pace with which the field moves forward, but at the same time it keeps us alert and makes our research more exciting.

In my own field I suspect the next trend to be unsupervised learning of language from more realistic multimodal input such as video, situated in interactive environments. Much effort in NLP research is put into developing models and techniques that provide better results, but quantitative and qualitative analysis of what models actually learn does not receive enough attention. I hope more researchers focus on understanding the nature of linguistic knowledge learned by computational models.


Pascale Fung’s Human Language Technology Center

I co-founded the Human Language Technology Center back in 1997, 20 years ago. This is a joint center between the departments of Electronic and Computer Engineering and Computer Science and Engineering of the Hong Kong University of Science and Technology. We work on multilingual, multimodal and interactive aspects of speech and language processing. In particular, we were the first group who worked on accented Mandarin speech recognition and colloquial Chinese language understanding. We built the first Chinese natural language search engine back in 2001 and the first Chinese language virtual assistant on the smartphone in 2010. We also work on code-switching in speech and language modeling. More recently, we have been focusing on building interactive systems with both traditional spoken language understanding and an additional empathy module.

The group has always been diverse with current students from China, Indonesia, Singapore, Taiwan, Bangladesh, Korea, the Netherlands, and Italy, even though the norm for engineering groups in Hong Kong is for the majority of students to be from mainland China. So far, the group had 8-9 women graduate students, and a number of women undergraduate project students. With more cross-disciplinary topics, I find that I am getting more female students who are interested in our area.

Computational linguistics is both a science, where many linguists attempt to discover the underlying structures of languages, and engineering, where many others attempt to build AI systems that complete some tasks. A corollary of this challenge is that it is difficult to find students who excel in NLP – most are either interested in linguistics or in engineering. The kind of work we do in my group has always needed good engineers who nevertheless have some intuitions and insights about languages, if not linguistics per se. I find that having groups of students with diverse backgrounds working together definitely helps though, as some of them are more mathematically-inclined, others are better software engineers, and yet others are more intuitive and creative. These traits are not mutually exclusive of course, but it is harder to find individual students who possess all of them. Students who came from different cultural backgrounds went through different education systems which often trained them to be stronger in one of these areas.

I am most proud of the kind of work we have done that have “gone where no man has gone before” – they were not mainstream at the time, but became more important later on. One early example was using signal processing for language processing, another example is code switch speech, the other multilingual language processing. Although much of our non-mainstream work paid off later on, there can be issues with being taken seriously initially. For instance, groups that are based in North America have more media exposure and access to policy makers. We have to make more of an effort to affect impact on the global stage. It can be done, however, through persistence.

Group members in the photo, in alphabetical order:

Dario Bertero
Chao Xianjin
Anik Dey
Akanksha Gupta (undergrad project student)
Onno Kampman
Nayeon Lee
Naziba Mostafa
Ji Ho Park
Qi Zihao
Jaemin Shin
Farhad Bin Siddique
Genta Indra Winata
Xu Peng
Emily Yang


Varada Kolhatkar

I was born in Pune, India. I came to the U.S. to do my M.Sc. in computer science at the University of Minnesota, Duluth and then moved to Canada for my Ph.D. in computational linguistics at the University of Toronto. I am now a postdoctoral researcher at Simon Fraser University.

I did not really think very deeply when I started my undergraduate degree in computer science. I always thought very highly of people who devoted their lives to solve hard problems and I wanted to be one of them. So I decided to pursue my Masters. My Masters advisor, Ted Pedersen, introduced me to NLP. I was fascinated by the countless research opportunities in the field and decided to pursue a Ph.D. in NLP. Currently, I am working on developing computational methods for analyzing online text, in particular, reader comments posted on news articles. I have also worked on anonymizing unstructured data, anaphora resolution, and word-sense disambiguation.

I think deeper analysis from the perspective of understanding language is not getting much attention in the field. In our papers, we say which features work well and which do not by carrying out ablation experiments. But we only provide relatively superficial explanations. We see many papers with sophisticated models that work well. But there is no deeper analysis of why these models work so well and what kind of linguistic characteristics these models are capturing. I also feel that proper evaluation does not get enough attention. I believe one of the next topics to get more attention will be filtering toxic text on the web, discourse processing, and argumentation mining.

I like the work I’ve done so far but the proud moment has yet to come. I feel that finding an appropriate job and building a network is one of the main challenges for an early-career professional in NLP, which is probably shared by all early career researchers, not just women or other minority groups. I hope that initiatives like WiNLP will help tackle barriers for early-career professionals in NLP, especially women and other underrepresented groups in the field.


Barbara Plank

I am an assistant professor at the University of Groningen. Much of my work focuses on building Natural Language Processing (NLP) models that are more robust: that work better on unexpected inputs (like new domains and languages) and can be trained from semi-automatically or weakly annotated data from a variety of sources. This includes leveraging what I call fortuitous data: sources that so far have been neglected or rest in non-obvious places.
I grew up in a German-speaking minority in the North of Italy. After attending a commercial high school, I studied computer science in Bozen-Bolzano, despite many suggesting me to go abroad and study economics. During my CS bachelor studies I took an introductory class in computational linguistics, and immediately got hooked: finally the studies were no longer about software engineering, interfaces and databases, but about language, with all the fascinating challenges that come with it. I then did the Language and Communication Technologies (LCT) European Masters Program, as one of its first students, which brought me to Amsterdam.
Throughout my academic career, I am proud that I have had the opportunity to learn from and work with many great and inspiring people in the field across Europe (Denmark, Italy and The Netherlands). I am also proud that I co-taught a summer course at last year’s ESSLLI at the very same university I graduated from nine years earlier. And finally, I am proud of being tenured at the age of 33.
I think one of the hottest research directions in NLP will be, and already is, transfer learning, the ability to transfer models to new conditions, which includes learning under limited (or absence of) annotated resources. As a matter of fact most current approaches need abundant amounts of labeled training data, and work well only in benchmarking scenarios: when future data comes from the same distribution.
What currently doesn’t get enough attention in NLP is evaluation. This includes both in-depth and thorough studies that shed light on why certain approaches do (or do not) work, as well as work that questions established evaluation measures. Such work is typically hard to publish.


Zornitsa Kozareva

I am a Manager of the AWS Deep Learning group at Amazon that solves and builds natural language processing and dialogue applications. Previously, I was a Senior Manager at Yahoo! leading the Query Understanding and User Intent group that powered Mobile Search and Advertisement. Before that I wore an academic hat as a Research Professor at the University of Southern California CS Department with affiliation to Information Sciences Institute, where I spearheaded research funded by DARPA and IARPA on learning to read, interpreting metaphors and building knowledge bases from the Web. My interests lie in building intelligent NLP systems that scale to billions of data points, work across different domains and verticals, and most importantly solve real world problems for the end user.

Since I was a little kid, I was very curious and passionate about science, I would regularly participate in math and physics Olympiads. My interest in Natural Language Processing began in 2003, when I was doing my undergraduate studies in computer science and was selected to conduct research at the New University of Lisbon and work on Multilingual Information Retrieval. I was mesmerized by the power of machine learning and its ability to solve natural language problems. From that moment, I knew that I wanted to work in this area and contribute towards scientific breakthroughs.

Throughout my career, I feel extremely lucky to have worked with and learned from great leaders and visionaries such as Eduard Hovy, Jerry Hobbs, Ellen Riloff, Kevin Knight, Rada Mihalcea, Manuel Palomar, Andres Montoyo, Dimitar Mekerov among others. I was fortunate to be surrounded by so many smart people, from whom I learned a lot both as a mentee and a mentor. I feel honored that in 2016, I was awarded the John Atanasoff Award, one of the highest recognitions given by the President of Republic of Bulgaria for my contributions and impact in science, education, and industry.

I think one challenge in NLP is that sometimes methods and algorithms that are shown to work well on small datasets lead to misconceptions that a particular problem is solved. However in practice, it is really hard to develop practical solutions that work and solve real world problems at a large scale.

One area I am particularly excited about is open domain multi-modal conversational assistants that can understand and help humans in their daily lives. I think this research area will receive a lot of attention in the coming years. I envision such dialog systems to be personalized, context-aware with the capability of automatically learning about new intents directly from human conversations.


Yuki Arase

I am an associate professor at Osaka University, born in Tokushima, a small city in Japan. I work on phrase alignment, which can be applied for phrasal paraphrase and parallel phrase detection. More recently, I have also started working on conversation systems in collaboration with the team at Microsoft Research Asia who developed the popular chat-bot service Rinna. I see the future of NLP in advanced automatic conversation systems that really understand us.

I have always been interested in understanding human minds, in how people perceive the world and what is the mechanism of thinking are. When I was a high school student, there was a social boom on intelligent robots, like the recent AI enthusiasm, where computer science plays an important. That strongly motivated me to study computer science at university.
For my Ph.D., I focused on my interests in human’s perception through research on user interfaces for mobile devices. After graduation, I joined Microsoft Research Asia as an associate researcher, where I had the opportunity to explore all research fields in MSRA and find out what I truly want to work on thanks to my line manager. I ended up with NLP, following my natural interests of understanding the mechanisms of thinking.
What I find challenging as a researcher that is not unique in NLP is a gap between product development and research. To deliver something valuable to people, we should use practical, sustainable, and scalable approaches to achieve goals, which are not necessarily academically appealing to researchers.
We further put excessive emphasis on numbers, i.e., scores on precision or BLEU, but neglect analysing data and outputs to understand research problems.

One of the strategies I have seen female computer scientists use to try and fit in is, intentionally or unintentionally, hiding preferences or characteristics thought of as a being stereotypically female to feel “included” in a male-centric community. For instance, following the stereotype of a hardworking male researcher, staying in the lab until late despite being tired and pretending not to be interested in anything apart from work. On a superficial level this works, but might lead to a feeling of losing one’s true self. And worse, we may distance ourselves from female friends who are members of more balanced communities for behaving differently from their male counterparts. Increasing diversity in a community and promoting minorities through initiatives such as this workshop does help to change such situations, but it will take time. In the meantime, it is important to keep friends across communities to have a place to behave naturally.


Natalie Schluter

I was born in Vancouver, but now live in Denmark, where I’m Assistant Professor at the IT University of Copenhagen. As part of my role, I helped establish and now act as the Head of the first Data Science BSc Program in Denmark. This is a busy and thrilling part of my job where I directly help shape the data science competency pool in Denmark.

My focus is on algorithms for NLP: theoretical and linguistic power and resource complexity, especially for semantic processing. My research often carries a good portion of mathematical content. This is a central challenge, both for my individual contributions at the review stage and for finding people to collaborate with, because NLP researchers don’t have much patience for mathematics these days.

Related to this is the growing problem of recent state-of-the-art learning architectures that are prohibitively expensive to train, leaving many important contributions in NLP unverifiable by most of the research community. I think the next hot topic will have to be on improving these big architectures with more focus on experiment methodology and computational complexity: verifiability, truth, feasibility and sustainability.

I have several different degrees, which I worked really hard for and am quite proud of. At university, I wanted to study mathematics, but had a stronger desire to travel. That is when I started getting more interested in the structure of language. When I started studying math at UBC, Vancouver, I also took some linguistics courses, and then switched over to linguistics, finishing my first BA in this. Desiring to understand the mathematics better, I then went to Montréal and finished a BSc in Mathematics. However, I felt a Linguistics pang and then finished an MSc in Linguistics also at the University of Montréal. Of course, I then felt a pang for more mathematics and went on to finish an MSc in Mathematics at Trinity College, University of Dublin, Ireland. Natural language processing was a good field in which to reconcile my interests. I completed a PhD in Natural Language Processing at Dublin City University.

Surprisingly, my education is problematic for many, some inferring that I kept switching degrees because I was not very good at any one thing or was confused. I’ve even got “her background is, well… broad” as a dismissing and the only remark in a formal presentation of my research in front of group of peers. I believe that the perception of my academic achievements has been negative because I am a woman, black or both. Rather than being a couple random examples of our careers as minorities, such incidents are (always surprisingly) frequent. I believe that there is little awareness of their pervasiveness and if there were, the vast majority of the NLP community would support initiatives like WiNLP. However, WiNLP is in virtue of its name a closed community. We need broader community-wide outreach.


Dong Nguyen

Picture of Dong Yu smilingI am a research fellow at the Alan Turing Institute, researching large-scale text analysis to shed light on social and cultural phenomena. Much of my work focuses on the social aspects of language use and fits within the emerging area of computational sociolinguistics. I am also very interested in methodological issues that arise with interdisciplinary research and specifically the interaction between ML/NLP and the social sciences.

I grew up in Deventer, in the Netherlands. I already had access to a computer as a kid as my father had studied computer science. My first exposure to programming was a programming language designed for kids to control the movements of a turtle on a screen. Later on during high school I got very interested in web design. Despite that, I first started studying biomedical engineering, partly because I was put off by misleading computer science stereotypes. After one year I fortunately changed my mind and switched to computer science.

During my studies, an award targeting minorities in the Netherlands gave me the opportunity to attend a summer school at UCLA in the second year of my undergraduate, where I attended a course in linguistics. Inspired by this, I pursued a Master’s degree at the Language Technologies Institute at CMU. Despite the incredible experience and the possibility to continue as a PhD student, I decided to do my PhD closer to home, at the University of Twente. This was a very difficult decision for me. At the time I valued the prestige of institutions a lot and quite some people advised against my move. However, this turned out to be one of the best decisions I have ever made, enabling me to maintain a work-life balance that I am happy with.

I think computational sociolinguistics will be, or already is, one of the hottest areas in natural language processing. One of my proudest achievements so far has been an article published in Computational Linguistics, surveying the area of computational sociolinguistics, which I hope will become a useful resource. I also think that the interpretability and validation of NLP models will become more important in some domains, especially given the ongoing discussions regarding ethics, fairness etc. and the increasing societal impact of NLP.

I have found the NLP community to be very welcoming and initiatives like WiNLP great to reduce barriers for underrepresented groups.
One big challenge I see in NLP is that there seems to be an almost exclusive focus on task performance, measuring the quality of the models using abstract metrics. Analysis studies are often undervalued and difficult to publish, although thorough, insightful analyses are valuable and difficult to carry out.


Wei Xu

I am an assistant professor in Computer Science at Ohio State University. I do research on semantics, social media data and natural language generation. I am extremely interested in the intersection of these three areas — paraphrases!

When I was a PhD student at New York University, I was intrigued by how many different and creative ways people can use language to express the same meaning especially in user-generated text on Twitter. I switched my PhD research from information extraction to the less-studied paraphrase acquisition problem. Unlike many NLP systems that frustratingly do not get close to human performance, the models I built often surprise and excite me by exceeding the paraphrasing capabilities of native speakers, from “Ezekiel Ansah wearing 3D glasses wout the lens” to “Ezekiel Ansah is wearing 3D movie glasses with the lenses knocked out”. So I was determined to design better models to capture all possible paraphrases, and create useful paraphrase data resources to support all other NLP researchers and applications.

For my postdoc, I moved to the University of Pennsylvania in 2014. I started to deepen in the second branch of paraphrase research — generation. I decided to take on the text simplification task as it is more technically challenging than sentence compression and practically useful for education applications. At the time, I was already very confident with my technical skills but struggled to push the MT system performance on the simplification task. There must be something wrong, I thought, either in the data or evaluation. It turned out to be both. I wrote a TACL paper in 2015 to show the state-of-the-art setup was not right for simplification using Simple Wikipedia data since 2010. I was relieved when Shashi Narayan, who held the then best performing system, told me that it was great and he shared the same concerns. I felt I did justice to many researchers who worked on simplification and likely struggled for months, and PhD students who had to abandon their projects. This is what motivates me to work, besides my students.

#IMHO The biggest challenge in NLP (and academia in general) is how to maintain high-quality peer review and reward quality work beyond numbers of publications, especially when the field is exploding. Additional efforts are also needed in its globalization. When I was an area chair for EMNLP, even as a native Chinese speaker I felt it was difficult to recruit program committee members from Chinese names in their English spellings. I am sure similar challenges apply to Vietnamese and some other nationalities.


Martha Palmer

I am a professor at the University of Colorado at Boulder, where I research abstract representations that can consistently and effectively capture key elements of meaning.

I grew up in Houston, TX and discovered my passion for CS as a Math major at the U of Texas while taking an Introduction to CS course. We fed punch cards into the CDC 66/6400 in the basement of Pearce Hall. What I liked the most was that the compiler found all my careless mistakes for me!
I changed my major to Philosophy since there was no CS undergraduate major at the time. After spending a year in a Psychology Ph.D. program, I quickly switched back to an MA in CS, with Robert Simmons (semantic networks) as my advisor. Subsequently, I received my Ph.D. from Edinburgh University, where I was the first woman to receive a Ph.D. from the AI Department in 1985.

My biggest achievements in NLP research include the integration of semantic and pragmatic processing that I spearheaded for the PUNDIT/Kernel system in the 80s, as well as leading the development of PropBank, VerbNet and AMRs. While semantics is currently a hot topic, I anticipate that conversational speech that actually works will get more attention in the years to come. I hope that the importance of semantic representations and speaker and hearer discourse models will be recognized as foundational work for this.

One of the biggest challenges NLP professionals currently face, in my experience, is that algorithms and mathematical approaches, basically quantitative approaches, receive most of the credit in NLP research. I see this bias in favor of mathematical solutions as a type of barrier in the NLP community. Even though studying how to develop and formalize abstract semantic representations is every bit as difficult, challenging and important as studying new algorithms, if not more so, it is rarely recognized.