Skip to main content

Michal Ptaszynski

Kitami Institute of Technology, Department of Computer Science, Faculty Member

Followers

145

Following

57

Co-authors

7

Public Views

InterestsView All (17)

Uploads

Books by Michal Ptaszynski

Automatic Cyberbullying Detection: Emerging Research and Opportunities

Due to the prevalence of social network service and social media, the problem of cyberbullying ha... more Due to the prevalence of social network service and social media, the problem of cyberbullying has risen to the forefront as a major social issue over the last decade. Internet hate, harassment, cyberstalking, cyberbullying—these terms, which were almost unknown 10 years ago—are in the everyday lexicon of all internet users today. Unfortunately, it is becoming increasingly difficult to undertake continuous surveillance of websites as new ones are appearing daily. Methods for automatic detection and mitigation for online bullying have become necessary in order to retain comfortable user experience online.

Automatic Cyberbullying Detection: Emerging Research and Opportunities provides innovative insights into online bullying and methods of early identification, mitigation, and prevention of harassing speech and activity online. The book provides explanations and reasoning for each of these applied methods and discusses their pros and cons int he context of the language of online bullying. Also included are some generalizations of cyberbullying as a phenomenon and how to approach the problem from a practical technology-backed point of view. The content within this publication represents the work spanning over ten years and covers a wide range of artificial intelligence, machine learning and natural language processing methods applied to the problem of automatic cyberbullying detection, such as traditional machine learning, deep learning, web mining, or language combinatorics. The book is designed for researchers, academicians, social media moderators, IT consultants, programmers and education administrators and covers topics centered on methods of detection and mitigation of cyberbullying, and surrounding problems such as internet hate and online harassment.

Emotion Awareness in Dialog Agents

This book describes my research on enhancing machines with Emotional Intelligence. I develop a se... more This book describes my research on enhancing machines with Emotional Intelligence. I develop a set of affect analysis tools and propose methods for their efficient utilization. The first system, ML-Ask, separates emotive utterances from neutral and in the emotive utterances seeks for expressions of specific emotion types. The second system, CAO, extracts emoticons from input and determines the emotion types they express. The above systems are then utilized in two methods for enhancing of Human-Computer Interaction. The first is a method for automatic evaluation of conversational agents. In this method the information on user emotional engagement during conversation is reinterpreted to specify general attitudes to conversational agents. The second method determines whether emotions expressed by speaker are appropriate for the context of the conversation. The information on affective states of the user-speaker is confronted with gathered from the Internet list of emotions that should be expressed at the moment. I conclude the book with a discussion on other applications for the proposed methods and further work needed for full implementation of Emotional Intelligence in machines.

Towards Socialized Machines: Emotions and Sense of Humour in Conversational Agents

by Michal Ptaszynski and Rafal Rzepka

From the beginning of computer era over half a century ago, humanity was fascinated by the idea o... more From the beginning of computer era over half a century ago, humanity was fascinated by the idea of creating a machine substituting their mental capabilities. This New Age version of Mary Shelley's Frankenstein gave birth to S-F literature and was one of the motors for development of our civilisation. The mental functions digitalized as the first ones were fast processing of large numbers or sophisticated formulas for specialized fields like mathematics or physics. These functions were the most troublesome for humans, but the easiest to process mechanically. Ironically, the human mental functions said to be the most human-like, and thought of as the ones which make up a grown well-socialized man, such as a sense of humour or understanding emotions of others, were neglected in Computer Science for a long time as too subjective and therefore unscientific...

Papers by Michal Ptaszynski

Looking for Razors and Needles in a Haystack: Multifaceted Analysis of Suicidal Declarations on Social Media—A Pragmalinguistic Approach

by Michal Ptaszynski and Maciej Brochocki

International Journal of Environmental Research and Public Health

In this paper, we study language used by suicidal users on Reddit social media platform. To do th... more In this paper, we study language used by suicidal users on Reddit social media platform. To do that, we firstly collect a large-scale dataset of Reddit posts and annotate it with highly trained and expert annotators under a rigorous annotation scheme. Next, we perform a multifaceted analysis of the dataset, including: (1) the analysis of user activity before and after posting a suicidal message, and (2) a pragmalinguistic study on the vocabulary used by suicidal users. In the second part of the analysis, we apply LIWC, a dictionary-based toolset widely used in psychology and linguistic research, which provides a wide range of linguistic category annotations on text. However, since raw LIWC scores are not sufficiently reliable, or informative, we propose a procedure to decrease the possibility of unreliable and misleading LIWC scores leading to misleading conclusions by analyzing not each category separately, but in pairs with other categories. The analysis of the results supported t...

Improving Basic Natural Language Processing Tools for the Ainu Language

Information

Ainu is a critically endangered language spoken by the native inhabitants of northern Japan. This... more Ainu is a critically endangered language spoken by the native inhabitants of northern Japan. This paper describes our research aimed at the development of technology for automatic processing of text in Ainu. In particular, we improved the existing tools for normalizing old transcriptions, word segmentation, and part-of-speech tagging. In the experiments we applied two Ainu language dictionaries from different domains (literary and colloquial) and created a new data set by combining them. The experiments revealed that expanding the lexicon had a positive impact on the overall performance of our tools, especially with test data unrelated to any of the training sets used.

MiNgMatch—A Fast N-gram Model for Word Segmentation of the Ainu Language

Information

Word segmentation is an essential task in automatic language processing for languages where there... more Word segmentation is an essential task in automatic language processing for languages where there are no explicit word boundary markers, or where space-delimited orthographic words are too coarse-grained. In this paper we introduce the MiNgMatch Segmenter—a fast word segmentation algorithm, which reduces the problem of identifying word boundaries to finding the shortest sequence of lexical n-grams matching the input text. In order to validate our method in a low-resource scenario involving extremely sparse data, we tested it with a small corpus of text in the critically endangered language of the Ainu people living in northern parts of Japan. Furthermore, we performed a series of experiments comparing our algorithm with systems utilizing state-of-the-art lexical n-gram-based language modelling techniques (namely, Stupid Backoff model and a model with modified Kneser-Ney smoothing), as well as a neural model performing word segmentation as character sequence labelling. The experiment...

How Religion and Morality Correlate in Age of Society 5.0: Statistical Analysis of Emotional and Moral Associations with Buddhist Religious Terms Appearing on Japanese Blogs

Cognitive Systems Research

ML-Ask: Open Source Affect Analysis Software for Textual Input in Japanese

Journal of Open Research Software

Automatically Annotating A Five-Billion-Word Corpus of Japanese Blogs for Sentiment and Affect Analysis Computer Speech and Language (CSL)

Affect as Information about Users' Attitudes to Conversational Agents

Proceedings of the 2008 Ieee Wic Acm International Conference on Web Intelligence and Intelligent Agent Technology Volume 03, 2008

Multi-humoroid: joking system that reacts with humor to humans' bad moods

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems Volume 1 Volume 1, 2010

Language Combinatorics: A Sentence Pattern Extraction Architecture Based on Combinatorial Explosion

When Your Users Are Not Serious - Using Web-based Associations Affect and Humor for Generating Appropriate Utterances for Inappropriate Input

Transactions of the Japanese Society For Artificial Intelligence, 2010

Crossing Word Borders - Towards Phrasal Pun Generation Engine

... In our previous works we showed that implementing a very simple pun genera-tor into a chatter... more ... In our previous works we showed that implementing a very simple pun genera-tor into a chatterbot can visibly improve its performance. ... One of the first and probably most robust sys-tems in the field of pun processing is Binsted&amp;amp;amp;#x27;s JAPE punning riddles generator (Binsted, 1996 ...

Machine Moral Development: Moral Reasoning Agent Based on Wisdom of Web-Crowd and Emotions

Contextual Valence Shifters Supporting Affect Analysis of Utterances in Japanese

Double Standpoint Evaluation Method for Affect Analysis Systems

人工知能学会全国大会論文集, 2008

Automatically annotating a five-billion-word corpus of Japanese blogs for affect and sentiment analysis

Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis, Jul 12, 2012

Science of Emoticons: Research Framework and State of the Art in Analysis of kaomoji-type Emoticons

A Survey on Large Scale Web Based Corpora

Automatic Cyberbullying Detection: Emerging Research and Opportunities

Due to the prevalence of social network service and social media, the problem of cyberbullying ha... more Due to the prevalence of social network service and social media, the problem of cyberbullying has risen to the forefront as a major social issue over the last decade. Internet hate, harassment, cyberstalking, cyberbullying—these terms, which were almost unknown 10 years ago—are in the everyday lexicon of all internet users today. Unfortunately, it is becoming increasingly difficult to undertake continuous surveillance of websites as new ones are appearing daily. Methods for automatic detection and mitigation for online bullying have become necessary in order to retain comfortable user experience online.

Automatic Cyberbullying Detection: Emerging Research and Opportunities provides innovative insights into online bullying and methods of early identification, mitigation, and prevention of harassing speech and activity online. The book provides explanations and reasoning for each of these applied methods and discusses their pros and cons int he context of the language of online bullying. Also included are some generalizations of cyberbullying as a phenomenon and how to approach the problem from a practical technology-backed point of view. The content within this publication represents the work spanning over ten years and covers a wide range of artificial intelligence, machine learning and natural language processing methods applied to the problem of automatic cyberbullying detection, such as traditional machine learning, deep learning, web mining, or language combinatorics. The book is designed for researchers, academicians, social media moderators, IT consultants, programmers and education administrators and covers topics centered on methods of detection and mitigation of cyberbullying, and surrounding problems such as internet hate and online harassment.

Emotion Awareness in Dialog Agents

This book describes my research on enhancing machines with Emotional Intelligence. I develop a se... more This book describes my research on enhancing machines with Emotional Intelligence. I develop a set of affect analysis tools and propose methods for their efficient utilization. The first system, ML-Ask, separates emotive utterances from neutral and in the emotive utterances seeks for expressions of specific emotion types. The second system, CAO, extracts emoticons from input and determines the emotion types they express. The above systems are then utilized in two methods for enhancing of Human-Computer Interaction. The first is a method for automatic evaluation of conversational agents. In this method the information on user emotional engagement during conversation is reinterpreted to specify general attitudes to conversational agents. The second method determines whether emotions expressed by speaker are appropriate for the context of the conversation. The information on affective states of the user-speaker is confronted with gathered from the Internet list of emotions that should be expressed at the moment. I conclude the book with a discussion on other applications for the proposed methods and further work needed for full implementation of Emotional Intelligence in machines.

Towards Socialized Machines: Emotions and Sense of Humour in Conversational Agents

by Michal Ptaszynski and Rafal Rzepka

From the beginning of computer era over half a century ago, humanity was fascinated by the idea o... more From the beginning of computer era over half a century ago, humanity was fascinated by the idea of creating a machine substituting their mental capabilities. This New Age version of Mary Shelley's Frankenstein gave birth to S-F literature and was one of the motors for development of our civilisation. The mental functions digitalized as the first ones were fast processing of large numbers or sophisticated formulas for specialized fields like mathematics or physics. These functions were the most troublesome for humans, but the easiest to process mechanically. Ironically, the human mental functions said to be the most human-like, and thought of as the ones which make up a grown well-socialized man, such as a sense of humour or understanding emotions of others, were neglected in Computer Science for a long time as too subjective and therefore unscientific...

Looking for Razors and Needles in a Haystack: Multifaceted Analysis of Suicidal Declarations on Social Media—A Pragmalinguistic Approach

by Michal Ptaszynski and Maciej Brochocki

International Journal of Environmental Research and Public Health

In this paper, we study language used by suicidal users on Reddit social media platform. To do th... more In this paper, we study language used by suicidal users on Reddit social media platform. To do that, we firstly collect a large-scale dataset of Reddit posts and annotate it with highly trained and expert annotators under a rigorous annotation scheme. Next, we perform a multifaceted analysis of the dataset, including: (1) the analysis of user activity before and after posting a suicidal message, and (2) a pragmalinguistic study on the vocabulary used by suicidal users. In the second part of the analysis, we apply LIWC, a dictionary-based toolset widely used in psychology and linguistic research, which provides a wide range of linguistic category annotations on text. However, since raw LIWC scores are not sufficiently reliable, or informative, we propose a procedure to decrease the possibility of unreliable and misleading LIWC scores leading to misleading conclusions by analyzing not each category separately, but in pairs with other categories. The analysis of the results supported t...

Improving Basic Natural Language Processing Tools for the Ainu Language

Information

Ainu is a critically endangered language spoken by the native inhabitants of northern Japan. This... more Ainu is a critically endangered language spoken by the native inhabitants of northern Japan. This paper describes our research aimed at the development of technology for automatic processing of text in Ainu. In particular, we improved the existing tools for normalizing old transcriptions, word segmentation, and part-of-speech tagging. In the experiments we applied two Ainu language dictionaries from different domains (literary and colloquial) and created a new data set by combining them. The experiments revealed that expanding the lexicon had a positive impact on the overall performance of our tools, especially with test data unrelated to any of the training sets used.

MiNgMatch—A Fast N-gram Model for Word Segmentation of the Ainu Language

Information

Word segmentation is an essential task in automatic language processing for languages where there... more Word segmentation is an essential task in automatic language processing for languages where there are no explicit word boundary markers, or where space-delimited orthographic words are too coarse-grained. In this paper we introduce the MiNgMatch Segmenter—a fast word segmentation algorithm, which reduces the problem of identifying word boundaries to finding the shortest sequence of lexical n-grams matching the input text. In order to validate our method in a low-resource scenario involving extremely sparse data, we tested it with a small corpus of text in the critically endangered language of the Ainu people living in northern parts of Japan. Furthermore, we performed a series of experiments comparing our algorithm with systems utilizing state-of-the-art lexical n-gram-based language modelling techniques (namely, Stupid Backoff model and a model with modified Kneser-Ney smoothing), as well as a neural model performing word segmentation as character sequence labelling. The experiment...

How Religion and Morality Correlate in Age of Society 5.0: Statistical Analysis of Emotional and Moral Associations with Buddhist Religious Terms Appearing on Japanese Blogs

Cognitive Systems Research

ML-Ask: Open Source Affect Analysis Software for Textual Input in Japanese

Journal of Open Research Software

Automatically Annotating A Five-Billion-Word Corpus of Japanese Blogs for Sentiment and Affect Analysis Computer Speech and Language (CSL)

Affect as Information about Users' Attitudes to Conversational Agents

Proceedings of the 2008 Ieee Wic Acm International Conference on Web Intelligence and Intelligent Agent Technology Volume 03, 2008

Multi-humoroid: joking system that reacts with humor to humans' bad moods

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems Volume 1 Volume 1, 2010

Language Combinatorics: A Sentence Pattern Extraction Architecture Based on Combinatorial Explosion

When Your Users Are Not Serious - Using Web-based Associations Affect and Humor for Generating Appropriate Utterances for Inappropriate Input

Transactions of the Japanese Society For Artificial Intelligence, 2010

Crossing Word Borders - Towards Phrasal Pun Generation Engine

... In our previous works we showed that implementing a very simple pun genera-tor into a chatter... more ... In our previous works we showed that implementing a very simple pun genera-tor into a chatterbot can visibly improve its performance. ... One of the first and probably most robust sys-tems in the field of pun processing is Binsted&amp;amp;amp;#x27;s JAPE punning riddles generator (Binsted, 1996 ...

Machine Moral Development: Moral Reasoning Agent Based on Wisdom of Web-Crowd and Emotions

Contextual Valence Shifters Supporting Affect Analysis of Utterances in Japanese

Double Standpoint Evaluation Method for Affect Analysis Systems

人工知能学会全国大会論文集, 2008

Automatically annotating a five-billion-word corpus of Japanese blogs for affect and sentiment analysis

Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis, Jul 12, 2012

Science of Emoticons: Research Framework and State of the Art in Analysis of kaomoji-type Emoticons

A Survey on Large Scale Web Based Corpora

Conscience of Blogs: Verifying Contextual Appropriateness of Emotions Based on Blog Contents

Affecting Corpora:Experiments with Automatic Affect Annotation System - A Case Study of the 2channel Forum

Emotion Recognition in Humorous Human-Computer Communication: A Conversational System Using Humor According to Use Emotions

Deep Learning Hybrid Models for Multilingual Cyberbullying Detection: Insights from Bangla and Chittagonian Languages

Improving Low-Resource Speech Recognition through Multilingual Fine-Tuning with Language Identifiers and Self-Training

Serious processing for frivolous purpose

AI Training for Thunderstorm Training: Better Situational Awareness for Disaster Tweets Using Context and Emotions

Reason based machine learning approach to detect bangla abusive social media comments

Automatic sentiment score generation method for sightspots review system

DISASTER TWEET CLASSIFICATION FOR DAMAGE ASSESSMENT AND ITS IMPROVEMENT WITH FEATURE ANALYSIS

IMPROVEMENT OF QUANTITATIVE LEARNER'S MOTIVATION METHOD BY OPTIMIZING THE COMBINATION OF THE ELEMENTS

Supporting Inbound Tourism in Hokkaido: Keyword Extraction and Focus Point Analysis from Spot Reviews

Spicing up the game for underresourced language learning: Preliminary experiments with Ainu language-speaking Pepper robot

Epistolary Education in 21st Century: A System to Support Composition of E-mails by Students to Superiors in Japanese

Predicting University Students’ Public Transport Preferences for Sustainability Improvement

A Study in Practical Solutions to Sarcasm Detection with Machine Learning and Knowledge Engineering Techniques.

A proposal of prediction method using word polarity information for future event prediction support system

Improving tokenization, transcription normalization and part-of-speech tagging of ainu language through merging multiple dictionaries

Application of future sentence reference extraction in support of future event prediction

Combining multiple dictionaries to improve tokenization of Ainu language

Advances in Curling Game Information Analysis by Considering Starting Position.

Quality improvement of a gear transmission by means of genetic algorithm

Emoji-Aware Attention-based Bi-directional GRU Network Model for Chinese Sentiment Analysis.

Past, Present, and Future of Automatic Cyberbullying Detection Research

Expert-annotated dataset to study cyberbullying in Polish language

Classification of disaster tweets for damage assessment, and improvement by feature analysis

Automatic vulgar word extraction method with application to vulgar remark detection in chittagonian dialect of bangla

Cyberbullying detection for low-resource languages and dialects: Review of the state of the art

Emotive Information Discovery from User Textual Input Using Emotion Expression Element and Web Mining

Adaptation of a multilingual speech representation model for a new, underresourced language via multilingual fine-tuning and continued pretraining

Improving Polish to English Neural Machine Translation with Transfer Learning: Effects of Data Volume and Language Similarity

A new approach to extracting tourism focus points from Chinese inbound tourist reviews after COVID-19

Enhancing cross-lingual learning: Optimal transfer language selection with linguistic similarity

T “wards S “cialized Machines: Em “ti “ns and Sense “f Hum “ur in C “nversati “nal Agents

Zero-shot cross-lingual transfer language selection using linguistic similarity

Does change in ethical education influence core moral values? Towards culture-aware morality model

Adapting multilingual speech representation model for a new, underresourced language through multilingual fine-tuning and continued pretraining

Namespotting: Username toxicity and actual toxic behavior on Reddit

Transfer language selection for zero-shot cross-lingual abusive language detection

日本語大規模ブログコーパス YACIS に基づいた ELECTRA 事前学習済み言語モデルの作成及び性能評価

A Method of Supplementing Reviews to Less-Known Tourist Spots Using Geotagged Tweets

$Research paper thumbnail of Wykorzystanie algorytmu genetycznego do optymalizacji ilo{\'s}ciowego modelu motywacji ucznia$

Wykorzystanie algorytmu genetycznego do optymalizacji ilo{\'s}ciowego modelu motywacji ucznia

Language Sense and Communication on Computer

Annotating Japanese Blogs with Syntactic and Affective Information.

Affect analysis of textual input utterance in Japanese and its application in human-computer interaction

Description and Initial Analysis of Cyberbullying Dataset

Brute Force Search Method for Cyberbullying Detection