We address the challenge of extracting structured information from business documents without det... more We address the challenge of extracting structured information from business documents without detailed annotations. We propose Deep Conditional Probabilistic Context Free Grammars (DeepCPCFG) to parse two-dimensional complex documents and use Recursive Neural Networks to create an end-to-end system for finding the most probable parse that represents the structured information to be extracted. This system is trained end-to-end with scanned documents as input and only relational-records as labels. The relational-records are extracted from existing databases avoiding the cost of annotating documents by hand. We apply this approach to extract information from scanned invoices achieving state-of-the-art results despite using no hand-annotations.
Combinatorial Optimization problems occur frequently as a fundamental problem in operations resea... more Combinatorial Optimization problems occur frequently as a fundamental problem in operations research. The optimal solution to many problems in operations research can be derived from the optimal solution of basic combinatorial problems. Some examples of such problems include Traveling Salesman Problem (TSP), Quadratic Assignment Problem (QAP), and Job Shop Scheduling Problem (JSP).
Abstract Modern social networks often consist of multiple relations among individuals. Understand... more Abstract Modern social networks often consist of multiple relations among individuals. Understanding the structure of such multi-relational network is essential. In sociology, one way of structural analysis is to identify different positions and roles using blockmodels. In this paper, we generalize stochastic blockmodels to Generalized Stochastic Blockmodels (GSBM) for performing positional and role analysis on multi-relational networks.
ABSTRACT Social media services such as Twitter generate phenomenal volume of content for most rea... more ABSTRACT Social media services such as Twitter generate phenomenal volume of content for most real-world events on a daily basis. Digging through the noise and redundancy to understand the important aspects of the content is a very challenging task. We propose a search and summarization framework to extract relevant representative tweets from an unfiltered tweet stream in order to generate a coherent and concise summary of an event.
Abstract User-to-user interactions have become ubiquitous in Web 2.0. Users exchange emails, post... more Abstract User-to-user interactions have become ubiquitous in Web 2.0. Users exchange emails, post on newsgroups, tag web pages, co-author papers, etc. Through these interactions, users co-produce or co-adopt content items (eg, words in emails, tags in social bookmarking sites). We model such dynamic interactions as a user interaction network, which relates users, interactions, and content items over time. After some interactions, a user may produce content that is more similar to those produced by other users previously.
Abstract Diffusion of items occurs in social networks due to spreading of items through word of m... more Abstract Diffusion of items occurs in social networks due to spreading of items through word of mouth and exogenous factors. These items may be news, products, videos, advertisements or contagious viruses. When a user purchases or consumes one of such items, we say that she adopts the item and she becomes an item adopter. Previous research has studied diffusion process at both the macro and micro levels.
ABSTRACT Many event monitoring systems rely on counting known keywords in streaming text data to ... more ABSTRACT Many event monitoring systems rely on counting known keywords in streaming text data to detect sudden spikes in frequency. But the dynamic and conversational nature of Twitter makes it hard to select known keywords for monitoring. Here we consider a method of automatically finding noun phrases (NPs) as keywords for event monitoring in Twitter.
Abstract Users face many choices on the Web when it comes to choosing which product to buy, which... more Abstract Users face many choices on the Web when it comes to choosing which product to buy, which video to watch, etc. In making adoption decisions, users rely not only on their own preferences, but also on friends. We call the latter social correlation which may be caused by the homophily and social influence effects. In this paper, we focus on modeling social correlation on users&# 8217; item adoptions.
Abstract—In-game actions of real-time strategy (RTS) games are extremely useful in determining th... more Abstract—In-game actions of real-time strategy (RTS) games are extremely useful in determining the players' strategies, analyzing their behaviors and recommending ways to improve their play skills. Unfortunately, unstructured sequences of ingame actions are hardly informative enough for these analyses. The inconsistency we observed in human annotation of ingame data makes the analytical task even more challenging.
We address the challenge of extracting structured information from business documents without det... more We address the challenge of extracting structured information from business documents without detailed annotations. We propose Deep Conditional Probabilistic Context Free Grammars (DeepCPCFG) to parse two-dimensional complex documents and use Recursive Neural Networks to create an end-to-end system for finding the most probable parse that represents the structured information to be extracted. This system is trained end-to-end with scanned documents as input and only relational-records as labels. The relational-records are extracted from existing databases avoiding the cost of annotating documents by hand. We apply this approach to extract information from scanned invoices achieving state-of-the-art results despite using no hand-annotations.
Combinatorial Optimization problems occur frequently as a fundamental problem in operations resea... more Combinatorial Optimization problems occur frequently as a fundamental problem in operations research. The optimal solution to many problems in operations research can be derived from the optimal solution of basic combinatorial problems. Some examples of such problems include Traveling Salesman Problem (TSP), Quadratic Assignment Problem (QAP), and Job Shop Scheduling Problem (JSP).
Abstract Modern social networks often consist of multiple relations among individuals. Understand... more Abstract Modern social networks often consist of multiple relations among individuals. Understanding the structure of such multi-relational network is essential. In sociology, one way of structural analysis is to identify different positions and roles using blockmodels. In this paper, we generalize stochastic blockmodels to Generalized Stochastic Blockmodels (GSBM) for performing positional and role analysis on multi-relational networks.
ABSTRACT Social media services such as Twitter generate phenomenal volume of content for most rea... more ABSTRACT Social media services such as Twitter generate phenomenal volume of content for most real-world events on a daily basis. Digging through the noise and redundancy to understand the important aspects of the content is a very challenging task. We propose a search and summarization framework to extract relevant representative tweets from an unfiltered tweet stream in order to generate a coherent and concise summary of an event.
Abstract User-to-user interactions have become ubiquitous in Web 2.0. Users exchange emails, post... more Abstract User-to-user interactions have become ubiquitous in Web 2.0. Users exchange emails, post on newsgroups, tag web pages, co-author papers, etc. Through these interactions, users co-produce or co-adopt content items (eg, words in emails, tags in social bookmarking sites). We model such dynamic interactions as a user interaction network, which relates users, interactions, and content items over time. After some interactions, a user may produce content that is more similar to those produced by other users previously.
Abstract Diffusion of items occurs in social networks due to spreading of items through word of m... more Abstract Diffusion of items occurs in social networks due to spreading of items through word of mouth and exogenous factors. These items may be news, products, videos, advertisements or contagious viruses. When a user purchases or consumes one of such items, we say that she adopts the item and she becomes an item adopter. Previous research has studied diffusion process at both the macro and micro levels.
ABSTRACT Many event monitoring systems rely on counting known keywords in streaming text data to ... more ABSTRACT Many event monitoring systems rely on counting known keywords in streaming text data to detect sudden spikes in frequency. But the dynamic and conversational nature of Twitter makes it hard to select known keywords for monitoring. Here we consider a method of automatically finding noun phrases (NPs) as keywords for event monitoring in Twitter.
Abstract Users face many choices on the Web when it comes to choosing which product to buy, which... more Abstract Users face many choices on the Web when it comes to choosing which product to buy, which video to watch, etc. In making adoption decisions, users rely not only on their own preferences, but also on friends. We call the latter social correlation which may be caused by the homophily and social influence effects. In this paper, we focus on modeling social correlation on users&# 8217; item adoptions.
Abstract—In-game actions of real-time strategy (RTS) games are extremely useful in determining th... more Abstract—In-game actions of real-time strategy (RTS) games are extremely useful in determining the players' strategies, analyzing their behaviors and recommending ways to improve their play skills. Unfortunately, unstructured sequences of ingame actions are hardly informative enough for these analyses. The inconsistency we observed in human annotation of ingame data makes the analytical task even more challenging.
Uploads
Papers by Freddy C . Chua