Classifying Documents within Multiple Hierarchical Datasets using Multi-Task Learning

Naik, Azad; Charuvaka, Anveshi; Rangwala, Huzefa

Computer Science > Machine Learning

arXiv:1706.01583 (cs)

[Submitted on 6 Jun 2017]

Title:Classifying Documents within Multiple Hierarchical Datasets using Multi-Task Learning

Authors:Azad Naik, Anveshi Charuvaka, Huzefa Rangwala

View PDF

Abstract:Multi-task learning (MTL) is a supervised learning paradigm in which the prediction models for several related tasks are learned jointly to achieve better generalization performance. When there are only a few training examples per task, MTL considerably outperforms the traditional Single task learning (STL) in terms of prediction accuracy. In this work we develop an MTL based approach for classifying documents that are archived within dual concept hierarchies, namely, DMOZ and Wikipedia. We solve the multi-class classification problem by defining one-versus-rest binary classification tasks for each of the different classes across the two hierarchical datasets. Instead of learning a linear discriminant for each of the different tasks independently, we use a MTL approach with relationships between the different tasks across the datasets established using the non-parametric, lazy, nearest neighbor approach. We also develop and evaluate a transfer learning (TL) approach and compare the MTL (and TL) methods against the standard single task learning and semi-supervised learning approaches. Our empirical results demonstrate the strength of our developed methods that show an improvement especially when there are fewer number of training examples per classification task.

Comments:	IEEE International Conference on Tools with Artificial Intelligence (ICTAI), 2013
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1706.01583 [cs.LG]
	(or arXiv:1706.01583v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1706.01583

Submission history

From: Azad Naik [view email]
[v1] Tue, 6 Jun 2017 02:17:40 UTC (100 KB)

Computer Science > Machine Learning

Title:Classifying Documents within Multiple Hierarchical Datasets using Multi-Task Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Classifying Documents within Multiple Hierarchical Datasets using Multi-Task Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators