research-article

Free access

Kernel slicing: scalable online training with conjunctive features

Authors:

Naoki Yoshinaga,

Masaru KitsuregawaAuthors Info & Claims

COLING '10: Proceedings of the 23rd International Conference on Computational Linguistics

Pages 1245 - 1253

Published: 23 August 2010 Publication History

Abstract

This paper proposes an efficient online method that trains a classifier with many conjunctive features. We employ kernel computation called kernel slicing, which explicitly considers conjunctions among frequent features in computing the polynomial kernel, to combine the merits of linear and kernel-based training. To improve the scalability of this training, we reuse the temporal margins of partial feature vectors and terminate unnecessary margin computations. Experiments on dependency parsing and hyponymy-relation extraction demonstrated that our method could train a classifier orders of magnitude faster than kernel-based online learning, while retaining its space efficiency.

References

[1]

Ando, Rie Kubota and Tong Zhang. 2005. A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6:1817--1853.

Digital Library

[2]

Aoe, Jun'ichi. 1989. An efficient digital search algorithm by using a double-array structure. IEEE Transactions on Software Engineering, 15(9):1066--1077.

Digital Library

[3]

Bellare, Kedar, Partha Pratim Talukdar, Giridhar Kumaran, Fernando Pereira, Mark Liberman, Andrew McCallum, and Mark Dredze. 2007. Lightly-supervised attribute extraction. In Proc. NIPS 2007 Workshop on Machine Learning for Web Search.

[4]

Cavallanti, Giovanni, Nicolò Cesa-Bianchi, and Claudio Gentile. 2007. Tracking the best hyperplane with a simple budget perceptron. Machine Learning, 69(2--3):143--167.

Digital Library

[5]

Chang, Yin-Wen, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard, and Chih-Jen Lin. 2010. Training and testing low-degree polynomial data mappings via linear SVM. Journal of Machine Learning Research, 11:1471--1490.

Digital Library

[6]

Crammer, Koby, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. 2006. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7:551--585.

Digital Library

[7]

Daumé III, Hal. 2006. Practical Structured Learning Techniques for Natural Language Processing. Ph.D. thesis, University of Southern California.

Digital Library

[8]

Daumé III, Hal. 2008. Cross-task knowledge-constrained self training. In Proc. EMNLP 2008, pages 680--688.

Digital Library

[9]

Dekel, Ofer, Shai Shalev-Shwartz, and Yoram Singer. 2008. The forgetron: A kernel-based perceptron on a budget. SIAM Journal on Computing, 37(5):1342--1372.

Digital Library

[10]

Freund, Yoav and Robert E. Schapire. 1999. Large margin classification using the perceptron algorithm. Machine Learning, 37(3):277--296.

Digital Library

[11]

Goldberg, Yoav and Michael Elhadad. 2008. splitSVM: fast, space-efficient, non-heuristic, polynomial kernel computation for NLP applications. In Proc. ACL-08: HLT, Short Papers, pages 237--240.

Digital Library

[12]

Isozaki, Hideki and Hideto Kazawa. 2002. Efficient support vector classifiers for named entity recognition. In Proc. COLING 2002, pages 1--7.

Digital Library

[13]

Iwakura, Tomoya and Seishi Okamoto. 2008. A fast boosting-based learner for feature-rich tagging and chunking. In Proc. CoNLL 2008, pages 17--24.

Digital Library

[14]

Kudo, Taku and Yuji Matsumoto. 2003. Fast methods for kernel-based text analysis. In Proc. ACL 2003, pages 24--31.

Digital Library

[15]

Liang, Percy, Hal Daumé III, and Dan Klein. 2008. Structure compilation: trading structure for features. In Proc. ICML 2008, pages 592--599.

Digital Library

[16]

Okanohara, Daisuke and Jun'ichi Tsujii. 2007. A discriminative language model with pseudo-negative samples. In Proc. ACL 2007, pages 73--80.

[17]

Okanohara, Daisuke and Jun'ichi Tsujii. 2009. Learning combination features with L ₁ regularization. In Proc. NAACL HLT 2009, Short Papers, pages 97--100.

Digital Library

[18]

Orabona, Francesco, Joseph Keshet, and Barbara Caputo. 2009. Bounded kernel-based online learning. Journal of Machine Learning Research, 10:2643--2666.

Digital Library

[19]

Perkins, Simon, Kevin Lacker, and James Theiler. 2003. Grafting: fast, incremental feature selection by gradient descent in function space. Journal of Machine Learning Research, 3:1333--1356.

Digital Library

[20]

Sassano, Manabu. 2004. Linear-time dependency analysis for Japanese. In Proc. COLING 2004, pages 8--14.

Digital Library

[21]

Sumida, Asuka, Naoki Yoshinaga, and Kentaro Torisawa. 2008. Boosting precision and recall of hyponymy relation acquisition from hierarchical layouts in Wikipedia. In Proc. LREC 2008, pages 2462--2469.

[22]

Tsuruoka, Yoshimasa, Jun'ichi Tsujii, and Sophia Ananiadou. 2009. Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty. In Proc. ACL-IJCNLP 2009, pages 477--485.

Digital Library

[23]

Williams, Hugh E. and Justin Zobel. 1999. Compressing integers for fast file access. The Computer Journal, 42(3):193--201.

[24]

Wu, Yu-Chieh, Jie-Chi Yang, and Yue-Shi Lee. 2007. An approximate approach for training polynomial kernel SVMs in linear time. In Proc. ACL 2007, Interactive Poster and Demonstration Sessions, pages 65--68.

Digital Library

[25]

Yata, Susumu, Masahiro Tamura, Kazuhiro Morita, Masao Fuketa, and Jun'ichi Aoe. 2009. Sequential insertions and performance evaluations for doublearrays. In Proc. the 71st National Convention of IPSJ, pages 1263--1264. (In Japanese).

[26]

Yoshinaga, Naoki and Masaru Kitsuregawa. 2009. Polynomial to linear: efficient classification with conjunctive features. In Proc. EMNLP 2009, pages 1542--1551.

Digital Library

Cited By

Iwanari TYoshinaga NKaj NNishina TToyoda MKitsuregawa M(2016)Ordering concepts based on common attribute intensityProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3061053.3061143(3747-3753)Online publication date: 9-Jul-2016
https://dl.acm.org/doi/10.5555/3061053.3061143
Mu TMiwa MTsujii JAnaniadou S(2014)DISCOVERING ROBUST EMBEDDINGS IN DISSIMILARITY SPACE FOR HIGH-DIMENSIONAL LINGUISTIC FEATURESComputational Intelligence10.1111/j.1467-8640.2012.00452.x30:2(285-315)Online publication date: 1-May-2014
https://dl.acm.org/doi/10.1111/j.1467-8640.2012.00452.x
Takaku YKaji NYoshinaga NToyoda MTsujii JHenderson JPasca M(2012)Identifying constant and unique relations by using time-series textProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2391044(883-892)Online publication date: 12-Jul-2012
https://dl.acm.org/doi/10.5555/2390948.2391044

Kernel slicing: scalable online training with conjunctive features
1. Computing methodologies
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Multi-label learning with kernel local label information
Abstract
It is important to fully utilize label correlations in multi-label learning. If there is a strong positive correlation between label i and label j, an instance associated with label i also likely has label j simultaneously. So, label ...
Highlights
- Label correlations are used to train a model and predict labels simultaneously.
Kernel selection forl semi-supervised kernel machines
ICML '07: Proceedings of the 24th international conference on Machine learning

Existing semi-supervised learning methods are mostly based on either the cluster assumption or the manifold assumption. In this paper, we propose an integrated regularization framework for semi-supervised kernel machines by incorporating both the ...
Semi-Supervised Kernel Regression
ICDM '06: Proceedings of the Sixth International Conference on Data Mining

Insufficiency of training data is a major obstacle in machine learning and data mining applications. Many different semi-supervised learning algorithms have been proposed to tackle this difficulty by leveraging a large amount of unlabeled data. However, ...

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

COLING '10: Proceedings of the 23rd International Conference on Computational Linguistics

August 2010

1408 pages

General Chair:
Aravind K. Joshi
University of Pennsylvania
,
Program Chairs:
Chu-Ren Huang
The Hong Kong Polytechnic University
,
Dan Jurafsky
Stanford University

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 23 August 2010

Qualifiers

Research-article

Acceptance Rates

Overall Acceptance Rate 1,537 of 1,537 submissions, 100%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
205
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)4

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Iwanari TYoshinaga NKaj NNishina TToyoda MKitsuregawa M(2016)Ordering concepts based on common attribute intensityProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3061053.3061143(3747-3753)Online publication date: 9-Jul-2016
https://dl.acm.org/doi/10.5555/3061053.3061143
Mu TMiwa MTsujii JAnaniadou S(2014)DISCOVERING ROBUST EMBEDDINGS IN DISSIMILARITY SPACE FOR HIGH-DIMENSIONAL LINGUISTIC FEATURESComputational Intelligence10.1111/j.1467-8640.2012.00452.x30:2(285-315)Online publication date: 1-May-2014
https://dl.acm.org/doi/10.1111/j.1467-8640.2012.00452.x
Takaku YKaji NYoshinaga NToyoda MTsujii JHenderson JPasca M(2012)Identifying constant and unique relations by using time-series textProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2391044(883-892)Online publication date: 12-Jul-2012
https://dl.acm.org/doi/10.5555/2390948.2391044

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten