Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1835804.1835908acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Redefining class definitions using constraint-based clustering: an application to remote sensing of the earth's surface

Published: 25 July 2010 Publication History

Abstract

Two aspects are crucial when constructing any real world supervised classification task: the set of classes whose distinction might be useful for the domain expert, and the set of classifications that can actually be distinguished by the data. Often a set of labels is defined with some initial intuition but these are not the best match for the task. For example, labels have been assigned for land cover classification of the Earth but it has been suspected that these labels are not ideal and some classes may be best split into subclasses whereas others should be merged. This paper formalizes this problem using three ingredients: the existing class labels, the underlying separability in the data, and a special type of input from the domain expert. We require a domain expert to specify an L × L matrix of pairwise probabilistic constraints expressing their beliefs as to whether the L classes should be kept separate, merged, or split. This type of input is intuitive and easy for experts to supply. We then show that the problem can be solved by casting it as an instance of penalized probabilistic clustering (PPC). Our method, Class-Level PPC (CPPC) extends PPC showing how its time complexity can be reduced from O(N2) to O(NL) for the problem of class re-definition. We further extend the algorithm by presenting a heuristic to measure adherence to constraints, and providing a criterion for determining the model complexity (number of classes) for constraint-based clustering. We demonstrate and evaluate CPPC on artificial data and on our motivating domain of land cover classification. For the latter, an evaluation by domain experts shows that the algorithm discovers novel class definitions that are better suited to land cover classification than the original set of labels.

Supplementary Material

JPG File (kdd2010_preston_rcdu_01.jpg)
MOV File (kdd2010_preston_rcdu_01.mov)

References

[1]
A. M. Aisen et al. Automated storage and retrieval of medical images to assist diagnosis: Implementation and preliminary assessment. Radiology, 228:265--270, July 2003.
[2]
H. Akaike. A new look at the statistical identification model. IEEE Trans. Auto Control, AC-19:716--723, 1974.
[3]
S. Basu, A. Banerjee, and R. J. Mooney. Semi-supervised clustering by seeding. In ICML, pages 27--34, 2002.
[4]
S. Basu, I. Davidson, and K. Wagstaff. Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman & Hall/CRC, 2008.
[5]
C.-C. Chen and D. Landgrebe. A spectral feature design system for the hiris/modis era. Geoscience and Remote Sensing, IEEE Transactions on, 27(6):681--686, Nov 1989.
[6]
U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data-mining to knowledge discovery: An overview. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurasamy, editors, Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, 1996.
[7]
M. Friedl et al. Global land cover mapping from MODIS: Algorithms and early results. Remote Sensing of Environment, 83:287--302, 2002.
[8]
H. Ghassemian and D. Landgrebe. Object-oriented feature extraction method for image data compaction. Control Systems Magazine, IEEE, 8(3):42--48, Jun 1988.
[9]
E. J. Hannan and B. G. Quinn. The determination of the order of an autoregression. Journal of the Royal Statistical Society, Series B, 41(2):190--195, 1979.
[10]
T. S. Jaakkola. Tutorial on variational approximation methods. In Advanced Mean Field Methods: Theory and Practice, pages 129--159. MIT Press, 2000.
[11]
I. T. Jolliffe. Principal component analysis. Springer Series in Statistics, 1986.
[12]
B. Kulis, S. Basu, I. S. Dhillon, and R. J. Mooney. Semi-supervised graph clustering: a kernel approach. Machine Learning, 74(1):1--22, 2009.
[13]
M. H. C. Law, E. Topchy, and A. K. Jain. Clustering with soft and group constraints. Proc. Joint IAPR International Workshops on Structural, Syntactic, And Statistical Pattern Recognition, pages 662--670, 2004.
[14]
T. Loveland et al. Development of a global land cover characteristics database and IGBP DISCover from 1-km AVHRR data. Remote Sensing of Environment, 83:287--302, 2002.
[15]
Z. Lu and T. K. Leen. Penalized probabilistic clustering. Neural Comput., 19(6):1528--1567, 2007.
[16]
J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proc. of the Fifth Sym. on Math, Statistics, and Probability, pages 281--297, 1967.
[17]
G. J. McLachlan and K. E. Basford. Mixture models. Inference and applications to clustering. Statistics: Textbooks and Monographs, 1988.
[18]
M. Pugh and A. Waxman. Classification of spectrally-similar land cover using multi-spectral neural image fusion and the fuzzy artmap neural classifier. In IGARSS 2006, pages 1808--1811, 31 2006-Aug. 4 2006.
[19]
G. Schwartz. Estimating the dimension of a model. The Annals of Statistics, 5(2):461--464, 1978.
[20]
N. Shental, A. Bar-hillel, and D. Weinshall. Computing gaussian mixture models with EM using equivalence constraints. In In Advances in Neural Information Processing Systems 16. MIT Press, 2003.
[21]
S. D. Spiegelhalter et al. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B, 64(4):583--639, 2002.
[22]
N. Ueda, R. Nakano, Z. Ghahramani, and G. E. Hinton. Split and merge EM algorithm for improving gaussian mixture density estimates. J. VLSI Signal Process. Syst., 26(1-2):133--140, 2000.
[23]
U. von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395--416, 2007.
[24]
K. Wagstaff, C. Cardie, and S. Schroedl. Constrained k-means clustering with background knowledge. In ICML, pages 577--584, 2001.
[25]
Q. Zhao and D. J. Miller. Mixture modeling with pairwise, instance-level class constraints. Neural Computation, 17(11):2482--2507, 2005.

Cited By

View all
  • (2024)COR Themes for Readability from Iterative FeedbackProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642108(1-23)Online publication date: 11-May-2024
  • (2019)Research on Evaluation Method of Wargame Strategy Based on Fuzzy Petri Net2019 2nd International Conference on Information Systems and Computer Aided Education (ICISCAE)10.1109/ICISCAE48440.2019.221710(626-629)Online publication date: Sep-2019
  • (2019)Research on Deep Reinforcement Learning Exploration Strategy in Wargame Deduction2019 2nd International Conference on Information Systems and Computer Aided Education (ICISCAE)10.1109/ICISCAE48440.2019.221709(622-625)Online publication date: Sep-2019
  • Show More Cited By

Index Terms

  1. Redefining class definitions using constraint-based clustering: an application to remote sensing of the earth's surface

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
      July 2010
      1240 pages
      ISBN:9781450300551
      DOI:10.1145/1835804
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 July 2010

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. class discovery
      2. constraint-based clustering
      3. kdd-process
      4. mining scientific data
      5. remote sensing

      Qualifiers

      • Research-article

      Conference

      KDD '10
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 30 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)COR Themes for Readability from Iterative FeedbackProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642108(1-23)Online publication date: 11-May-2024
      • (2019)Research on Evaluation Method of Wargame Strategy Based on Fuzzy Petri Net2019 2nd International Conference on Information Systems and Computer Aided Education (ICISCAE)10.1109/ICISCAE48440.2019.221710(626-629)Online publication date: Sep-2019
      • (2019)Research on Deep Reinforcement Learning Exploration Strategy in Wargame Deduction2019 2nd International Conference on Information Systems and Computer Aided Education (ICISCAE)10.1109/ICISCAE48440.2019.221709(622-625)Online publication date: Sep-2019
      • (2018)Modified CURE algorithm with enhancement to identify number of clustersInternational Journal of Artificial Intelligence and Soft Computing10.1504/IJAISC.2016.0785175:3(226-240)Online publication date: 13-Dec-2018
      • (2017)Non-intrusive monitoring of overlapping home appliances using smart meter measurements2017 IEEE Power and Energy Conference at Illinois (PECI)10.1109/PECI.2017.7935717(1-5)Online publication date: Feb-2017
      • (2013)Data Mining, A Promising Tool for Large-Area Cropland MappingIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2013.22385076:5(2132-2138)Online publication date: Oct-2013
      • (2011)Serendipitous learningProceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2020408.2020608(1343-1351)Online publication date: 21-Aug-2011

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media