article

A smarter process for sensing the information space

Authors:

W. S. Spangler,

A. BehalAuthors Info & Claims

IBM Journal of Research and Development, Volume 54, Issue 4

Pages 375 - 387

https://doi.org/10.1147/JRD.2010.2050541

Published: 01 July 2010 Publication History

Abstract

As a result of the growth of the Internet, the amount of available information is exponentially increasing. However, increasing the amount of information does not imply increasing usefulness. Furthermore, as the complexity of business relationships increases, there is a natural tendency toward less structured interaction between entities. This highlights the growing relevance of unstructured information in documenting the interactions of organizations and individuals. Analyzing and making sense of this unstructured information space requires more than text-mining algorithms; it requires a strategic approach. We propose a unified approach that addresses a variety of information space analytics problems. Our method for making sense of unstructured data is described by six steps that are analogous to the algebraic order of operations PEMDAS (parenthesis, exponent, multiplication, division, addition, and subtraction). These basic text-mining operations can be combined in many interesting ways to handle a diverse set of problems, and just as in algebra, it is critical that these operations be performed in the correct order to guarantee a meaningful result. In this paper, we describe how PEMDAS has been implemented within organizations to enable decisions that produced measurable business value.

References

[1]

C. Bühring-Uhle, L. Kirchhoff, and G. Scherer, Arbitration and Mediation in International Business. The Hague, The Netherlands: Kluwer Law Int., 2006.

[2]

S. VanBoskirk, US Interactive Marketing Forecast, 2007 to 2012. Cambridge, MA: Forrester Res., Inc., Oct. 10, 2007. {Online}. Available: http://impaqt.com/downloads/WP_ mediaMix_03-08.pdf.

[3]

S. Balegno, Social Media's Role in Building Your Brand, Apr. 2009. {Online}. Available: http://comprehension.prsa.org/?p=250.

[4]

J. D. Wells, W. L. Fuerst, and J. Choobineh, "Managing information technology (IT) for one-to-one customer interaction," Inf. Manage., vol. 35, no. 1, pp. 53-62, Jan. 1999.

Digital Library

[5]

R. Chapman and M. Corso, "From continuous improvement to collaborative innovation: The next challenge in supply chain management," Prod. Plan. Control, vol. 16, no. 4, pp. 339-344, Jun. 2005.

[6]

W. S. Spangler, J. T. Kreulen, and J. F. Newswanger, "Machines in the conversation: Detecting themes and trends in information communication streams," IBM Syst. J., vol. 45, no. 4, pp. 785-800, 2006.

Digital Library

[7]

C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press, 1999.

Digital Library

[8]

P. Jackson and I. Moulinier, Natural Language Processing for Online Applications: Text Retrieval, Extraction, and Categorization. Philadelphia, PA: John Benjamins Publishing Co, 2002.

[9]

T. Gotz and O. Suhre, "Design and implementation of the UIMA common analysis system," IBM Syst. J., vol. 43, no. 3, pp. 476-489, 2004.

Digital Library

[10]

S. Spangler and J. Kreulen, Mining the Talk: Unlocking the Business Value in Unstructured Information. Boston, MA: IBM Press, Pearson PLC, 2008, pp. 9-16.

[11]

S. Spangler, Y. Chen, L. Proctor, A. Lelescu, A. Behal, B. He, T. D. Griffin, A. Liu, B. Wade, and T. Davis, "COBRA--Mining web for corporate brand and reputation analysis," in Proc. IEEE/ WIC/ACM Int. Conf. WI, 2007, pp. 11-17.

Digital Library

[12]

F. Beil, M. Ester, and X. Xu, "Frequent term-based text clustering," in Proc. 8th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Edmonton, AB, Canada, 2002, pp. 436-442.

Digital Library

[13]

B. C. M. Fung, K. Wang, and M. Ester, "Hierarchical document clustering using frequent itemsets," in Proc. SIAM Int. Conf. Data Mining, 2003, pp. 59-70.

[14]

X. Liu and P. He, "A study on text clustering algorithms based on frequent term sets," in Proc. 1st Int. Conf. Adv. Data Mining Appl., Wuhan, China, Jul. 22-24, 2005, pp. 347-354.

Digital Library

[15]

M. Berry, Z. Drmac, and E. Jessup, "Matrices, vector spaces, and information retrieval," SIAM Rev., vol. 41, no. 2, pp. 335-362, Jun. 1999.

Digital Library

[16]

S. Spangler, L. Proctor, and Y. Chen, "Multi-taxonomy: Determining perceived brand characteristics from web data," in Proc. IEEE/WIC/ACM Int. Conf. WI-IAT, 2008, vol. 1, pp. 258-264.

Digital Library

[17]

S. Spangler and J. Kreulen, "Interactive methods of taxonomy editing and validation," in Next Generation of Data Mining Applications, M. Kantardzic and J. Zurada, Eds. Piscataway, NJ: IEEE Press, 2005, pp. 495-524.

[18]

S. Spangler and J. Kreulen, "Interactive methods for taxonomy editing and validation," in Proc. 11th Int. Conf. Inf. Knowl. Mining, McLean, VA, 2002, pp. 665-668.

Digital Library

[19]

D. Booth, H. Haas, F. McCabe, E. Newcomer, C. Ferris, and D. Orchard, Web Services Architecture. {Online}. Available: http://www.w3.org/TR/ws-arch/wsa.pdf

[20]

B. He, R. Wang, Y. Chen, A. Lelescu, and J. Rhodes, "BIwTL: A business information warehouse toolkit and language for warehousing simplification and automation," in Proc. ACM SIGMOD, Beijing, China, 2007.

Digital Library

[21]

Sun Microsystems, Lesson: Regular Expressions (The Java Tutorials; Essential Classes). {Online}. Available: http://java.sun. com/docs/books/tutorial/essential/regex/

[22]

J. A. Hartigan, Clustering Algorithms. New York: Wiley, 1975.

Digital Library

[23]

G. R. Wagman and S. B. Scofield, "The competitive advantage of intellectual property," SAM Adv. Manage. J., vol. 64, no. 3, pp. 4-8, 1999.

[24]

World Intellectual Property Organization, International Classifications at WIPO. {Online}. Available: http://www.wipo.int/ classifications/fulltext/new_ipc/ipcen.html

[25]

J. Rhodes, S. Boyer, J. Kreulen, Y. Chen, and P. Ordonez, "Mining patents using molecular similarity search," in Proc. Pacific Symp. Biocomputing, 2007, vol. 12, pp. 304-315.

[26]

M. Al Hasan, W. S. Spangler, T. D. Griffin, and A. Alba, "COA: Finding novel patents through text analysis," in Proc. 15th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Paris, France, 2009, pp. 1175-1184.

Digital Library

[27]

IBM Corporation, IBM--Kraft Australia Explores a New Frontier of Customer Understanding Through Advanced Analytics (04/17/2009). {Online}. Available: http://www-01.ibm.com/ software/success/cssdb.nsf/cs/JSTS-7QWTFM?OpenDocument& Site=corp&cty=en_us

[28]

J. Robertson, "Vegemite the Internet's 'most loved brand': IBM survey," in Courier Mail, Jul. 22, 2008. {Online}. Available: http:// abc.net.au/mediawatch/transcripts/0824_vegemite.pdf

Cited By

Nam TPardo TBertot JNahon KChun SLuna-Reyes LAtluri V(2011)Conceptualizing smart city with dimensions of technology, people, and institutionsProceedings of the 12th Annual International Digital Government Research Conference: Digital Government Innovation in Challenging Times10.1145/2037556.2037602(282-291)Online publication date: 12-Jun-2011
https://dl.acm.org/doi/10.1145/2037556.2037602

Recommendations

Towards smarter documents
CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management

Document analysis research typically focuses on document image understanding or classic problems in text classification, clustering, summarization and discovery. While that is an important aspect of document management, in practice, documents lifecycles ...
Automatic office document classification and information extraction
Extracting information from newspaper archives in Africa

In sub-Saharan Africa, lack of useful information for the public good is one obstacle to the development of public services (public safety, education, healthcare, etc.). This makes the extraction of data from digital archives (e.g., analog sources such ...

Comments

Information & Contributors

Information

Published In

cover image IBM Journal of Research and Development

IBM Journal of Research and Development Volume 54, Issue 4

July 2010

95 pages

ISSN:0018-8646

Issue’s Table of Contents

Publisher

IBM Corp.

United States

Publication History

Published: 01 July 2010

Accepted: 21 August 2009

Received: 14 July 2009

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nam TPardo TBertot JNahon KChun SLuna-Reyes LAtluri V(2011)Conceptualizing smart city with dimensions of technology, people, and institutionsProceedings of the 12th Annual International Digital Government Research Conference: Digital Government Innovation in Challenging Times10.1145/2037556.2037602(282-291)Online publication date: 12-Jun-2011
https://dl.acm.org/doi/10.1145/2037556.2037602

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents