Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

A smarter process for sensing the information space

Published: 01 July 2010 Publication History

Abstract

As a result of the growth of the Internet, the amount of available information is exponentially increasing. However, increasing the amount of information does not imply increasing usefulness. Furthermore, as the complexity of business relationships increases, there is a natural tendency toward less structured interaction between entities. This highlights the growing relevance of unstructured information in documenting the interactions of organizations and individuals. Analyzing and making sense of this unstructured information space requires more than text-mining algorithms; it requires a strategic approach. We propose a unified approach that addresses a variety of information space analytics problems. Our method for making sense of unstructured data is described by six steps that are analogous to the algebraic order of operations PEMDAS (parenthesis, exponent, multiplication, division, addition, and subtraction). These basic text-mining operations can be combined in many interesting ways to handle a diverse set of problems, and just as in algebra, it is critical that these operations be performed in the correct order to guarantee a meaningful result. In this paper, we describe how PEMDAS has been implemented within organizations to enable decisions that produced measurable business value.

References

[1]
C. Bühring-Uhle, L. Kirchhoff, and G. Scherer, Arbitration and Mediation in International Business. The Hague, The Netherlands: Kluwer Law Int., 2006.
[2]
S. VanBoskirk, US Interactive Marketing Forecast, 2007 to 2012. Cambridge, MA: Forrester Res., Inc., Oct. 10, 2007. {Online}. Available: http://impaqt.com/downloads/WP_ mediaMix_03-08.pdf.
[3]
S. Balegno, Social Media's Role in Building Your Brand, Apr. 2009. {Online}. Available: http://comprehension.prsa.org/?p=250.
[4]
J. D. Wells, W. L. Fuerst, and J. Choobineh, "Managing information technology (IT) for one-to-one customer interaction," Inf. Manage., vol. 35, no. 1, pp. 53-62, Jan. 1999.
[5]
R. Chapman and M. Corso, "From continuous improvement to collaborative innovation: The next challenge in supply chain management," Prod. Plan. Control, vol. 16, no. 4, pp. 339-344, Jun. 2005.
[6]
W. S. Spangler, J. T. Kreulen, and J. F. Newswanger, "Machines in the conversation: Detecting themes and trends in information communication streams," IBM Syst. J., vol. 45, no. 4, pp. 785-800, 2006.
[7]
C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press, 1999.
[8]
P. Jackson and I. Moulinier, Natural Language Processing for Online Applications: Text Retrieval, Extraction, and Categorization. Philadelphia, PA: John Benjamins Publishing Co, 2002.
[9]
T. Gotz and O. Suhre, "Design and implementation of the UIMA common analysis system," IBM Syst. J., vol. 43, no. 3, pp. 476-489, 2004.
[10]
S. Spangler and J. Kreulen, Mining the Talk: Unlocking the Business Value in Unstructured Information. Boston, MA: IBM Press, Pearson PLC, 2008, pp. 9-16.
[11]
S. Spangler, Y. Chen, L. Proctor, A. Lelescu, A. Behal, B. He, T. D. Griffin, A. Liu, B. Wade, and T. Davis, "COBRA--Mining web for corporate brand and reputation analysis," in Proc. IEEE/ WIC/ACM Int. Conf. WI, 2007, pp. 11-17.
[12]
F. Beil, M. Ester, and X. Xu, "Frequent term-based text clustering," in Proc. 8th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Edmonton, AB, Canada, 2002, pp. 436-442.
[13]
B. C. M. Fung, K. Wang, and M. Ester, "Hierarchical document clustering using frequent itemsets," in Proc. SIAM Int. Conf. Data Mining, 2003, pp. 59-70.
[14]
X. Liu and P. He, "A study on text clustering algorithms based on frequent term sets," in Proc. 1st Int. Conf. Adv. Data Mining Appl., Wuhan, China, Jul. 22-24, 2005, pp. 347-354.
[15]
M. Berry, Z. Drmac, and E. Jessup, "Matrices, vector spaces, and information retrieval," SIAM Rev., vol. 41, no. 2, pp. 335-362, Jun. 1999.
[16]
S. Spangler, L. Proctor, and Y. Chen, "Multi-taxonomy: Determining perceived brand characteristics from web data," in Proc. IEEE/WIC/ACM Int. Conf. WI-IAT, 2008, vol. 1, pp. 258-264.
[17]
S. Spangler and J. Kreulen, "Interactive methods of taxonomy editing and validation," in Next Generation of Data Mining Applications, M. Kantardzic and J. Zurada, Eds. Piscataway, NJ: IEEE Press, 2005, pp. 495-524.
[18]
S. Spangler and J. Kreulen, "Interactive methods for taxonomy editing and validation," in Proc. 11th Int. Conf. Inf. Knowl. Mining, McLean, VA, 2002, pp. 665-668.
[19]
D. Booth, H. Haas, F. McCabe, E. Newcomer, C. Ferris, and D. Orchard, Web Services Architecture. {Online}. Available: http://www.w3.org/TR/ws-arch/wsa.pdf
[20]
B. He, R. Wang, Y. Chen, A. Lelescu, and J. Rhodes, "BIwTL: A business information warehouse toolkit and language for warehousing simplification and automation," in Proc. ACM SIGMOD, Beijing, China, 2007.
[21]
Sun Microsystems, Lesson: Regular Expressions (The Java Tutorials; Essential Classes). {Online}. Available: http://java.sun. com/docs/books/tutorial/essential/regex/
[22]
J. A. Hartigan, Clustering Algorithms. New York: Wiley, 1975.
[23]
G. R. Wagman and S. B. Scofield, "The competitive advantage of intellectual property," SAM Adv. Manage. J., vol. 64, no. 3, pp. 4-8, 1999.
[24]
World Intellectual Property Organization, International Classifications at WIPO. {Online}. Available: http://www.wipo.int/ classifications/fulltext/new_ipc/ipcen.html
[25]
J. Rhodes, S. Boyer, J. Kreulen, Y. Chen, and P. Ordonez, "Mining patents using molecular similarity search," in Proc. Pacific Symp. Biocomputing, 2007, vol. 12, pp. 304-315.
[26]
M. Al Hasan, W. S. Spangler, T. D. Griffin, and A. Alba, "COA: Finding novel patents through text analysis," in Proc. 15th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, Paris, France, 2009, pp. 1175-1184.
[27]
IBM Corporation, IBM--Kraft Australia Explores a New Frontier of Customer Understanding Through Advanced Analytics (04/17/2009). {Online}. Available: http://www-01.ibm.com/ software/success/cssdb.nsf/cs/JSTS-7QWTFM?OpenDocument& Site=corp&cty=en_us
[28]
J. Robertson, "Vegemite the Internet's 'most loved brand': IBM survey," in Courier Mail, Jul. 22, 2008. {Online}. Available: http:// abc.net.au/mediawatch/transcripts/0824_vegemite.pdf

Cited By

View all
  • (2011)Conceptualizing smart city with dimensions of technology, people, and institutionsProceedings of the 12th Annual International Digital Government Research Conference: Digital Government Innovation in Challenging Times10.1145/2037556.2037602(282-291)Online publication date: 12-Jun-2011

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IBM Journal of Research and Development
IBM Journal of Research and Development  Volume 54, Issue 4
July 2010
95 pages

Publisher

IBM Corp.

United States

Publication History

Published: 01 July 2010
Accepted: 21 August 2009
Received: 14 July 2009

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2011)Conceptualizing smart city with dimensions of technology, people, and institutionsProceedings of the 12th Annual International Digital Government Research Conference: Digital Government Innovation in Challenging Times10.1145/2037556.2037602(282-291)Online publication date: 12-Jun-2011

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media