Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 58 results for author: Murtagh, F

.
  1. arXiv:1902.10655  [pdf, other

    cs.IR

    Linear Time Visualization and Search in Big Data using Pixellated Factor Space Mapping

    Authors: Fionn Murtagh

    Abstract: It is demonstrated how linear computational time and storage efficient approaches can be adopted when analyzing very large data sets. More importantly, interpretation is aided and furthermore, basic processing is easily supported. Such basic processing can be the use of supplementary, i.e. contextual, elements, or particular associations. Furthermore pixellated grid cell contents can be utilized a… ▽ More

    Submitted 27 February, 2019; originally announced February 2019.

    Comments: 12 pages, 4 figures. From IFCS 2017 Conference, Tokyo, Japan

    MSC Class: 62-07; 62-09; 60E99 ACM Class: G.3; H.2.8

  2. arXiv:1805.11140  [pdf, other

    cs.CL

    Core Conflictual Relationship: Text Mining to Discover What and When

    Authors: Fionn Murtagh, Giuseppe Iurato

    Abstract: Following detailed presentation of the Core Conflictual Relationship Theme (CCRT), there is the objective of relevant methods for what has been described as verbalization and visualization of data. Such is also termed data mining and text mining, and knowledge discovery in data. The Correspondence Analysis methodology, also termed Geometric Data Analysis, is shown in a case study to be comprehensi… ▽ More

    Submitted 28 May, 2018; originally announced May 2018.

    Comments: 25 pages, 10 figures

  3. arXiv:1705.08503  [pdf, other

    cs.CY

    The Geometry and Topology of Data and Information for Analytics of Processes and Behaviours: Building on Bourdieu and Addressing New Societal Challenges

    Authors: Fionn Murtagh

    Abstract: We begin by summarizing the relevance and importance of inductive analytics based on the geometry and topology of data and information. Contemporary issues are then discussed. These include how sampling data for representativity is increasingly to be questioned. While we can always avail of analytics from a "bag of tools and techniques", in the application of machine learning and predictive analyt… ▽ More

    Submitted 15 May, 2017; originally announced May 2017.

    Comments: 16 pages, 7 figures

    MSC Class: 62H25; 62P25 ACM Class: G.3; I.5.1

  4. arXiv:1704.01871  [pdf, other

    stat.ML

    Massive Data Clustering in Moderate Dimensions from the Dual Spaces of Observation and Attribute Data Clouds

    Authors: Fionn Murtagh

    Abstract: Cluster analysis of very high dimensional data can benefit from the properties of such high dimensionality. Informally expressed, in this work, our focus is on the analogous situation when the dimensionality is moderate to small, relative to a massively sized set of observations. Mathematically expressed, these are the dual spaces of observations and attributes. The point cloud of observations is… ▽ More

    Submitted 6 April, 2017; originally announced April 2017.

    Comments: 17 pages, 2 figures

    MSC Class: 62H30; 91C20 ACM Class: H.3.3; I.5.3

  5. Hierarchical Matching and Regression with Application to Photometric Redshift Estimation

    Authors: Fionn Murtagh

    Abstract: This work emphasizes that heterogeneity, diversity, discontinuity, and discreteness in data is to be exploited in classification and regression problems. A global a priori model may not be desirable. For data analytics in cosmology, this is motivated by the variety of cosmological objects such as elliptical, spiral, active, and merging galaxies at a wide range of redshifts. Our aim is matching and… ▽ More

    Submitted 12 December, 2016; originally announced December 2016.

    Comments: 15 pages, 6 figures, 3 tables

    MSC Class: 11Y35; 85-08; 62H30 ACM Class: I.5.3; H.3.3; G.3; J.2

    Journal ref: Astroinformatics, Proceedings of the International Astronomical Union, Vol. 12, Issue S325, pp. 145-155, 2016

  6. Contextualizing Geometric Data Analysis and Related Data Analytics: A Virtual Microscope for Big Data Analytics

    Authors: Fionn Murtagh, Mohsen Farid

    Abstract: The relevance and importance of contextualizing data analytics is described. Qualitative characteristics might form the context of quantitative analysis. Topics that are at issue include: contrast, baselining, secondary data sources, supplementary data sources, dynamic and heterogeneous data. In geometric data analysis, especially with the Correspondence Analysis platform, various case studies are… ▽ More

    Submitted 15 September, 2017; v1 submitted 29 November, 2016; originally announced November 2016.

    Comments: 19 pages, 8 figures, 2 tables, Journal of Interdisciplinary Methodologies and Issues in Science, vol. 3, 2017. This version contains DOI, ISSN

    MSC Class: 62H30; 68P01; 6207 ACM Class: G.3; H.2.8; I.2.1

    Journal ref: Journal of Interdisciplinary Methodologies and Issues in Sciences (September 19, 2017) jimis:2570

  7. Qualitative Judgement of Research Impact: Domain Taxonomy as a Fundamental Framework for Judgement of the Quality of Research

    Authors: Fionn Murtagh, Michael Orlov, Boris Mirkin

    Abstract: The appeal of metric evaluation of research impact has attracted considerable interest in recent times. Although the public at large and administrative bodies are much interested in the idea, scientists and other researchers are much more cautious, insisting that metrics are but an auxiliary instrument to the qualitative peer-based judgement. The goal of this article is to propose availing of such… ▽ More

    Submitted 8 April, 2018; v1 submitted 11 July, 2016; originally announced July 2016.

    Comments: 22 pages, 7 figures, Journal of Classification, Online First, March 25, 2018

    MSC Class: 68P01 ACM Class: H.0, I.5.3, G.3

  8. Sparse p-Adic Data Coding for Computationally Efficient and Effective Big Data Analytics

    Authors: Fionn Murtagh

    Abstract: We develop the theory and practical implementation of p-adic sparse coding of data. Rather than the standard, sparsifying criterion that uses the $L_0$ pseudo-norm, we use the p-adic norm. We require that the hierarchy or tree be node-ranked, as is standard practice in agglomerative and other hierarchical clustering, but not necessarily with decision trees. In order to structure the data, all comp… ▽ More

    Submitted 23 April, 2016; originally announced April 2016.

    Comments: 20 pages, 6 figures

    MSC Class: 94B27; 62H30; 68P01 ACM Class: E.2; E.4; G.2.2; H.3.3

    Journal ref: p-Adic Numbers, Ultrametric Analysis and Applications, 8(3), 2016, pp. 236-247

  9. arXiv:1604.06952  [pdf, ps, other

    cs.CL stat.ML

    Visualization of Jacques Lacan's Registers of the Psychoanalytic Field, and Discovery of Metaphor and of Metonymy. Analytical Case Study of Edgar Allan Poe's "The Purloined Letter"

    Authors: Fionn Murtagh, Giuseppe Iurato

    Abstract: We start with a description of Lacan's work that we then take into our analytics methodology. In a first investigation, a Lacan-motivated template of the Poe story is fitted to the data. A segmentation of the storyline is used in order to map out the diachrony. Based on this, it will be shown how synchronous aspects, potentially related to Lacanian registers, can be sought. This demonstrates the e… ▽ More

    Submitted 30 January, 2017; v1 submitted 23 April, 2016; originally announced April 2016.

    Comments: 34 pages, 9 figures

    MSC Class: 62H25; 62H30 ACM Class: I.5.3; I.5.4; I.2; G.2.2; G.3

  10. arXiv:1512.04052  [pdf, other

    stat.ML cs.LG

    Big Data Scaling through Metric Mapping: Exploiting the Remarkable Simplicity of Very High Dimensional Spaces using Correspondence Analysis

    Authors: Fionn Murtagh

    Abstract: We present new findings in regard to data analysis in very high dimensional spaces. We use dimensionalities up to around one million. A particular benefit of Correspondence Analysis is its suitability for carrying out an orthonormal mapping, or scaling, of power law distributed data. Power law distributed data are found in many domains. Correspondence factor analysis provides a latent semantic or… ▽ More

    Submitted 13 December, 2015; originally announced December 2015.

    Comments: 13 pages, 3 figures

    MSC Class: 62H25 ACM Class: E.0; G.3; H.3.3; I.5

  11. arXiv:1507.01529  [pdf, other

    cs.CL

    Correspondence Factor Analysis of Big Data Sets: A Case Study of 30 Million Words; and Contrasting Analytics using Apache Solr and Correspondence Analysis in R

    Authors: Fionn Murtagh

    Abstract: We consider a large number of text data sets. These are cooking recipes. Term distribution and other distributional properties of the data are investigated. Our aim is to look at various analytical approaches which allow for mining of information on both high and low detail scales. Metric space embedding is fundamental to our interest in the semantic properties of this data. We consider the projec… ▽ More

    Submitted 6 July, 2015; originally announced July 2015.

    Comments: 38 pages, 17 figures

    MSC Class: 62H25; 62.07 ACM Class: G.3; H.2.8

  12. arXiv:1409.1039  [pdf, other

    cs.SI cs.CY physics.soc-ph

    Visualizing and Quantifying Impact and Effect in Twitter Narrative using Geometric Data Analysis

    Authors: Fionn Murtagh, Monica Pianosi, Richard Bull

    Abstract: We use geometric multivariate data analysis which has been termed a methodology for both the visualization and verbalization of data. The general objectives are data mining and knowledge discovery. In the first case study, we use the narrative surrounding very highly profiled tweets, and thus a Twitter event of significance and importance. In the second case study, we use eight carefully planned T… ▽ More

    Submitted 14 September, 2014; v1 submitted 3 September, 2014; originally announced September 2014.

    Comments: 34 pages, 11 figures

    MSC Class: 66H25; 62H30; 91F99 ACM Class: I.7; I.5.3; H.3.1; H.2.8; G.3

  13. Pattern Recognition in Narrative: Tracking Emotional Expression in Context

    Authors: Fionn Murtagh, Adam Ganz

    Abstract: Using geometric data analysis, our objective is the analysis of narrative, with narrative of emotion being the focus in this work. The following two principles for analysis of emotion inform our work. Firstly, emotion is revealed not as a quality in its own right but rather through interaction. We study the 2-way relationship of Ilsa and Rick in the movie Casablanca, and the 3-way relationship of… ▽ More

    Submitted 4 May, 2015; v1 submitted 14 May, 2014; originally announced May 2014.

    Comments: 21 pages, 7 figures

    MSC Class: 62H25; 62H30; 62.07 ACM Class: H.2.8; H.3; I.5; I.7.0; J.5

    Journal ref: Journal of Data Mining & Digital Humanities, 2015 (May 26, 2015) jdmdh:647

  14. arXiv:1309.3611  [pdf, other

    cs.AI

    Ultrametric Component Analysis with Application to Analysis of Text and of Emotion

    Authors: Fionn Murtagh

    Abstract: We review the theory and practice of determining what parts of a data set are ultrametric. It is assumed that the data set, to begin with, is endowed with a metric, and we include discussion of how this can be brought about if a dissimilarity, only, holds. The basis for part of the metric-endowed data set being ultrametric is to consider triplets of the observables (vectors). We develop a novel co… ▽ More

    Submitted 13 September, 2013; originally announced September 2013.

    Comments: 49 pages, 15 figures, 52 citations

    MSC Class: 62H30; 68T10 ACM Class: I.2.0; H.3.3; I.5.3

  15. arXiv:1308.3745  [pdf, other

    cs.HC

    Computational Properties of Fiction Writing and Collaborative Work

    Authors: Joseph Reddington, Fionn Murtagh, Douglas Cowie

    Abstract: From the earliest days of computing, there have been tools to help shape narrative. Spell-checking, word counts, and readability analysis, give today's novelists tools that Dickens, Austen, and Shakespeare could only have dreamt of. However, such tools have focused on the word, or phrase levels. In the last decade, research focus has shifted to support for collaborative editing of documents. This… ▽ More

    Submitted 16 August, 2013; originally announced August 2013.

    Comments: 13 pages, 6 figures

    MSC Class: 91C20; 62H30; 76M27 ACM Class: J.5; H.1.2; H.3.3; I.5.3; I.2.7

  16. arXiv:1209.0125  [pdf, other

    cs.DL cs.LG stat.ML

    A History of Cluster Analysis Using the Classification Society's Bibliography Over Four Decades

    Authors: Fionn Murtagh, Michael J. Kurtz

    Abstract: The Classification Literature Automated Search Service, an annual bibliography based on citation of one or more of a set of around 80 book or journal publications, ran from 1972 to 2012. We analyze here the years 1994 to 2011. The Classification Society's Service, as it was termed, has been produced by the Classification Society. In earlier decades it was distributed as a diskette or CD with the J… ▽ More

    Submitted 16 August, 2013; v1 submitted 1 September, 2012; originally announced September 2012.

    Comments: 23 pages, 9 figures

    MSC Class: 62H30 ACM Class: I.5.3; H.3.3

  17. arXiv:1202.3451  [pdf, ps, other

    cs.IR stat.ML

    The Future of Search and Discovery in Big Data Analytics: Ultrametric Information Spaces

    Authors: Fionn Murtagh, Pedro Contreras

    Abstract: Consider observation data, comprised of n observation vectors with values on a set of attributes. This gives us n points in attribute space. Having data structured as a tree, implied by having our observations embedded in an ultrametric topology, offers great advantage for proximity searching. If we have preprocessed data through such an embedding, then an observation's nearest neighbor is found i… ▽ More

    Submitted 15 February, 2012; originally announced February 2012.

    Comments: 10 pages

    MSC Class: 11Z05 ACM Class: I.5.3; H.3.3; E.2

  18. arXiv:1201.2719  [pdf, ps, other

    cs.AI cs.CL

    Ultrametric Model of Mind, II: Application to Text Content Analysis

    Authors: Fionn Murtagh

    Abstract: In a companion paper, Murtagh (2012), we discussed how Matte Blanco's work linked the unrepressed unconscious (in the human) to symmetric logic and thought processes. We showed how ultrametric topology provides a most useful representational and computational framework for this. Now we look at the extent to which we can find ultrametricity in text. We use coherent and meaningful collections of nea… ▽ More

    Submitted 16 July, 2012; v1 submitted 12 January, 2012; originally announced January 2012.

    Comments: 21 pages, 6 tables. arXiv admin note: substantial text overlap with arXiv:cs/0701181 (V3: minor corrections)

    MSC Class: 68T01 ACM Class: I.2.0; I.2.3; J.4

    Journal ref: p-Adic Numbers, Ultrametric Analysis and Applications, 4, 207-221, 2012

  19. Ultrametric Model of Mind, I: Review

    Authors: Fionn Murtagh

    Abstract: We mathematically model Ignacio Matte Blanco's principles of symmetric and asymmetric being through use of an ultrametric topology. We use for this the highly regarded 1975 book of this Chilean psychiatrist and pyschoanalyst (born 1908, died 1995). Such an ultrametric model corresponds to hierarchical clustering in the empirical data, e.g. text. We show how an ultrametric topology can be used as a… ▽ More

    Submitted 16 July, 2012; v1 submitted 12 January, 2012; originally announced January 2012.

    Comments: 20 pages, 2 figures, 46 references. arXiv admin note: substantial text overlap with arXiv:0709.0116, arXiv:0805.2744, and arXiv:1105.0121 (V3: 2 typos corrected)

    MSC Class: 68T01 ACM Class: I.2.0; I.2.3; J.4

    Journal ref: p-Adic Numbers, Ultrametric Analysis and Applications, 4, 193-206, 2012

  20. arXiv:1111.6285  [pdf, other

    stat.ML cs.CV stat.AP

    Ward's Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm

    Authors: Fionn Murtagh, Pierre Legendre

    Abstract: The Ward error sum of squares hierarchical clustering method has been very widely used since its first description by Ward in a 1963 publication. It has also been generalized in various ways. However there are different interpretations in the literature and there are different implementations of the Ward agglomerative algorithm in commonly used software systems, including differing expressions of… ▽ More

    Submitted 11 December, 2011; v1 submitted 27 November, 2011; originally announced November 2011.

    Comments: 20 pages, 21 citations, 4 figures

    MSC Class: 62H30; 91C20 ACM Class: G.3; H.3.3

    Journal ref: Journal of Classification, 31 (3), 274-295, 2014

  21. Fast, Linear Time, m-Adic Hierarchical Clustering for Search and Retrieval using the Baire Metric, with linkages to Generalized Ultrametrics, Hashing, Formal Concept Analysis, and Precision of Data Measurement

    Authors: Fionn Murtagh, Pedro Contreras

    Abstract: We describe many vantage points on the Baire metric and its use in clustering data, or its use in preprocessing and structuring data in order to support search and retrieval operations. In some cases, we proceed directly to clusters and do not directly determine the distances. We show how a hierarchical clustering can be read directly from one pass through the data. We offer insights also on pract… ▽ More

    Submitted 27 November, 2011; originally announced November 2011.

    Comments: 17 pages, 45 citations, 2 figures

    MSC Class: 11Z05 ACM Class: H.3.3; E.2

    Journal ref: P-Adic Numbers, Ultrametric Analysis, and Applications, 4 (1), 45-56, 2012

  22. arXiv:1106.2229  [pdf, other

    stat.ML cs.IR stat.AP

    Fast, Linear Time Hierarchical Clustering using the Baire Metric

    Authors: Pedro Contreras, Fionn Murtagh

    Abstract: The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorit… ▽ More

    Submitted 11 June, 2011; originally announced June 2011.

    Comments: 27 pages, 6 tables, 10 figures

    MSC Class: 11Z05 ACM Class: H.3.3

    Journal ref: Journal of Classification, July 2012, Volume 29, Issue 2, pp 118-143

  23. arXiv:1105.2976  [pdf, other

    stat.AP

    Current Trends in Evolving Specialization in UK Universities

    Authors: Fionn Murtagh

    Abstract: There are very significant changes taking place in the university sector and in related higher education institutes in many parts of the world. In this work we look at financial data from 2010 and 2011 from the UK higher education sector. Situating ourselves to begin with in the context of teaching versus research in universities, we look at the data in order to explore the new divergence between… ▽ More

    Submitted 27 November, 2011; v1 submitted 15 May, 2011; originally announced May 2011.

    Comments: 58th World Statistics Congress of the International Statistical Institute, invited plenary presentation, IPS057, Data Mining and Machine Learning in Statistics Organizations

    MSC Class: 62H86; 62H30 ACM Class: H.2.8; G.3

  24. arXiv:1105.0121  [pdf, ps, other

    cs.IR cs.CV math.ST stat.ML

    Methods of Hierarchical Clustering

    Authors: Fionn Murtagh, Pedro Contreras

    Abstract: We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. We look at hierarchical self-organizing maps, and mixture models. We review grid-based clustering, focusing on hierarchical density-based approaches. Finally we describe a recently developed very efficient (linear time) hierarchical clustering al… ▽ More

    Submitted 30 April, 2011; originally announced May 2011.

    Comments: 21 pages, 2 figures, 1 table, 69 references

    MSC Class: 62H30 ACM Class: H.3.3; H.2.8; G.3

  25. arXiv:1104.4063  [pdf, ps, other

    cs.IR astro-ph.IM stat.ML

    Fast redshift clustering with the Baire (ultra) metric

    Authors: Fionn Murtagh, Pedro Contreras

    Abstract: The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more cos\ tl… ▽ More

    Submitted 20 April, 2011; originally announced April 2011.

    Comments: 14 pages, 6 figures

    MSC Class: 62H30; 85-08; 11S82 ACM Class: E.5; H.3; E.2

  26. arXiv:1011.3241  [pdf

    cs.AI cs.HC stat.AP

    New Methods of Analysis of Narrative and Semantics in Support of Interactivity

    Authors: Fionn Murtagh, Adam Ganz, Joe Reddington

    Abstract: Our work has focused on support for film or television scriptwriting. Since this involves potentially varied story-lines, we note the implicit or latent support for interactivity. Furthermore the film, television, games, publishing and other sectors are converging, so that cross-over and re-use of one form of product in another of these sectors is ever more common. Technically our work has been la… ▽ More

    Submitted 14 November, 2010; originally announced November 2010.

    Comments: 17 pages, 6 figures

    ACM Class: G.3; I.2.1; H.1.2

    Journal ref: Entertainment Computing, 2, 115-121, 2011

  27. arXiv:1008.3585  [pdf, other

    cs.LO cs.LG stat.ML

    Ultrametric and Generalized Ultrametric in Computational Logic and in Data Analysis

    Authors: Fionn Murtagh

    Abstract: Following a review of metric, ultrametric and generalized ultrametric, we review their application in data analysis. We show how they allow us to explore both geometry and topology of information, starting with measured data. Some themes are then developed based on the use of metric, ultrametric and generalized ultrametric in logic. In particular we study approximation chains in an ultrametric or… ▽ More

    Submitted 20 August, 2010; originally announced August 2010.

    Comments: 19 pp., 5 figures, 3 tables

    MSC Class: 91C20; 62-07; 03-XX ACM Class: I.5.3; F.4.0

  28. arXiv:1006.1343  [pdf

    cs.CL stat.ML

    Segmentation and Nodal Points in Narrative: Study of Multiple Variations of a Ballad

    Authors: Fionn Murtagh, Adam Ganz

    Abstract: The Lady Maisry ballads afford us a framework within which to segment a storyline into its major components. Segments and as a consequence nodal points are discussed for nine different variants of the Lady Maisry story of a (young) woman being burnt to death by her family, on account of her becoming pregnant by a foreign personage. We motivate the importance of nodal points in textual and literary… ▽ More

    Submitted 7 June, 2010; originally announced June 2010.

    Comments: 27 pp., 13 figures. Submitted

    ACM Class: H.3.1; H.3.2; I.2.7

  29. arXiv:1005.2638  [pdf, other

    stat.ML cs.CV cs.LG

    Hierarchical Clustering for Finding Symmetries and Other Patterns in Massive, High Dimensional Datasets

    Authors: Fionn Murtagh, Pedro Contreras

    Abstract: Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. "Structure" can be understood as symmetry and a range of symmetries are expressed by hierarchy. Such symmetries directly point to invariants, that pinpoint intrinsic properties of the data and of the background empirical domain of interest. We review many aspects of hierarchy… ▽ More

    Submitted 14 May, 2010; originally announced May 2010.

    Comments: 41 pages, 13 figures, 6 tables. 81 references

    MSC Class: 62H30; 68P01 ACM Class: G.3; H.2.8; H.3.3

  30. arXiv:0912.1262  [pdf, ps, other

    cs.CY

    Open Access, Intellectual Property, and How Biotechnology Becomes a New Software Science

    Authors: Fionn Murtagh

    Abstract: Innovation is slowing greatly in the pharmaceutical sector. It is considered here how part of the problem is due to overly limiting intellectual property relations in the sector. On the other hand, computing and software in particular are characterized by great richness of intellectual property frameworks. Could the intellectual property ecosystem of computing come to the aid of the biosciences… ▽ More

    Submitted 7 December, 2009; originally announced December 2009.

    Comments: 7 pages

    ACM Class: K.4; K.5

    Journal ref: CEPIS UPGRADE, vol. XI, no. 4, pp. 50-64, 2010

  31. Scale-Based Gaussian Coverings: Combining Intra and Inter Mixture Models in Image Segmentation

    Authors: Fionn Murtagh, Pedro Contreras, Jean-Luc Starck

    Abstract: By a "covering" we mean a Gaussian mixture model fit to observed data. Approximations of the Bayes factor can be availed of to judge model fit to the data within a given Gaussian mixture model. Between families of Gaussian mixture models, we propose the Rényi quadratic entropy as an excellent and tractable model comparison framework. We exemplify this using the segmentation of an MRI image volum… ▽ More

    Submitted 2 September, 2009; originally announced September 2009.

    Comments: 20 pages, 5 figures

    ACM Class: I.4.6

    Journal ref: Entropy, 11 (3), 513-528, 2009

  32. Tag Clouds for Displaying Semantics: The Case of Filmscripts

    Authors: F. Murtagh, A. Ganz, S. McKie, J. Mothe, K. Englmeier

    Abstract: We relate tag clouds to other forms of visualization, including planar or reduced dimensionality mapping, and Kohonen self-organizing maps. Using a modified tag cloud visualization, we incorporate other information into it, including text sequence and most pertinent words. Our notion of word pertinence goes beyond just word frequency and instead takes a word in a mathematical sense as located at… ▽ More

    Submitted 23 May, 2009; originally announced May 2009.

    Comments: 23 pages, 7 figures

    ACM Class: I.5.4; I.2.7; H.3.1

    Journal ref: Information Visualization 9, 253-262, 2010

  33. Ultrametric Wavelet Regression of Multivariate Time Series: Application to Colombian Conflict Analysis

    Authors: Fionn Murtagh, Michael Spagat, Jorge A. Restrepo

    Abstract: We first pursue the study of how hierarchy provides a well-adapted tool for the analysis of change. Then, using a time sequence-constrained hierarchical clustering, we develop the practical aspects of a new approach to wavelet regression. This provides a new way to link hierarchical relationships in a multivariate time series data set with external signals. Violence data from the Colombian confl… ▽ More

    Submitted 16 February, 2009; originally announced February 2009.

    Comments: 36 pages, 13 figures

    Journal ref: IEEE Transactions on Systems, Man, and Cybernetics - Part A, Systems and Humans, 2011

  34. arXiv:0811.2519  [pdf, ps, other

    cs.CY cs.DL

    Origins of Modern Data Analysis Linked to the Beginnings and Early Development of Computer Science and Information Engineering

    Authors: Fionn Murtagh

    Abstract: The history of data analysis that is addressed here is underpinned by two themes, -- those of tabular data analysis, and the analysis of collected heterogeneous data. "Exploratory data analysis" is taken as the heuristic approach that begins with data and information and seeks underlying explanation for what is observed or measured. I also cover some of the evolving context of research and appli… ▽ More

    Submitted 15 November, 2008; originally announced November 2008.

    Comments: 26 pages

    Journal ref: Electronic Journal for History of Probability and Statisics, Vol. 4, no. 2, Dec. 2008

  35. arXiv:0809.0874  [pdf, other

    cs.CY cs.GL

    Between the Information Economy and Student Recruitment: Present Conjuncture and Future Prospects

    Authors: Fionn Murtagh

    Abstract: In university programs and curricula, in general we react to the need to meet market needs. We respond to market stimulus, or at least try to do so. Consider now an inverted view. Consider our data and perspectives in university programs as reflecting and indeed presaging economic trends. In this article I pursue this line of thinking. I show how various past events fit very well into this new v… ▽ More

    Submitted 4 September, 2008; originally announced September 2008.

    Comments: 18 pages, 4 figures

    ACM Class: K.0; K.1; K.3.0; K.4.3; K.7.0

    Journal ref: CEPIS UPGRADE, vol. IX, no. 5, pp. 56-64, Oct. 2008

  36. From Data to the p-Adic or Ultrametric Model

    Authors: Fionn Murtagh

    Abstract: We model anomaly and change in data by embedding the data in an ultrametric space. Taking our initial data as cross-tabulation counts (or other input data formats), Correspondence Analysis allows us to endow the information space with a Euclidean metric. We then model anomaly or change by an induced ultrametric. The induced ultrametric that we are particularly interested in takes a sequential -… ▽ More

    Submitted 2 September, 2008; originally announced September 2008.

    Comments: 15 pages, 6 figures. To appear in: Proceedings of Third International Conference on p-Adic Mathematical Physics: From Planck Scale Physics to Complex Systems to Biology, Steklov Mathematics Institute, Russian Academy of Sciences

    Journal ref: p-Adic Numbers, Ultrametric Analysis and Applications, 1, 58-68, 2009

  37. Discussion of: Treelets--An adaptive multi-Scale basis for sparse unordered data

    Authors: Fionn Murtagh

    Abstract: Discussion of "Treelets--An adaptive multi-Scale basis for sparse unordered data" [arXiv:0707.0481]

    Submitted 25 July, 2008; originally announced July 2008.

    Comments: Published in at http://dx.doi.org/10.1214/08-AOAS137A the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS137A

    Journal ref: Annals of Applied Statistics 2008, Vol. 2, No. 2, 472-473

  38. The Correspondence Analysis Platform for Uncovering Deep Structure in Data and Information

    Authors: Fionn Murtagh

    Abstract: We study two aspects of information semantics: (i) the collection of all relationships, (ii) tracking and spotting anomaly and change. The first is implemented by endowing all relevant information spaces with a Euclidean metric in a common projected space. The second is modelled by an induced ultrametric. A very general way to achieve a Euclidean embedding of different information spaces based o… ▽ More

    Submitted 2 September, 2008; v1 submitted 6 July, 2008; originally announced July 2008.

    Comments: Sixth Annual Boole Lecture in Informatics, Boole Centre for Research in Informatics, Cork, Ireland, 29 April 2008. 28 pp., 17 figures. To appear, Computer Journal. This version: 3 typos corrected

    ACM Class: I.5.4; H.3.1; I.2.7

    Journal ref: Computer Journal, 53 (3), 304-315, 2010

  39. The Structure of Narrative: the Case of Film Scripts

    Authors: Fionn Murtagh, Adam Ganz, Stewart McKie

    Abstract: We analyze the style and structure of story narrative using the case of film scripts. The practical importance of this is noted, especially the need to have support tools for television movie writing. We use the Casablanca film script, and scripts from six episodes of CSI (Crime Scene Investigation). For analysis of style and structure, we quantify various central perspectives discussed in McKee… ▽ More

    Submitted 24 May, 2008; originally announced May 2008.

    Comments: 28 pages, 7 figures, 21 references

    ACM Class: I.5.4; I.2.7; H.3.1

    Journal ref: Pattern Recognition, 42 (2), 302-312, 2009

  40. The Remarkable Simplicity of Very High Dimensional Data: Application of Model-Based Clustering

    Authors: Fionn Murtagh

    Abstract: An ultrametric topology formalizes the notion of hierarchical structure. An ultrametric embedding, referred to here as ultrametricity, is implied by a hierarchical embedding. Such hierarchical structure can be global in the data set, or local. By quantifying extent or degree of ultrametricity in a data set, we show that ultrametricity becomes pervasive as dimensionality and/or spatial sparsity i… ▽ More

    Submitted 16 November, 2008; v1 submitted 18 May, 2008; originally announced May 2008.

    Comments: 36 pages, 18 figures, 36 references

    Journal ref: Journal of Classification, 26 (3), 249-277, 2009

  41. Symmetry in Data Mining and Analysis: A Unifying View based on Hierarchy

    Authors: Fionn Murtagh

    Abstract: Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. The data sets themselves are explicitly linked as a form of representation to an observational or otherwise empirical domain of interest. "Structure" has long been understood as symmetry which can take many forms with respect to any transformation, including point, translationa… ▽ More

    Submitted 1 June, 2009; v1 submitted 18 May, 2008; originally announced May 2008.

    Comments: 35 pages, 3 figures, 84 references

    Journal ref: Proceedings of Steklov Institute of Mathematics, 265, 177-198, 2009

  42. Geometric Data Analysis, From Correspondence Analysis to Structured Data Analysis (book review)

    Authors: Fionn Murtagh

    Abstract: Review of: Brigitte Le Roux and Henry Rouanet, Geometric Data Analysis, From Correspondence Analysis to Structured Data Analysis, Kluwer, Dordrecht, 2004, xi+475 pp.

    Submitted 8 April, 2008; originally announced April 2008.

    Comments: 5 pages, 8 citations. Accepted in Journal of Classification

    ACM Class: I.5; G.3; H.3; I.7; J.4

    Journal ref: Journal of Classification 25, 137-141, 2008

  43. arXiv:0802.3528  [pdf, other

    cs.CV

    Wavelet and Curvelet Moments for Image Classification: Application to Aggregate Mixture Grading

    Authors: Fionn Murtagh, Jean-Luc Starck

    Abstract: We show the potential for classifying images of mixtures of aggregate, based themselves on varying, albeit well-defined, sizes and shapes, in order to provide a far more effective approach compared to the classification of individual sizes and shapes. While a dominant (additive, stationary) Gaussian noise component in image data will ensure that wavelet coefficients are of Gaussian distribution,… ▽ More

    Submitted 24 February, 2008; originally announced February 2008.

    Comments: Submitted to Pattern Recognition Letters

    Journal ref: Pattern Recognition Letters, 29, 1557-1564, 2008

  44. On Ultrametric Algorithmic Information

    Authors: Fionn Murtagh

    Abstract: How best to quantify the information of an object, whether natural or artifact, is a problem of wide interest. A related problem is the computability of an object. We present practical examples of a new way to address this problem. By giving an appropriate representation to our objects, based on a hierarchical coding of information, we exemplify how it is remarkably easy to compute complex objec… ▽ More

    Submitted 29 September, 2007; v1 submitted 2 September, 2007; originally announced September 2007.

    Comments: Forthcoming, Computer Journal. Minor corrections 29 Oct. 2007

    ACM Class: I.2.0

    Journal ref: Computer Journal, 53, 405-416, 2010

  45. arXiv:physics/0702064  [pdf, ps, other

    physics.data-an

    Hilbert Space Becomes Ultrametric in the High Dimensional Limit: Application to Very High Frequency Data Analysis

    Authors: Fionn Murtagh

    Abstract: An ultrametric topology formalizes the notion of hierarchical structure. An ultrametric embedding, referred to here as ultrametricity, is implied by a natural hierarchical embedding. Such hierarchical structure can be global in the data set, or local. By quantifying extent or degree of ultrametricity in a data set, we show that ultrametricity becomes pervasive as dimensionality and/or spatial sp… ▽ More

    Submitted 7 February, 2007; originally announced February 2007.

    Comments: 22 pp., 9 figs., 4 tables

  46. arXiv:cs/0702067  [pdf, ps, other

    cs.IR

    The Haar Wavelet Transform of a Dendrogram: Additional Notes

    Authors: Fionn Murtagh

    Abstract: We consider the wavelet transform of a finite, rooted, node-ranked, $p$-way tree, focusing on the case of binary ($p = 2$) trees. We study a Haar wavelet transform on this tree. Wavelet transforms allow for multiresolution analysis through translation and dilation of a wavelet function. We explore how this works in our tree context.

    Submitted 10 February, 2007; originally announced February 2007.

    Comments: 37 pp, 1 fig. Supplementary material to "The Haar Wavelet Transform of a Dendrogram", http://arxiv.org/abs/cs.IR/0608107

    ACM Class: I.5.3; H.3.1; I.1.m; I.7.m

  47. arXiv:cs/0701181  [pdf, ps, other

    cs.CL

    A Note on Local Ultrametricity in Text

    Authors: Fionn Murtagh

    Abstract: High dimensional, sparsely populated data spaces have been characterized in terms of ultrametric topology. This implies that there are natural, not necessarily unique, tree or hierarchy structures defined by the ultrametric topology. In this note we study the extent of local ultrametric topology in texts, with the aim of finding unique ``fingerprints'' for a text or corpus, discriminating betwee… ▽ More

    Submitted 27 January, 2007; originally announced January 2007.

    Comments: 18 pp

    ACM Class: I.5.3; I.7.2; H.3

  48. arXiv:cs/0701180  [pdf, ps, other

    cs.IR

    Ontology from Local Hierarchical Structure in Text

    Authors: F. Murtagh, J. Mothe, K. Englmeier

    Abstract: We study the notion of hierarchy in the context of visualizing textual data and navigating text collections. A formal framework for ``hierarchy'' is given by an ultrametric topology. This provides us with a theoretical foundation for concept hierarchy creation. A major objective is {\em scalable} annotation or labeling of concept maps. Serendipitously we pursue other objectives such as deriving… ▽ More

    Submitted 27 January, 2007; originally announced January 2007.

    Comments: 35 pp., 12 figures

    ACM Class: H.5; I.5.3; H.5.2; I.7.2; H.3

  49. The Haar Wavelet Transform of a Dendrogram

    Authors: Fionn Murtagh

    Abstract: We describe a new wavelet transform, for use on hierarchies or binary rooted trees. The theoretical framework of this approach to data analysis is described. Case studies are used to further exemplify this approach. A first set of application studies deals with data array smoothing, or filtering. A second set of application studies relates to hierarchical tree condensation. Finally, a third stud… ▽ More

    Submitted 19 February, 2007; v1 submitted 28 August, 2006; originally announced August 2006.

    Comments: 38 pp, 8 figures. Forthcoming in Journal of Classification

    ACM Class: I.5.3; H.3.1; I.1.m

    Journal ref: Journal of Classification, 24, 3-32, 2007

  50. arXiv:math/0605555  [pdf, ps, other

    math.ST

    Ultrametric embedding: application to data fingerprinting and to fast data clustering

    Authors: Fionn Murtagh

    Abstract: We begin with pervasive ultrametricity due to high dimensionality and/or spatial sparsity. How extent or degree of ultrametricity can be quantified leads us to the discussion of varied practical cases when ultrametricity can be partially or locally present in data. We show how the ultrametricity can be assessed in text or document collections, and in time series signals. An aspect of importance… ▽ More

    Submitted 28 January, 2007; v1 submitted 19 May, 2006; originally announced May 2006.

    Comments: 14 pages, 1 figure. New content and modified title compared to the 19 May 2006 version

    Report number: P.M. Pardalos and P. Hansen, Eds., Data Mining and Mathematical Programming, CRM Proceedings & Lecture Notes Vol. 45, American Mathematical Society, 199-209, 2008 MSC Class: 62H30; 68P30; 68P20