Abstract
Long short-term memory (LSTM) network is widely applied to multi-dimensional time series modeling to solve many real-world problems, and visual analytics plays a crucial role in improving its interpretability. To understand the high-dimensional activations in the hidden layer of the model, the application of dimensionality reduction (DR) techniques is essential. However, the diversity of DR techniques dramatically increases the difficulty of selecting one among them. In this paper, aiming at the applicability of DR techniques for visual analysis of LSTM hidden activity on multi-dimensional time series modeling, we select four representative DR techniques as the comparison objects, including principal component analysis (PCA), multi-dimensional scaling (MDS), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP). The original continuous modeling data and the symbolically processed discrete data are used as knowledge of model learning, which are associated with LSTM hidden layer activity, and the ability of DR techniques to maintain high-dimensional information of the hidden layer activation is compared. According to the model structure of LSTM and the characteristics of modeling data, the controlled experiments were carried out in five typical tasks, namely the quality evaluation of DR, the abstract representation of high and low hidden layers, the association analysis between model and output variable, the importance analysis of input features and the exploration of temporal regularity. Through the complete experimental process and detailed result analysis, we distilled a systematic guidance for analysts to select appropriate and effective DR techniques for visual analytics of LSTM.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Notes
The ETT dataset was acquired at https://paperswithcode.com/dataset/ett
References
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Gunning, D., Aha, D.: Darpa’s explainable artificial intelligence (Xai) program. AI Mag. 40(2), 44–58 (2019). https://doi.org/10.1609/aimag.v40i2.2850
Lipton, Z.C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. arXiv:1506.00019 (2015)
Chu, Y., Fei, J., Hou, S.: Adaptive global sliding-mode control for dynamic systems using double hidden layer recurrent neural network structure. IEEE Trans. Neural Netw. Learn. Syst. (2020). https://doi.org/10.1109/TNNLS.2019.2919676
Bäuerle, A., Albus, P., Störk, R., Seufert, T., Ropinski, T.: Explornn: teaching recurrent neural networks through visual exploration. Visual Comput. (2023). https://doi.org/10.1007/s00371-022-02593-0
Liu, S., Maljovec, D., Wang, B., Bremer, P.T., Pascucci, V.: Visualizing high-dimensional data: advances in the past decade. IEEE Trans. Visual Comput. Graphics 23(3), 1249–1268 (2017). https://doi.org/10.1109/TVCG.2016.2640960
Ali, M., Jones, M.W., Xie, X., Williams, M.: Timecluster: dimension reduction applied to temporal data for visual analytics. Vis. Comput. 35(6–8), 1013–1026 (2019). https://doi.org/10.1007/s00371-019-01673-y
Ballester-Ripoll, R., Halter, G., Pajarola, R.: High-dimensional scalar function visualization using principal parameterizations. Visual Comput. (2023). https://doi.org/10.1007/s00371-023-02937-4
La Rosa, B., Blasilli, G., Bourqui, R., Auber, D., Santucci, G., Capobianco, R., Bertini, E., Giot, R., Angelini, M.: State of the art of visual analytics for explainable deep learning. In: Pierre, A., Helwig, H. (eds.) Computer graphics forum, vol. 42, pp. 319–355. Wiley, London (2023)
Zhao, Y., Luo, F., Chen, M., Wang, Y., Xia, J., Zhou, F., Wang, Y., Chen, Y., Chen, W.: Evaluating multi-dimensional visualizations for understanding fuzzy clusters. IEEE Trans. Visual Comput. Graphics 25(1), 12–21 (2019). https://doi.org/10.1109/TVCG.2018.2865020
Strobelt, H., Gehrmann, S., Pfister, H., Rush, A.M.: Lstmvis: a tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE Trans. Visual Comput. Graphics 24(1), 667–676 (2018). https://doi.org/10.1109/TVCG.2017.2744158
Hohman, F., Kahng, M., Pienta, R., Chau, D.H.: Visual analytics in deep learning: an interrogative survey for the next frontiers. IEEE Trans. Visual Comput. Graphics 25(8), 2674–2693 (2019). https://doi.org/10.1109/TVCG.2018.2843369
Alicioglu, G., Sun, B.: A survey of visual analytics for explainable artificial intelligence methods. Comput. Graph. 102, 502–520 (2022)
Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Syst. 2(1), 37–52 (1987). https://doi.org/10.1016/0169-7439(87)80084-9
Cox, M.A.A., Cox, T.F.: Multidimensional scaling, pp. 315–347. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-33037-0_14
Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
McInnes, L., Healy, J., Melville, J.: Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018). https://doi.org/10.48550/arXiv.1802.03426
Van der Maaten, L., Postma, E., Herik, H.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10, 66–71 (2007)
Jia, W., Sun, M., Lian, J., Hou, S.: Feature dimensionality reduction: a review. Complex Intell. Syst. 8(3), 2663–2693 (2022). https://doi.org/10.1007/s40747-021-00637-x
De Lorenzo, A., Medvet, E., Tušar, T., Bartoli, A.: An analysis of dimensionality reduction techniques for visualizing evolution. In: Proceedings of the genetic and evolutionary computation conference companion, gecco ’19, p. 1864-1872. association for computing machinery, New York, NY, USA (2019). https://doi.org/10.1145/3319619.3326868
Xia, J., Zhang, Y., Song, J., Chen, Y., Wang, Y., Liu, S.: Revisiting dimensionality reduction techniques for visual cluster analysis: an empirical study. IEEE Trans. Visual Comput. Graphics 28(1), 529–539 (2022). https://doi.org/10.1109/TVCG.2021.3114694
Ayesha, S., Hanif, M.K., Talib, R.: Overview and comparative study of dimensionality reduction techniques for high dimensional data. Inform. Fusion 59, 44–58 (2020). https://doi.org/10.1016/j.inffus.2020.01.005
Armstrong, G., Rahman, G., Martino, C., McDonald, D., Gonzalez, A., Mishne, G., Knight, R.: Applications and comparison of dimensionality reduction methods for microbiome data. Front. Bioinform. (2022). https://doi.org/10.3389/fbinf.2022.821861
Jain, R., Kumar, A., Nayyar, A., Dewan, K., Garg, R., Raman, S., Ganguly, S.: Explaining sentiment analysis results on social media texts through visualization. Multimed. Tools Appl. 82(15), 22613–22629 (2023). https://doi.org/10.1007/s11042-023-14432-y
Holzinger, A.: The next frontier: ai we can really trust. Proc. ECML PKDD 2021, 427–440 (2021). https://doi.org/10.1007/978-3-030-93736-2_33
Holzinger, A., Dehmer, M., Emmert-Streib, F., Cucchiara, R., Augenstein, I., Del Ser, J., Samek, W., Jurisica, I., Díaz-Rodríguez, N.: Information fusion as an integrative cross-cutting enabler to achieve robust, explainable, and trustworthy medical artificial intelligence. Information Fusion 79, 263–278 (2022). https://doi.org/10.1016/j.inffus.2021.10.007
Choo, J., Liu, S.: Visual analytics for explainable deep learning. IEEE Comput. Graphics Appl. 38(4), 84–92 (2018). https://doi.org/10.1109/MCG.2018.042731661
Ras, G., Xie, N., Van Gerven, M., Doran, D.: Explainable deep learning: a field guide for the uninitiated. J. Art. Intell. Res. 73, 329–396 (2022)
Zahavy, T., Ben-Zrihem, N., Mannor, S.: Graying the black box: Understanding dqns. In: International conference on machine learning, pp. 1899–1908. PMLR (2016). http://proceedings.mlr.press/v48/zahavy16.html
Gabella, M., Afambo, N., Ebli, S., Spreemann, G.: Topology of learning in artificial neural networks (2019). https://doi.org/10.48550/arXiv.1902.08160
Rauber, P.E., Fadel, S.G., Falcão, A.X., Telea, A.C.: Visualizing the hidden activity of artificial neural networks. IEEE Trans. Visual Comput. Graph. 23(1), 101–110 (2017). https://doi.org/10.1109/TVCG.2016.2598838
Tang, Z., Shi, Y., Wang, D., Feng, Y., Zhang, S.: Memory visualization for gated recurrent neural networks in speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2736–2740 (2017). https://doi.org/10.1109/ICASSP.2017.7952654
Shen, Q., Wu, Y., Jiang, Y., Zeng, W., LAU, A.K.H., Vianova, A., Qu, H.: Visual interpretation of recurrent neural network on multi-dimensional time-series forecast. In: 2020 IEEE Pacific visualization symposium (PacificVis), pp. 61–70 (2020). https://doi.org/10.1109/PacificVis48177.2020.2785
Ji, L., Yang, Y., Qiu, S., et al.: Visual analytics of rnn for thermal power control system identification. J. Comput. Aided Design Comput. Graph. 33(12), 1876–1886 (2021)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Espadoto, M., Martins, R.M., Kerren, A., Hirata, N.S., Telea, A.C.: Toward a quantitative survey of dimension reduction techniques. IEEE Trans. Visual Comput. Graph. 27(3), 2153–2173 (2019). https://doi.org/10.1109/TVCG.2019.2944182
Martins, R.M., Coimbra, D.B., Minghim, R., Telea, A.: Visual analysis of dimensionality reduction quality for parameterized projections. Comput. Graph. 41, 26–42 (2014). https://doi.org/10.1016/j.cag.2014.01.006
Gracia, A., González, S., Robles, V., Menasalvas, E.: A methodology to compare dimensionality reduction algorithms in terms of loss of quality. Inform. Sci. 270, 1–27 (2014). https://doi.org/10.1016/j.ins.2014.02.068
Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing sax: a novel symbolic representation of time series. Data Min. Knowl. Disc. 15(2), 107–144 (2007). https://doi.org/10.1007/s10618-007-0064-z
Karo, I.M.K., MaulanaAdhinugraha, K., Huda, A.F.: A cluster validity for spatial clustering based on davies bouldin index and polygon dissimilarity function. In: 2017 Second International Conference on Informatics and Computing (ICIC), pp. 1–6 (2017). https://doi.org/10.1109/IAC.2017.8280572
Natsukawa, H., Deyle, E.R., Pao, G.M., Koyamada, K., Sugihara, G.: A visual analytics approach for ecosystem dynamics based on empirical dynamic modeling. IEEE Trans. Visual Comput. Graph. 27(2), 506–516 (2021). https://doi.org/10.1109/TVCG.2020.3028956
Kindlmann, G., Scheidegger, C.: An algebraic process for visualization design. IEEE Trans. Visual Comput. Graph. 20(12), 2181–2190 (2014). https://doi.org/10.1109/TVCG.2014.2346325
Paulovich, F.V., Nonato, L.G., Minghim, R., Levkowitz, H.: Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans. Visual Comput. Graph. 14(3), 564–575 (2008). https://doi.org/10.1109/TVCG.2007.70443
Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)
Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., Zhang, W.: Informer: Beyond efficient transformer for long sequence time-series forecasting. In: Proceedings of the AAAI conference on artificial intelligence, vol. 35, pp. 11106–11115 (2021)
Acknowledgements
This work was supported by the NSFC under Grant No. 60873093 and the Strategic Cooperation Technology Projects of CNPC and CUPB (ZLZX2020-05).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Lianen Ji declares that he has no conflict of interest. Shirong Qiu declares that he has no conflict of interest. Zhi Xu declares that he has no conflict of interest. Yue Liu declares that she has no conflict of interest. Guang Yang declares that he has no conflict of interest.
Ethical approval
This work is original research that has not been published before and is not considered for publication elsewhere.
Humans or animal rights
This article does not include any studies of humans or animals.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Model description
In this section, we introduce the working mechanism of the simple recurrent neural network (SRN) [44] and the basic long short-term memory (LSTM).
The SRN represents the most basic form of recurrent neural networks (RNNs) and serves as a fundamental building block for more advanced variants. In the forward propagation process, the hidden state at t time step \(h_t\) is calculated by the hidden state at \(t-1\) time step \(h_{t-1}\) the input at t time step \(x_t\) and learning parameters (U, W, b), as follows:
When the time step length is too long, SRN will suffer from the vanishing or exploding gradient issue, which makes learning SRN using gradient descent very difficult. Hochreiter and Schmidhuber [1] proposed LSTM to solve this problem, enabling RNNs to be applied to long sequence time series. There are three gate controllers on each LSTM hidden layer neuron, namely input gate \(i_t\), forget gate \(f_t\), and output gate \(o_t\). The core component of LSTM is the memory cell, which possesses an internal state, referred to as the cell state \(c_t\), for storing and transmitting information. The input gate controls the input of information, the forget gate resets the retention of historical state information of hidden units, and the output gate controls the output of information. The process can be expressed as
where W and U are weights, and b is bias. h is the activation value of the hidden layer output.
Appendix B: Dataset description
The electricity transformer temperature (ETT) is a crucial indicator in the electric power long-term deployment. The ETT dataset introduced by Zhou et al. [45] is a popular dataset for time series forecast (TSF) tasks, which is publicly available and can be acquired at https://paperswithcode. com/dataset/ett. This dataset consists of 2 years data from two separated counties in China and the sampling interval is 1 min. Each data point consists of the target value “oil temperature” and 6 power load features. Figure 10 shows the temporal fluctuation curves of the normalized predicted variables in the ETT dataset and HST dataset. A total of 18 000 sets of data are taken as the training dataset, 1 000 sets as the validation dataset, and 1 000 sets as the test dataset. We selected this dataset for new multi-dimensional time series modeling, aiming to compare the applicability of dimensionality reduction (DR) techniques in various analysis tasks.
Appendix C: Additional case study with the ETT dataset and the basic LSTM
Here, we use the basic LSTM for ETT prediction and extract the activations of the hidden layers. The selected LSTM architecture consists of two hidden layers, each comprising 100 neurons. Similarly, we conducted comparative experiments on DR techniques in the previously proposed visual analysis tasks. All projections presented were created from activations of a test set subset and inspecting a training set subset can also provide similar insights.
The abstract representation of the high and low hidden layers. As shown in Fig. 11, t-SNE and UMAP projecting the two hidden layers is not conducive to identifying the abstract ability of the hidden layers, and the difference in the activation shape of the projection view of the two hidden layers is small. However, after PCA processing, the projection points of the low hidden layer show a uniformly distributed circular shape, which cannot separate different types of samples well. While the projection points of the high hidden layer show a shape distribution with obvious angles and edges, showing more complex and nonlinear inter-layer abstract representations. Unlike previous experiments, we do not observe clustered projection scatters of PCA and MDS in the low hidden layer, possibly due to the time series fluctuation of the predicted variable values of the ETT dataset being more intense than that of the HST dataset, as shown in Fig. 10.
The association analysis between model and output variable. As shown in Fig. 11, when the associated object of activations is continuous, PCA and MDS produce smoother and more continuous scatter distributions than other DR techniques. During the training process, the spatial projection views of the four DR techniques can reflect the separated projection results of the sample points gradually become better. However, only the projection transformation of PCA matches the magnitude of actual activation changes, which also reflects the advantages of the linear DR techniques. Additionally, we use the same method to obtain discrete symbol datasets and project various classes of samples. The results depicted in Fig. 12 demonstrate that t-SNE and UMAP yield superior separation effects in this context.
The importance analysis of input features. As shown in Table 6, we calculate the correlation coefficients between input and predicted variables. \(X_2\) has the strongest correlation with the predicted parameter, followed by \(X_4\) or \(X_6\). Table 5 shows the statistical results of the feature importance by DR techniques with different parameter settings. We find that the stronger the correlation between the input variables and the predicted variable, the greater their feature importance, especially in DR techniques with a strong ability to preserve the global neighborhood. Therefore, consistent with the previous experiments, PCA and MDS perform better in this task.
The exploration of temporal regularity. After observing the temporal projection views of numerous samples, we noticed that PCA and MDS exhibit more low-quality inflection points, making their projection trajectories less smooth compared to t-SNE and UMAP. Figure 13 also confirms that the appearance of inflection points is associated with the local neighborhood preservation ability of these techniques, which highlights the advantage of t-SNE and UMAP in this task.
Since we selected a new dataset and model structure in the experiment, the activation pattern characteristics of the hidden layers we observed have also changed to a certain extent. Nevertheless, the DR techniques still show similar performance to previous experiments in LSTM visual analysis tasks, which reinforces the generalizability of the guidelines proposed in this paper.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ji, L., Qiu, S., Xu, Z. et al. Comparing dimensionality reduction techniques for visual analysis of the LSTM hidden activity on multi-dimensional time series modeling. Vis Comput 40, 8243–8261 (2024). https://doi.org/10.1007/s00371-023-03235-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-03235-9