Online Handwritten Diagram Recognition with Graph Attention Networks

Yun, Xiao-Long; Zhang, Yan-Ming; Ye, Jun-Yu; Liu, Cheng-Lin

doi:10.1007/978-3-030-34120-6_19

Xiao-Long Yun^14,15,
Yan-Ming Zhang¹⁴,
Jun-Yu Ye^14,15 &
…
Cheng-Lin Liu^14,15

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11901))

Included in the following conference series:

International Conference on Image and Graphics

2766 Accesses

Abstract

Handwritten text recognition has been extensively researched over decades and achieved extraordinary success in recent years. However, handwritten diagram recognition is still a challenging task because of the complex 2D structure and writing style variation. This paper presents a general framework for online handwritten diagram recognition based on graph attention networks (GAT). We model each diagram as a graph in which nodes represent strokes and edges represent the relationships between strokes. Then, we learn GAT models to classify graph nodes taking both stroke features and the relationships between strokes into consideration. To better exploit the spatial and temporal relationships, we enhance the original GAT model with a novel attention mechanism. Experiments on two online handwritten flowchart datasets and a finite automata dataset show that our method consistently outperforms previous methods and achieves the state-of-the-art performance.

You have full access to this open access chapter, Download conference paper PDF

Online recognition of sketched arrow-connected diagrams

Article 19 May 2016

Arrow R-CNN for handwritten diagram recognition

Article Open access 02 February 2021

Stroke-Level Graph Labeling with Edge-Weighted Graph Attention Network for Handwritten Mathematical Expression Recognition

Keywords

1 Introduction

Handwriting is one of the most natural and efficient ways for human to record information. As the widespread usage of smartphone, tablet computer and electrical whiteboard, recording information in intelligence devices has become a major choice for its convenience. As a result, handwritten text recognition has been intensively studied over the last decades and widely applied in many fields. However, the recognition and analysis of 2D diagrams, such as flowchart, circuit and music score, are still challenging because of the complex 2D structures and great writing style variation.

Existing methods for online handwritten diagram recognition and interpretation can be roughly divided into two categories: bottom-up [4,5,6, 15, 27] and top-down ones [1, 11, 14, 21, 22]. Bottom-up approaches sequentially perform a symbol segmentation step and a recognition step. However, due to the error accumulation, these methods often lead to low recognition accuracy. On the other hand, top-down approaches integrate the two steps in one framework, such as probabilistic graphical models (PGM), and perform segmentation and recognition simultaneously. Typically, top-down methods can achieve higher accuracy results but suffer from high computational cost because of the complicated learning and inference algorithms. We review these methods in more details in the next section.

In this work, we propose an efficient and high-accuracy method for online handwritten diagram recognition. In particular, we treat diagram stroke classification as a graph node classification problem and solve it with attention-based graph neural networks (GNN) [13, 20]. Compared with PGM, such as conditional random fields (CRF) and Markov random fields (MRF), GNN is more powerful and flexible in learning the stroke representation and exploiting the contextual information. Unlike PGM, the learning and inference algorithms of GNN are very simple and efficient, which makes it very suitable for large-scale applications.

We highlight the main contributions of this work as follows. First, we propose a general online handwritten diagram recognition method based on GNN. Second, to better exploit the relationships between strokes, we enhance the original GAT [20] by introducing a novel attention mechanism. Third, on three popular benchmark datasets, our method consistently outperforms the existing methods and achieves the state-of-the-art results.

In the rest of this paper, we first provide a general review of existing online handwritten diagram recognition works and a brief review of GNN in Sect. 2. In Sect. 3, we give a detailed introduction to the proposed method. The experimental setting and comparison results are described in Sect. 4. Finally, Sect. 5 draws our concluding remarks.

2 Related Works

2.1 Diagram Recognition

Since it is a difficult task to classify text and non-text strokes or segment symbols precisely in early stage for diagram recognition, some works only considered graphic symbols, and others imposed some constraints on users. Qi et al. [16] presented a recognition system for flowchart recognition using Bayesian conditional random fields, but their dataset only included very simple graphic symbols rather than texts. Yuan et al. [27] proposed a hybrid model combing support vector machine (SVM) and hidden Markov models (HMM) for programming teaching. Miyao et al. [15] presented a flowchart recognition and editing system that segmented the symbols based on loop structure and recognized them using SVMs. Although [15, 27] allowed the flowcharts to contain both symbols and texts, there were many constraints on users. [27] required users to draw each symbol with only one stroke, and [15] required users to differentiate texts from graphic symbols.

Awal et al. [1] proposed two methods—bottom-up and top-down approaches for flowchart recognition from different viewpoints. For the former, texts and graphic symbols were classified based on the entropy of strokes, then time delayed neural network (TDNN) or SVM was applied for graphical symbols recognition. Moreover, they introduced a global recognition architecture based on the TDNN and dynamic programming (DP) algorithm.

Flowchart diagrams are document with complex 2D structures, thus previous statistical approaches [1, 15, 27] have reported limited performance because of the ignorance of structure information. Lemaitre et al. [14] proposed a method that tried to handle the segmentation and recognition simultaneously. Their model integrated structural and syntactic prior of flowchart with Enhanced Position Formalism (EPF) language [8], then they used Description and MOdification of the Segmentation (DMOS) method [8] to segment and recognize the flowchart in one step. Their method achieved great progress in stroke labeling and symbol recognition compared with [1], but it is too restricted and rigid to adapt to other domains and it is impossible to describe every symbol with variation.

For exploring structure information, Carton et al. [7] presented a human-like perceptive mechanism approach that incorporated both structural and statistical information of a flowchart. Same as [14], the work made use of DMOS to express circular symbols and quadrilateral symbols, then proposed a deformation measure to quantify what was a good quadrilateral.

In handwritten diagrams, arrows are variable in appearance and are difficult to recognize compared to other symbols using identical classifier. Bresler et al. [4,5,6] proposed a new framework that strokes were firstly classified as text or non-text, then non-text strokes were clustered and uniform symbols were classified with SVM, lastly the arrows were detected. For structure analysis, they modeled whole flowchart excluding texts as a max-sum problem and applied integer linear programming to solve it [3]. This approach achieved the state-of-the-art results in three handwritten datasets. However, the recognition system has some severe flaws, such as each arrow must consist of a shaft and a head, which may lead to recognition failures if one of them is absent.

Wang et al. [21, 22] proposed a general model, max-margin MRF, which combines MRF and structural SVM to perform stroke segmentation and recognition simultaneously. By exploiting temporal and spatial relationship between strokes, their model greatly improved the stroke labeling accuracy. To lower the complexity in evaluating stroke grouping candidates in a diagram, Julca-Aguilar et al. [11] applied the Faster R-CNN model [17] to the detection of online handwritten graphics through converting the original online data to offline images. Despite the overall high performance of flowchart symbol detection, the arrow detection accuracy is not satisfactory, and the conversion into images causes loss of temporal information of strokes.

2.2 Graph Neural Networks

In recently years, graph neural networks (GNN) have received extensive attention and become one of the most popular research highlights in deep learning field. With its capability of capturing the dependency between objects and operating on non-Euclidean domain [28], GNN have obtained great success in many tasks, such as relational reasoning [2] and text classification [24]. Kipf et al. [13] proposed a simple and efficient layer-wise propagation rule for graph convolutional networks (GCN) based on spectral graph convolution and their model achieved significant raise in several graph-structured datasets. Veličković et al. [20] put forward a novel GNN architecture—graph attention networks (GAT), which introduced masked self-attention mechanism to tackle some key challenges of GNN. Recently, Ye et al. [26] proposed a new GAT framework for stroke classification, which demonstrated the great potential for online handwritten document recognition with GNN.

3 Method

We are given N labeled online handwritten diagrams ${D}=\left\{ \left( X^{i}, Y^{i}\right) | i \in [1, N]\right\} $, where each diagram $X^{i}$ is composed of a sequence of strokes $X^{i}=\left\{ X_{s}^{i} | s \in \left[ 1, M_{i}\right] \right\} $ ($M_{i}$ is the number of strokes in $ X^{i}$) and $Y_{s}^{i}$ is the label of $X_{s}^{i}$ which takes discrete semantic annotation, such as process, decision and arrow. Our target is to learn a model from the training set D that can predict the labels of strokes in testing diagrams as accurate as possible.

Roughly speaking, our method models each diagram with a graph in which nodes represent strokes and edges represent the relationships between strokes. Then, we treat diagram stroke classification as a graph node classification problem and solve it with attention-based GNN. The proposed method is composed of three modules, including the construction of the diagram graph, extraction of node and edge features from raw signals and graph attention networks, which will be introduced separately as follows.

3.1 Graph Building

Here we introduce a new approach to abstract the structure information in the diagram that each handwritten diagram is formulated as a space-time relationship graph (STRG). Every stroke $s_{i}$ is represented as a vertex ${v}_{i} \in V$ and the relevance in space and time between strokes $s_{i}$ and $s_{j}$ is noted as edge ${e}_{ij} \in E$ in graph G(V, E), where V is the vertex set and E is the edge set in G.

Specifically, from the time perspective, we build the edges $E_T=\{(t,t+1) | t \in [1, n - 1]\}$ between every temporal adjacent strokes in the diagram, where n is the number of strokes.

In view of spatial relationship, for the stroke ${s}_{s} $, the edge set ${e}_{s,N(s)} $ is added to E(G), where N(s) are all space neighbors of stroke ${s}_{s} $. If any stroke pairs’ minimal Euclidean distance is less than the spatial neighbor threshold (SNT), they are regarded as neighbors each other. The hyperparameter SNT is elaborately tuned on validation set. We also try to build more complex STRG of a document, but it has little effect to the experimental result. Figure 1 shows an example of flowchart rendering from original data and its corresponding STRG.

3.2 Feature Extraction

For each stroke in an online document, 10 local features and 13 context features [25] are extracted as node features in STRG. These features have been proven to be very effective in previous works [19, 25]. In addition, 19 edge features [25] are extracted from stroke pairs for modeling the relations between strokes. In feature pre-processing, we conduct power transformation with the coefficient 0.5 and normalization with mean $\mu $ and standard deviation $\sigma $. Therefore, the original feature h become:

$$\begin{aligned}&h^{\prime }=\text {sign}(h) \sqrt{|h|} \end{aligned}$$

(1)

$$\begin{aligned}&h^{\prime \prime }=\left( h^{\prime }-\mu \right) /\sigma \end{aligned}$$

(2)

where sign($\cdot $) is the sign function.

3.3 Graph Attention Networks

In this section, we introduce the enhanced GAT model, which is constructed by stacking multiple graph attention layers.

The input to each graph attention layer are a set of node features, $\mathbf{H}=\left\{ \overrightarrow{h}_{1}, \overrightarrow{h}_{2}, \ldots , \overrightarrow{h}_{|V|}\right\} , \overrightarrow{h}_{i} \in \mathbb {R}^{C}$, and a set of edge features $\mathbf{F}=\left\{ \overrightarrow{f}_{i j} |(i, j) \in {E}\right\} $, $\overrightarrow{f}_{i j}\in \mathbb {R}^{D}$, where |V| is the number of nodes, and C, D are the dimensionality of node features and edge features, respectively. The layer generates a new set of node features, $\mathbf{H}^{\prime }=\left\{ \overrightarrow{h}^{\prime }_{1}, \overrightarrow{h}_{2}^{\prime }, \ldots , \overrightarrow{h}_{|V|}^{\prime }\right\} , \overrightarrow{h}_{i}^{\prime }\in \mathbb {R}^{C^{\prime }}$, where $C^{\prime }$ is the dimension of output features.

In each layer, the first step is applying a shared linear transformation to every node, then a shared attention mechanism is performed to compute attention coefficients utilizing self-attention on the nodes:

$$\begin{aligned} c_{i j}=a\left( \mathbf{W}_{h} \overrightarrow{h}_{i}, \mathbf{W}_{h} \overrightarrow{h}_{j}\right) \end{aligned}$$

(3)

where $\mathbf{W}_{h}$ is a shared learnable weight matrix for the node-wise feature transformation. The node attention mechanism $a: \mathbb {R}^{C^{\prime }} \times \mathbb {R}^{C^{\prime }} \rightarrow \mathbb {R}$ used in this work is the additive attention parameterized by a learnable weight $\overrightarrow{a}_{h} \in \mathbb {R}^{C^{\prime }}$ with an activation function $\sigma $, which is formulated as:

$$\begin{aligned} c_{i j}=\sigma \left( \overrightarrow{a}_{h}^{T}\left( \mathbf{W}_{h} \overrightarrow{h}_{i}+\mathbf{W}_{h} \overrightarrow{h}_{j}\right) \right) . \end{aligned}$$

(4)

In addition to computing attention coefficients by self-attention mechanisms, we also incorporate edge features to measure the importance of edges by applying an one-layer feedforward neural network:

$$\begin{aligned} c_{i j}^{\prime }=\sigma \left( \overrightarrow{a}_{f}^{T} \sigma \left( \mathbf{W}_{f} \overrightarrow{f}_{ij}+\overrightarrow{b}_{f}\right) \right) \end{aligned}$$

(5)

where $\mathbf{W}_{f} \in \mathbb {R}^{C^{\prime } \times D}, \overrightarrow{b}_{f} \in \mathbb {R}^{C^{\prime }}, \overrightarrow{a}_{f} \in \mathbb {R}^{C^{\prime }}$ are all learnable parameters. In this work, we use Leaky ReLU as the activation function.

It should be noted that, the coefficients mentioned above are not comparable across different nodes. Consequently, they are normalized across all neighbors using the softmax function:

$$\begin{aligned} \alpha _{i j}=\text {softmax}_{j}\left( c_{i j}+c_{i j}^{\prime }\right) =\frac{\exp \left( c_{i j}+c_{i j}^{\prime }\right) }{\sum _{k \in N(i)} \exp \left( c_{i k}+c^{\prime }_{i k}\right) } \end{aligned}$$

(6)

where N(i) is the neighborhood of node i.

The final output features for every node are computed by aggregating weighted node features of neighbors with attention coefficients:

$$\begin{aligned} \overrightarrow{h}_{i}^{\prime }=\sigma \left( \sum _{j \in N(i)} \alpha _{i j} \mathbf{W}_{h} \overrightarrow{h}_{j}\right) . \end{aligned}$$

(7)

Following Veličković et al. [20], we also adopt multi-head attention in our model. Specifically, K independent attention mechanisms execute the transformation of Eq. 7, and then their features are concatenated:

$$\begin{aligned} \overrightarrow{h}_{i}^{\prime }= \Vert _{k=1}^{K}\sigma \left( \sum _{j \in N(i)} \alpha _{i j}^{k} \mathbf{W}_{h}^{k} \overrightarrow{h}_{j}\right) \end{aligned}$$

(8)

in which || denotes concatenation operation, and ${\alpha _{i j}^{k}}$ are normalized attention coefficients calculated by the k-th attention mechanism $ {a}^{k}$.

In the final layer, we perform average operation instead of concatenation, and then apply the softmax function to output predicted values:

$$\begin{aligned}&\overrightarrow{h}_{i}^{\prime }=\sigma \left( \frac{1}{K} \sum _{k=1}^{K} \sum _{j \in N(i)} \alpha _{i j}^{k} \mathbf{W}_{h}^{k} \overrightarrow{h}_{j}\right) \end{aligned}$$

(9)

$$\begin{aligned}&\overrightarrow{p}_{i}=\text {softmax}\left( \mathbf{W}_{o}\overrightarrow{h}_{i}^{\prime }\right) \end{aligned}$$

(10)

where $\mathbf{W}_{o} \in \mathbb {R}^{C^{\prime } \times L}$ is a learnable weight matrix that transforms features to outputs.

The standard cross-entropy loss on the training set is used to train the GAT model, which is formulated as:

$$\begin{aligned} L(\mathbf W )=-\sum _{i=1}^{N}\sum _{s=1}^{M_i}\text {log }\overrightarrow{p}_{s}\left( Y_{s}^{i}\right) , \end{aligned}$$

(11)

in which W encompasses all learnable parameters. W is initialized with Glorot initialization [9] and learned using mini-batch gradient descent. In practice, parameter optimization is performed with the Adam SGD optimizer [12].

4 Experiments

4.1 Dataset

In this work, we evaluate our method on three publicly accessible online handwritten diagram databases: FC_A [1], FC_B [5] and FA [6]. FC_A and FC_B are two flowchart databases which include text and six graphical symbols: terminator, connection, decision, data, process and arrow. FA is a finite automata database which encompasses state (circle), final state (pairwise concentric circles), arrow and label. Table 1 shows the details of the three databases.

Table 1. Online handwritten diagram datasets overview.

Full size table

4.2 Experiments Setup

For all experiments, we adopt a seven-layer GAT model with 32 neurons for each hidden layer employing residual connections [10]. Following [20], we employ the Leaky ReLU activation function with the negative slope of 0.2 as the attention functions in Eqs. 4 and 5. We also introduce dropout rules [18] with the dropout probability of 0.1 for all layers. In addition, every hidden layer consists of 8 attention heads for flowchart datasets (FC_A and FC_B) and 6 attention heads for FA. We use 2 output attention heads for all networks.

We train all models for 200 epochs by minimizing the standard cross-entropy loss on the training set with the early stopping strategy. The optimization is performed using the Adam optimizer [12] with an initial learning rate of 0.005 for flowchart datasets and 0.003 for FA. The decay rate r is set to 0.1 and the number of patience round $r=15$ for flowchart datasets and $r=17$ for FA. To train networks more efficiently, we adopt the mini-batch trick with 8 graphs for flowchart datasets and 6 graphs for FA.

The feature extraction module is implemented by C++ and both training and inference algorithms are implemented with Pytorch and Deep Graph Library (DGL)^{Footnote 1}. Training of GAT is conducted on a server with a NVIDIA Geforce GTX 980 GPU, while testing is performed on a PC with four Intel (R) Core (TM) i5-7400 CPU @ 3.00 GHz. Unless otherwise specified, we repeat each experiment for 10 times using the same configurations and report the average results.

4.3 Results and Discussion

In this work, we use the stroke classification accuracy to evaluate our method. Each stroke in a diagram is assigned a predefined symbol-level label by the model, which is then compared against the ground truth. Stroke classification accuracy on FC_A, FC_B and FA are reported in Tables 2, 3 and 4, respectively. Numbers in boldface show the best results. Results of comparison methods are directly cited from the original papers. GAT denotes the method that uses the original GAT model, while GAT with EFA denotes the method proposed in this work which enhances GAT by introducing edge feature attention.

As we can see, on all datasets, GAT with EFA outperformes previous methods and achieved the best overall accuracy results. One notable phenomenon is that the performance of previous methods vary dramatically across different labels, while GAT with EFA consistently delivers accurate prediction for all labels. In addition, GAT with EFA improves GAT with a large margin for all experiments, which demonstrates that the proposed edge feature attention plays an important role in capturing the complicated temporal and spatial relationships between strokes. Furthermore, our method is very efficient in both training and testing. For example, on FC_A, it takes about 38 min to train our model and 70 ms to classify all strokes of one flowchart under the settings described in Sect. 4.2.

Table 2. Stroke classification accuracy on FC_A (%).

Full size table

Table 3. Stroke classification accuracy on FC_B (%).

Full size table

Table 4. Stroke classification accuracy on FA (%).

Full size table

4.4 Error Analysis

In Fig. 2, we show three examples of recognized flowchart from FC_A with typical errors. If a stroke is far away from the symbol which it should belong to, but close to the neighboring one, it is more likely to be misclassified, as Fig. 2(a) and (b). For isolated text strokes next to the arrow, it is more possible to be predicted as arrow by mistake. Some recognition errors could be eliminated through postprocess, such as a process stroke surrounded by texts, as shown in (c). The confusion matrix for strokes classification result on test set of FC_A is presented in Fig. 3. Since some symbols are ambiguous in appearance, such as process and data, they are likely to be misclassified in highly confidence. Another very important factor for misclassification is that the number of training samples of different classes are imbalanced severely in nature, which has a serious side effect on recognition performance: the classifier is more likely to predict stroke classes with less samples as other classes that have more samples. However, in contrast to previous work, this effect is moderate in our proposed framework with edge attention mechanism.

5 Conclusions

In this work, we have introduced a novel and general framework based on GAT for online handwritten diagram recognition. We formulate diagram stroke classification as the node classification task in a graph. Experiments on two flowchart benchmark datasets and one finite automata dataset demonstrate that the proposed framework with edge feature attention mechanism is capable of encoding complex spatial and temporal relationships in an efficient way for stroke classification. Our method outperforms several recently proposed approaches by a prominent margin. Our model is computationally efficient, which is suitable for large-scale applications in mobile devices. Moreover, the classification performances have a great potential to be improved from our analysis of the failure cases. In the future work, we will investigate how to extend our framework to perform stroke grouping and symbol recognition of handwritten diagrams, as well as structure analysis of diagrams.

Notes

1.
https://github.com/dmlc/dgl.

References

Awal, A.M., Feng, G., Mouchere, H., Viard-Gaudin, C.: First experiments on a new online handwritten flowchart database. In: Document Recognition and Retrieval XVIII (2011)
Google Scholar
Battaglia, P., Pascanu, R., Lai, M., Rezende, D.J., et al.: Interaction networks for learning about objects, relations and physics. In: Advances in Neural Information Processing Systems (2016)
Google Scholar
Bresler, M., Průša, D., Hlavác, V.: Modeling flowchart structure recognition as a max-sum problem. In: International Conference on Document Analysis and Recognition (2013)
Google Scholar
Bresler, M., Průša, D., Hlavác, V.: Detection of arrows in on-line sketched diagrams using relative stroke positioning. In: IEEE Winter Conference on Applications of Computer Vision (2015)
Google Scholar
Bresler, M., Průša, D., Hlaváč, V.: Online recognition of sketched arrow-connected diagrams. Int. J. Doc. Anal. Recogn. 19(3), 253–267 (2016)
Article Google Scholar
Bresler, M., Van Phan, T., Průša, D., Nakagawa, M., Hlavác, V.: Recognition system for on-line sketched diagrams. In: International Conference on Frontiers in Handwriting Recognition (2014)
Google Scholar
Carton, C., Lemaitre, A., Coüasnon, B.: Fusion of statistical and structural information for flowchart recognition. In: International Conference on Document Analysis and Recognition (2013)
Google Scholar
Coüasnon, B.: DMOS, a generic document recognition method: application to table structure analysis in a general and in a specific way. Int. J. Doc. Anal. Recogn. 8(2–3), 111–122 (2006)
Article Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics (2010)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Julca-Aguilar, F.D., Hirata, N.S.: Symbol detection in online handwritten graphics using Faster R-CNN. In: International Workshop on Document Analysis Systems (2018)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations (2017)
Google Scholar
Lemaitre, A., Mouchère, H., Camillerapp, J., Coüasnon, B.: Interest of syntactic knowledge for on-line flowchart recognition. In: Kwon, Y.-B., Ogier, J.-M. (eds.) GREC 2011. LNCS, vol. 7423, pp. 89–98. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36824-0_9
Chapter Google Scholar
Miyao, H., Maruyama, R.: On-line handwritten flowchart recognition, beautification and editing system. In: International Conference on Frontiers in Handwriting Recognition (2012)
Google Scholar
Qi, Y., Szummer, M., Minka, T.P.: Diagram structure recognition by Bayesian conditional random fields. In: IEEE Conference on Computer Vision and Pattern Recognition (2005)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (2015)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Van Phan, T., Nakagawa, M.: Combination of global and local contexts for text/non-text classification in heterogeneous online handwritten documents. Pattern Recogn. 51, 112–124 (2016)
Article Google Scholar
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. In: International Conference on Learning Representations (2018)
Google Scholar
Wang, C., Mouchère, H., Lemaitre, A., Viard-Gaudin, C.: Online flowchart understanding by combining max-margin Markov random field with grammatical analysis. Int. J. Doc. Anal. Recogn. 20(2), 123–136 (2017)
Article Google Scholar
Wang, C., Mouchere, H., Viard-Gaudin, C., Jin, L.: Combined segmentation and recognition of online handwritten diagrams with high order Markov random field. In: International Conference on Frontiers in Handwriting Recognition (2016)
Google Scholar
Wu, J., Wang, C., Zhang, L., Rui, Y.: Offline sketch parsing via shapeness estimation. In: International Joint Conference on Artificial Intelligence (2015)
Google Scholar
Yao, L., Mao, C., Luo, Y.: Graph convolutional networks for text classification. arXiv preprint arXiv:1809.05679 (2018)
Ye, J.Y., Zhang, Y.M., Liu, C.L.: Joint training of conditional random fields and neural networks for stroke classification in online handwritten documents. In: International Conference on Pattern Recognition (2016)
Google Scholar
Ye, J.Y., Zhang, Y.M., Yang, Q., Liu, C.L.: Contextual stroke classification in online handwritten documents with graph attention networks. In: International Conference on Document Analysis and Recognition (2019)
Google Scholar
Yuan, Z., Pan, H., Zhang, L.: A novel pen-based flowchart recognition system for programming teaching. In: Leung, E.W.C., Wang, F.L., Miao, L., Zhao, J., He, J. (eds.) WBL 2008. LNCS, vol. 5328, pp. 55–64. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89962-4_6
Chapter Google Scholar
Zhou, J., Cui, G., Zhang, Z., Yang, C., Liu, Z., Sun, M.: Graph neural networks: a review of methods and applications. arXiv preprint arXiv:1812.08434 (2018)

Download references

Acknowledgements

This work is supported by the National Key Research and Development Program Grant 2018YFB1005000, the National Natural Science Foundation of China (NSFC) Grants 61773376, 61721004.

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing, 100190, People’s Republic of China
Xiao-Long Yun, Yan-Ming Zhang, Jun-Yu Ye & Cheng-Lin Liu
University of Chinese Academy of Sciences, Beijing, People’s Republic of China
Xiao-Long Yun, Jun-Yu Ye & Cheng-Lin Liu

Authors

Xiao-Long Yun
View author publications
You can also search for this author in PubMed Google Scholar
Yan-Ming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jun-Yu Ye
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Lin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cheng-Lin Liu .

Editor information

Editors and Affiliations

Beijing Jiaotong University, Beijing, China
Yao Zhao
The Australian National University, Canberra, Australia
Nick Barnes
Peking University, Beijing, China
Baoquan Chen
The Technical University of Munich, Munich, Bayern, Germany
Rüdiger Westermann
Zhejiang University, Hangzhou, China
Xiangwei Kong
Beijing Jiaotong University, Beijing, China
Chunyu Lin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yun, XL., Zhang, YM., Ye, JY., Liu, CL. (2019). Online Handwritten Diagram Recognition with Graph Attention Networks. In: Zhao, Y., Barnes, N., Chen, B., Westermann, R., Kong, X., Lin, C. (eds) Image and Graphics. ICIG 2019. Lecture Notes in Computer Science(), vol 11901. Springer, Cham. https://doi.org/10.1007/978-3-030-34120-6_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-34120-6_19
Published: 28 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34119-0
Online ISBN: 978-3-030-34120-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Online Handwritten Diagram Recognition with Graph Attention Networks

Abstract

Similar content being viewed by others

Online recognition of sketched arrow-connected diagrams

Arrow R-CNN for handwritten diagram recognition

Stroke-Level Graph Labeling with Edge-Weighted Graph Attention Network for Handwritten Mathematical Expression Recognition

Keywords

1 Introduction