Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Syntactic Pattern Recognition in Computer Vision: A Systematic Review

Published: 17 April 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Using techniques derived from the syntactic methods for visual pattern recognition is not new and was much explored in the area called syntactical or structural pattern recognition. Syntactic methods have been useful because they are intuitively simple to understand and have transparent, interpretable, and elegant representations. Their capacity to represent patterns in a semantic, hierarchical, compositional, spatial, and temporal way have made them very popular in the research community. In this article, we try to give an overview of how syntactic methods have been employed for computer vision tasks. We conduct a systematic literature review to survey the most relevant studies that use syntactic methods for pattern recognition tasks in images and videos. Our search returned 597 papers, of which 71 papers were selected for analysis. The results indicated that in most of the studies surveyed, the syntactic methods were used as a high-level structure that makes the hierarchical or semantic relationship among objects or actions to perform the most diverse tasks.

    References

    [1]
    Nosheen Abid, Adnan ul Hasan, and Faisal Shafait. 2018. DeepParse: A trainable postal address parser. In Proceedings of the Conference on Digital Image Computing: Techniques and Applications (DICTA’18). IEEE, 1--8.
    [2]
    Francisco Álvaro, Joan-Andreu Sánchez, and José-Miguel Benedí. 2014. Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models. Pattern Recog. Lett. 35 (2014), 58--67.
    [3]
    Francisco Álvaro, Joan-Andreu Sánchez, and José-Miguel Benedí. 2016. An integrated grammar-based approach for mathematical expression recognition. Pattern Recog. 51 (2016), 135--147.
    [4]
    Alexander Andreopoulos and John K. Tsotsos. 2013. 50 Years of object recognition: Directions forward. Comput. Vis. Image Underst. 117, 8 (2013), 827--891.
    [5]
    Gilberto Astolfi, Marcio Carneiro Brito Pache, Geazy Vilharva Menezes, Adair da Silva Oliveira Junior, Gabriel Kirsten Menezes, Vanessa Aparecida Moares de Weber, Everton Castelão Tetila, Nícolas Alessandro de Souza Belete, Edson Takashi Matsubara, and Hemerson Pistori. 2020. Combining syntactic methods with LSTM to classify soybean aerial images. IEEE Geosci. Rem. Sens. Lett. 1, 1 (2020), 1--5.
    [6]
    Kaouther Khazri Ayeb, Afef Kacem Echi, and Abdel Belaïd. 2015. A syntax directed system for the recognition of printed Arabic mathematical formulas. In Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR’15). IEEE, 186--190.
    [7]
    Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc J. Van Gool. 2008. Speeded-Up robust features (SURF). Comput. Vis. Image Underst. 110, 3 (June 2008), 346--359.
    [8]
    Andrew Blake, Pushmeet Kohli, and Carsten Rother. 2011. Markov Random Fields for Vision and Image Processing. The MIT Press, Cambridge, MA.
    [9]
    Alexandre Boulch, Simon Houllier, Renaud Marlet, and Olivier Tournaire. 2013. Semantizing complex 3D scenes using constrained attribute grammars. In Proceedings of the 11th Eurographics/ACMSIGGRAPH Symposium on Geometry Processing (SGP’13). Eurographics Association, 33--42.
    [10]
    Lubomir Bourdev, Subhransu Maji, Thomas Brox, and Jitendra Malik. 2010. Detecting people using mutually consistent poselet activations. In Proceedings of the 11th European Conference on Computer Vision (ECCV’10). Springer-Verlag, Berlin, 168--181. Retrieved from http://dl.acm.org/citation.cfm?idequals;1888212.1888227.
    [11]
    Steve Brooks, Andrew Gelman, Galin Jones, and Xiao-Li Meng. 2011. Handbook of Markov Chain Monte Carlo. CRC Press, Boca Raton, FL. Retrieved from https://books.google.com.br/books?idequals;qfRsAIKZ4rIC.
    [12]
    Gaurav Chanda and Frank Dellaert. 2004. Grammatical Methods in Computer Vision: An Overview. Technical Report GIT-GVU-04-29. Georgia Institute of Technology. Retrieved from https://www.cc.gatech.edu/gvu/reports/2004/abstracts/04-29.html.
    [13]
    Tae Eun Choe, Hongli Deng, Feng Guo, Mun Wai Lee, and Niels Haering. 2013. Semantic video-to-video search using sub-graph grouping and matching. In Proceedings of the IEEE International Conference on Computer Vision Workshops. IEEE, 787--794.
    [14]
    Jeroen Chua and Pedro F. Felzenszwalb. 2016. Scene grammars, factor graphs, and belief propagation. CoRR abs/1606.01307 (2016), 1--46.
    [15]
    Nicholas Dahm, Yongsheng Gao, Terry Caelli, and Horst Bunke. 2013. Matching non-aligned objects using a relational string-graph. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 3394--3398.
    [16]
    Lluís-Pere de las Heras, Oriol Ramos Terrades, and Josep Lladós. 2015. Attributed graph grammar for floor plan analysis. In Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR’15). IEEE, 726--730.
    [17]
    Ilke Demir, Daniel G. Aliaga, and Bedrich Benes. 2015. Procedural editing of 3D building point clouds. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). IEEE, 2147--2155.
    [18]
    Vincenzo Deufemia, Michele Risi, and Genoveffa Tortora. 2014. Sketched symbol recognition using latent-dynamic conditional random fields and distance-based clustering. Pattern Recog. 47, 3 (2014), 1159--1171.
    [19]
    Murray Eden. 1961. On the formalization of handwriting. Amer. Math. Soc. Appl. Math Symp. 12 (1961), 83--88.
    [20]
    Haoshu Fang, Yuanlu Xu, Wenguan Wang, Xiaobai Liu, and Song-Chun Zhu. 2018. Learning pose grammar to encode human body configuration for 3D pose estimation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI’18), the 30th innovative Applications of Artificial Intelligence (IAAI’18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’18), Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 6821--6828.
    [21]
    Weiguo Feng, Rui Liu, and Ming Zhu. 2014. Fall detection for elderly person care in a vision-based home surveillance environment using a monocular camera. Sig. Image Vid. Proc. 8, 6 (2014), 1129--1138.
    [22]
    G. Ferber. 1986. Classifying and validating intermittent EEG patterns with syntactic methods. Pattern Recog. 19, 4 (1986), 289--295.
    [23]
    Amy Fire and Song-Chun Zhu. 2017. Inferring hidden statuses and actions in video by causal reasoning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’17). IEEE, 48--56.
    [24]
    Mariusz Flasiński and Janusz Jurek. 2014. Fundamental methodological issues of syntactic pattern recognition. Pattern Anal. Applic. 17, 3 (01 Aug. 2014), 465--480.
    [25]
    G. D. Forney. 2001. Codes on graphs: Normal realizations. IEEE Trans. Inf. Theor. 47, 2 (Feb. 2001), 520--548.
    [26]
    David A. Forsyth and Jean Ponce. 2002. Computer Vision: A Modern Approach. Prentice Hall Professional Technical Reference, Upper Saddle River, NJ.
    [27]
    King-Sun Fu and A. Rosenfeld. 1976. Pattern recognition and image processing. IEEE Trans. Comput. C-25, 12 (Dec. 1976), 1336--1346.
    [28]
    Raghudeep Gadde, Renaud Marlet, and Nikos Paragios. 2016. Learning grammars for architecture-specific facade parsing. Int. J. Comput. Vis. 117, 3 (May 2016), 290--316.
    [29]
    Zoubin Ghahramani. 2001. An introduction to hidden Markov models and Bayesian networks. Int. J. Pattern Recog. Artif. Intell. 15, 01 (2001), 9--42.
    [30]
    Josep M. Gonfaus, Marco Pedersoli, Jordi González, Andrea Vedaldi, and F. Xavier Roca. 2015. Factorized appearances for object detection. Comput. Vis. Image Underst. 138 (2015), 92--101.
    [31]
    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the International Conference on Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2672--2680.
    [32]
    Klaus Greff, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink, and Jürgen Schmidhuber. 2017. LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 28, 10 (Oct. 2017), 2222--2232.
    [33]
    Christian Hentschel and Harald Sack. 2014. Does one size really fit all?: Evaluating classifiers in bag-of-visual-words classification. In Proceedings of the 14th International Conference on Knowledge Technologies and Data-driven Business. ACM, New York, NY.
    [34]
    Geoffrey Hinton, Sara Sabour, and Nicholas Frosst. 2018. Matrix capsules with EM routing. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). ICLR, 1--15.
    [35]
    Geoffrey E. Hinton, Alex Krizhevsky, and Sida D. Wang. 2011. Transforming auto-encoders. In Lecture Notes in Computer Science. Springer Berlin, 44--51.
    [36]
    Satoshi Ikehata, Hang Yang, and Yasutaka Furukawa. 2015. Structured indoor modeling. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). IEEE, 1323--1331.
    [37]
    Phillip Isola and Ce Liu. 2013. Scene collaging: Analysis and synthesis of natural images with semantic layers. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). IEEE, Washington, DC, 3048--3055.
    [38]
    Tommi S. Jaakkola and David Haussler. 1999. Exploiting generative models in discriminative classifiers. In Proceedings of the Conference on Advances in Neural Information Processing Systems. The MIT Press, Cambridge, MA, 487--493. Retrieved from http://dl.acm.org/citation.cfm?idequals;340534.340715.
    [39]
    A. K. Jain, R. P. W. Duin, and Jianchang Mao. 2000. Statistical pattern recognition: A review. IEEE Trans. Pattern Anal. Mach. Intell. 22, 1 (Jan. 2000), 4--37.
    [40]
    Ahsan Jalal, Ahmad Salman, Ajmal Mian, Mark Shortis, and Faisal Shafait. 2020. Fish detection and species classification in underwater environments using deep learning with temporal information. Ecol. Inform. 57 (May 2020), 101088.
    [41]
    Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia (MM’14). Association for Computing Machinery, New York, NY, 675--678.
    [42]
    Chenfanfu Jiang, Siyuan Qi, Yixin Zhu, Siyuan Huang, Jenny Lin, Lap-Fai Yu, Demetri Terzopoulos, and Song-Chun Zhu. 2018. Configurable 3D scene synthesis and 2D image rendering with per-pixel ground truth using stochastic grammars. Int. J. Comput. Vis. 126, 9 (June 2018), 920--941.
    [43]
    Yunsheng Jiang and Jinwen Ma. 2015. Combination features and models for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, Boston, MA, 240--248.
    [44]
    Frank D. Julca-Aguilar, Harold Mouchère, Christian Viard-Gaudin, and Nina S. T. Hirata. 2017. A general framework for the recognition of online handwritten graphics. CoRR abs/1709.06389 (2017), 1--14.
    [45]
    Aniruddha Kembhavi, Mike Salvato, Eric Kolve, Minjoon Seo, Hannaneh Hajishirzi, and Ali Farhadi. 2016. A diagram is worth a dozen images. In Computer Vision -- ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 235--251.
    [46]
    Diederik P. Kingma, Danilo J. Rezende, Shakir Mohamed, and Max Welling. 2014. Semi-supervised learning with deep generative models. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14). The MIT Press, Cambridge, MA, 3581--3589.
    [47]
    Russell A. Kirsch. 1964. Computer interpretation of English text and picture patterns. IEEE Trans. Electron. Comput. EC-13, 4 (Aug. 1964), 363--376.
    [48]
    Barbara Kitchenham and Stuart Charters. 2007. Guidelines for Performing Systematic Literature Reviews in Software Engineering. Technical Report EBSE 2007-001. Keele University and Durham University Joint Report. Retrieved from http://www.dur.ac.uk/ebse/resources/Systematic-reviews-5-8.pdf.
    [49]
    W. W. Kong and Surendra Ranganath. 2014. Towards subject independent continuous sign language recognition: A segment and merge approach. Pattern Recog. 47, 3 (2014), 1294--1308.
    [50]
    Adam Kortylewski, Aleksander Wieczorek, Mario Wieser, Clemens Blumer, Sonali Parbhoo, Andreas Morel-Forster, Volker Roth, and Thomas Vetter. 2019. Greedy structure learning of hierarchical compositional models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). Computer Vision Foundation/IEEE, 11612--11621.
    [51]
    Mateusz Koziński, Raghudeep Gadde, Sergey Zagoruyko, Guillaume Obozinski, and Renaud Marlet. 2015. A MRF shape prior for facade parsing with occlusions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, Boston, MA, 2820--2828.
    [52]
    Mateusz Koziński and Renaud Marlet. 2014. Image parsing with graph grammars and Markov Random Fields applied to facade analysis. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. IEEE, 729--736.
    [53]
    Mateusz Koziński, Guillaume Obozinski, and Renaud Marlet. 2015. Beyond procedural facade parsing: Bidirectional alignment via linear programming. In Computer Vision -- ACCV 2014, Daniel Cremers, Ian Reid, Hideo Saito, and Ming-Hsuan Yang (Eds.). Springer International Publishing, Cham, 79--94.
    [54]
    Volker Krüger and Dennis Herzog. 2013. Tracking in object action space. Comput. Vis. Image Underst. 117, 7 (2013), 764--789.
    [55]
    Hilde Kuehne, Juergen Gall, and Thomas Serre. 2016. An end-to-end generative framework for video segmentation and recognition. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’16). IEEE, 1--8.
    [56]
    Hilde Kuehne, Alexander Richard, and Juergen Gall. 2017. Weakly supervised learning of actions from transcripts. Comput. Vis. Image Underst. 163 (2017), 78--89.
    [57]
    Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. IEEE, New York, NY, 2169--2178.
    [58]
    T. Hoang Ngan Le, ChenChen Zhu, Yutong Zheng, Khoa Luu, and Marios Savvides. 2017. DeepSafeDrive: A grammar-aware driver parsing approach to Driver Behavioral Situational Awareness (DB-SAW). Pattern Recog. 66 (2017), 229--238.
    [59]
    Kyuhwa Lee, Dimitri Ognibene, Hyung Jin Chang, Tae-Kyun Kim, and Yiannis Demiris. 2015. STARE: Spatio-temporal attention relocation for multiple structured activities detection. IEEE Trans. Image Proc. 24, 12 (Dec. 2015), 5916--5927.
    [60]
    Eduardo Lemus, Ernesto Bribiesca, and Edgar Garduno. 2015. Surface trees Representation of boundary surfaces using a tree descriptor. J. Vis. Commun. Image Represent. 31 (2015), 101--111.
    [61]
    Bo Li, Yaobin Chen, and Fei-Yue Wang. 2015. Pedestrian detection based on clustered poselet models and hierarchical and-or grammar. IEEE Trans. Vehic. Technol. 64, 4 (Apr. 2015), 1435--1444.
    [62]
    Bo Li, Xi Song, Tianfu Wu, Wenze Hu, and Mingtao Pei. 2014. Coupling-and-decoupling: A hierarchical model for occlusion-free object detection. Pattern Recog. 47, 10 (2014), 3254--3264.
    [63]
    Xilai Li, Xi Song, and Tianfu Wu. 2019. AOGNets: Compositional grammatical architectures for deep learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). IEEE, 6220--6230.
    [64]
    Xilai Li, Tianfu Wu, Xi Song, and Hamid Krim. 2017. AOGNets: Deep AND-OR grammar networks for visual recognition. CoRR abs/1711.05847 (2017), 1--12.
    [65]
    Li Liu, Shu Wang, Yuxin Peng, Zigang Huang, Ming Liu, and Bin Hu. 2016. Mining intricate temporal rules for recognizing complex activities of daily living under uncertainty. Pattern Recog. 60 (2016), 1015--1028.
    [66]
    Xianming Liu, Rongrong Ji, Changhu Wang, Wei Liu, Bineng Zhong, and Thomas S. Huang. 2015. Understanding image structure via hierarchical shape parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). IEEE, Boston, MA, 5042--5050.
    [67]
    Xiaobai Liu, Yuanlu Xu, Lei Zhu, and Yadong Mu. 2018. A stochastic attribute grammar for robust cross-view human tracking. IEEE Trans. Circ. Syst. Vid. Technol. 28, 10 (Oct. 2018), 2884--2895.
    [68]
    Xiaobai Liu, Yibiao Zhao, and Song-Chun Zhu. 2014. Single-view 3D scene parsing by attributed grammar. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 684--691.
    [69]
    Xiaobai Liu, Yibiao Zhao, and Song-Chun Zhu. 2018. Single-view 3D scene reconstruction and parsing by attribute grammar. IEEE Trans. Pattern Anal. Mach. Intell. 40, 3 (Mar. 2018), 710--725.
    [70]
    Yang Lu, Tianfu Wu, and Song-Chun Zhu. 2014. Online object tracking, learning, and parsing with and-or graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3462--3469.
    [71]
    Andelo Martinovic and Luc Van Gool. 2013. Early Parsing for 2D Stochastic Context Free Grammars. Technical Report KUL/ESAT/PSI/1301. Department of Electrical Engineering (ESAT), University Hospital Gasthuisberg, Kasteelpark Arenberg, België.
    [72]
    Andelo Martinovic and Luc Van Gool. 2013. Bayesian grammar learning for inverse procedural modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). IEEE Computer Society, Washington, DC, 201--208.
    [73]
    Lilyana Mihalkova, Tuyen Huynh, and Raymond J. Mooney. 2007. Mapping and revising Markov logic networks for transfer learning. In Proceedings of the 22nd National Conference on Artificial Intelligence (AAAI’07). AAAI Press, 608--614. Retrieved from http://dl.acm.org/citation.cfm?idequals;1619645.1619743.
    [74]
    Darnell Moore and Irfan Essa. 2002. Recognizing multitasked activities from video using stochastic context-free grammar. In Proceedings of the 18th National Conference on Artificial Intelligence. American Association for Artificial Intelligence, 770--776.
    [75]
    Louis-Philippe Morency, Ariadna Quattoni, and Trevor Darrell. 2007. Latent-dynamic discriminative models for continuous gesture recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.
    [76]
    R. Narasimhan. 1962. A Linguistic Approach to Pattern Recognition. Technical Report 121. Digital Computer Laboratory, University of Illinois, Urbana, IL.
    [77]
    Andrew Y. Ng and Michael I. Jordan. 2001. On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS’01). The MIT Press, Cambridge, MA, 841--848.
    [78]
    Andrew Y. Ng and Michael I. Jordan. 2001. On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS’01). The MIT Press, Cambridge, MA, 841--848.
    [79]
    T. Ojala, M. Pietikainen, and D. Harwood. 1994. Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. In Proceedings of 12th International Conference on Pattern Recognition. IEEE, 582--585.
    [80]
    Eray Özkural. 2014. An application of stochastic context sensitive grammar induction to transfer learning. In Artificial General Intelligence, Ben Goertzel, Laurent Orseau, and Javier Snaider (Eds.). Springer International Publishing, Cham, 121--132.
    [81]
    Seyoung Park, Bruce Xiaohan Nie, and Song-Chun Zhu. 2018. Attribute and-or grammar for joint parsing of human pose, parts and attributes. IEEE Trans. Pattern Anal. Mach. Intell. 40, 7 (July 2018), 1555--1569.
    [82]
    Seyoung Park and Song-Chun Zhu. 2015. Attributed grammars for joint estimation of human attributes, part and pose. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). IEEE, 2372--2380.
    [83]
    Ricardo Wandré Dias Pedro, Fátima L. S. Nunes, and Ariane Machado-Lima. 2013. Using grammars for pattern recognition in images: A systematic review. ACM Comput. Surv. 46, 2 (Nov. 2013).
    [84]
    Mingtao Pei, Zhangzhang Si, Benjamin Z. Yao, and Song-Chun Zhu. 2013. Learning and parsing video events with goal and intent prediction. Comput. Vis. Image Underst. 117, 10 (Oct. 2013), 1369--1383.
    [85]
    John L. Pfaltz and Azriel Rosenfeld. 1969. Web grammars. In Proceedings of the 1st International Joint Conference on Artificial Intelligence (IJCAI’69). Morgan Kaufmann Publishers Inc., San Francisco, CA, 609--619. Retrieved from http://dl.acm.org/citation.cfm?idequals;1624562.1624616.
    [86]
    Hamed Pirsiavash and Deva Ramanan. 2014. Parsing videos of actions with segmental grammars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). IEEE Computer Society, Washington, DC, 612--619.
    [87]
    Hemerson Pistori, Andrew Calway, and Peter Flach. 2013. A new strategy for applying grammatical inference to image classification problems. In Proceedings of the IEEE International Conference on Industrial Technology (ICIT’13). IEEE, 1032--1037.
    [88]
    Siyuan Qi, Siyuan Huang, Ping Wei, and Song-Chun Zhu. 2017. Predicting human activities using stochastic grammar. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, 1173--1181.
    [89]
    Siyuan Qi, Yixin Zhu, Siyuan Huang, Chenfanfu Jiang, and Song-Chun Zhu. 2018. Human-centric indoor scene synthesis using stochastic grammar. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 5899--5908.
    [90]
    Christian P. Robert and George Casella. 1999. The Metropolis—Hastings algorithm. In Springer Texts in Statistics. Springer New York, New York, NY, 231--283.
    [91]
    Antonio Foncubierta Rodríguez, Henning Müller, and Adrien Depeursinge. 2017. From visual words to a visual grammar: Using language modelling for image classification. CoRR abs/1703.05571 (2017), 1--17.
    [92]
    Brandon Rothrock, Seyoung Park, and Song-Chun Zhu. 2013. Integrating grammar and segmentation for human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3214--3221.
    [93]
    Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. 2017. Dynamic routing between capsules. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, 3859--3869.
    [94]
    Anderson Santos, José Marcato Junior, Jonathan de Andrade Silva, Rodrigo Pereira, Daniel Matos, Geazy Menezes, Leandro Higa, Anette Eltner, Ana Paula Ramos, Lucas Osco, and Wesley Gonçalves. 2020. Storm-drain and manhole detection using the RetinaNet method. Sensors 20, 16 (Aug. 2020), 4450.
    [95]
    Sunita Sarawagi and William W. Cohen. 2004. Semi-Markov conditional random fields for information extraction. In Proceedings of the 17th International Conference on Neural Information Processing Systems. The MIT Press, Cambridge, MA, 1185--1192. Retrieved from http://dl.acm.org/citation.cfm?idequals;2976040.2976189.
    [96]
    M. Schuster and K. K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Sig. Proc. 45, 11 (1997), 2673--2681.
    [97]
    Ricky J. Sethi and Amit K. Roy-Chowdhury. 2010. Modeling and recognition of complex multi-person interactions in video. In Proceedings of the 1st ACM International Workshop on Multimodal Pervasive Video Analysis (MPVA’10). ACM, New York, NY, 43--46.
    [98]
    Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15). ICLR, 1--14.
    [99]
    Kenneth Slonneger and Barry Kurtz. 1995. Formal Syntax and Semantics of Programming Languages: A Laboratory Based Approach (1st ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA.
    [100]
    Xi Song, Tianfu Wu, Yunde Jia, and Song-Chun Zhu. 2013. Discriminatively trained and-or tree models for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3278--3285.
    [101]
    George Stiny and James Gips. 1971. Shape grammars and the generative specification of painting and sculpture. In Information Processing, Proceedings of IFIP Congress, Vol. 2. Elsevier, North Holland Publishing Co., 1460--1465.
    [102]
    Domen Tabernik, Matej Kristan, Jeremy L. Wyatt, and Ales Leonardis. 2016. Towards deep compositional networks. In Proceedings of the 23rd International Conference on Pattern Recognition (ICPR’16). IEEE, 3470--3475.
    [103]
    Domen Tabernik, Aleš Leonardis, Marko Boben, Danijel Skočaj, and Matej Kristan. 2015. Adding discriminative power to a generative hierarchical compositional model using histograms of compositions. Comput. Vis. Image Underst. 138, C (Sept. 2015), 102--113.
    [104]
    Jawad Tayyub, Majd Hawasly, David C. Hogg, and Anthony G. Cohn. 2018. Learning hierarchical models of complex daily activities from annotated videos. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’18). IEEE, 1633--1641.
    [105]
    Olivier Teboul, Iasonas Kokkinos, Loic Simon, Panagiotis Koutsourakis, and Nikos Paragios. 2011. Shape grammar parsing via reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE Computer Society, Washington, DC, 2273--2280.
    [106]
    Olivier Teboul, Iasonas Kokkinos, Loic Simon, Panagiotis Koutsourakis, and Nikos Paragios. 2013. Parsing facades with shape grammars and reinforcement learning. IEEE Trans. Pattern Anal. Mach. Intell. 35, 7 (July 2013), 1744--1756.
    [107]
    Everton Castelão Tetila, Bruno Brandoli Machado, Gilberto Astolfi, Nícolas Alessandro de Souza Belete, Willian Paraguassu Amorim, Antonia Railda Roel, and Hemerson Pistori. 2020. Detection and classification of soybean pests using deep learning with UAV images. Comput. Electron. Agric. 179 (2020), 105836.
    [108]
    Bin Tian, Ming Tang, and Fei-Yue Wang. 2015. Vehicle detection grammars with partial occlusion handling for traffic surveillance. Transport. Res. Part C: Emerg. Technol. 56 (2015), 80--93.
    [109]
    Nam N. Vo and Aaron F. Bobick. 2014. From stochastic grammar to Bayes network: Probabilistic parsing of complex activity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2641--2648.
    [110]
    Nam N. Vo and Aaron F. Bobick. 2016. Sequential interval network for parsing complex structured activity. Comput. Vis. Image Underst. 143 (2016), 147--158.
    [111]
    Michael Walton, Doug Lange, and Song-Chun Zhu. 2017. Inferring context through scene understanding. In Proceedings of the AAAI Spring Symposium Series. AAAI Press, 356--360.
    [112]
    Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu. 2013. Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103, 1 (May 2013), 60--79.
    [113]
    Wenguan Wang, Wenguan Wang, Yuanlu Xu, Jianbing Shen, and Song-Chun Zhu. 2018. Attentive fashion grammar network for fashion landmark detection and clothing category classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 4271--4280.
    [114]
    Julien Weissenberg, Hayko Riemenschneider, Mukta Prasad, and Luc Van Gool. 2013. Is there a procedural logic to architecture? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Washington, DC, 185--192.
    [115]
    A. D. Wilson and A. F. Bobick. 1999. Parametric hidden Markov models for gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 21, 9 (Sep. 1999), 884--900.
    [116]
    David Windridge, Josef Kittler, Teofilo de Campos, Fei Yan, William Christmas, and Aftab Khan. 2015. A novel Markov logic rule induction strategy for characterizing sports video footage. IEEE MultiMedia 22, 2 (Apr. 2015), 24--35.
    [117]
    Bingwei Wu. 2013. Two-dimensional (2D) Languages and Application to Handwritten Graphical Parsing. Technical Report. Ecole Polytechnique de l’université de Nantes. Retrieved from https://hal.archives-ouvertes.fr/hal-00861080.
    [118]
    Ying Nian Wu, Zhangzhang Si, Haifeng Gong, and Song-Chun Zhu. 2009. Learning active basis model for object detection and recognition. Int. J. Comput. Vis. 90, 2 (Aug. 2009), 198--235.
    [119]
    Xianglei Xing, Tianfu Wu, Song-Chun Zhu, and Ying Nian Wu. 2020. Inducing hierarchical compositional model by sparsifying generator network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). IEEE, 14284--14293.
    [120]
    Xianglei Xing, Song-Chun Zhu, and Ying Nian Wu. 2019. Inducing sparse coding and And-Or grammar from generator network. In Proceedings of the AAAI Conference on Artificial Intelligence, Workshop on Network Interpretability for Deep Learning. AAAI Press, 1--4.
    [121]
    Yuanlu Xu, Lei Qin, Xiaobai Liu, Jianwen Xie, and Song-Chun Zhu. 2018. A causal and-or graph model for visibility fluent reasoning in tracking interacting objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2178--2187.
    [122]
    M. S. Zarchi, R. T. Tan, C. van Gemeren, A. Monadjemi, and R. C. Veltkamp. 2016. Understanding image concepts using ISTOP model. Pattern Recog. 53, C (May 2016), 174--183.
    [123]
    Yibiao Zhao and Song-Chun Zhu. 2013. Scene parsing by integrating function, geometry and appearance models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3119--3126.
    [124]
    Y. Zhu, N. Nayak, U. Gaur, B. Song, and A. Roy-Chowdhury. 2013. Modeling multi-object interactions using string of feature graphs. Comput. Vis. Image Underst. 117, 10 (2013), 1313--1328.
    [125]
    Bartosz Zieliński, Marek Skomorowski, Wadim Wojciechowski, Mariusz Korkosz, and Kamila Sprężak. 2015. Computer aided erosions and osteophytes detection based on hand radiographs. Pattern Recog. 48, 7 (2015), 2304--2317.

    Cited By

    View all
    • (2024)Pictorial syntaxMind & Language10.1111/mila.12497Online publication date: 2-Jan-2024
    • (2024)Design of Motor Skill Recognition and Hierarchical Evaluation System for Table Tennis PlayersIEEE Sensors Journal10.1109/JSEN.2023.334688024:4(5303-5315)Online publication date: 15-Feb-2024
    • (2024)Hierarchical attributed graph-based generative façade parsing for high-rise residential buildingsAutomation in Construction10.1016/j.autcon.2024.105471164(105471)Online publication date: Aug-2024
    • Show More Cited By

    Index Terms

    1. Syntactic Pattern Recognition in Computer Vision: A Systematic Review

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 54, Issue 3
      April 2022
      836 pages
      ISSN:0360-0300
      EISSN:1557-7341
      DOI:10.1145/3461619
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 April 2021
      Accepted: 01 January 2021
      Revised: 01 November 2020
      Received: 01 April 2020
      Published in CSUR Volume 54, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Computer vision
      2. formal languages
      3. image representation
      4. pattern recognition
      5. syntactic methods

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • Foundation for the Support and Development of Education, Science and Technology from the State of Mato Grosso do Sul, FUNDECT
      • Brazilian National Council of Technological and Scientific Development, CNPq
      • Coordination for the Improvement of Higher Education Personnel, CAPES

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)120
      • Downloads (Last 6 weeks)8
      Reflects downloads up to

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Pictorial syntaxMind & Language10.1111/mila.12497Online publication date: 2-Jan-2024
      • (2024)Design of Motor Skill Recognition and Hierarchical Evaluation System for Table Tennis PlayersIEEE Sensors Journal10.1109/JSEN.2023.334688024:4(5303-5315)Online publication date: 15-Feb-2024
      • (2024)Hierarchical attributed graph-based generative façade parsing for high-rise residential buildingsAutomation in Construction10.1016/j.autcon.2024.105471164(105471)Online publication date: Aug-2024
      • (2024)Research on Pedestrian Intrusion Detection Method in Coal Mine Based on Deep LearningMultimedia Technology and Enhanced Learning10.1007/978-3-031-50577-5_13(169-183)Online publication date: 21-Feb-2024
      • (2023)Pattern Recognition and Deep Learning Technologies, Enablers of Industry 4.0, and Their Role in Engineering ResearchSymmetry10.3390/sym1502053515:2(535)Online publication date: 17-Feb-2023
      • (2023)Urban Carbon Price Forecasting by Fusing Remote Sensing Images and Historical Price DataForests10.3390/f1410198914:10(1989)Online publication date: 3-Oct-2023
      • (2023)Development of Optimal Hyperparameter Tuning-Cycle GAN for Photo-realistic Face Age Progression ModelInternational Journal on Artificial Intelligence Tools10.1142/S021821302350068932:07Online publication date: 28-Nov-2023
      • (2023)A Paradigm Shift towards Computer Vision2023 International Conference on Device Intelligence, Computing and Communication Technologies, (DICCT)10.1109/DICCT56244.2023.10110300(54-58)Online publication date: 17-Mar-2023
      • (2022)A Spatial Lexical Analyzer and 3D Grammars that Recognize Voxel Based Structures Using Linear Positional Grammars in Minecraft2022 21st Brazilian Symposium on Computer Games and Digital Entertainment (SBGames)10.1109/SBGAMES56371.2022.9961122(1-6)Online publication date: 24-Oct-2022
      • (2022)A Stochastic Grammar Approach to Predict Flight Phases of a Hypersonic Glide Vehicle2022 IEEE Aerospace Conference (AERO)10.1109/AERO53065.2022.9843362(01-15)Online publication date: 5-Mar-2022
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media