Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/ICSE.2019.00049acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

ActionNet: vision-based workflow action recognition from programming screencasts

Published: 25 May 2019 Publication History

Abstract

Programming screencasts have two important applications in software engineering context: study developer behaviors, information needs and disseminate software engineering knowledge. Although programming screencasts are easy to produce, they are not easy to analyze or index due to the image nature of the data. Existing techniques extract only content from screencasts, but ignore workflow actions by which developers accomplish programming tasks. This significantly limits the effective use of programming screencasts in downstream applications. In this paper, we are the first to present a novel technique for recognizing workflow actions in programming screencasts. Our technique exploits image differencing and Convolutional Neural Network (CNN) to analyze the correspondence and change of consecutive frames, based on which nine classes of frequent developer actions can be recognized from programming screencasts. Using programming screencasts from Youtube, we evaluate different configurations of our CNN model and the performance of our technique for developer action recognition across developers, working environments and programming languages. Using screencasts of developers' real work, we demonstrate the usefulness of our technique in a practical application for action-aware extraction of key-code frames in developers' work.

References

[1]
J. Wang, X. Peng, Z. Xing, and W. Zhao, "An exploratory study of feature location process: Distinct phases, recurring patterns, and elementary actions," in Software Maintenance (ICSM), 2011 27th IEEE International Conference on. IEEE, 2011, pp. 213--222.
[2]
H. Li, Z. Xing, X. Peng, and W. Zhao, "What help do developers seek, when and how?" in Reverse Engineering (WCRE), 2013 20th Working Conference on. IEEE, 2013, pp. 142--151.
[3]
X. Xia, L. Bao, D. Lo, Z. Xing, A. E. Hassan, and S. Li, "Measuring program comprehension: A large-scale field study with professionals," IEEE Transactions on Software Engineering, 2017.
[4]
L. Bao, Z. Xing, X. Xia, D. Lo, and A. E. Hassan, "Inference of development activities from interaction with uninstrumented applications," Empirical Software Engineering, vol. 23, no. 3, pp. 1313--1351, 2018.
[5]
L. Bao, D. Ye, Z. Xing, X. Xia, and X. Wang, "Activityspace: a remembrance framework to support interapplication information needs," in Automated Software Engineering (ASSE), 2015 30th IEEE/ACM International Conference on. IEEE, 2015, pp. 864--869.
[6]
A. J. Ko, H. Aung, and B. A. Myers, "Eliciting design requirements for maintenance-oriented ides: a detailed study of corrective and perfective maintenance tasks," in Proceedings of the 27th international conference on Software engineering. ACM, 2005, pp. 126--135.
[7]
L. Ponzanelli, G. Bavota, A. Mocci, R. Oliveto, M. Di Penta, S. C. Haiduc, B. Russo, and M. Lanza, "Automatic identification and classification of software development video tutorial fragments," IEEE Transactions on Software Engineering, no. 1, pp. 1--1, 2017.
[8]
L. Bao, Z. Xing, X. Xia, and D. Lo, "Vt-revolution: Interactive programming video tutorial authoring and watching system," IEEE Transactions on Software Engineering, 2018.
[9]
M. Hilton and A. Begel, "A study of the organizational dynamics of software teams," in Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice. ACM, 2018, pp. 191--200.
[10]
X. Xia, L. Bao, D. Lo, P. S. Kochhar, A. E. Hassan, and Z. Xing, "What do developers search for on the web?" Empirical Software Engineering, vol. 22, no. 6, pp. 3149--3185, 2017.
[11]
L. Bao, J. Li, Z. Xing, X. Wang, X. Xia, and B. Zhou, "Extracting and analyzing time-series hci data from screen-captured task videos," Empirical Software Engineering, vol. 22, no. 1, pp. 134--174, 2017.
[12]
J. Lawrance, C. Bogart, M. Burnett, R. Bellamy, K. Rector, and S. D. Fleming, "How programmers debug, revisited: An information foraging theory perspective," IEEE Transactions on Software Engineering, vol. 39, no. 2, pp. 197--215, 2013.
[13]
J. Sillito, K. De Voider, B. Fisher, and G. Murphy, "Managing software change tasks: An exploratory study," in Empirical Software Engineering, 2005. 2005 International Symposium on. IEEE, 2005, pp. 10-pp.
[14]
A. J. Ko, B. A. Myers, M. J. Coblenz, and H. H. Aung, "An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks," IEEE Transactions on software engineering, vol. 32, no. 12, pp. 971--987, 2006.
[15]
D. Piorkowski, S. D. Fleming, C. Scaffidi, L. John, C. Bogart, B. E. John, M. Burnett, and R. Bellamy, "Modeling programmer navigation: A head-to-head empirical evaluation of predictive models," in Visual Languages and Human-Centric Computing (VL/HCC), 2011 IEEE Symposium on. IEEE, 2011, pp. 109--116.
[16]
T. Fritz, D. C. Shepherd, K. Kevic, W. Snipes, and C. Bräunlich, "Developers' code context models for change tasks," in Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 2014, pp. 7--18.
[17]
A. J. Ko, H. H. Aung, and B. A. Myers, "Design requirements for more flexible structured editors from a study of programmers' text editing," in CHI'05 extended abstracts on human factors in computing systems. ACM, 2005, pp. 1557--1560.
[18]
P. Dewan, P. Agarwal, G. Shroff, and R. Hegde, "Distributed side-by-side programming," in Proceedings of the 2009 ICSE workshop on cooperative and human aspects on software engineering. IEEE Computer Society, 2009, pp. 48--55.
[19]
A. Hurst, S. E. Hudson, and J. Mankoff, "Automatically identifying targets users interact with during real world tasks," in Proceedings of the 15th international conference on Intelligent user interfaces. ACM, 2010, pp. 11--20.
[20]
T.-H. Chang, T. Yeh, and R. Miller, "Associating the visual representation of user interfaces with their internal structures and metadata," in Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 2011, pp. 245--256.
[21]
L. Ponzanelli, G. Bavota, A. Mocci, M. Di Penta, R. Oliveto, B. Russo, S. Haiduc, and M. Lanza, "Codetube: extracting relevant fragments from software development video tutorials," in Proceedings of the 38th International Conference on Software Engineering Companion. ACM, 2016, pp. 645--648.
[22]
S. Mori, H. Nishida, and H. Yamada, Optical character recognition. John Wiley & Sons, Inc., 1999.
[23]
K. Simonyan and A. Zisserman, "Two-stream convolutional networks for action recognition in videos," in Advances in neural information processing systems, 2014, pp. 568--576.
[24]
J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, "Long-term recurrent convolutional networks for visual recognition and description," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 2625--2634.
[25]
S. Ji, W. Xu, M. Yang, and K. Yu, "3d convolutional neural networks for human action recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221--231, 2013.
[26]
N. Banovic, T. Grossman, J. Matejka, and G. Fitzmaurice, "Waken: reverse engineering usage information and interface structure from software videos," in Proceedings of the 25th annual ACM symposium on User interface software and technology. ACM, 2012, pp. 83--92.
[27]
C. Nguyen and F. Liu, "Making software tutorial video responsive," in Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 2015, pp. 1565--1568.
[28]
A. Mahendran and A. Vedaldi, "Understanding deep image representations by inverting them," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5188--5196.
[29]
T.-J. K. P. Monserrat, S. Zhao, K. McGee, and A. V. Pandey, "Notevideo: facilitating navigation of blackboard-style lecture videos," in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2013, pp. 1139--1148.
[30]
K. Khandwala and P. J. Guo, "Codemotion: expanding the design space of learner interactions with computer programming tutorial videos," in Proceedings of the Fifth Annual ACM Conference on Learning at Scale. ACM, 2018, p. 57.
[31]
S. Pongnumkul, M. Dontcheva, W. Li, J. Wang, L. Bourdev, S. Avidan, and M. F. Cohen, "Pause-and-play: automatically linking screencast video tutorials with applications," in Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 2011, pp. 135--144.
[32]
https://scikit-image.org, august 24, 2018.
[33]
C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, "Inception-v4, inception-resnet and the impact of residual connections on learning." in AAAI, vol. 4, 2017, p. 12.
[34]
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770--778.
[35]
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for computer vision," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818--2826.
[36]
G. Koch, R. Zemel, and R. Salakhutdinov, "Siamese neural networks for one-shot image recognition," in ICML Deep Learning Workshop, vol. 2, 2015.
[37]
https://developers.google.com/youtube/v3/, august 24, 2018.
[38]
https://rg3.github.io/youtube-dl/, august 24, 2018.
[39]
https://opencv.org, august 24, 2018.
[40]
https://www.tensorflow.org, august 24, 2018.
[41]
P. Rodeghero, C. McMillan, P. W. McBurney, N. Bosch, and S. D'Mello, "Improving automated source code summarization via an eye-tracking study of programmers," in Proceedings of the 36th International Conference on Software Engineering. ACM, 2014, pp. 390--401.
[42]
T. R. Shaffer, J. L. Wise, B. M. Walters, S. C. Müller, M. Falcone, and B. Sharif, "itrace: Enabling eye tracking on software artifacts within the ide to support software engineering tasks," in Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ACM, 2015, pp. 954--957.
[43]
J. Siegmund, C. Kästner, S. Apel, C. Parnin, A. Bethmann, T. Leich, G. Saake, and A. Brechmann, "Understanding understanding source code with functional magnetic resonance imaging," in Proceedings of the 36th International Conference on Software Engineering. ACM, 2014, pp. 378--389.
[44]
D. M. Hilbert and D. F. Redmiles, "Extracting usability information from user interface events," ACM Computing Surveys (CSUR), vol. 32, no. 4, pp. 384--421, 2000.
[45]
J. H. Kim, D. V. Gunn, E. Schuh, B. Phillips, R. J. Pagulayan, and D. Wixon, "Tracking real-time user experience (true): a comprehensive instrumentation solution for complex systems," in Proceedings of the SIGCHI conference on Human Factors in Computing Systems. ACM, 2008, pp. 443--452.
[46]
J. Kim, P. J. Guo, C. J. Cai, S.-W. D. Li, K. Z. Gajos, and R. C. Miller, "Data-driven interaction techniques for improving navigation of educational videos," in Proceedings of the 27th annual ACM symposium on User interface software and technology. ACM, 2014, pp. 563--572.
[47]
L. Ponzanelli, G. Bavota, A. Mocci, M. Di Penta, R. Oliveto, M. Hasan, B. Russo, S. Haiduc, and M. Lanza, "Too long; didn't watch!: extracting relevant fragments from software development video tutorials," in Proceedings of the 38th International Conference on Software Engineering. ACM, 2016, pp. 261--272.
[48]
J. Escobar-Avila, E. Parra, and S. Haiduc, "Text retrieval-based tagging of software engineering video tutorials," in Software Engineering Companion (ICSE-C), 2017 IEEE/ACM 39th International Conference on. IEEE, 2017, pp. 341--343.
[49]
S. Yadid and E. Yahav, "Extracting code from programming tutorial videos," in Proceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. ACM, 2016, pp. 98--111.
[50]
J. Ott, A. Atchison, P. Harnack, A. Bergh, and E. Linstead, "A deep learning approach to identifying source code in images and video," 2018.
[51]
P. Moslehi, B. Adams, and J. Rilling, "Feature location using crowd-based screencasts," 2018.
[52]
J. Yamato, J. Ohya, and K. Ishii, "Recognizing human action in time-sequential images using hidden markov model," in Computer Vision and Pattern Recognition, 1992. Proceedings CVPR'92., 1992 IEEE Computer Society Conference on. IEEE, 1992, pp. 379--385.
[53]
J. C. Niebles, C.-W. Chen, and L. Fei-Fei, "Modeling temporal structure of decomposable motion segments for activity classification," in European conference on computer vision. Springer, 2010, pp. 392--405.
[54]
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, "Learning spatiotemporal features with 3d convolutional networks," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 4489--4497.
[55]
D. Gerónimo and H. Kjellström, "Unsupervised surveillance video retrieval based on human action and appearance," in Pattern Recognition (ICPR), 2014 22nd International Conference on. IEEE, 2014, pp. 4630--4635.
[56]
L. Shao, S. Jones, and X. Li, "Efficient search and localization of human actions in video databases," IEEE Transactions on Circuits and Systems for Video Technology, vol. 24, no. 3, pp. 504--512, 2014.
[57]
A. Meidan and R. B. Sella, "Automatic video surveillance system and method," Feb. 2 2016, uS Patent 9,253,453.
[58]
S. Pushparaj and S. Arumugam, "Using 3d convolutional neural network in surveillance videos for recognizing human actions." Int. Arab J. Inf. Technol., vol. 15, no. 4, pp. 693--700, 2018.
[59]
B. M. H. Alhafidh, A. I. Daood, and W. H. Allen, "Comparison of classifiers for prediction of human actions in a smart home," in Internet-of-Things Design and Implementation (IoTDI), 2018 IEEE/ACM Third International Conference on. IEEE, 2018, pp. 287--288.
[60]
B. Alhafidh and W. Allen, "Design and simulation of a smart home managed by an intelligent self-adaptive system," International Journal of Engineering Research and Applications, vol. 6, no. 8, pp. 64--90, 2016.
[61]
B. Xu, D. Ye, Z. Xing, X. Xia, G. Chen, and S. Li, "Predicting semantically linkable knowledge in developer online forums via convolutional neural network," in Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, 2016, pp. 51--62.
[62]
M. White, M. Tufano, C. Vendome, and D. Poshyvanyk, "Deep learning code fragments for code clone detection," in Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, 2016, pp. 87--98.
[63]
Q. Luo, D. Poshyvanyk, and M. Grechanik, "Mining performance regression inducing code changes in evolving software," in Mining Software Repositories (MSR), 2016 IEEE/ACM 13th Working Conference on. IEEE, 2016, pp. 25--36.
[64]
J. Li, A. Sun, and Z. Xing, "Learning to answer programming questions with software documentation through social context embedding," Information Sciences, vol. 448, pp. 36--52, 2018.
[65]
M. Choetkiertikul, H. K. Dam, T. Tran, T. Pham, and A. Ghose, "Predicting components for issue reports using deep learning with information retrieval," in Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. ACM, 2018, pp. 244--245.
[66]
L. Mou, G. Li, L. Zhang, T. Wang, and Z. Jin, "Convolutional neural networks over tree structures for programming language processing." in AAAI, vol. 2, no. 3, 2016, p. 4.
[67]
C. Chen, T. Su, G. Meng, Z. Xing, and Y. Liu, "From ui design image to gui skeleton: a neural machine translator to bootstrap mobile gui implementation," in Proceedings of the 40th International Conference on Software Engineering. ACM, 2018, pp. 665--676.
[68]
K. Moran, B. Li, C. Bernal-Cárdenas, D. Jelf, and D. Poshyvanyk, "Automated reporting of gui design violations for mobile apps," arXiv preprint arXiv:1802.04732, 2018.
[69]
K. Moran, C. Watson, J. Hoskins, G. Purnell, and D. Poshyvanyk, "Detecting and summarizing gui changes in evolving mobile apps," arXiv preprint arXiv:1807.09440, 2018.

Cited By

View all
  • (2023)AG3: Automated Game GUI Text Glitch Detection Based on Computer VisionProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613867(1879-1890)Online publication date: 30-Nov-2023
  • (2023)DeepPatch: Maintaining Deep Learning Model Programs to Retain Standard Accuracy with Substantial Robustness ImprovementACM Transactions on Software Engineering and Methodology10.1145/360460932:6(1-49)Online publication date: 14-Jun-2023
  • (2023)Diversity Awareness in Software Engineering Participant ResearchProceedings of the 45th International Conference on Software Engineering: Software Engineering in Society10.1109/ICSE-SEIS58686.2023.00017(120-131)Online publication date: 17-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '19: Proceedings of the 41st International Conference on Software Engineering
May 2019
1318 pages

Sponsors

Publisher

IEEE Press

Publication History

Published: 25 May 2019

Check for updates

Author Tags

  1. action recognition
  2. deep learning
  3. programming screencast

Qualifiers

  • Research-article

Conference

ICSE '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)AG3: Automated Game GUI Text Glitch Detection Based on Computer VisionProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613867(1879-1890)Online publication date: 30-Nov-2023
  • (2023)DeepPatch: Maintaining Deep Learning Model Programs to Retain Standard Accuracy with Substantial Robustness ImprovementACM Transactions on Software Engineering and Methodology10.1145/360460932:6(1-49)Online publication date: 14-Jun-2023
  • (2023)Diversity Awareness in Software Engineering Participant ResearchProceedings of the 45th International Conference on Software Engineering: Software Engineering in Society10.1109/ICSE-SEIS58686.2023.00017(120-131)Online publication date: 17-May-2023
  • (2022)GIFdroidProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510048(1045-1057)Online publication date: 21-May-2022
  • (2022)XCode: Towards Cross-Language Code Representation with Large-Scale Pre-TrainingACM Transactions on Software Engineering and Methodology10.1145/350669631:3(1-44)Online publication date: 9-Apr-2022
  • (2022)A Systematic Literature Review on the Use of Deep Learning in Software Engineering ResearchACM Transactions on Software Engineering and Methodology10.1145/348527531:2(1-58)Online publication date: 4-Mar-2022
  • (2022)A user survey on the adoption of crowd-based software engineering instructional screencasts by the new generation of software developersJournal of Systems and Software10.1016/j.jss.2021.111144185:COnline publication date: 1-Mar-2022
  • (2021)Etna: Harvesting Action Graphs from WebsitesThe 34th Annual ACM Symposium on User Interface Software and Technology10.1145/3472749.3474752(312-331)Online publication date: 10-Oct-2021
  • (2021)GLIB: towards automated test oracle for graphically-rich applicationsProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3468264.3468586(1093-1104)Online publication date: 20-Aug-2021
  • (2021)It Takes Two to TangoProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00091(957-969)Online publication date: 22-May-2021
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media