research-article

Automatic Generation of Interactive Nonlinear Video for Online Apparel Shopping Navigation

Authors:

Changyuan Yang,

Jiayi YaoAuthors Info & Claims

IEEE Transactions on Multimedia, Volume 26

Pages 474 - 486

https://doi.org/10.1109/TMM.2023.3266615

Published: 12 April 2023 Publication History

Abstract

We present an automatic generation pipeline of interactive nonlinear video for online apparel shopping navigation. Our approach was inspired by Google's “Messy Middle” theory, which suggests that people mentally are faced with two tasks—exploration and evaluation—before purchasing online. Given a set of apparel product presentation videos, our navigation UI organizes them to optimize users' product exploration and automatically generates interactive videos for users' product evaluation. To support automatic methods, we proposed a video clustering similarity (<inline-formula><tex-math notation="LaTeX">$\operatorname{CSIM}$</tex-math></inline-formula>) and a camera movement similarity (<inline-formula><tex-math notation="LaTeX">$\operatorname{MSIM}$</tex-math></inline-formula>), as well as a comparative video generation algorithm for product recommendation, presentation, and comparison. To evaluate our pipeline's effectiveness, we conducted several user studies. The results showed that our pipeline can help users complete the consumption process more efficiently, making it easier for them to understand and choose a product.

References

[1]

S. Chevalier, “Fashion E-Commerce in the United States - statistics & facts,” Mar. 23, 2023. Accessed: May 2023. [Online]. Available: https://www.statista.com/topics/3481/fashion-e-commerce-in-the-united-states/#topicHeader__wrapper

[2]

D. R. Anderson and M. C. Davidson, “Receptive versus interactive video screens: A role for the brain's default mode network in learning from media,” Comput. Hum. Behav., vol. 99, pp. 168–180, 2019.

Digital Library

[3]

D. Zhang, L. Zhou, R. O. Briggs, and J. F. Nunamaker Jr, “Instructional video in e-learning: Assessing the impact of interactive video on learning effectiveness,” Inf. Manage., vol. 43, no. 1, pp. 15–27, 2006.

[4]

J. Trout and B. Christie, “Interactive video games in physical education,” J. Phys. Educ., Recreation Dance, vol. 78, no. 5, pp. 29–45, 2007.

[5]

C. Gu et al., “What do users care about? research on user behavior of mobile interactive video advertising,” Heliyon, vol. 8, no. 10, 2022, Art. no. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2405844022021983

[6]

M. K. Afıfy, “Effect of interactive video length within e-learning environments on cognitive load, cognitive achievement and retention of learning,” Turkish Online J. Distance Educ., vol. 21, no. 4, pp. 68–89, 2020.

[7]

A. E. A. Rollings, Fundamentals of Game Design. Englewood Cliffs, NJ, USA: Prentice Hall, 2006.

[8]

P. T. Montfort, Nick, and Urbano, “A quarta era da ficção interactiva archived,” in The Wayback Machine Nada, vol. 6. San Francisco, CA, USA: Internet Arch., 2008.

[9]

A. Rennie, J. Protheroe, C. Charron, and G. Breatnach, “Decoding decision: Making sense of the messy middle,” Think With Google, Jul. 2020. Accessed: Aug. 2022. [Online]. Available: https://www.thinkwithgoogle.com/consumer-insights/consumerjourney/navigating-purchase-behavior-and-decision-making/

[10]

D. A. Griffith, R. F. Krampf, and J. W. Palmer, “The role of interface in electronic commerce: Consumer involvement with print versus on-line catalogs,” Int. J. Electron. Commerce, vol. 5, no. 4, pp. 135–153, 2001.

Digital Library

[11]

C. Orús, R. Gurrea, and C. Flavián, “Facilitating imaginations through online product presentation videos: Effects on imagery fluency, product attitude and purchase intention,” Electron. Commerce Res., vol. 17, no. 4, pp. 661–700, 2017.

Digital Library

[12]

A. Kumar and Y. Tan, “The demand effects of joint product advertising in online videos,” Manage. Sci., vol. 61, no. 8, pp. 1921–1937, 2015.

Digital Library

[13]

T. Mei, X.-S. Hua, L. Yang, and S. Li, “VideoSense: Towards effective online video advertising,” in Proc. 15th ACM Int. Conf. Multimedia, 2007, pp. 1075–1084.

Digital Library

[14]

T. Mei, X. -S. Hua, and S. Li, “VideoSense: A contextual in-video advertising system,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 12, pp. 1866–1879, Dec. 2009.

[15]

Z.-Q. Cheng, Y. Liu, X. Wu, and X. Hua, “Video eCommerce: Towards online video advertising,” in Proc. 24th ACM Int. Conf. Multimedia, 2016, pp. 1365–1374.

Digital Library

[16]

Z.-Q. Cheng, X. Wu, Y. Liu, and X.-S. Hua, “Video eCommerce++: Toward large scale online video advertising,” IEEE Trans. Multimedia, vol. 19, pp. 1170–1183, 2017.

Digital Library

[17]

Z.-Q. Cheng, X. Wu, and Y. Liu, “Video2Shop: Exact matching clothes in videos to online shopping images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 4048–4056.

[18]

B. Meixner, “Hypervideos and interactive multimedia presentations,” ACM Comput. Surv., vol. 50, no. 1, pp. 1–34, 2017.

Digital Library

[19]

P. A. Nguyen, Q. Li, Z. Q. Cheng, Y. J. Lu, and C. W. Ngo, “Vireo, trecvid2017: Video hyperlinking,” in Proc. 17th Annu. TREC Video Retrieval Eval., 2017.

[20]

Z. Q. Cheng, H. Zhang, X. Wu, and C. Ngo, “On the selection of anchors and targets for video hyperlinking,” in Proc. ACM Int. Conf. Multimedia Retrieval, 2017, pp. 287–293.

[21]

R. A. Deyo et al., “Involving patients in clinical decisions: Impact of an interactive video program on use of back surgery,” Med. Care, vol. 38, pp. 959–969, 2000.

[22]

S. S. Sundar and J. Kim, “Interactivity and persuasion: Influencing attitudes with information and involvement,” J. Interactive Advertising, vol. 5, no. 2, pp. 5–18, 2005.

[23]

S. Saarinen, V. Mäkelä, P. Kallioniemi, J. Hakulinen, and M. Turunen, “Guidelines for designing interactive omnidirectional video applications,” in Proc. 16th IFIP Conf. Hum.- Comput. Interact., 2017, pp. 263–272.

[24]

R. Panda, N. C. Mithun, and A. K. Roy-Chowdhury, “Diversity-aware multi-video summarization,” IEEE Trans. Image Process., vol. 26, no. 10, pp. 4712–4724, Oct. 2017.

Digital Library

[25]

S. Messaoud et al., “DeepQAMVS: Query-aware hierarchical pointer networks for multi-video summarization,” in Proc. 44th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2021, pp. 1389–1399.

Digital Library

[26]

Y.-T. Liu, Y.-J. Li, F.-E. Yang, S.-F. Chen, and Y.-C. F. Wang, “Learning hierarchical self-attention for video summarization,” in Proc. IEEE Int. Conf. Image Process. 2019, pp. 3377–3381.

[27]

D. Sahrawat et al., “Video summarization using global attention with memory network and LSTM,” in Proc. IEEE 5th Int. Conf. Multimedia Big Data2019, pp. 231–236.

[28]

K. Zhang, W.-L. Chao, F. Sha, and K. Grauman, “Video summarization with long short-term memory,” in Proc. 14th Eur. Conf. Comput. Vis., 2016, pp. 766–782.

[29]

S. Cai, W. Zuo, L. S. Davis, and L. Zhang, “Weakly-supervised video summarization using variational encoder-decoder and web prior,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 184–200.

[30]

Z. Ji, K. Xiong, Y. Pang, and X. Li, “Video summarization with attention-based encoder–decoder networks,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 6, pp. 1709–1717, Jun. 2020.

[31]

L. Feng, Z. Li, Z. Kuang, and W. Zhang, “Extractive video summarizer with memory augmented neural networks,” in Proc. 26th ACM Int. Conf. Multimedia, 2018, pp. 976–983.

Digital Library

[32]

H. Wei et al., “Video summarization via semantic attended networks,” in Proc. 32nd AAAI Conf. Artif. Intell., 30th Innov. Appl. Arti. Intell.Conf., 8th AAAI Symp. Educ. Adv. Artif. Intell., 2018, pp. 216–223.

[33]

B. A. Plummer, M. Brown, and S. Lazebnik, “Enhancing video summarization via vision-language embedding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5781–5789.

[34]

A. B. Vasudevan, M. Gygli, A. Volokitin, and L. Van Gool, “Query-adaptive video summarization via quality-aware relevance estimation,” in Proc. 25th ACM Int. Conf. Multimedia, 2017, pp. 582–590.

Digital Library

[35]

Y. Cong, J. Yuan, and J. Luo, “Towards scalable summarization of consumer videos via sparse dictionary selection,” IEEE Trans. Multimedia, vol. 14, 1, pp. 66–75, 2012.

Digital Library

[36]

F. Dornaika and I. K. Aldine, “Decremental sparse modeling representative selection for prototype selection,” Pattern Recognit., vol. 48, no. 11, pp. 3714–3727, 2015.

Digital Library

[37]

E. Elhamifar, G. Sapiro, and R. Vidal, “See all by looking at a few: Sparse modeling for finding representative objects,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2012, pp. 1600–1607.

[38]

B. Zhao and E. P. Xing, “Quasi real-time summarization for consumer videos,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 2513–2520.

[39]

S. E. F. De Avila, A. P. B. Lopes, A. da Luz Jr, and A. de Albuquerque Araújo, “VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method,” Pattern Recognit. Lett., vol. 32, no. 1, pp. 56–68, 2011.

Digital Library

[40]

G. Guan et al., “A top-down approach for video summarization,” ACM Trans. Multimedia Comput., Commun., Appl., vol. 11, no. 1, pp. 1–21, 2014.

Digital Library

[41]

G. Pan et al., “A bottom-up summarization algorithm for videos in the wild,” EURASIP J. Adv. Signal Process., vol. 2019, no. 1, pp. 1–11, 2019.

[42]

Y. Pritch, A. Rav-Acha, A. Gutman, and S. Peleg, “Webcam synopsis: Peeking around the world,” in Proc. IEEE 11th Int. Conf. Comput. Vis., 2007, pp. 1–8.

[43]

W.-S. Chu, Y. Song, and A. Jaimes, “Video co-summarization: Video summarization by visual co-occurrence,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 3584–3592.

[44]

M. Rochan, L. Ye, and Y. Wang, “Video summarization using fully convolutional sequence networks,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 347–363.

[45]

T.-J. Fu, S.-H. Tai, and H.-T. Chen, “Attentive and adversarial learning for video summarization,” in Proc. IEEE Winter Conf. Appl. Comput. Vis., 2019, pp. 1579–1587.

[46]

B. Mahasseni, M. Lam, and S. Todorovic, “Unsupervised video summarization with adversarial LSTM networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 202–211.

[47]

M. Rochan and Y. Wang, “Video summarization by learning from unpaired data,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 7902–7911.

[48]

Y. Zhang et al., “Dilated temporal relational adversarial network for generic video summarization,” Multimedia Tools Appl., vol. 78, no. 24, pp. 35237–35261, 2019.

[49]

Y. Zhang, M. Kampffmeyer, X. Liang, M. Tan, and E. P. Xing, “Query-conditioned three-player adversarial network for video summarization,” 2018, arXiv:1807.06677.

[50]

S. Lan, R. Panda, Q. Zhu, and A. K. Roy-Chowdhury, “FFNet: Video fast-forwarding via reinforcement learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 6771–6780.

[51]

R. Paulus, C. Xiong, and R. Socher, “A deep reinforced model for abstractive summarization,” 2018. [Online]. Available: https://openreview.net/forum?id=HkAClQgA-

[52]

Y. Zhang, M. Kampffmeyer, X. Zhao, and M. Tan, “Deep reinforcement learning for query-conditioned video summarization,” Appl. Sci., vol. 9, no. 4, 2019, Art. no.

[53]

K. Zhou, Y. Qiao, and T. Xiang, “Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward,” in Proc. AAAI Conf. Artif. Intell., 2018.

[54]

Y. Hoshen, G. Ben-Artzi, and S. Peleg, “Wisdom of the crowd in egocentric video curation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, 2014, pp. 573–579.

[55]

A. Mahapatra, P. K. Sa, B. Majhi, and S. Padhy, “MVS: A multi-view video synopsis framework,” Signal Process.: Image Commun., vol. 42, pp. 31–44, 2016.

Digital Library

[56]

S.-H. Ou, C.-H. Lee, V. S. Somayazulu, Y.-K. Chen, and S.-Y. Chien, “On-line multi-view video summarization for wireless video sensor network,” IEEE J. Sel. Topics Signal Process., vol. 9, no. 1, pp. 165–179, Feb. 2015.

[57]

S.-H. Ou et al., “Communication-efficient multi-view keyframe extraction in distributed video sensors,” in Proc. IEEE Vis. Commun. Image Process. Conf., 2014, pp. 13–16.

[58]

J. Zhu, S. Liao, and S. Z. Li, “Multicamera joint video synopsis,” IEEE Trans. on Circuits and Syst. for Video Technol., vol. 26, no. 6, pp. 1058–1069, Jun. 2016.

[59]

I. Arev, H. S. Park, Y. Sheikh, J. Hodgins, and A. Shamir, “Automatic editing of footage from multiple social cameras,” ACM Trans. Graph., vol. 33, no. 4, pp. 1–11, 2014.

Digital Library

[60]

Y. Zhang and R. Zimmermann, “Efficient summarization from multiple georeferenced user-generated videos,” IEEE Trans. Multimedia, vol. 18, pp. 418–431, 2016.

Digital Library

[61]

Z. Ji, Y. Zhang, Y. Pang, X. Li, and J. Pan, “Multi-video summarization with query-dependent weighted archetypal analysis,” Neurocomputing, vol. 332, pp. 406–416, 2019.

Digital Library

[62]

M. Mills, J. Cohen, and Y. Y. Wong, “A magnifier tool for video data,” in Proc. SIGCHI Conf. Hum. Factors Comput. Syst., 1992, pp. 93–98.

Digital Library

[63]

J. P. Collomosse, G. McNeill, and Y. Qian, “Storyboard sketches for content based video retrieval,” in Proc. IEEE 12th Int. Conf. Comput. Vis., 2009, pp. 245–252.

[64]

M. Furini, S. Mirri, and M. Montangero, “TagLecture: The gamification of video lecture indexing through quality-based tags,” in Proc. IEEE Symp. Comput. Commun., 2017, pp. 122–127.

[65]

A. A. Patwardhan, S. Das, S. Varshney, M. S. Desarkar, and D. P. Dogra, “ViTag: Automatic video tagging using segmentation and conceptual inference,” in Proc. IEEE 5th Int. Conf. Multimedia Big Data, 2019, pp. 271–276.

[66]

D. Fernández et al., “ViTS: Video tagging system from massive web multimedia collections,” in Proc. IEEE Int. Conf. Comput. Vis. Workshops, 2017, pp. 337–346.

[67]

A. Truong, P. Chi, D. Salesin, I. Essa, and M. Agrawala, “Automatic generation of two-level hierarchical tutorials from instructional makeup videos,” in Proc. CHI Conf. Hum. Factors Comput. Syst., 2021, pp. 1–16.

[68]

S. Bano and A. Cavallaro, “Discovery and organization of multi-camera user-generated videos of the same event,” Inf. Sci., vol. 302, pp. 108–121, 2015.

Digital Library

[69]

A. D. Sokolova, A. S. Kharchevnikova, and A. V. Savchenko, “Organizing multimedia data in video surveillance systems based on face verification with convolutional neural networks,” in Proc. Int. Conf. Anal. Images, Social Netw. Texts, 2017, pp. 223–230.

[70]

S.-S. Cheung and A. Zakhor, “Efficient video similarity measurement with video signature,” IEEE Trans. on Circuits Syst. Video Technol., vol. 13, no. 1, pp. 59–74, Jan. 2003.

Digital Library

[71]

L. Yuan et al., “Central similarity quantization for efficient image and video retrieval,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 3083–3092.

[72]

S. Bekhet and A. Ahmed, “Evaluation of similarity measures for video retrieval,” Multimedia Tools Appl., vol. 79, no. 9, pp. 6265–6278, 2020.

[73]

M. Hamroun, S. Lajmi, H. Nicolas, and I. Amous, “VISEN: A video interactive retrieval engine based on semantic network in large video collections,” in Proc. 23rd Int. Database Appl. Eng. Symp., 2019, pp. 1–10.

[74]

Y. Huang et al., “Real-time video recommendation exploration,” in Proc. Int. Conf. Manage. Data, 2016, pp. 35–46.

[75]

Y. Deldjoo et al., “Content-based video recommendation system based on stylistic visual features,” J. Data Semantics, vol. 5, no. 2, pp. 99–113, 2016.

[76]

H. Yan et al., “Multi-site user behavior modeling and its application in video recommendation,” IEEE Trans. Knowl. Data Eng., vol. 33, no. 1, pp. 180–193, Jan. 2021.

Digital Library

[77]

G. Adomavicius and A. Tuzhilin, “Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 6, pp. 734–749, Jun. 2005.

Digital Library

[78]

Y. Chen, S. Mensah, F. Ma, H. Wang, and Z. Jiang, “Collaborative filtering grounded on knowledge graphs,” Pattern Recognit. Lett., vol. 151, pp. 55–61, 2021.

Digital Library

[79]

P. Resnick, “GruopLens: An open architecture for collaborative filtering of netnews,” in Proc. ACM Conf. Comput. Supported Cooperative Work, 1994, pp. 175–186.

[80]

R. Salakhutdinov and A. Mnih, “Probabilistic matrix factorization,” in “Proc. 20th Int. Conf. Neural Inf. Process. Syst.,” 2007, pp. 1257–1264.

[81]

B. Sarwar, “Item-based collaborative filtering recommendation algorithms,” in Proc. 10th Int. World Wide Web Conf., 2001, pp. 285–295.

Digital Library

[82]

G.-L. Sun, Z.-Q. Cheng, X. Wu, and Q. Peng, “Personalized clothing recommendation combining user social circle and fashion style consistency,” Multimedia Tools Appl., vol. 77, no. 14, pp. 17731–17754, 2018.

Digital Library

[83]

X. Qian, H. Feng, G. Zhao, and T. Mei, “Personalized recommendation combining user interest and social circle,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 7, pp. 1763–1777, Jul. 2014.

[84]

P. Pirolli and S. Card, “Information foraging,” Psychol. Rev., vol. 106, no. 4, 1999, Art. no.

[85]

X. Liu, J. Li, J. Wang, and Z. Liu, “MMFashion: An open-source toolbox for visual fashion analysis,” in Proc. 29th ACM Int. Conf. Multimedia, 2021, pp. 3755–3758.

[86]

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proc. 18th Int. Conf. Med. Image Comput. Comput.- Assist. Interv., 2015, pp. 234–241.

[87]

Y. Wang, E. Ghumare, R. Vandenberghe, and P. Dupont, “Comparison of different generalizations of clustering coefficient and local efficiency for weighted undirected graphs,” Neural Comput., vol. 29, no. 2, pp. 313–331, 2017.

Digital Library

[88]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.

[89]

S. Lee, S. Oh, C. Jung, and C. Kim, “A global-local embedding module for fashion landmark detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshops, 2019, pp. 3153–3156.

[90]

J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” 2018, arXiv:1804.02767.

[91]

H. Fang, S. Xie, Y. W. Tai, and C. Lu, “RMPE: Regional multi-person pose estimation,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2353–2362.

[92]

Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in Proc. 16th Eur. Conf. Comput. Vis., 2020, pp. 402–419.

[93]

A. Bangor, P. Kortum, and J. Miller, “Determining what individual SUS scores mean: Adding an adjective rating scale,” J. Usability Stud., vol. 4, no. 3, pp. 114–123, 2009.

Digital Library

Index Terms

Automatic Generation of Interactive Nonlinear Video for Online Apparel Shopping Navigation

Index terms have been assigned to the content through auto-classification.

Recommendations

Item-level RFID for enhancement of customer shopping experience in apparel retail

We use item-level RFID to enhance customer shopping experience in apparel retail.We install RFID devices to collect customer shopping behaviour.We implement intelligent fuzzy screening algorithms to analyze customer preferences.We use the results for ...
Consumer perceptions of apparel products in internet shopping
An interactive exploratory search system for on-line apparel shopping
VINCI '15: Proceedings of the 8th International Symposium on Visual Information Communication and Interaction

Many people (especially women) tend to take relatively longer time for shopping. This paper presents a system for product retrieval inspired by psychology of women's shopping activity, and an implementation of the system for apparel products. Our study ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia

IEEE Transactions on Multimedia Volume 26, Issue

2024

11427 pages

ISSN:1520-9210

Issue’s Table of Contents

1520-9210 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 12 April 2023

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents