Automatic Generation of Interactive Nonlinear Video for Online Apparel Shopping Navigation
Abstract
We present an automatic generation pipeline of interactive nonlinear video for online apparel shopping navigation. Our approach was inspired by Google's “Messy Middle” theory, which suggests that people mentally are faced with two tasks—exploration and evaluation—before purchasing online. Given a set of apparel product presentation videos, our navigation UI organizes them to optimize users' product exploration and automatically generates interactive videos for users' product evaluation. To support automatic methods, we proposed a video clustering similarity (<inline-formula><tex-math notation="LaTeX">$\operatorname{CSIM}$</tex-math></inline-formula>) and a camera movement similarity (<inline-formula><tex-math notation="LaTeX">$\operatorname{MSIM}$</tex-math></inline-formula>), as well as a comparative video generation algorithm for product recommendation, presentation, and comparison. To evaluate our pipeline's effectiveness, we conducted several user studies. The results showed that our pipeline can help users complete the consumption process more efficiently, making it easier for them to understand and choose a product.
References
[1]
S. Chevalier, “Fashion E-Commerce in the United States - statistics & facts,” Mar. 23, 2023. Accessed: May 2023. [Online]. Available: https://www.statista.com/topics/3481/fashion-e-commerce-in-the-united-states/#topicHeader__wrapper
[2]
D. R. Anderson and M. C. Davidson, “Receptive versus interactive video screens: A role for the brain's default mode network in learning from media,” Comput. Hum. Behav., vol. 99, pp. 168–180, 2019.
[3]
D. Zhang, L. Zhou, R. O. Briggs, and J. F. Nunamaker Jr, “Instructional video in e-learning: Assessing the impact of interactive video on learning effectiveness,” Inf. Manage., vol. 43, no. 1, pp. 15–27, 2006.
[4]
J. Trout and B. Christie, “Interactive video games in physical education,” J. Phys. Educ., Recreation Dance, vol. 78, no. 5, pp. 29–45, 2007.
[5]
C. Gu et al., “What do users care about? research on user behavior of mobile interactive video advertising,” Heliyon, vol. 8, no. 10, 2022, Art. no. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2405844022021983
[6]
M. K. Afıfy, “Effect of interactive video length within e-learning environments on cognitive load, cognitive achievement and retention of learning,” Turkish Online J. Distance Educ., vol. 21, no. 4, pp. 68–89, 2020.
[7]
A. E. A. Rollings, Fundamentals of Game Design. Englewood Cliffs, NJ, USA: Prentice Hall, 2006.
[8]
P. T. Montfort, Nick, and Urbano, “A quarta era da ficção interactiva archived,” in The Wayback Machine Nada, vol. 6. San Francisco, CA, USA: Internet Arch., 2008.
[9]
A. Rennie, J. Protheroe, C. Charron, and G. Breatnach, “Decoding decision: Making sense of the messy middle,” Think With Google, Jul. 2020. Accessed: Aug. 2022. [Online]. Available: https://www.thinkwithgoogle.com/consumer-insights/consumerjourney/navigating-purchase-behavior-and-decision-making/
[10]
D. A. Griffith, R. F. Krampf, and J. W. Palmer, “The role of interface in electronic commerce: Consumer involvement with print versus on-line catalogs,” Int. J. Electron. Commerce, vol. 5, no. 4, pp. 135–153, 2001.
[11]
C. Orús, R. Gurrea, and C. Flavián, “Facilitating imaginations through online product presentation videos: Effects on imagery fluency, product attitude and purchase intention,” Electron. Commerce Res., vol. 17, no. 4, pp. 661–700, 2017.
[12]
A. Kumar and Y. Tan, “The demand effects of joint product advertising in online videos,” Manage. Sci., vol. 61, no. 8, pp. 1921–1937, 2015.
[13]
T. Mei, X.-S. Hua, L. Yang, and S. Li, “VideoSense: Towards effective online video advertising,” in Proc. 15th ACM Int. Conf. Multimedia, 2007, pp. 1075–1084.
[14]
T. Mei, X. -S. Hua, and S. Li, “VideoSense: A contextual in-video advertising system,” IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 12, pp. 1866–1879, Dec. 2009.
[15]
Z.-Q. Cheng, Y. Liu, X. Wu, and X. Hua, “Video eCommerce: Towards online video advertising,” in Proc. 24th ACM Int. Conf. Multimedia, 2016, pp. 1365–1374.
[16]
Z.-Q. Cheng, X. Wu, Y. Liu, and X.-S. Hua, “Video eCommerce++: Toward large scale online video advertising,” IEEE Trans. Multimedia, vol. 19, pp. 1170–1183, 2017.
[17]
Z.-Q. Cheng, X. Wu, and Y. Liu, “Video2Shop: Exact matching clothes in videos to online shopping images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 4048–4056.
[18]
B. Meixner, “Hypervideos and interactive multimedia presentations,” ACM Comput. Surv., vol. 50, no. 1, pp. 1–34, 2017.
[19]
P. A. Nguyen, Q. Li, Z. Q. Cheng, Y. J. Lu, and C. W. Ngo, “Vireo, trecvid2017: Video hyperlinking,” in Proc. 17th Annu. TREC Video Retrieval Eval., 2017.
[20]
Z. Q. Cheng, H. Zhang, X. Wu, and C. Ngo, “On the selection of anchors and targets for video hyperlinking,” in Proc. ACM Int. Conf. Multimedia Retrieval, 2017, pp. 287–293.
[21]
R. A. Deyo et al., “Involving patients in clinical decisions: Impact of an interactive video program on use of back surgery,” Med. Care, vol. 38, pp. 959–969, 2000.
[22]
S. S. Sundar and J. Kim, “Interactivity and persuasion: Influencing attitudes with information and involvement,” J. Interactive Advertising, vol. 5, no. 2, pp. 5–18, 2005.
[23]
S. Saarinen, V. Mäkelä, P. Kallioniemi, J. Hakulinen, and M. Turunen, “Guidelines for designing interactive omnidirectional video applications,” in Proc. 16th IFIP Conf. Hum.- Comput. Interact., 2017, pp. 263–272.
[24]
R. Panda, N. C. Mithun, and A. K. Roy-Chowdhury, “Diversity-aware multi-video summarization,” IEEE Trans. Image Process., vol. 26, no. 10, pp. 4712–4724, Oct. 2017.
[25]
S. Messaoud et al., “DeepQAMVS: Query-aware hierarchical pointer networks for multi-video summarization,” in Proc. 44th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2021, pp. 1389–1399.
[26]
Y.-T. Liu, Y.-J. Li, F.-E. Yang, S.-F. Chen, and Y.-C. F. Wang, “Learning hierarchical self-attention for video summarization,” in Proc. IEEE Int. Conf. Image Process. 2019, pp. 3377–3381.
[27]
D. Sahrawat et al., “Video summarization using global attention with memory network and LSTM,” in Proc. IEEE 5th Int. Conf. Multimedia Big Data2019, pp. 231–236.
[28]
K. Zhang, W.-L. Chao, F. Sha, and K. Grauman, “Video summarization with long short-term memory,” in Proc. 14th Eur. Conf. Comput. Vis., 2016, pp. 766–782.
[29]
S. Cai, W. Zuo, L. S. Davis, and L. Zhang, “Weakly-supervised video summarization using variational encoder-decoder and web prior,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 184–200.
[30]
Z. Ji, K. Xiong, Y. Pang, and X. Li, “Video summarization with attention-based encoder–decoder networks,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 6, pp. 1709–1717, Jun. 2020.
[31]
L. Feng, Z. Li, Z. Kuang, and W. Zhang, “Extractive video summarizer with memory augmented neural networks,” in Proc. 26th ACM Int. Conf. Multimedia, 2018, pp. 976–983.
[32]
H. Wei et al., “Video summarization via semantic attended networks,” in Proc. 32nd AAAI Conf. Artif. Intell., 30th Innov. Appl. Arti. Intell.Conf., 8th AAAI Symp. Educ. Adv. Artif. Intell., 2018, pp. 216–223.
[33]
B. A. Plummer, M. Brown, and S. Lazebnik, “Enhancing video summarization via vision-language embedding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5781–5789.
[34]
A. B. Vasudevan, M. Gygli, A. Volokitin, and L. Van Gool, “Query-adaptive video summarization via quality-aware relevance estimation,” in Proc. 25th ACM Int. Conf. Multimedia, 2017, pp. 582–590.
[35]
Y. Cong, J. Yuan, and J. Luo, “Towards scalable summarization of consumer videos via sparse dictionary selection,” IEEE Trans. Multimedia, vol. 14, 1, pp. 66–75, 2012.
[36]
F. Dornaika and I. K. Aldine, “Decremental sparse modeling representative selection for prototype selection,” Pattern Recognit., vol. 48, no. 11, pp. 3714–3727, 2015.
[37]
E. Elhamifar, G. Sapiro, and R. Vidal, “See all by looking at a few: Sparse modeling for finding representative objects,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2012, pp. 1600–1607.
[38]
B. Zhao and E. P. Xing, “Quasi real-time summarization for consumer videos,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 2513–2520.
[39]
S. E. F. De Avila, A. P. B. Lopes, A. da Luz Jr, and A. de Albuquerque Araújo, “VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method,” Pattern Recognit. Lett., vol. 32, no. 1, pp. 56–68, 2011.
[40]
G. Guan et al., “A top-down approach for video summarization,” ACM Trans. Multimedia Comput., Commun., Appl., vol. 11, no. 1, pp. 1–21, 2014.
[41]
G. Pan et al., “A bottom-up summarization algorithm for videos in the wild,” EURASIP J. Adv. Signal Process., vol. 2019, no. 1, pp. 1–11, 2019.
[42]
Y. Pritch, A. Rav-Acha, A. Gutman, and S. Peleg, “Webcam synopsis: Peeking around the world,” in Proc. IEEE 11th Int. Conf. Comput. Vis., 2007, pp. 1–8.
[43]
W.-S. Chu, Y. Song, and A. Jaimes, “Video co-summarization: Video summarization by visual co-occurrence,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 3584–3592.
[44]
M. Rochan, L. Ye, and Y. Wang, “Video summarization using fully convolutional sequence networks,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 347–363.
[45]
T.-J. Fu, S.-H. Tai, and H.-T. Chen, “Attentive and adversarial learning for video summarization,” in Proc. IEEE Winter Conf. Appl. Comput. Vis., 2019, pp. 1579–1587.
[46]
B. Mahasseni, M. Lam, and S. Todorovic, “Unsupervised video summarization with adversarial LSTM networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 202–211.
[47]
M. Rochan and Y. Wang, “Video summarization by learning from unpaired data,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 7902–7911.
[48]
Y. Zhang et al., “Dilated temporal relational adversarial network for generic video summarization,” Multimedia Tools Appl., vol. 78, no. 24, pp. 35237–35261, 2019.
[49]
Y. Zhang, M. Kampffmeyer, X. Liang, M. Tan, and E. P. Xing, “Query-conditioned three-player adversarial network for video summarization,” 2018, arXiv:1807.06677.
[50]
S. Lan, R. Panda, Q. Zhu, and A. K. Roy-Chowdhury, “FFNet: Video fast-forwarding via reinforcement learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 6771–6780.
[51]
R. Paulus, C. Xiong, and R. Socher, “A deep reinforced model for abstractive summarization,” 2018. [Online]. Available: https://openreview.net/forum?id=HkAClQgA-
[52]
Y. Zhang, M. Kampffmeyer, X. Zhao, and M. Tan, “Deep reinforcement learning for query-conditioned video summarization,” Appl. Sci., vol. 9, no. 4, 2019, Art. no.
[53]
K. Zhou, Y. Qiao, and T. Xiang, “Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward,” in Proc. AAAI Conf. Artif. Intell., 2018.
[54]
Y. Hoshen, G. Ben-Artzi, and S. Peleg, “Wisdom of the crowd in egocentric video curation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, 2014, pp. 573–579.
[55]
A. Mahapatra, P. K. Sa, B. Majhi, and S. Padhy, “MVS: A multi-view video synopsis framework,” Signal Process.: Image Commun., vol. 42, pp. 31–44, 2016.
[56]
S.-H. Ou, C.-H. Lee, V. S. Somayazulu, Y.-K. Chen, and S.-Y. Chien, “On-line multi-view video summarization for wireless video sensor network,” IEEE J. Sel. Topics Signal Process., vol. 9, no. 1, pp. 165–179, Feb. 2015.
[57]
S.-H. Ou et al., “Communication-efficient multi-view keyframe extraction in distributed video sensors,” in Proc. IEEE Vis. Commun. Image Process. Conf., 2014, pp. 13–16.
[58]
J. Zhu, S. Liao, and S. Z. Li, “Multicamera joint video synopsis,” IEEE Trans. on Circuits and Syst. for Video Technol., vol. 26, no. 6, pp. 1058–1069, Jun. 2016.
[59]
I. Arev, H. S. Park, Y. Sheikh, J. Hodgins, and A. Shamir, “Automatic editing of footage from multiple social cameras,” ACM Trans. Graph., vol. 33, no. 4, pp. 1–11, 2014.
[60]
Y. Zhang and R. Zimmermann, “Efficient summarization from multiple georeferenced user-generated videos,” IEEE Trans. Multimedia, vol. 18, pp. 418–431, 2016.
[61]
Z. Ji, Y. Zhang, Y. Pang, X. Li, and J. Pan, “Multi-video summarization with query-dependent weighted archetypal analysis,” Neurocomputing, vol. 332, pp. 406–416, 2019.
[62]
M. Mills, J. Cohen, and Y. Y. Wong, “A magnifier tool for video data,” in Proc. SIGCHI Conf. Hum. Factors Comput. Syst., 1992, pp. 93–98.
[63]
J. P. Collomosse, G. McNeill, and Y. Qian, “Storyboard sketches for content based video retrieval,” in Proc. IEEE 12th Int. Conf. Comput. Vis., 2009, pp. 245–252.
[64]
M. Furini, S. Mirri, and M. Montangero, “TagLecture: The gamification of video lecture indexing through quality-based tags,” in Proc. IEEE Symp. Comput. Commun., 2017, pp. 122–127.
[65]
A. A. Patwardhan, S. Das, S. Varshney, M. S. Desarkar, and D. P. Dogra, “ViTag: Automatic video tagging using segmentation and conceptual inference,” in Proc. IEEE 5th Int. Conf. Multimedia Big Data, 2019, pp. 271–276.
[66]
D. Fernández et al., “ViTS: Video tagging system from massive web multimedia collections,” in Proc. IEEE Int. Conf. Comput. Vis. Workshops, 2017, pp. 337–346.
[67]
A. Truong, P. Chi, D. Salesin, I. Essa, and M. Agrawala, “Automatic generation of two-level hierarchical tutorials from instructional makeup videos,” in Proc. CHI Conf. Hum. Factors Comput. Syst., 2021, pp. 1–16.
[68]
S. Bano and A. Cavallaro, “Discovery and organization of multi-camera user-generated videos of the same event,” Inf. Sci., vol. 302, pp. 108–121, 2015.
[69]
A. D. Sokolova, A. S. Kharchevnikova, and A. V. Savchenko, “Organizing multimedia data in video surveillance systems based on face verification with convolutional neural networks,” in Proc. Int. Conf. Anal. Images, Social Netw. Texts, 2017, pp. 223–230.
[70]
S.-S. Cheung and A. Zakhor, “Efficient video similarity measurement with video signature,” IEEE Trans. on Circuits Syst. Video Technol., vol. 13, no. 1, pp. 59–74, Jan. 2003.
[71]
L. Yuan et al., “Central similarity quantization for efficient image and video retrieval,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 3083–3092.
[72]
S. Bekhet and A. Ahmed, “Evaluation of similarity measures for video retrieval,” Multimedia Tools Appl., vol. 79, no. 9, pp. 6265–6278, 2020.
[73]
M. Hamroun, S. Lajmi, H. Nicolas, and I. Amous, “VISEN: A video interactive retrieval engine based on semantic network in large video collections,” in Proc. 23rd Int. Database Appl. Eng. Symp., 2019, pp. 1–10.
[74]
Y. Huang et al., “Real-time video recommendation exploration,” in Proc. Int. Conf. Manage. Data, 2016, pp. 35–46.
[75]
Y. Deldjoo et al., “Content-based video recommendation system based on stylistic visual features,” J. Data Semantics, vol. 5, no. 2, pp. 99–113, 2016.
[76]
H. Yan et al., “Multi-site user behavior modeling and its application in video recommendation,” IEEE Trans. Knowl. Data Eng., vol. 33, no. 1, pp. 180–193, Jan. 2021.
[77]
G. Adomavicius and A. Tuzhilin, “Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 6, pp. 734–749, Jun. 2005.
[78]
Y. Chen, S. Mensah, F. Ma, H. Wang, and Z. Jiang, “Collaborative filtering grounded on knowledge graphs,” Pattern Recognit. Lett., vol. 151, pp. 55–61, 2021.
[79]
P. Resnick, “GruopLens: An open architecture for collaborative filtering of netnews,” in Proc. ACM Conf. Comput. Supported Cooperative Work, 1994, pp. 175–186.
[80]
R. Salakhutdinov and A. Mnih, “Probabilistic matrix factorization,” in “Proc. 20th Int. Conf. Neural Inf. Process. Syst.,” 2007, pp. 1257–1264.
[81]
B. Sarwar, “Item-based collaborative filtering recommendation algorithms,” in Proc. 10th Int. World Wide Web Conf., 2001, pp. 285–295.
[82]
G.-L. Sun, Z.-Q. Cheng, X. Wu, and Q. Peng, “Personalized clothing recommendation combining user social circle and fashion style consistency,” Multimedia Tools Appl., vol. 77, no. 14, pp. 17731–17754, 2018.
[83]
X. Qian, H. Feng, G. Zhao, and T. Mei, “Personalized recommendation combining user interest and social circle,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 7, pp. 1763–1777, Jul. 2014.
[84]
P. Pirolli and S. Card, “Information foraging,” Psychol. Rev., vol. 106, no. 4, 1999, Art. no.
[85]
X. Liu, J. Li, J. Wang, and Z. Liu, “MMFashion: An open-source toolbox for visual fashion analysis,” in Proc. 29th ACM Int. Conf. Multimedia, 2021, pp. 3755–3758.
[86]
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proc. 18th Int. Conf. Med. Image Comput. Comput.- Assist. Interv., 2015, pp. 234–241.
[87]
Y. Wang, E. Ghumare, R. Vandenberghe, and P. Dupont, “Comparison of different generalizations of clustering coefficient and local efficiency for weighted undirected graphs,” Neural Comput., vol. 29, no. 2, pp. 313–331, 2017.
[88]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.
[89]
S. Lee, S. Oh, C. Jung, and C. Kim, “A global-local embedding module for fashion landmark detection,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshops, 2019, pp. 3153–3156.
[90]
J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” 2018, arXiv:1804.02767.
[91]
H. Fang, S. Xie, Y. W. Tai, and C. Lu, “RMPE: Regional multi-person pose estimation,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2353–2362.
[92]
Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in Proc. 16th Eur. Conf. Comput. Vis., 2020, pp. 402–419.
[93]
A. Bangor, P. Kortum, and J. Miller, “Determining what individual SUS scores mean: Adding an adjective rating scale,” J. Usability Stud., vol. 4, no. 3, pp. 114–123, 2009.
Index Terms
- Automatic Generation of Interactive Nonlinear Video for Online Apparel Shopping Navigation
Index terms have been assigned to the content through auto-classification.
Recommendations
Item-level RFID for enhancement of customer shopping experience in apparel retail
We use item-level RFID to enhance customer shopping experience in apparel retail.We install RFID devices to collect customer shopping behaviour.We implement intelligent fuzzy screening algorithms to analyze customer preferences.We use the results for ...
An interactive exploratory search system for on-line apparel shopping
VINCI '15: Proceedings of the 8th International Symposium on Visual Information Communication and InteractionMany people (especially women) tend to take relatively longer time for shopping. This paper presents a system for product retrieval inspired by psychology of women's shopping activity, and an implementation of the system for apparel products. Our study ...
Comments
Information & Contributors
Information
Published In
1520-9210 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
Publisher
IEEE Press
Publication History
Published: 12 April 2023
Qualifiers
- Research-article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 11 Jan 2025