As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Existing weakly-supervised temporal sentence grounding methods typically regard query reconstruction as the pretext task in place of the absent temporal supervision. However, their approaches suffer from two flaws, i.e. insignificant reconstruction and discrepancy in alignment. Insignificant reconstruction indicates the randomly masked words may not be discriminative enough to distinguish the target event from unrelated events in the video. Discrepancy in alignment indicates the incorrect partial alignment built by query reconstruction task. The flaws undermine the reliability of current reconstruction-based methods. To this end, we propose a novel Self-improving Query ReconstrucTion (SQRT) framework for weakly-supervised temporal sentence grounding. To deal with insignificant reconstruction, we devise a key words mining strategy to determine the important words for language grounding. To attain better moment-query alignment, we introduce inter-sample contrast to tackle the partial alignment built by query reconstruction. The self-improving framework utilizes query reconstruction for language grounding and alleviates the discrepancy in alignment, thus turning on the right track. Experiments on two popular datasets show that SQRT achieves state-of-the-art performance on Charades-STA and comparable performance to the state-of-the-art on ActivityNet Captions.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.