Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3557915.3560999acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article
Public Access

FastAddr: real-time abnormal address detection via contrastive augmentation for location-based services

Published: 22 November 2022 Publication History

Abstract

An address, a textual description of a physical location, plays an important role in location-based services such as on-demand delivery and e-commerce. However, abnormal addresses (i.e., an address without detailed information representing a spatial location) have led to significant costs. In real-world settings like e-commerce, abnormal address detection is not trivial because it needs to be completed in real-time to support massive online queries. In this study, we design FastAddr, a fast abnormal address detection framework, which detects abnormal addresses among millions of addresses in a short time. By investigating and modeling the hierarchical structure of address data, we first design a novel contrastive address augmentation approach to generate training data via learning the entity transition probability matrix. We further design a lightweight multi-head attention model for learning compact address representation by modeling the address characteristics. We conduct a comprehensive three-phase evaluation. (i) We evaluate FastAddr on a real-world dataset and it yields the average F1 of 85.7% in 0.058 milliseconds, which outperforms the state-of-the-art models by 47.4% with similar detection time. (ii) An offline A/B test shows that FastAddr outperforms the previous deployed model significantly. (iii) We also conduct an online A/B test to compare FastAddr with the deployed model, which shows an improvement of F1 by more than 20%. Moreover, a real-world case study demonstrates both the efficiency and effectiveness of FastAddr.

References

[1]
2021. Abnormal address in E-commerce. 2021. http://finance.china.com.cn/roll/20211102/5685321.shtml.
[2]
2022. Gaode Maps. 2022. https://ditu.amap.com/.
[3]
Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. 2011. Modern Information Retrieval - the concepts and technology behind search, Second edition.
[4]
Ane Blázquez-García, Angel Conde, Usue Mori, and Jose A. Lozano. 2021. A Review on Outlier/Anomaly Detection in Time Series Data. ACM Comput. Surv. 54, 3, Article 56 (April 2021), 33 pages.
[5]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5 (2017), 135--146.
[6]
Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM computing surveys (CSUR) 41, 3 (2009), 1--58.
[7]
Jinghui Chen, Saket Sathe, Charu Aggarwal, and Deepak Turaga. 2017. Outlier detection with autoencoder ensembles. In Proceedings of the 2017 SIAM international conference on data mining. SIAM, 90--98.
[8]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In ICML'20. PMLR, 1597--1607.
[9]
Yudong Chen, Xin Wang, Miao Fan, Jizhou Huang, Shengwen Yang, and Wenwu Zhu. 2021. Curriculum Meta-Learning for Next POI Recommendation. ACM, 2692--2702.
[10]
Ailin Deng and Bryan Hooi. 2021. Graph neural network-based anomaly detection in multivariate time series. In AAAI'21, Vol. 35. 4027--4035.
[11]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171--4186.
[12]
Yi Ding, Ling Liu, Yu Yang, Yunhuai Liu, Desheng Zhang, and Tian He. 2021. From Conception to Retirement: a Lifetime Story of a 3-Year-Old Wireless Beacon System in the Wild. In NSDI'21). 859--872.
[13]
Jose R Dorronsoro, Francisco Ginel, C Sgnchez, and Carlos S Cruz. 1997. Neural fraud detection in credit card operations. IEEE transactions on neural networks 8, 4 (1997), 827--834.
[14]
Bowen Du, Chuanren Liu, Wenjun Zhou, Zhenshan Hou, and Hui Xiong. 2016. Catch me if you can: Detecting pickpocket suspects from large-scale transit records. In ACM SIGKDD. 87--96.
[15]
Izhak Golan and Ran El-Yaniv. 2018. Deep anomaly detection using geometric transformations. arXiv preprint arXiv:1805.10917 (2018).
[16]
Tianfu He, Guochun Chen, Chuishi Meng, Huajun He, Zheyi Pan, Yexin Li, Sijie Ruan, Huimin Ren, Ye Yuan, Ruiyuan Li, et al. 2021. POI Alias Discovery in Delivery Addresses using User Locations. In SIGSPATIAL'21. 225--228.
[17]
Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich. 2018. Deep anomaly detection with outlier exposure. arXiv preprint arXiv:1812.04606 (2018).
[18]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[19]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In CVPR'18. 7132--7141.
[20]
Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015).
[21]
Prabhu Kaliamoorthi, Sujith Ravi, and Zornitsa Kozareva. 2019. PRADO: Projection Attention Networks for Document Classification On-Device. In EMNLP-IJCNLP. Association for Computational Linguistics, 5012--5021.
[22]
Ramakrishnan Kannan, Hyenkyun Woo, Charu C Aggarwal, and Haesun Park. 2017. Outlier detection for text data: An extended version. arXiv preprint arXiv:1701.01325 (2017).
[23]
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In EMNLP. Association for Computational Linguistics, 1746--1751.
[24]
Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent Convolutional Neural Networks for Text Classification. In AAAI'15 (Austin, Texas) (AAAI'15). AAAI Press, 2267--2273.
[25]
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In ICLR'20.
[26]
Dongha Lee, Jiaming Shen, SeongKu Kang, Susik Yoon, Jiawei Han, and Hwanjo Yu. 2022. TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel Topic Clusters. arXiv preprint arXiv:2201.06771 (2022).
[27]
Hao Li, Wei Lu, Pengjun Xie, and Linlin Li. 2019. Neural Chinese address parsing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 3421--3431.
[28]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. Focal Loss for Dense Object Detection. IEEE PAMI 42, 2 (2020), 318--327.
[29]
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In ICDM'08. IEEE, 413--422.
[30]
Hu Liu, Jing Lu, Hao Yang, Xiwei Zhao, Sulong Xu, Hao Peng, Zehua Zhang, Wenjie Niu, Xiaokun Zhu, Yongjun Bao, et al. 2020. Category-Specific CNN for Visual-aware CTR Prediction at JD. com. In KDD'20. 2686--2696.
[31]
Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent Neural Network for Text Classification with Multi-Task Learning. In IJCAI'16. AAAI Press, 2873--2879.
[32]
Andrei Manolache, Florin Brad, and Elena Burceanu. 2021. DATE: Detecting Anomalies in Text via Self-Supervision of Transformers. In NAACL-HLT'21. 267--277.
[33]
Yu Meng, Chenyan Xiong, Payal Bajaj, Paul Bennett, Jiawei Han, Xia Song, et al. 2021. Coco-lm: Correcting and contrasting text sequences for language model pretraining. Advances in Neural Information Processing Systems 34 (2021).
[34]
Guansong Pang, Chunhua Shen, Longbing Cao, and Anton Van Den Hengel. 2021. Deep learning for anomaly detection: A review. ACM Computing Surveys (CSUR) 54, 2 (2021), 1--38.
[35]
Sijie Ruan, Zi Xiong, Cheng Long, Yiheng Chen, Jie Bao, Tianfu He, Ruiyuan Li, Shengnan Wu, Zhongyuan Jiang, and Yu Zheng. 2020. Doing in One Go: Delivery Time Inference Based on Couriers' Trajectories. In KDD'20. 2813--2821.
[36]
Lukas Ruff, Yury Zemlyanskiy, Robert Vandermeulen, Thomas Schnake, and Marius Kloft. 2019. Self-Attentive, Multi-Context One-Class Classification for Unsupervised Anomaly Detection on Text. In ACL. Association for Computational Linguistics, 4061--4071.
[37]
Bernhard Schölkopf, John C Platt, John Shawe-Taylor, Alex J Smola, and Robert C Williamson. 2001. Estimating the support of a high-dimensional distribution. Neural computation 13, 7 (2001), 1443--1471.
[38]
Bernhard Schölkopf, Robert C Williamson, Alexander J Smola, John Shawe-Taylor, John C Platt, et al. 1999. Support vector method for novelty detection. In NIPS, Vol. 12. Citeseer, 582--588.
[39]
Yatong Song, Jiawei Li, Liying Chen, Shuiping Chen, Renqing He, and Zhizhao Sun. 2021. A Semantic Segmentation Based POI Coordinates Generating Framework for On-Demand Food Delivery Service. In SIGSPATIAL'21. ACM, 379--388.
[40]
Yan Song, Shuming Shi, Jing Li, and Haisong Zhang. 2018. Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings. In NAACL-HLT'18, Volume 2 (Short Papers). Association for Computational Linguistics, 175--180.
[41]
Vishal Srivastava, Priyam Tejaswin, Lucky Dhakad, Mohit Kumar, and Amar Dani. 2020. A Geocoding Framework Powered by Delivery Data. In Proceedings of the 28th International Conference on Advances in Geographic Information Systems (SIGSPATIAL '20). ACM, 568--577.
[42]
Lichao Sun, Congying Xia, Wenpeng Yin, Tingting Liang, Philip Yu, and Lifang He. 2020. Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks. In ACL. International Committee on Computational Linguistics, 3436--3440.
[43]
Nesime Tatbul, Tae Jun Lee, Stan Zdonik, Mejbah Alam, and Justin Gottschlich. 2018. Precision and recall for time series. arXiv preprint arXiv:1803.03639 (2018).
[44]
Qin Tian, Fu Ren, Tao Hu, Jiangtao Liu, Ruichang Li, and Qingyun Du. 2016. Using an optimized Chinese address matching method to develop a geocoding service: a case study of Shenzhen, China. ISPRS International Journal of Geo-Information 5, 5 (2016), 65.
[45]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS'17. 5998--6008.
[46]
Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. In NeurIPS 2020, December 6--12, 2020, virtual.
[47]
Vedang A Waradpande, Petchetti Vinay Surya Prakash, Nikhil Jhaveri, and Shashank Agarwal. 2021. Predicting Completeness of Unstructured Shipping Addresses Using Ensemble Models. (2021).
[48]
Seonghyeon Ye, Jiseon Kim, and Alice Oh. 2021. Efficient Contrastive Learning via Novel Data Augmentation and Curriculum Learning. In EMNLP. Association for Computational Linguistics, 1832--1838.
[49]
Mengxi Yu, Ziyu Liu, Yuhang Tang, and Jianfeng Jiang. 2021. Recognition algorithm of e-commerce click farming based on K-means technology. In 2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP). 103--106.
[50]
Rose Yu, Huida Qiu, Zhen Wen, ChingYung Lin, and Yan Liu. 2016. A survey on social media anomaly detection. ACM SIGKDD Explorations Newsletter 18, 1 (2016), 1--14.
[51]
Pengpeng Zhao, Anjing Luo, Yanchi Liu, Fuzhen Zhuang, Jiajie Xu, Zhixu Li, Victor S Sheng, and Xiaofang Zhou. 2020. Where to go next: A spatio-temporal gated network for next poi recommendation. TKDE'20 (2020).
[52]
Li Zheng, Zhenpeng Li, Jian Li, Zhao Li, and Jun Gao. 2019. AddGraph: Anomaly Detection in Dynamic Graph Using Attention-based Temporal GCN. In IJCAI. 4419--4425.
[53]
Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hongwei Hao, and Bo Xu. 2016. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification. In ACL'16 (Volume 2: Short Papers). Association for Computational Linguistics, 207--212.

Cited By

View all
  • (2024)SmallMap: Low-cost Community Road Map Sensing with Uncertain Delivery BehaviorProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595968:2(1-26)Online publication date: 15-May-2024
  • (2024)Behavior-aware Sparse Trajectory Recovery in Last-mile Delivery with Multi-scale Attention FusionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680079(4931-4938)Online publication date: 21-Oct-2024
  • (2024)MalLight: Influence-Aware Coordinated Traffic Signal Control for Traffic Signal MalfunctionsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679605(2879-2889)Online publication date: 21-Oct-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSPATIAL '22: Proceedings of the 30th International Conference on Advances in Geographic Information Systems
November 2022
806 pages
ISBN:9781450395298
DOI:10.1145/3557915
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 November 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. contrastive augmentation
  2. geocoding
  3. multi-head attention
  4. real-time abnormal address detection

Qualifiers

  • Research-article

Funding Sources

Conference

SIGSPATIAL '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 220 of 1,116 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)128
  • Downloads (Last 6 weeks)19
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SmallMap: Low-cost Community Road Map Sensing with Uncertain Delivery BehaviorProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595968:2(1-26)Online publication date: 15-May-2024
  • (2024)Behavior-aware Sparse Trajectory Recovery in Last-mile Delivery with Multi-scale Attention FusionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680079(4931-4938)Online publication date: 21-Oct-2024
  • (2024)MalLight: Influence-Aware Coordinated Traffic Signal Control for Traffic Signal MalfunctionsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679605(2879-2889)Online publication date: 21-Oct-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media