research-article

Public Access

FastAddr: real-time abnormal address detection via contrastive augmentation for location-based services

Authors:

Desheng ZhangAuthors Info & Claims

SIGSPATIAL '22: Proceedings of the 30th International Conference on Advances in Geographic Information Systems

Article No.: 64, Pages 1 - 10

https://doi.org/10.1145/3557915.3560999

Published: 22 November 2022 Publication History

Abstract

An address, a textual description of a physical location, plays an important role in location-based services such as on-demand delivery and e-commerce. However, abnormal addresses (i.e., an address without detailed information representing a spatial location) have led to significant costs. In real-world settings like e-commerce, abnormal address detection is not trivial because it needs to be completed in real-time to support massive online queries. In this study, we design FastAddr, a fast abnormal address detection framework, which detects abnormal addresses among millions of addresses in a short time. By investigating and modeling the hierarchical structure of address data, we first design a novel contrastive address augmentation approach to generate training data via learning the entity transition probability matrix. We further design a lightweight multi-head attention model for learning compact address representation by modeling the address characteristics. We conduct a comprehensive three-phase evaluation. (i) We evaluate FastAddr on a real-world dataset and it yields the average F1 of 85.7% in 0.058 milliseconds, which outperforms the state-of-the-art models by 47.4% with similar detection time. (ii) An offline A/B test shows that FastAddr outperforms the previous deployed model significantly. (iii) We also conduct an online A/B test to compare FastAddr with the deployed model, which shows an improvement of F1 by more than 20%. Moreover, a real-world case study demonstrates both the efficiency and effectiveness of FastAddr.

References

[1]

2021. Abnormal address in E-commerce. 2021. http://finance.china.com.cn/roll/20211102/5685321.shtml.

[2]

2022. Gaode Maps. 2022. https://ditu.amap.com/.

[3]

Ricardo A. Baeza-Yates and Berthier A. Ribeiro-Neto. 2011. Modern Information Retrieval - the concepts and technology behind search, Second edition.

Digital Library

[4]

Ane Blázquez-García, Angel Conde, Usue Mori, and Jose A. Lozano. 2021. A Review on Outlier/Anomaly Detection in Time Series Data. ACM Comput. Surv. 54, 3, Article 56 (April 2021), 33 pages.

[5]

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics 5 (2017), 135--146.

[6]

Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM computing surveys (CSUR) 41, 3 (2009), 1--58.

Digital Library

[7]

Jinghui Chen, Saket Sathe, Charu Aggarwal, and Deepak Turaga. 2017. Outlier detection with autoencoder ensembles. In Proceedings of the 2017 SIAM international conference on data mining. SIAM, 90--98.

[8]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In ICML'20. PMLR, 1597--1607.

[9]

Yudong Chen, Xin Wang, Miao Fan, Jizhou Huang, Shengwen Yang, and Wenwu Zhu. 2021. Curriculum Meta-Learning for Next POI Recommendation. ACM, 2692--2702.

[10]

Ailin Deng and Bryan Hooi. 2021. Graph neural network-based anomaly detection in multivariate time series. In AAAI'21, Vol. 35. 4027--4035.

[11]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171--4186.

[12]

Yi Ding, Ling Liu, Yu Yang, Yunhuai Liu, Desheng Zhang, and Tian He. 2021. From Conception to Retirement: a Lifetime Story of a 3-Year-Old Wireless Beacon System in the Wild. In NSDI'21). 859--872.

[13]

Jose R Dorronsoro, Francisco Ginel, C Sgnchez, and Carlos S Cruz. 1997. Neural fraud detection in credit card operations. IEEE transactions on neural networks 8, 4 (1997), 827--834.

Digital Library

[14]

Bowen Du, Chuanren Liu, Wenjun Zhou, Zhenshan Hou, and Hui Xiong. 2016. Catch me if you can: Detecting pickpocket suspects from large-scale transit records. In ACM SIGKDD. 87--96.

[15]

Izhak Golan and Ran El-Yaniv. 2018. Deep anomaly detection using geometric transformations. arXiv preprint arXiv:1805.10917 (2018).

[16]

Tianfu He, Guochun Chen, Chuishi Meng, Huajun He, Zheyi Pan, Yexin Li, Sijie Ruan, Huimin Ren, Ye Yuan, Ruiyuan Li, et al. 2021. POI Alias Discovery in Delivery Addresses using User Locations. In SIGSPATIAL'21. 225--228.

Digital Library

[17]

Dan Hendrycks, Mantas Mazeika, and Thomas Dietterich. 2018. Deep anomaly detection with outlier exposure. arXiv preprint arXiv:1812.04606 (2018).

[18]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.

Digital Library

[19]

Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In CVPR'18. 7132--7141.

[20]

Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015).

[21]

Prabhu Kaliamoorthi, Sujith Ravi, and Zornitsa Kozareva. 2019. PRADO: Projection Attention Networks for Document Classification On-Device. In EMNLP-IJCNLP. Association for Computational Linguistics, 5012--5021.

[22]

Ramakrishnan Kannan, Hyenkyun Woo, Charu C Aggarwal, and Haesun Park. 2017. Outlier detection for text data: An extended version. arXiv preprint arXiv:1701.01325 (2017).

[23]

Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In EMNLP. Association for Computational Linguistics, 1746--1751.

[24]

Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Recurrent Convolutional Neural Networks for Text Classification. In AAAI'15 (Austin, Texas) (AAAI'15). AAAI Press, 2267--2273.

[25]

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In ICLR'20.

[26]

Dongha Lee, Jiaming Shen, SeongKu Kang, Susik Yoon, Jiawei Han, and Hwanjo Yu. 2022. TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel Topic Clusters. arXiv preprint arXiv:2201.06771 (2022).

[27]

Hao Li, Wei Lu, Pengjun Xie, and Linlin Li. 2019. Neural Chinese address parsing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 3421--3431.

[28]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. Focal Loss for Dense Object Detection. IEEE PAMI 42, 2 (2020), 318--327.

[29]

Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In ICDM'08. IEEE, 413--422.

Digital Library

[30]

Hu Liu, Jing Lu, Hao Yang, Xiwei Zhao, Sulong Xu, Hao Peng, Zehua Zhang, Wenjie Niu, Xiaokun Zhu, Yongjun Bao, et al. 2020. Category-Specific CNN for Visual-aware CTR Prediction at JD. com. In KDD'20. 2686--2696.

[31]

Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. 2016. Recurrent Neural Network for Text Classification with Multi-Task Learning. In IJCAI'16. AAAI Press, 2873--2879.

[32]

Andrei Manolache, Florin Brad, and Elena Burceanu. 2021. DATE: Detecting Anomalies in Text via Self-Supervision of Transformers. In NAACL-HLT'21. 267--277.

[33]

Yu Meng, Chenyan Xiong, Payal Bajaj, Paul Bennett, Jiawei Han, Xia Song, et al. 2021. Coco-lm: Correcting and contrasting text sequences for language model pretraining. Advances in Neural Information Processing Systems 34 (2021).

[34]

Guansong Pang, Chunhua Shen, Longbing Cao, and Anton Van Den Hengel. 2021. Deep learning for anomaly detection: A review. ACM Computing Surveys (CSUR) 54, 2 (2021), 1--38.

Digital Library

[35]

Sijie Ruan, Zi Xiong, Cheng Long, Yiheng Chen, Jie Bao, Tianfu He, Ruiyuan Li, Shengnan Wu, Zhongyuan Jiang, and Yu Zheng. 2020. Doing in One Go: Delivery Time Inference Based on Couriers' Trajectories. In KDD'20. 2813--2821.

Digital Library

[36]

Lukas Ruff, Yury Zemlyanskiy, Robert Vandermeulen, Thomas Schnake, and Marius Kloft. 2019. Self-Attentive, Multi-Context One-Class Classification for Unsupervised Anomaly Detection on Text. In ACL. Association for Computational Linguistics, 4061--4071.

[37]

Bernhard Schölkopf, John C Platt, John Shawe-Taylor, Alex J Smola, and Robert C Williamson. 2001. Estimating the support of a high-dimensional distribution. Neural computation 13, 7 (2001), 1443--1471.

Digital Library

[38]

Bernhard Schölkopf, Robert C Williamson, Alexander J Smola, John Shawe-Taylor, John C Platt, et al. 1999. Support vector method for novelty detection. In NIPS, Vol. 12. Citeseer, 582--588.

Digital Library

[39]

Yatong Song, Jiawei Li, Liying Chen, Shuiping Chen, Renqing He, and Zhizhao Sun. 2021. A Semantic Segmentation Based POI Coordinates Generating Framework for On-Demand Food Delivery Service. In SIGSPATIAL'21. ACM, 379--388.

Digital Library

[40]

Yan Song, Shuming Shi, Jing Li, and Haisong Zhang. 2018. Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings. In NAACL-HLT'18, Volume 2 (Short Papers). Association for Computational Linguistics, 175--180.

[41]

Vishal Srivastava, Priyam Tejaswin, Lucky Dhakad, Mohit Kumar, and Amar Dani. 2020. A Geocoding Framework Powered by Delivery Data. In Proceedings of the 28th International Conference on Advances in Geographic Information Systems (SIGSPATIAL '20). ACM, 568--577.

Digital Library

[42]

Lichao Sun, Congying Xia, Wenpeng Yin, Tingting Liang, Philip Yu, and Lifang He. 2020. Mixup-Transformer: Dynamic Data Augmentation for NLP Tasks. In ACL. International Committee on Computational Linguistics, 3436--3440.

[43]

Nesime Tatbul, Tae Jun Lee, Stan Zdonik, Mejbah Alam, and Justin Gottschlich. 2018. Precision and recall for time series. arXiv preprint arXiv:1803.03639 (2018).

[44]

Qin Tian, Fu Ren, Tao Hu, Jiangtao Liu, Ruichang Li, and Qingyun Du. 2016. Using an optimized Chinese address matching method to develop a geocoding service: a case study of Shenzhen, China. ISPRS International Journal of Geo-Information 5, 5 (2016), 65.

[45]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NeurIPS'17. 5998--6008.

[46]

Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, and Ming Zhou. 2020. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. In NeurIPS 2020, December 6--12, 2020, virtual.

[47]

Vedang A Waradpande, Petchetti Vinay Surya Prakash, Nikhil Jhaveri, and Shashank Agarwal. 2021. Predicting Completeness of Unstructured Shipping Addresses Using Ensemble Models. (2021).

[48]

Seonghyeon Ye, Jiseon Kim, and Alice Oh. 2021. Efficient Contrastive Learning via Novel Data Augmentation and Curriculum Learning. In EMNLP. Association for Computational Linguistics, 1832--1838.

[49]

Mengxi Yu, Ziyu Liu, Yuhang Tang, and Jianfeng Jiang. 2021. Recognition algorithm of e-commerce click farming based on K-means technology. In 2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP). 103--106.

[50]

Rose Yu, Huida Qiu, Zhen Wen, ChingYung Lin, and Yan Liu. 2016. A survey on social media anomaly detection. ACM SIGKDD Explorations Newsletter 18, 1 (2016), 1--14.

Digital Library

[51]

Pengpeng Zhao, Anjing Luo, Yanchi Liu, Fuzhen Zhuang, Jiajie Xu, Zhixu Li, Victor S Sheng, and Xiaofang Zhou. 2020. Where to go next: A spatio-temporal gated network for next poi recommendation. TKDE'20 (2020).

[52]

Li Zheng, Zhenpeng Li, Jian Li, Zhao Li, and Jun Gao. 2019. AddGraph: Anomaly Detection in Dynamic Graph Using Attention-based Temporal GCN. In IJCAI. 4419--4425.

[53]

Peng Zhou, Wei Shi, Jun Tian, Zhenyu Qi, Bingchen Li, Hongwei Hao, and Bo Xu. 2016. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification. In ACL'16 (Volume 2: Short Papers). Association for Computational Linguistics, 207--212.

Cited By

Hong ZWang HDing YWang GHe TZhang D(2024)SmallMap: Low-cost Community Road Map Sensing with Uncertain Delivery BehaviorProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595968:2(1-26)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3659596
Wang HWang SLin LYang YWang SWen HSerra ESpezzano F(2024)Behavior-aware Sparse Trajectory Recovery in Last-mile Delivery with Multi-scale Attention FusionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680079(4931-4938)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680079
Yang QXie ZWei HZhang DYang YSerra ESpezzano F(2024)MalLight: Influence-Aware Coordinated Traffic Signal Control for Traffic Signal MalfunctionsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679605(2879-2889)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679605

Index Terms

FastAddr: real-time abnormal address detection via contrastive augmentation for location-based services
1. Information systems
  1. Information systems applications
    1. Spatial-temporal systems
      1. Location based services
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. E-commerce infrastructure

Recommendations

Address Geocoding using Street Profiles for Local Search
WWW '16 Companion: Proceedings of the 25th International Conference Companion on World Wide Web

Geocoding is the process of converting addresses to geocoordinates. It is widely used in several fields such as public health to monitor socioeconomic inequalities for example or in Geographical Information Systems (GIS) to be able to use with its ...
Improving Geocoding Practices: Evaluation of Geocoding Tools

This study examined the sources of error involved in geocoding, by systematically evaluating the strengths and weaknesses of three widely used tools for geocoding. We tested them against a random sample of addresses from a state administrative address ...
A Novel Deep Multi-head Attentive Vulnerable Line Detector
Abstract
Detecting and fixing vulnerabilities in software programs before production is crucial in software engineering. Manual vulnerability detection is labor-intensive, especially for large programs, leading to the proposal of machine learning-based ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGSPATIAL '22: Proceedings of the 30th International Conference on Advances in Geographic Information Systems

November 2022

806 pages

ISBN:9781450395298

DOI:10.1145/3557915

General Chairs:
Matthias Renz
Kiel University, Germany
,
Mohamed Sarwat
Wherobots Inc. / Arizona State University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSPATIAL: ACM Special Interest Group on Spatial Information

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 November 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF (National Science Foundation)

Conference

SIGSPATIAL '22

Sponsor:

SIGSPATIAL

SIGSPATIAL '22: The 30th International Conference on Advances in Geographic Information Systems

November 1 - 4, 2022

Washington, Seattle

Acceptance Rates

Overall Acceptance Rate 220 of 1,116 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
236
Total Downloads

Downloads (Last 12 months)128
Downloads (Last 6 weeks)19

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hong ZWang HDing YWang GHe TZhang D(2024)SmallMap: Low-cost Community Road Map Sensing with Uncertain Delivery BehaviorProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595968:2(1-26)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3659596
Wang HWang SLin LYang YWang SWen HSerra ESpezzano F(2024)Behavior-aware Sparse Trajectory Recovery in Last-mile Delivery with Multi-scale Attention FusionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680079(4931-4938)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680079
Yang QXie ZWei HZhang DYang YSerra ESpezzano F(2024)MalLight: Influence-Aware Coordinated Traffic Signal Control for Traffic Signal MalfunctionsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679605(2879-2889)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679605

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents