Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3201064.3201091acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
research-article

Automated Discovery of Internet Censorship by Web Crawling

Published: 15 May 2018 Publication History

Abstract

Censorship of the Internet is widespread around the world. As access to the web becomes increasingly ubiquitous, filtering of this resource becomes more pervasive. Transparency about specific content and information that citizens are denied access to is atypical. To counter this, numerous techniques for maintaining URL filter lists have been proposed by various individuals, organisations and researchers. These aim to improve empirical data on censorship for benefit of the public and wider censorship research community, while also increasing the transparency of filtering activity by oppressive regimes. We present a new approach for discovering filtered domains in different target countries. This method is fully automated and requires no human interaction. The system uses web crawling techniques to traverse between filtered sites and implements a robust method for determining if a domain is filtered. We demonstrate the effectiveness of the approach by running experiments to search for filtered content in four different censorship regimes. Our results show that we perform better than the current state of the art and have built domain filter lists an order of magnitude larger than the most widely available public lists as of April 2018. Further, we build a dataset mapping the interlinking nature of blocked content between domains and exhibit the tightly networked nature of censored web resources.

References

[1]
Giuseppe Aceto . 2014. Monitoring Internet censorship: the case of UBICA. (2014).
[2]
Giuseppe Aceto, Alessio Botta, Antonio Pescapé, M Faheem Awan, Tahir Ahmad, and Saad Qaisar . 2016. Analyzing internet censorship in Pakistan. In Research and Technologies for Society and Industry Leveraging a better tomorrow (RTSI), 2016 IEEE 2nd International Forum on. IEEE, 1--6.
[3]
Giuseppe Aceto, Alessio Botta, Antonio Pescapè, Nick Feamster, M. Faheem Awan, Tahir Ahmad, and Saad Qaisar . 2015. Monitoring Internet Censorship with UBICA. In Traffic Monitoring and Analysis. Springer.
[4]
Giuseppe Aceto and Antonio Pescapè . 2015. Internet Censorship detection: A survey. Computer Networks (2015).
[5]
U.S. Central Intelligence Agency . Accessed Oct 2017. Internet hosts, CIA World Factbook. https://www.cia.gov/library/publications/the-world-factbook/rankorder/2184rank.html.
[6]
Mustafa Akgul and Melih Kirlidog . 2015. Internet censorship in Turkey. Internet Policy Review Vol. 4, 2 (2015), 1--22.
[7]
Collin Anderson . 2013. Dimming the Internet: Detecting throttling as a mechanism of censorship in Iran. arXiv preprint arXiv:1306.4361 (2013).
[8]
Simurgh Aryan, Homa Aryan, and J Alex Halderman . 2013. Internet Censorship in Iran: A First Look. In FOCI.
[9]
citizenlab.org . 2017 (accessed May, 2017). citizenlab/test-lists. https://github.com/citizenlab/test-lists.
[10]
Richard Clayton, Steven J Murdoch, and Robert NM Watson . 2006. Ignoring the great firewall of china. In Privacy Enhancing Technologies. Springer, 20--35.
[11]
Jedidiah R. Crandall, Masashi Crete-Nishihata, and Jeffrey Knockel . 2015. Forgive Us our SYNs: Technical and Ethical Considerations for Measuring Internet Filtering. In Ethics in Networked Systems Research. ACM.
[12]
Jedidiah R Crandall, Daniel Zinn, Michael Byrd, Earl T Barr, and Rich East . 2007. ConceptDoppler: a weather tracker for internet censorship. ACM Conference on Computer and Communications Security. 352--365.
[13]
Alexander Darer, Oliver Farnan, and Joss Wright . 2017. FilteredWeb: A Framework for the Automated Search-Based Discovery of Blocked URLs. In Network Traffic Measurement and Analysis. IFIP. deftempurl%http://tma.ifip.org/wordpress/wp-content/uploads/2017/06/tma2017_paper32.pdf tempurl
[14]
David Dittrich, Erin Kenneally, et almbox. . 2011. The Menlo Report: Ethical principles guiding information and communication technology research. US Department of Homeland Security (2011).
[15]
Oliver Farnan, Alexander Darer, and Joss Wright . 2016. Poisoning the Well: Exploring the Great Firewall's Poisoned DNS Responses Proceedings of the 2016 ACM on Workshop on Privacy in the Electronic Society. ACM, 95--98.
[16]
Arturo Filasto and Jacob Appelbaum . 2012. OONI: Open Observatory of Network Interference. In FOCI.
[17]
Mozilla Foundation . 2017 (accessed Sept, 2017). Public Suffix List. https://publicsuffix.org/.
[18]
King-wa Fu, Chung-hong Chan, and Marie Chau . 2013. Assessing censorship on microblogs in China: Discriminatory keyword analysis and the real-name registration policy. Internet Computing, IEEE Vol. 17, 3 (2013), 42--50.
[19]
Genevieve Gebhart, Anonymous Author, and Tadayoshi Kohno . 2017. Internet Censorship in Thailand: User Practices and Potential Threats European Symposium on Security & Privacy. IEEE. deftempurl%http://homes.cs.washington.edu/ yoshi/papers/GebhartEtAl-IEEEEuroSP.pdf tempurl
[20]
Open Net Initiative . 2011 (accessed Jan, 2018). ONI Country Profile - Indonesia. http://access.opennet.net/wp-content/uploads/2011/12/accesscontested-indonesia.pdf.
[21]
Arturo Filastò Khairil Yusof Tan Sze Ming Kay Yen Wong, Maria Xynou . 2017 (accessed May, 2017). The State of Internet Censorship in Indonesia. https://ooni.torproject.org/post/indonesia-internet-censorship/.
[22]
Gary King, Jennifer Pan, and Margaret E Roberts . 2013. How censorship in China allows government criticism but silences collective expression. American Political Science Review Vol. 107, 02 (2013), 326--343.
[23]
Jeffrey Knockel, Jedidiah R Crandall, and Jared Saia . 2011. Three Researchers, Five Conjectures: An Empirical Analysis of TOM-Skype Censorship and Surveillance. In FOCI.
[24]
Marc Kührer, Thomas Hupperich, Jonas Bushart, Christian Rossow, and Thorsten Holz . 2015. Going wild: Large-scale classification of open DNS resolvers Proceedings of the 2015 Internet Measurement Conference. ACM, 355--368.
[25]
Graham Lowe, Patrick Winters, and Michael L Marcus . 2007. The great DNS wall of China. MS, New York University Vol. 21 (2007).
[26]
MaxMind . Accessed Oct 2017. GeoIP2 Databases. https://www.maxmind.com/en/geoip2-databases.
[27]
Zubair Nabi . 2013. The anatomy of web censorship in Pakistan. arXiv preprint arXiv:1307.1144 (2013).
[28]
Bloomberg News . Accessed Oct 2017. China Tells Carriers to Block Access to Personal VPNs by February. https://www.bloomberg.com/news/articles/2017-07--10/china-is-said-to-order-carriers-to-bar-personal-vpns-by-february.
[29]
Paul Pearce, Roya Ensafi, Frank Li, Nick Feamster, and Vern Paxson . 2017 a. Augur: Internet-Wide Detection of Connectivity Disruptions Symposium on Security & Privacy. IEEE. deftempurl%http://www.ieee-security.org/TC/SP2017/papers/586.pdf tempurl
[30]
Paul Pearce, Ben Jones, Frank Li, Roya Ensafi, Nick Feamster, Nick Weaver, and Vern Paxson . 2017 b. Global Measurement of DNS Manipulation. In USENIX Security Symposium. USENIX. deftempurl%https://www.usenix.org/system/files/conference/usenixsecurity17/sec17-pearce.pdf tempurl
[31]
The Jakarta Post . 2017 (accessed May, 2017). Indonesia blocks 800,000 websites. http://www.thejakartapost.com/news/2017/01/07/indonesia-blocks-800000-websites.html.
[32]
Maria Praetzellis . Accessed Oct 2017. Identify and avoid crawler traps. https://support.archive-it.org/hc/en-us/articles/208332943-Identify-and-avoid-crawler-traps-.
[33]
Andreas Sfakianakis, Elias Athanasopoulos, and Sotiris Ioannidis . 2011. CensMon: A Web Censorship Monitor. In Free and Open Communications on the Internet. USENIX.
[34]
Yeganeh Torbati . Accessed Oct 2017. Iran blocks use of tool to get around Internet filter. https://www.reuters.com/article/us-iran-internet/iran-blocks-use-of-tool-to-get-around-internet-filter-idUSBRE9290CV20130310.
[35]
Matth"aus Wander, Christopher Boelmann, Lorenz Schwittmann, and Torben Weis . 2014. Measurement of globally visible DNS injection. IEEE Access Vol. 2 (2014), 526--536.
[36]
Barney Warf . 2011. Geographies of global Internet censorship. GeoJournal Vol. 76, 1 (2011), 1--23.
[37]
WebShrinker . Accessed 2017. WebShrinker Categories API. https://www.webshrinker.com/.
[38]
Zachary Weinberg, Mahmood Sharif, Janos Szurdi, and Nicolas Christin . 2017. Topics of Controversy: An Empirical Analysis of Web Censorship Lists. Privacy Enhancing Technologies Vol. 2017, 1 (2017), 42--61. deftempurl%https://petsymposium.org/2017/papers/issue1/paper06--2017--1-source.pdf tempurl
[39]
Gilbert Wondracek, Thorsten Holz, Christian Platzer, Engin Kirda, and Christopher Kruegel . 2010. Is the Internet for Porn? An Insight Into the Online Adult Industry. WEIS.
[40]
Joss Wright . 2014. Regional variation in Chinese internet filtering. Information, Communication & Society Vol. 17, 1 (2014), 121--141.
[41]
Joss Wright, Alexander Darer, and Oliver Farnan . 2015. Filterprints: Identifying Localised Usage Anomalies in Censorship Circumvention Tools. arXiv preprint arXiv:1507.05819 (2015).
[42]
Joss Wright, Tulio Souza, and Ian Brown . 2011. Fine-Grained Censorship Mapping: Information Sources, Legality and Ethics Free and Open Communications on the Internet. USENIX.

Cited By

View all
  • (2024)Aligning agent-based testing (ABT) with the experimental research paradigm: a literature review and best practicesJournal of Computational Social Science10.1007/s42001-024-00283-67:2(1625-1644)Online publication date: 16-May-2024
  • (2023)Global, Passive Detection of Connection TamperingProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604875(622-636)Online publication date: 10-Sep-2023
  • (2021)Chinese Wall or Swiss Cheese? Keyword filtering in the Great Firewall of ChinaProceedings of the Web Conference 202110.1145/3442381.3450076(472-483)Online publication date: 19-Apr-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WebSci '18: Proceedings of the 10th ACM Conference on Web Science
May 2018
399 pages
ISBN:9781450355636
DOI:10.1145/3201064
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. censorship
  2. dns
  3. filtering
  4. monitoring
  5. transparency

Qualifiers

  • Research-article

Funding Sources

Conference

WebSci '18
Sponsor:
WebSci '18: 10th ACM Conference on Web Science
May 27 - 30, 2018
Amsterdam, Netherlands

Acceptance Rates

WebSci '18 Paper Acceptance Rate 30 of 113 submissions, 27%;
Overall Acceptance Rate 245 of 933 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)4
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Aligning agent-based testing (ABT) with the experimental research paradigm: a literature review and best practicesJournal of Computational Social Science10.1007/s42001-024-00283-67:2(1625-1644)Online publication date: 16-May-2024
  • (2023)Global, Passive Detection of Connection TamperingProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604875(622-636)Online publication date: 10-Sep-2023
  • (2021)Chinese Wall or Swiss Cheese? Keyword filtering in the Great Firewall of ChinaProceedings of the Web Conference 202110.1145/3442381.3450076(472-483)Online publication date: 19-Apr-2021
  • (2018)On Identifying Anomalies in Tor Usage with Applications in Detecting Internet CensorshipProceedings of the 10th ACM Conference on Web Science10.1145/3201064.3201093(87-96)Online publication date: 15-May-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media