Dr. Anat Ben-David is a senior lecturer in the department of Sociology, Political Science and Communication. She is co-founder of the Open University's Open Media and Information Lab (OMILab). Focusing on New Media, her primary research interests are history and geopolitics of the Web, Digital STS, social and political studies of social media, and digital and computational methods for Web research.
Social media challenge several established concepts of memory research. In particular, the day-to... more Social media challenge several established concepts of memory research. In particular, the day-to-day mundane discourse of social media blur the essential distinction between commemorative and non-commemorative memory. We address these challenges by presenting a methodological framework that explores the dynamics of social memory on various social media. Our method combines top-down data mining with a bottom-up analysis tailored to each platform. We demonstrate the application of our approach by studying how the Holocaust is remembered in different corpora, including a dataset of 5.3 million Facebook posts and comments collected between 2015 and 2017 and a 5 million Tweets and Retweets dataset collected in 2021. We first identify the mnemonic agents initiating the discussion of the memory of the Holocaust and those responding to it. Second, we compare the macro-rhythms of Holocaust discourse on the two platforms, identifying peaks and mundane discussions that extend beyond commemorative occasions. Third, we identify distinctive language and cultural norms specific to the memorialization of the Holocaust on each platform. We conceptualize these dynamics as ‘Mnemonic Markers’ and discuss them as potential pathways for memory researchers who wish to explore the unique memory dynamics afforded by social media.
This article explores the uses and abuses of traumatic memory within the context of the multiface... more This article explores the uses and abuses of traumatic memory within the context of the multifaceted discursive representation of the Holocaust on social media. Combining computational, quantitative, and qualitative methodologies, the article offers a comprehensive mapping of the mnemonic spectrum extending beyond memory work conducted during official commemorative occasions. To do so, we examined a unique case: the Twitter manifestations of one Hebrew expression—“and their collaborators” (ATC)—which echoes the Israeli “Law for punishing Nazis and their collaborators.” In contrast to the complete phrase, the truncated collocation appears in a variety of contexts across Hebrew Twitter. Thus, our investigation shows that alongside traditional awe-inspiring commemorative (“good”) uses of ATC, the conjunction between social media affordances and user practices brings to the discursive forefront exploitative political (“bad”) ATC uses and misuses that contribute to political polarization...
Previous research on the platformization of news has mostly been devoted to considering the effec... more Previous research on the platformization of news has mostly been devoted to considering the effects of social media on the news industry. The current study focuses on Taboola and Outbrain, two leading content recommendation platforms. The companies form “partnerships” with news organizations, through which they take over a designated space on news websites and curate news, sponsored content, and advertisements, creating a blend that—the companies claim— maximizes monetization. We argue that the unique business model and distribution mechanism of these companies has a distinct effect on news sites, their audiences, and ultimately the journalism profession. An empirical analysis of 97,499 recommended content items, scraped from nine Israeli news sites, suggests that the spaces created by these partnerships blur the distinction between editorial and monetization logics. In addition, we find the creation of indirect network effects: while large media groups benefit from the circulation of sponsored content across their websites, smaller publishers pay Taboola and Outbrain as advertisers to drive traffic to their websites. Thus, even though these companies discursively position themselves as "gallants of the open web"—freeing publishers from the grip of walled-garden platforms—they de facto expose the news industry to the influence of the platform economy.
This paper empirically studies the effects of representation choices on neural sentiment analysis... more This paper empirically studies the effects of representation choices on neural sentiment analysis for Modern Hebrew, a morphologically rich language (MRL) for which no sentiment analyzer currently exists. We study two dimensions of representational choices: (i) the granularity of the input signal (token-based vs. morpheme-based), and (ii) the level of encoding of vocabulary items (string-based vs. character-based). We hypothesise that for MRLs, languages where multiple meaning-bearing elements may be carried by a single space-delimited token, these choices will have measurable effects on task perfromance, and that these effects may vary for different architectural designs — fully-connected, convolutional or recurrent. Specifically, we hypothesize that morpheme-based representations will have advantages in terms of their generalization capacity and task accuracy, due to their better OOV coverage. To empirically study these effects, we develop a new sentiment analysis benchmark for He...
The purpose of this chapter is to conceptually unfold the broader meaning of the term ‘digital na... more The purpose of this chapter is to conceptually unfold the broader meaning of the term ‘digital natives’ both by a historical contextualisation of the ‘digital’, as well as by a discussion of the geopolitics of the ‘native’. The terminological analysis, grounded by a historical contextualisation of digital activism and the history of digital technologies in the past decade, serves to argue that in its current form, the term ‘digital natives’ may represent a renewed dedication to the native place in a point in time when previous distinctions between ‘physical’ and ‘digital’ places no longer hold
Many national and international heritage institutes real-ize the importance of archiving the web ... more Many national and international heritage institutes real-ize the importance of archiving the web for future culture heritage. Web archiving is currently performed either by harvesting a national domain, or by crawling a pre-defined list of websites selected by the archiving institution. In either method, crawling results in more information being harvested than just the websites intended for preservation; which could be used to reconstruct impressions of pages that existed on the live web of the crawl date, but would have been lost forever. We present a method to create representations of what we will refer to as a web collection’s aura: the web documents that were not included in the archived collection, but are known to have existed — due to their mentions on pages that were included in the archived web collection. To create representations of these unarchived pages, we exploit the information about the unarchived URLs that can be de-rived from the crawls by combining crawl date d...
In 2003 the Palestinian state received official recognition on the Web before it was established ... more In 2003 the Palestinian state received official recognition on the Web before it was established on the ground. The delegation of the.ps Country code Top level domain (CcTld) to the Palestinian Authority and its inclusion in the UN list of recognized countries and territories created an official Web-space in which a Palestinian state operated side-by-side with other sovereign states. Yet with the rise of Web 2.0 applications, the official representation of the Palestinian state partially disappeared. This study focuses on the shift in the spatial representation of the Palestinian state on the Web, from an officially acknowledged national Web space, followed by its partial disappearance in Web 2.0 spaces, to its reconstruction as a user-generated space. It examines Palestine’s virtual borders on various Web 2.0 mapping platforms, along with the listing (and non-listing) of Palestine as a country in the registration procedure of popular Web 2.0 applications. It shows that on most mapp...
In light of the exponential growth in digital data characterizing the 21st century, future histor... more In light of the exponential growth in digital data characterizing the 21st century, future historians of our time will have to rely on born-digital materials as primary sources for establishing historical facts. Yet born-digital materials challenge historians’ well-established source criticism techniques used for establishing facts based on the authenticity, authorship and authority of documents, for they are ephemeral, immaterial, fragile and easy to manipulate. For example, the content of websites can be easily modified, tweets are frequently deleted, the number of social media comments and likes can be artificially boosted through click farms, and dubious sources spreading misinformation can be disguised as reliable news organizations. With the commercialization of the web, more than ever before, web data is primarily proprietary, and therefore subjected to platforms’ policies and constraints.
This study considers the ways that overt hate speech and covert discriminatory practices circulat... more This study considers the ways that overt hate speech and covert discriminatory practices circulate on Facebook despite its official policy that prohibits hate speech. We argue that hate speech and discriminatory practices are not only explained by users’ motivations and actions, but are also formed by a network of ties between the platform’s policy, its technological affordances, and the communicative acts of its users. Our argument is supported with longitudinal multimodal content and network analyses of data extracted from official Facebook pages of seven extreme-right political parties in Spain between 2009 and 2013. We found that the Spanish extreme-right political parties primarily implicate discrimination, which is then taken up by their followers who use overt hate speech in the comment space.
Following the familiar distinction between software and hardware, this chapter argues that web ar... more Following the familiar distinction between software and hardware, this chapter argues that web archives deserve to be treated as a third category—memoryware: specific forms of preservation techniques which involve both software and hardware, but also crawlers, bots, curators, and users. While historically the term memoryware refers to the art of cementing together bits and pieces of sentimental objects to commemorate loved ones, understanding web archives as a complex socio-technical memoryware moves beyond their perception as bits and pieces of the live Web. Instead, understanding web archives as memoryware hints at the premise of the web’s exceptionalism in media and communication history and calls for revisiting some of the concepts and best practices in web archiving and web archive research that have consolidated over the years. The chapter, therefore, presents new challenges for web archive research by turning a critical eye on web archiving itself and on the specific types of histories that are constructed with web archives.
The article proposes archival thinking as an analytical framework for studying Facebook. Followin... more The article proposes archival thinking as an analytical framework for studying Facebook. Following recent debates on data colonialism, it argues that Facebook dialectically assumes a role of a new archon of public records, while being unarchivable by design. It then puts forward counter-archiving – a practice developed to resist the epistemic hegemony of colonial archives – as a method that allows the critical study of the social media platform, after it had shut down researcher’s access to public data through its application programming interface. After defining and justifying counter-archiving as a method for studying datafied platforms, two counter-archives are presented as proof of concept. The article concludes by discussing the shifting boundaries between the archivist, the activist and the scholar, as the imperative of research methods after datafication.
The field of web archiving is at a turning point. In the early years of web archiving, the single... more The field of web archiving is at a turning point. In the early years of web archiving, the single URL has been the dominant unit for preservation and access. Access tools such as the Internet Archive's Wayback Machine reflect this notion as they allowed consultation, or browsing, of one URL at a time. In recent years, however, the single URL approach to accessing web archives is being gradually replaced by search interfaces. This paper addresses the theoretical and methodological implications of the transition to search on web archive research. It introduces ‘search as research’ methods, practices already applied in studies of the live web, which can be repurposed and implemented for critically studying archived web data. Such methods open up a variety of analytical practices that were so far precluded by the single URL entry point to the web archive, such as the re-assemblage of existing collections around a theme or an event, the study of archival artefacts and scaling the uni...
Social media challenge several established concepts of memory research. In particular, the day-to... more Social media challenge several established concepts of memory research. In particular, the day-to-day mundane discourse of social media blur the essential distinction between commemorative and non-commemorative memory. We address these challenges by presenting a methodological framework that explores the dynamics of social memory on various social media. Our method combines top-down data mining with a bottom-up analysis tailored to each platform. We demonstrate the application of our approach by studying how the Holocaust is remembered in different corpora, including a dataset of 5.3 million Facebook posts and comments collected between 2015 and 2017 and a 5 million Tweets and Retweets dataset collected in 2021. We first identify the mnemonic agents initiating the discussion of the memory of the Holocaust and those responding to it. Second, we compare the macro-rhythms of Holocaust discourse on the two platforms, identifying peaks and mundane discussions that extend beyond commemorative occasions. Third, we identify distinctive language and cultural norms specific to the memorialization of the Holocaust on each platform. We conceptualize these dynamics as ‘Mnemonic Markers’ and discuss them as potential pathways for memory researchers who wish to explore the unique memory dynamics afforded by social media.
This article explores the uses and abuses of traumatic memory within the context of the multiface... more This article explores the uses and abuses of traumatic memory within the context of the multifaceted discursive representation of the Holocaust on social media. Combining computational, quantitative, and qualitative methodologies, the article offers a comprehensive mapping of the mnemonic spectrum extending beyond memory work conducted during official commemorative occasions. To do so, we examined a unique case: the Twitter manifestations of one Hebrew expression—“and their collaborators” (ATC)—which echoes the Israeli “Law for punishing Nazis and their collaborators.” In contrast to the complete phrase, the truncated collocation appears in a variety of contexts across Hebrew Twitter. Thus, our investigation shows that alongside traditional awe-inspiring commemorative (“good”) uses of ATC, the conjunction between social media affordances and user practices brings to the discursive forefront exploitative political (“bad”) ATC uses and misuses that contribute to political polarization...
Previous research on the platformization of news has mostly been devoted to considering the effec... more Previous research on the platformization of news has mostly been devoted to considering the effects of social media on the news industry. The current study focuses on Taboola and Outbrain, two leading content recommendation platforms. The companies form “partnerships” with news organizations, through which they take over a designated space on news websites and curate news, sponsored content, and advertisements, creating a blend that—the companies claim— maximizes monetization. We argue that the unique business model and distribution mechanism of these companies has a distinct effect on news sites, their audiences, and ultimately the journalism profession. An empirical analysis of 97,499 recommended content items, scraped from nine Israeli news sites, suggests that the spaces created by these partnerships blur the distinction between editorial and monetization logics. In addition, we find the creation of indirect network effects: while large media groups benefit from the circulation of sponsored content across their websites, smaller publishers pay Taboola and Outbrain as advertisers to drive traffic to their websites. Thus, even though these companies discursively position themselves as "gallants of the open web"—freeing publishers from the grip of walled-garden platforms—they de facto expose the news industry to the influence of the platform economy.
This paper empirically studies the effects of representation choices on neural sentiment analysis... more This paper empirically studies the effects of representation choices on neural sentiment analysis for Modern Hebrew, a morphologically rich language (MRL) for which no sentiment analyzer currently exists. We study two dimensions of representational choices: (i) the granularity of the input signal (token-based vs. morpheme-based), and (ii) the level of encoding of vocabulary items (string-based vs. character-based). We hypothesise that for MRLs, languages where multiple meaning-bearing elements may be carried by a single space-delimited token, these choices will have measurable effects on task perfromance, and that these effects may vary for different architectural designs — fully-connected, convolutional or recurrent. Specifically, we hypothesize that morpheme-based representations will have advantages in terms of their generalization capacity and task accuracy, due to their better OOV coverage. To empirically study these effects, we develop a new sentiment analysis benchmark for He...
The purpose of this chapter is to conceptually unfold the broader meaning of the term ‘digital na... more The purpose of this chapter is to conceptually unfold the broader meaning of the term ‘digital natives’ both by a historical contextualisation of the ‘digital’, as well as by a discussion of the geopolitics of the ‘native’. The terminological analysis, grounded by a historical contextualisation of digital activism and the history of digital technologies in the past decade, serves to argue that in its current form, the term ‘digital natives’ may represent a renewed dedication to the native place in a point in time when previous distinctions between ‘physical’ and ‘digital’ places no longer hold
Many national and international heritage institutes real-ize the importance of archiving the web ... more Many national and international heritage institutes real-ize the importance of archiving the web for future culture heritage. Web archiving is currently performed either by harvesting a national domain, or by crawling a pre-defined list of websites selected by the archiving institution. In either method, crawling results in more information being harvested than just the websites intended for preservation; which could be used to reconstruct impressions of pages that existed on the live web of the crawl date, but would have been lost forever. We present a method to create representations of what we will refer to as a web collection’s aura: the web documents that were not included in the archived collection, but are known to have existed — due to their mentions on pages that were included in the archived web collection. To create representations of these unarchived pages, we exploit the information about the unarchived URLs that can be de-rived from the crawls by combining crawl date d...
In 2003 the Palestinian state received official recognition on the Web before it was established ... more In 2003 the Palestinian state received official recognition on the Web before it was established on the ground. The delegation of the.ps Country code Top level domain (CcTld) to the Palestinian Authority and its inclusion in the UN list of recognized countries and territories created an official Web-space in which a Palestinian state operated side-by-side with other sovereign states. Yet with the rise of Web 2.0 applications, the official representation of the Palestinian state partially disappeared. This study focuses on the shift in the spatial representation of the Palestinian state on the Web, from an officially acknowledged national Web space, followed by its partial disappearance in Web 2.0 spaces, to its reconstruction as a user-generated space. It examines Palestine’s virtual borders on various Web 2.0 mapping platforms, along with the listing (and non-listing) of Palestine as a country in the registration procedure of popular Web 2.0 applications. It shows that on most mapp...
In light of the exponential growth in digital data characterizing the 21st century, future histor... more In light of the exponential growth in digital data characterizing the 21st century, future historians of our time will have to rely on born-digital materials as primary sources for establishing historical facts. Yet born-digital materials challenge historians’ well-established source criticism techniques used for establishing facts based on the authenticity, authorship and authority of documents, for they are ephemeral, immaterial, fragile and easy to manipulate. For example, the content of websites can be easily modified, tweets are frequently deleted, the number of social media comments and likes can be artificially boosted through click farms, and dubious sources spreading misinformation can be disguised as reliable news organizations. With the commercialization of the web, more than ever before, web data is primarily proprietary, and therefore subjected to platforms’ policies and constraints.
This study considers the ways that overt hate speech and covert discriminatory practices circulat... more This study considers the ways that overt hate speech and covert discriminatory practices circulate on Facebook despite its official policy that prohibits hate speech. We argue that hate speech and discriminatory practices are not only explained by users’ motivations and actions, but are also formed by a network of ties between the platform’s policy, its technological affordances, and the communicative acts of its users. Our argument is supported with longitudinal multimodal content and network analyses of data extracted from official Facebook pages of seven extreme-right political parties in Spain between 2009 and 2013. We found that the Spanish extreme-right political parties primarily implicate discrimination, which is then taken up by their followers who use overt hate speech in the comment space.
Following the familiar distinction between software and hardware, this chapter argues that web ar... more Following the familiar distinction between software and hardware, this chapter argues that web archives deserve to be treated as a third category—memoryware: specific forms of preservation techniques which involve both software and hardware, but also crawlers, bots, curators, and users. While historically the term memoryware refers to the art of cementing together bits and pieces of sentimental objects to commemorate loved ones, understanding web archives as a complex socio-technical memoryware moves beyond their perception as bits and pieces of the live Web. Instead, understanding web archives as memoryware hints at the premise of the web’s exceptionalism in media and communication history and calls for revisiting some of the concepts and best practices in web archiving and web archive research that have consolidated over the years. The chapter, therefore, presents new challenges for web archive research by turning a critical eye on web archiving itself and on the specific types of histories that are constructed with web archives.
The article proposes archival thinking as an analytical framework for studying Facebook. Followin... more The article proposes archival thinking as an analytical framework for studying Facebook. Following recent debates on data colonialism, it argues that Facebook dialectically assumes a role of a new archon of public records, while being unarchivable by design. It then puts forward counter-archiving – a practice developed to resist the epistemic hegemony of colonial archives – as a method that allows the critical study of the social media platform, after it had shut down researcher’s access to public data through its application programming interface. After defining and justifying counter-archiving as a method for studying datafied platforms, two counter-archives are presented as proof of concept. The article concludes by discussing the shifting boundaries between the archivist, the activist and the scholar, as the imperative of research methods after datafication.
The field of web archiving is at a turning point. In the early years of web archiving, the single... more The field of web archiving is at a turning point. In the early years of web archiving, the single URL has been the dominant unit for preservation and access. Access tools such as the Internet Archive's Wayback Machine reflect this notion as they allowed consultation, or browsing, of one URL at a time. In recent years, however, the single URL approach to accessing web archives is being gradually replaced by search interfaces. This paper addresses the theoretical and methodological implications of the transition to search on web archive research. It introduces ‘search as research’ methods, practices already applied in studies of the live web, which can be repurposed and implemented for critically studying archived web data. Such methods open up a variety of analytical practices that were so far precluded by the single URL entry point to the web archive, such as the re-assemblage of existing collections around a theme or an event, the study of archival artefacts and scaling the uni...
The purpose of this chapter is to conceptually unfold the broader meaning of the term ‘digital na... more The purpose of this chapter is to conceptually unfold the broader meaning of the term ‘digital natives’ both by a historical contextualisation of the ‘digital’, as well as by a discussion of the geopolitics of the ‘native’. The terminological analysis, grounded by a historical contextualisation of digital activism and the history of digital technologies in the past decade, serves to argue that in its current form, the term ‘digital natives’ may represent a renewed dedication to the native place in a point in time when previous distinctions between ‘physical’ and ‘digital’ places no longer hold
This paper empirically studies the effects of representation choices on neural sentiment analysis... more This paper empirically studies the effects of representation choices on neural sentiment analysis for Modern Hebrew, a morphologically rich language (MRL) for which no sentiment analyzer currently exists. We study two dimensions of representational choices: (i) the granularity of the input signal (token-based vs. morpheme-based), and (ii) the level of encoding of vocabulary items (string-based vs. character-based). We hypothesise that for MRLs, languages where multiple meaning-bearing elements may be carried by a single space-delimited token, these choices will have measurable effects on task perfromance, and that these effects may vary for different architectural designs: fully-connected, convolutional or recurrent. Specifically, we hypothesize that morpheme-based representations will have advantages in terms of their generalization capacity and task accuracy, due to their better OOV coverage. To empirically study these effects, we develop a new sentiment analysis benchmark for Hebrew, based on 12K social media comments, and provide two instances thereof: token-based and morpheme-based. Our experiments show that the effect of representational choices vary with architectural types. While fully-connected and convolutional networks slightly prefer token-based settings, RNNs benefit from a morpheme-based representation, in accord with the hypothesis that explicit morphological information may help generalize. Our endeavor also delivers the first state-of-the-art broad-coverage sentiment analyzer for Hebrew, with over 89% accuracy, alongside an established benchmark to further study the effects of linguistic representation choices on neural networks' task performance.
In Proceedings of the 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York NY, 2014.
Many national and international heritage institutes realize the im- portance of archiving the web... more Many national and international heritage institutes realize the im- portance of archiving the web for future culture heritage. Web archiving is currently performed either by harvesting a national do- main, or by crawling a pre-defined list of websites selected by the archiving institution. In either method, crawling results in more information being harvested than just the websites intended for preservation; which could be used to reconstruct impressions of pages that existed on the live web of the crawl date, but would have been lost forever. We present a method to create representations of what we will refer to as a web collection’s aura: the web documents that were not included in the archived collection, but are known to have existed — due to their mentions on pages that were included in the archived web collection. To create representations of these unarchived pages, we exploit the information about the unarchived URLs that can be derived from the crawls by combining crawl date distribution, anchor text and link structure. We illustrate empiri- cally that the size of the aura can be substantial: in 2012, the Dutch Web archive contained 12.3M unique pages, while we uncover ref- erences to 11.9M additional (unarchived) pages.
"WebSci '13 Proceedings of the 5th Annual ACM Web Science Conference. Pages 182-190
Web archives provide access to snapshots of the Web of the past, and could be valuable for resear... more Web archives provide access to snapshots of the Web of the past, and could be valuable for research purposes. However, access to these archives is often limited, both in terms of data availability, and interfaces to this data. This paper explores new methods to overcome these limitations. It presents "sprint-methods" for performing research using an archived collection of the Dutch news aggregator Website Nu.nl, and for developing and adapting a search system and interface to this data. The work aims to contribute to research in the humanities and social sciences, in particular New Media research employing digital methods to study the Web of the past. Secondly, this work aims to contribute to Computer Science, in the development of novel access tools for Web archives, that facilitate research.
Abstract This study traces the emergence of national Web-spaces in unstable territories. In parti... more Abstract This study traces the emergence of national Web-spaces in unstable territories. In particular, it focuses on the history of the Palestinian Web, which gradually transformed from Websites hosted under generic domains (.org, .net, .edu), via symbolic hosting of official Websites under the .int domain, and finally to the official delegation of the national .ps domain. The creation of the Palestinian digital space, with its defined sovereign borders, stands in contrast to the current unsettled borders of the Palestinian Territory. While prevalent accounts of the ‘nationalization’ of the Palestinian Web-space are ethnographic in nature, this paper traces the history of the Palestinian Web-space and the shaping of its digital borders by turning to the Web itself. It maps the emergence of a national digital space into already existing national and international Web-spaces by using digital methods that reconstruct and visualize archived Web data and their evolution over time. Such digital history-telling aims at revealing the unique characteristics of the Web in shaping digital borders, as well as the resonance of digital borders with physical territories, and their related political and diplomatic processes.
Dr. Anat Ben-David explores ways social media have become tools in and extensions of the Israeli-... more Dr. Anat Ben-David explores ways social media have become tools in and extensions of the Israeli-Palestinian conflict.
Uploads
Papers by Anat Ben-David