Keywords

1 Introduction

French sociologist Michel Maffesoli coined the term “urban tribes” more than 30 years ago in 1985 [1]. In recent studies of social computing, the concept of urban tribes has evolved from simple classifications of subcultures to inform policy and decision making, to complex metrics for automating the extraction and analysis of urban subcultures [2, 3].

The goal of this work is to describe social subcultures in Saudi Arabia from urban analytics of check-ins and social media sharing of locations. A remarkable amount of social media in the context of Saudi Arabia takes the form of geo-tagged check-ins, images or videos, and is a largely untapped resource for understanding emergent phenomena in social behavior. Insights into mobility, consumer behavior, communication activity can be acquired from analyzing these vast amounts of information.

The applied context of this research is examining the urban tribes’ classification in the scope of decision making and informing policy for urban planning. Situated in a socio-technical context, urban analytics, for the purpose of city planning requires a close dialogue between social, engineering and design-oriented fields of research as well as their methods. We are particularly interested to look closely at the ways in which points of interest in the urban fabric of the city are visited. Different ‘urban tribes’ may occupy space (i.e. places in Riyadh) in different ways, unbeknownst to each other. Indicators of social subcultures include the density of activity in the location, topics discussed around points of interests, and demographics of frequent visitors to points of interest.

Problem Definition.

This work is focused on the analysis of geo-tagged tweets, a common type of activity in the social media. We focus on the problem of how to extract behavior patterns that facilitate meaningful comparison, i.e., a metric that captures tribal characteristics. To this end, we make use of recent advances in social network analytics SNA, and we show that it is possible to extract social semantic meaning from geo-tagged content. Our urban analytics approach is rooted in synthetic information and data analytics in urban contexts.

Contributions.

Recent advances in social computing have created new opportunities for collecting, integrating, analyzing and accessing information related to coupled urban socio-technical systems. Innovative systems designed for urban analytics that leverage this new capability have recently been recognized as useful. The first contribution is a framework to learn and recognize types of social categories in an urban context from social media. The second contribution is an insight into the spatial distribution of urban tribes which can consequently be a source for recommendations for introducing integrated transit systems, resources allocation and development of infrastructure. Although the applied context is a city in Saudi Arabia, we foresee the contribution to be generalized to some extent to the global scope in similar urban mobility contexts.

This paper is structured as follows. Section 2 provides an overview of background research on urban analytics and the emergent research on urban tribes. Section 3 describes the different approaches we used to conduct the analysis. An experiment is presented and discussed in Sect. 4. We conclude in Sect. 5 with key insights and future directions for research.

2 Background

While cities are spatially structured, they are considered intricate socio-economic entities that depend for their existence on their links with the natural environment. Several disciplines are attempting to tackle the problem of understanding the complex systems of cities and the underlying socio-economic ecosystem theories of their constituent elements (people, places, and environment). These disciplines range from economics, physics, and social sciences to applied domains of engineering, computing and urban studies. Notably, the dynamic and complex systems approach to studying urban spatial data (location-based data) has only been possible with computer-based modeling.

Geo-social media data is essentially heterogeneous. It is a mixture of geographical information (location), mobility footprints (check-in data), visual snapshots (images or videos), and social interactions (social conversation around the post or activity). Research on urban analytics for the purpose of identifying subcultures and addressing the intricacies and coupling of socio and technical systems have provided an insight into the key metrics for urban tribe classification [4].

Observing mobility data bring up important insights regarding people’s behavior and trends. These insights could be used by planners and business owners to study the area and to help them make an informed decision regarding the location of their businesses and offices. Location data provides useful insights in an aggregate level without being constrained to an individual level, which can be aligned with maintaining the privacy of the users of mobile devices. One example, to illustrate the use of location data to elicit urban insights, is Sense Network; an analytical platform by a company based in New York. The software analyzes location data and presents recommendations to shop owners and business stakeholders on several topics like for example the location of new branches of a shop [5].

Our project is part of a collaborative effort to understand how open data and social media can be used to provide a fine-grained and more holistic understanding of urban groups by using social media and locality information. In the following sections, we provide an overview of projects that have provided methods for an augmented analysis of urban tribes by using algorithmic approaches and social media data. Our methodology not only differs in the data computation, but also in the clustering through a spatial network of spaces.

2.1 Hoodsquare

The Hoodsquare project developed an algorithm for extracting neighborhood boundaries in cities, including New York City based on social media data [6]. The algorithm works by using data related to foursquare venue types, spatial distribution of local and tourists in the city, and the timestamps of check-ins in venues. Hoodsquare built a tool that can be used to “recommend geographic areas that are small in size and that maintain a balanced trade-off between prediction accuracy and geographic precision” [6]. The ranking attempts to go beyond the neighborhoods as administrative or politically defined units in order to unearth neighborhood geographies that are much more in accord to the actual behavior of people in the streets, and with the way they occupy the city. Their exercise provides a predefined spatial clustering within neighborhoods [6].

2.2 City Sense

CitySense is a discovery tool for temporal and spatial hot-spots of activity in the city that has been implemented as a mobile application [7]. It allows people and businesses to detect how the city is inhabited in real time, and take decisions on whether to go out, where to go, and where to locate franchises. Moreover, CitySense also allows for the exploration of how people is on the move in the city, by evaluating where they come from and where they go. The CitySense algorithm, is focused on discovering the circadian rhythms of the city [7].

2.3 Livehoods

The Livehoods project is based on the development of an algorithm that uses foursquare data in order to produce neighborhoods classifications based on spatial and social proximity of venues in New York City [3]. Livelihood addresses the need to provide a more automatized and data-based approach to the characterization and understanding of neighborhoods to aid both urban computing and city planning projects. The Livehoods project has developed a compelling method to unearth a classification of New York City neighborhoods by taking advantage of massive data sets and unsupervised learning approaches. A significant result of this approach is the Livehood’s definition of a neighborhood as “an urban area is defined not just by the type of places found there, but also by the people that choose to make that area part of their daily life” [3].

3 Methods

This section describes the collection and setup of the dataset used in our study. We focused on analyzing Twitter and Point of Interests (POI) datasets. At the time of conducting the fieldwork, there was no public dataset to provide insights into social subcultures in Saudi Arabia. Therefore, we created a dataset by scraping data from Twitter search APIs. The data collected was geo-tagged activity in Riyadh city for the months of October, November and December, 2015 with a total of 125 thousand geo-tagged tweets. The POI dataset was provided by Arriyadh Development Authority that has a comprehensive list of amenities in the city of Riyadh.

Social media applications facilitate spatially marking the activities of Riyadh residents, creating rich databases that hold digital imprints of their interactions. Although these datasets only represent the portion of the population who are active on social media, insights obtained from trends in interactions are often reflections of behavior patterns in the urban community. In our analysis, we observe the density of activity, the variation across space, and the cultural cues with regard to the interactions, the perceived narrative, and the place.

We follow a pipeline of three steps where in the first step we detect the type of activity based on the main topic discussed in each tweet. In the second step, we breakdown the city into clusters based on Traffic Analysis Zones (TAZ) which capture the mobility dynamics, based on the origin and destination of trips throughout the day. In the third step, we look at patterns of these clusters with respect to their spatial distribution and correlation with surrounding POIs.

3.1 Detecting Activity Type

We start by creating a list of eight categories of tweets derived from previous studies and inspired from anticipated popular categories in the Saudi context [8]. For each category, we define a set of four to ten keywords that are relevant to the topic of the category and are expected to occur frequently in Twitter. The keywords were selected based on our knowledge of the Arabic language and the local context in the social media. We chose keywords with direct-mapping in terms of their association with a certain category. An example of some of the categories and keywords used is show in Table 1.

Table 1. Sample list of Arabic keywords used with their translation in English

We then assign multiple scores to each tweet in the dataset where each score resembles the number of keywords for a certain category in a single tweet. The category of the tweet will be the one with the highest score. The tweets are then aggregated on the urban level to generate clusters of human activity in the city. Figure 1 shows the architecture of the framework guiding our analysis in this approach.

Fig. 1.
figure 1

Pipeline of steps to detect categories; showing complete list of categories

3.2 Clustering

To visualize urban tribes from tweets and correlate it to Points of Interests (POI) types, we used the Arriyadh development Authority (ADA) POI data and Riyadh Traffic Analysis Zones (TAZ) data derived from the ADA origin destination dataset as source of input. A TAZ is a geographical unit that is used in transportation modeling. The ADA data highlights all landmarks and amenities which are around 12000 points around the city. The POIs are categorized into six types including: restaurants, hotels and apartments, shops and services, Community services, health and education, and Tourism. Figure 2 shows the Riyadh TAZ map and POI distribution across the city.

Fig. 2.
figure 2

(Left) Riyadh TAZ areas, (Right) Riyadh POI points.

By joining TAZ data and POI data we identified the different types of amenities that resides within each TAZ. We then split TAZs into six clusters, each of which highlights TAZ areas that contain a specific amenity type. It should be noted that a particular TAZ area can exist in more than one cluster based on whether a POI of a specific type exists or not.

3.3 Analysis

In our analysis, we examined patterns that emerge when correlating the most dominant tweet category in each TAZ with two main dimensions. First, we look at the spatial distribution of different categories in the city with regards to the location and the size of each TAZ. Second, we look at correlation of the category of tweets in each TAZ with the type of POIs in that TAZ. The overall goal of this analysis is to study the influence of urban features on the social dynamics of people in the city [9].

4 Exploratory Study

We applied our tweet categorization method explained earlier on the geo-tagged tweets collected in Riyadh to sense spaces as done in earlier studies [1014]. Figure 3 shows the different subcultures within the city of Riyadh when overlaying tweets categories, extracted spatially over the TAZ areas, where clusters are color-coded by the category of tweets. The complete list of tweet-categories was described in the framework depicted in Fig. 1.

Fig. 3.
figure 3

TAZs in Riyadh colored by most dominant tweet category (Color figure online)

We look closely at Food, one of the tweet-categories, to examine relationships with related venues in the city (i.e. restaurants). Figure 4 shows the distribution of Food tweets around the city in contrast with TAZs that have restaurants. The levels of the pink color represent tweets about food with varying density (i.e. number of tweets) and the grey color represent all other tweets. We can see that dense food tweets (i.e. dark red) are found in locations that have restaurants, which are circled with red in Fig. 4.

Fig. 4.
figure 4

(Left) Food Tweets (Pink-scale) and all other tweets (Grey-scale) in the city. (Right) Food Tweets (Pink-scale) and all other tweets (Grey-scale) in Restaurant areas (Color figure online).

Overlaying tweet-categories, extracted spatially over the clusters identified, and visualizing them on a map help in understanding the correlation between both POI and tweets-categories as we saw earlier. Also, we look at the number of tweets in each category in the six types of clusters we have. Table 2 shows the summary of these counts. It should be noted that each count is normalized by the total number of tweets in that category to address the issue of favoring categories that have more tweets.

Table 2. Percentage of tweet categories in each TAZ cluster normalized by total number of tweets in each category.

The findings show categories of tweets are distributed among the TAZ clusters of the six types of POIs with no clear patterns. Notably, the numbers are influenced by the selection of keywords, the corresponding categories defined, and the types of POIs. Also, the analysis was based on the assumption that the list of categories that were defined in the framework are aligned with the list of types of POIs in the dataset. Further investigation of the mapping in the POI dataset is planned. A larger dataset along with a more comprehensive list of keywords is sought to address the sensitivity and specificity of the tweet categorization conducted in this round of analysis.

5 Conclusion

In this paper we examined the question of what can be determined about the social categorization of people from their social media activity. The model we propose captured social dynamics within mobility patterns from activities extracted from social media. A number of limitations were noted in the study. First, a relatively smaller subset of Saudis share their location when they post in twitter, when compared to global trends in posting on twitter. This was evident in our dataset where the most popular hashtags were not in Arabic, an indicator that they were posted by expatriates rather than natives of the urban context of analysis, which limits the activity that can be collected about the city’s local inhabitants. Second, the keywords and categories that were selected impact the effectiveness of categorization and clustering of twitter activity. Nevertheless, the potential in this methodology in gaining insights into the relationships between activity in social media and urban features in the city is evident. Tweet-categories with direct relationships to venues in the city, such as: food tweets with restaurants, sport tweets with stadiums and other sport venues, were used to validate the findings. The detection of less obvious relationships, such as the places that attract social, political or religious conversations remains a challenge. Identifying these relationships can be used to evaluate the influence of newly introduced venues to the social conversation in a given location.

Tweet categorization will be further improved in future work by redefining the list of keywords and categories selected. Also, other spatial clustering techniques will be examined to find unrevealed patterns in the data with respect to spatial elements in the city. In addition, a temporal dimension will be added to see differences in activity throughout the day and during different times of the week.