The echoes of social media friends’ travels: social influence and venue selection in a hyperconnected world


To address our research questions, we utilize a real-world, large-scale behavioral dataset to investigate how social connections impact users’ travel decisions. The availability of objective digital traces supports a quantitative observational research approach, which involves analyzing naturally occurring behavioral data and applying statistical analysis (Lazer et al., 2009; Creswell & Creswell, 2017). This method enables the examination of user behaviors based on real-world activities and allows for the analysis of such activities across large populations, leading to more robust generalizations than findings derived from small-scale surveys or interviews. We analyzed patterns of check-in overlap between users and their social media friends to show the influence of friends on individuals in terms of travel choices. To support such analysis, we developed an analytics framework (illustrated in Fig. 1) to guide a structured analysis, ensuring that the analysis aligns closely with our research questions and allows the identification of user patterns in a scalable and replicable manner. Specifically, this framework investigates how friendship type, distance from home, and venue categories affect travel behavior. We used a well-established, publicly available, and anonymized global-scale dataset of Foursquare check-ins spanning from April 2012 to January 2014, collected by Yang et al. (2021). The dataset’s rich and global nature has been validated in prior studies for location-based analyses (e.g., Amaro et al., 2016), which supports its reliability and suitability for our research. By identifying Foursquare-tagged tweets from Twitter Public Streams, this data includes more than 112,000 users from 254 countries, with 22 million check-ins at 3.9 million venues, alongside user social networks on Twitter. The full description of the data collection process is detailed in Yang et al. (2021). In the original data collection, the authors focused on active users, defined as those with at least one check-in per month. We did not apply additional filters to the dataset.

Fig. 1
figure 1

Since the dataset is publicly available and anonymized, accessible at https://data.4tu.nl/articles/_/15112308/1a, it protects user privacy while supporting open access. According to our institution’s Research Ethics Board (REB) guidelines, no further approvals were required for secondary analysis of this type of data. These additions reinforce our commitment to ethical standards and responsible data-handling practices

Then, we find the home locations of all individuals to analyze how distance from home affects their travel choices based on the influence of friends. Since the home location of users is not always available in the dataset, we infer these using geohashing, a hierarchical spatial data structure that divides the earth into grids. Each grid represents a geographic location and is represented using a short string of characters. This technique, validated in previous research (Cho, 2011), inferred home locations by discretizing the world into grid cells and defining the home location as the cell with the highest number of user check-ins (Scellato, 2011), which achieved an accuracy rate of ~85% through manual validation. This accuracy means that in 85% of cases, the inferred home location matched the expected location for a subset of users. This high accuracy supports the validity of our home-location inference approach. In this work, we adopt a similar approach and determine a user’s home location by identifying the grid with the highest number of check-ins as their home grid and calculating the average latitude and longitude of all check-ins within that grid. We use a grid size of 39.1 km × 19.5 km, which balances accuracy and practicality. This grid size is fine enough to pinpoint a user’s general area of activity (e.g., a city or district) while minimizing overlap between distinct regions. It provides sufficient granularity to assign users to a specific “home” area based on check-in densities without being overly precise, which might risk misclassifying movements within a smaller area as shifts between grids. A larger grid (the next larger grid size in geohashing is 156 km × 156 km) might lack the granularity needed to capture meaningful distinctions between nearby locations, while a smaller grid (the next smaller grid size in geohashing is 4.89 km × 4.89 km) could lead to fragmented data.

The dataset also incorporates social connections among users derived from Twitter data to represent social media friends among users. In Yang et al. (2021), social media friends within the dataset are defined as a connection between two users who mutually follow each other on social media platforms. Specifically, two users are considered friends if they both follow each other and thus are connected in social networks (i.e., if only one user follows the other one and not vice versa, this does not count as a friendship).

This definition of social media friends is commonly used in social network analysis and reflects a strong, mutual connection, which is often considered a reliable indicator of real-world friendships in digital contexts. Previous research (e.g., Scellato et al., 2011; Cho et al., 2011) has shown that reciprocal ties are robust predictors of stronger interpersonal relationships. Thus, this definition of friendship is consistent with established methodologies in the field and supports the reliability of the data for our analysis. While we acknowledge that reciprocal following on Twitter does not necessarily equate to close personal friendships, it serves as a reliable proxy for identifying users who are more likely to engage with each other.

To validate if friends influence each other in travel choices, we create a set of simulated friends for each user and compare the results with social media friends. By generating three types of simulated friendships (fully random, geohash random, and home country random), we aim to identify the unique impact of social media friendships while accounting for geographical and cultural factors, distinguishing between social influence from mutual connections and general exposure to travel content.

To generate simulated friends, we follow three strategies:

  1. 1.

    Fully random: In this strategy, for each user u who has a friend v, we randomly select another user from the entire dataset (other than v) as u’s simulated friend. These random friends can come from any location or social group within the Twitter network.

  2. 2.

    Geohash random: In this strategy, for each user u who has a friend v, we randomly select another user who is from the same geographic grid area (Geohash grid) as real friend v to be u’s simulated friend. This allows us to assess the importance of social media friendships compared to the influence of being geographically close.

  3. 3.

    Home country random: In this strategy, for each user u who has a friend v, we randomly select another user who is from the same home country as real friend v to be u’s simulated friend. Similar to Geohash random, this helps us explore whether social connections within a shared national context influence travel choices more than random connections.

After creating simulated friends, we ran our analysis to measure how friendship connections influence travel choices by examining the overlap in travel visits between users and their social media friends. To qualify as a travel visit, a check-in must occur at least 50 km away from the user’s home location. Check-ins within this distance were excluded to ensure the analysis focuses on meaningful travel behavior rather than local activities, as travel activities near the user’s home location typically represent routine movements. For all users, we compared the venues they visited with those visited by their friends and calculated the average percentage of overlap. We conducted this analysis separately for social media friends and each type of simulated friend (fully random, geohash random, and home country random). By comparing the overlap percentages for real and simulated friends, we measure social influence by assessing how closely a traveler’s venue choices align with those of their connections. The analysis focused on key variables, including the type of friendship (real vs. simulated), the distance of venues from the user’s inferred home location, and the categories of venues visited (e.g., nightlife, shops, events). This structured approach captured the role of geographic proximity, contextual factors, and the nature of connections in shaping travel behavior, directly addressing the study’s research questions. Specifically, (1) We assessed how the proximity of venues to the traveler’s inferred home location influences the overlap with their friends’ check-ins. This dimension addresses RQ1 by examining whether social influence is stronger for venues closer to home or further away, reflecting patterns of localized versus extended social impact. (2) We investigated whether certain types of venues (e.g., nightlife, shops, events) exhibit stronger overlap patterns, addressing RQ2. This analysis identifies which types of activities or locations are more likely to be influenced by social connections.

We use statistical Mann-Whitney U test to see if the influence of social media friendship (i.e., venue overlap between the traveler and her social media friendship) significantly differs from the influence of simulated friendship (i.e., venue overlap between the traveler and her simulated friendship).

The framework demonstrated in Fig. 1 is highly replicable due to its use of publicly available data, standardized methodologies, and adaptable techniques. Geohashing for home location inference and simulated friend strategies (fully random, geohash random, and home country random) are flexible for application to various datasets, such as Instagram or Google Maps data. The study ensures reliability through consistent use of the venue overlap metric and validated methods, while its validity is reinforced by isolating the influence of social media friendships using simulated controls. The global scale and diversity of the dataset enhance external validity, making the findings generalizable across different social and geographic contexts.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *