Skip to main content
  • Original article
  • Open access
  • Published:

Spatial analysis of users-generated ratings of yelp venues

Abstract

Background

With popular location-based services on smart phones, users are willing to leave comments on the business venues (e.g., restaurants, shops, bars, etc.) that they visited. Reviews of users on Yelp venues somewhat indicate satisfaction of customers with services of those venues. Those reviews could be used to reflect service quality of business venues. Geo-localized venues could tell researchers where and how good a business venue is.

Methods

In terms of a spatial analysis of venues’ ratings, this paper explored geographic patterns of ratings of Yelp business venues in a city-wide region. Specifically, we identified clusters of high and low ratings and explored spatial patterns of clusters of high ratings for different venue categories (i.e., restaurants, fast foods and bars).

Results

In this study, we undertook an analysis of Yelp ratings in Phoenix, USA. The empirical results indicate that spatial clusters of high ratings tend to be differently distributed between different categories of Yelp venues. More specifically, bars within or near the city centre are likely to get high ratings. Moreover, although hot spots and cold spots of restaurants and fast foods both tend to be randomly distributed over space, spatial distribution of restaurants’ ratings tends to be more similar to that of bars’ ratings.

Conclusion

Mapping Yelp’s business venues with ratings provides a new way to understand spatial patterns of service quality of business or public venues at a large spatial scale.

Background

A growing popularity of smart phones and apps promotes development of location-based social networks (LBSNs). LBSNs are combination of location-based service (LBS) and social media. One the one hand, users can geo-reference and time-stamp all their information automatically or manually, including texts, photos, ‘check-ins’ and ‘likes’ in terms of GPS-enabled devices such as smart phones and tablets. One the other hand, users can share their information with others or see others’ information, including the information geo-referenced in terms of online social networks. Lots of researches had been done to demonstrate how to use LBSNs data in different fields: mobility, urban planning, place recommendation and so forth (e.g., [2,27,, 11, 18, 17, 23, 2628]).

In very recent years, Yelp is attracting researchers who are interested in service quality of business venues. One the one hand, compared to other LBSNs like Foursquare, Google Latitude and Facebook Places, Yelp focuses on crowdsourced reviews on business venues (restaurants, shops, bars, etc.). Reviews of Yelp users on business venues reflect satisfaction of customers with services in those venues. Although other LBSNs also allow users to comment on business venues, they provide a simpler way to users than Yelp. Normally, in those LBSNs users can only select ‘like’ or ‘dislike’ to express their feedbacks after visiting business venues. Unlike the other LBSNs, Yelp provides a professional way to users. Specifically, Yelp offers a star-based rating system. There are 5 star ratings from 1-Star to 5-Star with increase in ranking. 5-Star and 1-Star represents the highest and lowest ranking. Yelp users can rank business venues by giving a star score (from 1 to 5). The star score mechanism used in Yelp is similar to the one widely used in ranking hotel. Basically, a high star score made by a user indicates a high ranking of venue, meaning the user is very satisfied with the venue; whilst a low star score means the user is not. Normally, there are a number of persons who gave different rating values to the same business venue. By using a voting mechanism, we could use the average star rating to rank a business venue. Therefore, Yelp offers a potential approach to measuring service quality of business venues at a large geographical scale.

On the other hand, Yelp could collect data to reflect service quality of business venues in an efficient and low cost way. Traditionally, survey methods are widely used to collect data for measuring service quality of business venues. In terms of designing a proper questionnaire, researchers can get less biased observations and more detailed information reflecting different aspects of service quality, e.g., food, facilities, tidiness and so forth. However, it is time-consuming to conduct a survey, and such a survey tends to cover a relatively small area region rather than an entire city. Moreover, some travel recommendation websites, e.g., TripAdvisor, also offer a platform where users can rank business venues by a level-based rating similar to the star-based rating. However, TripAdvisor focuses more on tourism related business venues, such as hotels and restaurants; while Yelp covers more categories of business venues, including ones related to public services, e.g., Hospital Care (see [24]).

In summary, Yelp offers a good opportunity to researchers who are interested in measuring and analyzing service quality of business venues with a high spatial resolution. Geo-localized venues could tell researchers where and how good a business venue (e.g, restaurant, bar, retail, etc.) is. In this regard, some researchers have tried to use Yelp reviews to analyze ratings of business venues from different perspectives, including influence of consumer reviews on purchase decisions (e.g., [14, 20]), fraud and credibility of reviews (e.g., [12, 15]), prediction of venue ratings (see [6, 7, 9, 10, 16, 29]), venue recommendation [4, 13, 22, 30]. Although more and more researchers are interested in Yelp data, exploiting geo-spatial information of Yelp data is missing in most of the aforementioned studies. In this paper, by exploiting geo-spatial information in Yelp data, we focus on spatial patterns of business venues with different levels of ratings. In this paper, we chosen a spatially constrained clustering method named ‘AMOEBA’ to identify clusters of high and low ratings. Basically, there are two reasons why we choose the AMOEBA algorithm in this study. First, compared to ordinary clustering methods (e.g., k-means and DBSCAN), spatially constrained clustering methods are less sensitive to spatial distance and are more suitable for spatially clustering applications. Second, the AMOEBA algorithm support identifying irregularly shaped spatial clusters; while most existing cluster identification techniques are dedicated to identifying circular spatial clusters. Those cluster identification algorithms make the implicit assumption that clusters are circular and compact regions [3]. Assuming that clusters are circular may lead to incorrect cluster size and false positive determinations [8]. Moreover, we compare spatial patterns of venues’ ratings by venue category in terms of similarity in spatial distribution of average rating of venues. Additionally, we compare the results based on Yelp restaurants with the results based on Foursquare restaurants to discuss the reliability of Yelp data.

The remainder of this paper is organized as follows. Methods section introduces the methods used in this study, while later Results and discussions section introduces the data and carries out an empirical analysis, and finally, Conclusions and future works section presents the conclusion and offers suggestions for future work.

Methods

In this section, the method used for spatial analysis of venues’ ratings is presented. First, to understand geographical patterns of venues’ ratings for different venue categories, we used a “regionalization” method to identify clusters of high or low value star ratings and explored spatial patterns of clusters of high ratings. “Regionalization” is a classification method dedicated to grouping areas based on attribute and spatial contiguity. In this sub section Neighboring and spatial contiguity and Cluster identification and regionalization method, we introduce the “regionalization” method used in this paper and how to solve the contiguity issues in reflecting neighboring relationships of venues (points). Second, we compare spatial patterns of venues’ ratings by venue type in terms of correlation analysis. Sub section Similarity in spatial patterns of rating levels will introduce how we investigate similarity in spatial patterns of venues’ ratings.

Neighboring and spatial contiguity

First of all, to run AMOEBA as a “regionalization” as method, the spatial units should be spatially contiguous. However, in this study, venues are represented by points as spatial units. To abridge this gap, we generated Voronoi diagrams (polygons) for venues (points). Afterwards, we created a ‘spatially contiguity’ matrix for the Voronoi polygons, which is able to reflect neighboring relationships of venues (points). Specifically, each venue has a unique Voronoi polygon. If two Voronoi polygons are adjacent to each other, their corresponding venues (points) are considered to be ‘spatially contiguous’. As a consequence, the corresponding venues (points) become ‘neighbors’.

Cluster identification and regionalization method

In this paper, the improved AMOEBA algorithm developed by Duque et al. [3] was used to identify clusters of high ratings or low ratings. This algorithm is applicable to classification of large number of areas and identification of irregularly shaped clusters. The original AMOEBA, A Multidirectional Optimum Ecotope-Based Algorithm, was devised by Aldstadt and Getis [1]. AMOEBA embeds a local spatial autocorrelation statistic in an iterative procedure in order to identify spatial clusters (ecotopes) of related spatial units. In brief, this algorithm starts with an initial area to which neighboring areas are iteratively attached until the addition of any neighboring area fails to increase the magnitude of the local \( {G}_i^{\ast } \) of Getis and Ord [5] and Ord and Getis [19]. The resulting region is considered an ecotope. This procedure is executed for all areas, and final ecotopes are defined after resolving overlaps and asserting nonrandomness [3].

Duque et al. [3] developed an alternative formulation that significantly reduces computational time without losing optimality. The main characteristic of their approach is that they take advantage of some properties of both the empirical distribution of the variable and the formulation of the \( {G}_i^{\ast } \) statistic to guide the algorithm toward an optimal solution, avoiding the need for combinatorial evaluations of the solution space, which are exceedingly costly from a computational perspective [3].

Here we briefly introduce the improved AMOEBA algorithm based on Duque et al. [3]. Essentially, a region or ecotope is a geographically linked group of areas. A region thus can be defined as a spatially contiguous set of areas. The value of the \( {G}_i^{\ast } \) statistic is used to measures the level of clustering of an attribute x around an area. In the improved AMOEBA algorithm, Duque et al. [3] rewrite the formulation of \( {G}_i^{\ast } \). Specifically, suppose we run AMOEBA on a study region with N areas and an attribute x with elements x i , indicating the value of x at area i. Let us denote this set of areas as M, and \( \overline{x} \) and S as the mean and the standard deviation of the attribute x. Moreover, let R be a sub region of M with n areas. Duque et al. [3] rewrite the formulation of \( {G}_i^{\ast } \) as follows:

$$ {G}_R^{\ast }=\frac{\sum_{i\in R}{x}_i-n\overline{x}}{S\sqrt{\frac{Nn-{n}^2}{N-1}}} $$
(1)

Basically, \( {G}_R^{\ast } \) depends on the areas that are in the region R and the parameters N, \( \overline{x} \) and S that are obtained from the areas in M.

A positive (negative) and statistically significant value of \( {G}_i^{\ast } \) statistic indicates the presence of a cluster of high (low) values of attribute x around area i. Thus, AMOEBA identifies high-valued, or low-valued, ecotopes by looking for subsets of geographically connected areas with a high absolute value of the \( {G}_i^{\ast } \) statistic.

The algorithm starts by taking an area i and computing its \( {G}_i^{\ast } \) value. When performed for a single unit, this amounts to calculating the standard score of the value for unit i. A positive (negative) value of the statistic indicates that the value of the attribute at area i is greater (lower) than the mean. Next, Duque et al. [3] take a constructive approach where the areas are sorted such that those that contribute most to the growth in absolute value of the statistic come first; then, they are added one by one until no further improvement is made upon the statistic. This iterative process of identifying sets of neighboring areas that maximize the value of \( {G}_i^{\ast } \) is repeated until it is not possible to increase the absolute value of the \( {G}_i^{\ast } \) statistic by addition of a set of contiguous units.

Moreover, a Monte Carlo-type permutation test is performed to calculate the statistical significance of each ecotope. This test performs a large number of random spatial permutations for the attribute x and records the times that the sum of the attribute values in the ecotope is larger than the sum of the values in the original ecotope. The p-value for the ecotopes is then calculated as the ratio between this number plus one and the total number of permutations plus one. Those ecotopes with p-values below some predesignated level of significance are considered as true clusters.

Similarity in spatial patterns of rating levels

Moreover, we compare spatial patterns of venues’ ratings by venue category in terms of correlation analysis. Aggregate venues to grids (areas) and calculate the average rating for venues in each grid. Suppose i is a grid (area), the average ratings for restaurants, fast foods and bars in i are computed as

$$ Ave\_ res\_ star(i)=\sum_{j\in {N}_i^{Res}}\ star(j) $$
(2)
$$ Ave\_ fas\_ star(i)=\sum_{j\in {N}_i^{Fas}}\ star(j) $$
(3)
$$ Ave\_ bar\_ star(i)=\sum_{j\in {N}_i^{Bar}}\ star(j) $$
(4)

Where star(j) is the rating of venue j. \( {N}_i^{Res} \), \( {N}_i^{Fas} \), and \( {N}_i^{Bar} \) represent the sets of restaurants, fast foods, and bars that are located within the grid i.

After that, we look at correlations of average rating between different venue categories at the grid level. As some grids might have few venues, we removed such grids before the correlation analysis to reduce effects of biased issues.

Results and discussions

This section demonstrates the empirical results of spatial analysis of Yelp venues’ ratings in the study region and makes discussions about the results.

Data

Yelp share a dataset with researchers. The dataset covers 10 cities in Europe and North America. The dataset is composed of 6 files, including business, review, user, check-in, tip and photo. The business file contains the yelp’s business venues (points-of-interest, POIs) with attributes, including ‘type’, ‘business_id’, ‘name’, ‘category’, ‘address’, ‘longitude’, ‘latitude’, ‘star (average star rating)’, etc. Here ‘stars’ is the average star rating of the venue based on crowdsourcing. For instance, there might be more than 100 users who ranked the same venue by giving a star-based rating. Different persons tend to give different star-based rating to the same venue. In this case, average star rating is used to represent crowdsourced rating to avoid biased reviews. In crowdsourcing science, normally, the more users are involved in ranking a venue, the more reliable the average star rating of the venue tends to be.

This study chooses Phoenix, USA as study city since the yelp venues in Phoenix is the most densely populated amongst these 10 cities. There are 32,616 business venues within and around Phoenix. In this paper, we selected three popular business venue categories (i.e., restaurant, fast food and bar) as the study case.

First of all, we have a look at the distributions of venue’s review count for restaurants, fast foods and bars. Figure 1 shows the complementary cumulative distribution functions (CCDFs) of venue’s review count for restaurants, fast foods and bars. Intuitively, the distributions of venue’s review count for restaurants, fast foods and bars all seem to approximately follow an exponential law (see Fig. 1). Typically, over 90% of venues (regardless of venue category) have a relatively small number of reviews (e.g., less than 300). Bars tend to have more reviews whilst fast foods tend to have fewer reviews. We further looked at distributions of venue’s rating for restaurants, fast foods and bars. Figure 2 shows the histogram of venue’s rating for restaurants, fast foods and bars. The histograms for restaurants and bars look similar with other whilst the histogram for fast foods looks different (see Fig. 2). Specifically, the most popular ratings for restaurants and bars are 3.5 and 4 whilst the most popular ratings for fast foods are 3 and 3.5.

Fig. 1
figure 1

Complementary cumulative distribution functions (CCDFs) of venue’s review count for restaurants, fast foods and bars

Fig. 2
figure 2

Histogram of venue’s rating for restaurants, fast foods and bars

There are fraud reviews in online review websites (e.g., [12, 15]). According to a recent study focusing on restaurant reviews in the metropolitan area of Boston, US, roughly 16% of restaurant reviews on Yelp are suspicious reviews [15]. Specifically, a restaurant is more likely to commit review fraud when it has few reviews [15]. In this study, to reduce effects of suspicious reviews, we filtered venues with few reviews. We thus filtered restaurants, fast foods and bars with a review count less than 10. As a result, we selected 2578 restaurants, 981 fast foods and 797 bars as the empirical data. Besides, 518 Foursquare restaurants collected via Foursquare API (https://developer.foursquare.com/) were used in comparison of analysis based on Yelp venues. Note that only Foursquare restaurants are used in this study as numbers of Foursquare fast foods and bars are relatively small in the study area.

Voronoi polygons and neighbouring

Before running the spatially constrained clustering algorithm AMOEBA, we generated Voronoi polygons for restaurants, fast foods and bars respectively. The creation of Voronoi polygons in this paper was conducted using the tool Create Thiessen Polygons in ESRI ArcMap 10. Figure 3 shows the Voronoi polygons generated from restaurants in the study region. In Fig. 3, each polygon has a unique point representing a restaurant inside. Contiguous polygons mean that their associated points (venues) are neighbors.

Fig. 3
figure 3

Voronoi polygons and associated restaurants in the study region

Cluster detection and regionalization

In this section, empirical analyses of user review ratings in Phoenix are demonstrated. In the input of the AMOEBA algorithm, observations are average star ratings of venues (see equations (1)). Specifically, we ran the AMOEBA algorithm for the selected restaurants, fast foods and bars respectively. In this paper, running AMOEBA was conducted using ClusterPy. ClusterPy is a Python library with algorithms for spatially constrained clustering, and offers users some of the most widely used algorithms for spatial aggregation (see www.rise-group.org/risem/clusterpy). Apart from the significance level threshold, no other parameters are required to run AMOEBA. The significance level threshold was set to 0.01, meaning only clusters with a p-value less than 0.01 are statistically significant.

In the output of AMOEBA, there are ‘solution values’ representing clusters of high values or low values. Specifically, areas with positive ‘solution values’ belongs to high value clusters; areas with negative ‘solution values’ belongs to low value clusters; and areas with ‘solution values’ of zero are those outside the clusters. Table 1 shows how we group areas with ‘solution values’ to three cluster types: cluster of high value (hot spot), cluster of low value (cold spot) and outside of cluster. In this empirical study, the value here is the average star rating of venue.

Table 1 Cluster types and associated solution values

As a result, we mapped clusters of high and low ratings for restaurants, fast foods and bars respectively (see Fig. 4). The first three maps reveal that hot spots of restaurants and hot spots of fast foods both tend to be randomly distributed over space whilst hot spots of bars are more likely to be around the city centre. This reveals that bars within or near the city centre are likely to have high ratings in Yelp.

Fig. 4
figure 4

Clusters of high and low ratings (hot spots and cold spots) of restaurants, fast foods and bars

Besides, we also mapped clusters of high and low ratings for restaurants based on Foursquare data (see Fig. 4). The last map reveals that: hot spots of Foursquare restaurants are more likely to be around the city center; whilst hot spots of Yelp restaurants tends to be spatially randomly distributed. This reveals that compared with Yelp restaurants Foursquare restaurants around the city centre are more likely to have high ratings. This also indicates a gap exists between Yelp restaurants’ ratings and Foursquare restaurants’ ratings.

Similarity in spatial patterns of rating levels

Moreover, we compare spatial patterns of venues’ ratings by venue category in terms of correlation analysis. First we aggregated venues to grids (areas). Here the study region was divided into 5 km × 5 km grids. We think this is a moderate size since 1) there are 150 grids covered by the study region; 2) 70% of the grids have 5 or more restaurants, and 60% of the grids have 10 or more restaurants. Then we calculated the average rating for restaurants, fast foods, and bars in each grid by equations (2)-(4). After that, we looked at correlations of average rating between different venue categories at the grid level. After removing grids that have a few venues, we selected out grids that have more than 10 restaurants, and more than 6 fast foods or bars. Since the total number of restaurants is larger than those of fast foods or bars, the threshold for restaurants is larger than those for fast foods or bars. Finally, we calculated correlations of average ratings between restaurants, fast foods and bars (see Table 2). The correlation coefficient for restaurants and bars (0.61) is much higher than that for restaurants and fast foods (0.22). This indicates that although hot spots and cold spots of restaurants and fast foods both tend to be randomly distributed over space, spatial distribution of restaurants’ ratings tends to be more similar to that of bars’ ratings. Additionally, we also calculated the correlation of average ratings between Yelp restaurants and Foursquare restaurants (see Table 3). Similarly, we selected out grids that have more than 10 Yelp restaurants, and more than 5 Foursquare restaurants. Since the total number of Yelp restaurants is larger than that of Foursquare restaurants, the threshold for Yelp restaurants is larger than that for Foursquare restaurants. The correlation coefficient is not high, further indicating a gap exists between Yelp restaurants’ ratings and Foursquare restaurants’ ratings.

Table 2 Correlations of average ratings between restaurants, fast foods and bars
Table 3 Correlation of average ratings between Yelp restaurants and Foursquare restaurants

Sensitivity analysis

The empirical analysis is based on venues with 10 or more reviews. To understand the impact of this threshold (review count of venue) on analysis results, we carried out a sensitive analysis by performing analysis based on venues with 1 or more reviews and venues with 5 or more reviews. Figure 5 maps clusters of high and low ratings of restaurants, fast foods and bars with different thresholds of review count: 1, 5 and 10. Regardless of the threshold, hot spots of restaurants and hot spots of fast foods both tend to be randomly distributed over space. With the threshold of 10, hot spots of bars are likely to be around the city centre; whilst with the other two thresholds, hot spots of bars tend to be around the city centre and the southwestern part of the study region. Compare with the clustering results for restaurants and fast foods, the clustering result for bars is relatively sensitive to the threshold.

Fig. 5
figure 5

Clusters of high and low ratings of restaurants, fast foods and bars with different thresholds of review count: 1, 5 and 10

Discussions

Although a recent study based on a survey reveals that reviews in Yelp tend to be more reliable than reviews in other online review websites, including Foursquare, TripAdvisor and Amazon [25]. However, many more studies are needed to further investigate the reliability of reviews in Yelp. The reliability of Yelp’s reviews might vary over space, time and venue category. Typically, a recent study reveals that the prevalence of suspicious reviews has grown significantly over time, and restaurants are more likely to engage in positive review fraud earlier in their life cycles [15]. Different categories of venues might be associated with different levels of review fraud. For instance, chain restaurants are less likely to leave fake reviews than independent restaurants [15].

Due to existence of fake reviews in Yelp, model dedicated to filtering suspicious reviews were developed by both Yelp and other researchers (e.g, [15]), assuming that Yelp users who have contributed more reviews are less likely to have their reviews filtered. However, a more recent study reveals that the most popular users do not always provide trustworthy ratings and suggests reducing heavy reliance on popular users’ ratings in filtering suspicious reviews [21]. More advanced models are needed to better filter suspicious reviews and improve reliability of Yelp reviews.

Conclusions and future works

Mapping Yelp’s business venues with ratings provides a new way to understand spatial patterns of service quality of business or public venues at a large spatial scale. In terms of a spatially constrained algorithm, we could identify clusters of high ratings and further explore spatial patterns of clusters of high ratings. In this paper, we conducted an empirical research on Phoenix, USA. The empirical results indicate that spatial clusters of high ratings tend to be differently distributed between different categories of Yelp venues. More specifically, bars within or near the city centre are likely to have high ratings. Additionally, compared with Yelp restaurants Foursquare restaurants around the city centre are more likely to have high ratings. Moreover, although hot spots and cold spots of restaurants and fast foods both tend to be randomly distributed over space, spatial distribution of restaurants’ ratings tends to be more similar to that of bars’ ratings.

There are some limitations in this paper. First, although we used some methods to filter fraud reviews, some fraud reviews might still exist. Second, this study only chose Yelp data in one city-wide region. More empirical studies with different cities are needed to explore the reliability of Yelp data. Third, participants contributing to Yelp reviews are not as independent as participants in surveys. Before ranking a business venue, a new visitor could see previous ratings made by other users, including his or her friends. Previous reviews from other persons might affect the new visitor’s decisions. In contrast, a new participant usually could not see others’ ratings when make a new rating. In this case, ratings in surveys are more independent and objective than those in Yelp.

In the future, some further aspects should be taken into account when undertaking an analysis of Yelp reviews. First, empirical analysis in this paper is only made on a city and would be extended to more cities, particularly mega-cities. Second, to enhance amount of reviews, fusion of data from different sources, such as Yelp, Foursquare, Google Latitude and Facebook Places could be undertaken. Third, as Yelp allows users to write some comments on venues apart from giving star-based ratings, it is also possible to get another type of ratings by undertaking sentiment analysis of Yelp’s textual comments. Future research could conduct an analysis using star-based ratings in combination with text-based ratings by a sentiment analysis. It will be challenging how to allocate weights of star-based ratings and text-based ratings.

Abbreviations

AMOEBA:

A Multidirectional optimum ecotope-based algorithm

DBSCAN:

Density-based spatial clustering of applications with noise

LBSNs:

Location-based social networks

POIs:

Points-of-interest

References

  1. Aldstadt J, Getis A. Using AMOEBA to create a spatial weights matrix and identify spatial clusters. Geogr Anal. 2006;38(4):327–43.

    Article  Google Scholar 

  2. Cho E, Myers SA, Leskovec J. Friendship and mobility: User movement in location-based social networks, Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. San Diego; 2011.

  3. Duque JC, Aldstadt J, Velasquez E, Franco JL, Betancourt A. A computationally efficient method for delineating irregularly shaped spatial clusters. J Geogr Syst. 2011;13(4):355–72.

    Article  Google Scholar 

  4. Feng H, Qian X. Recommendation via user's personality and social contextual, Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. San Francisco; 2013.

  5. Getis A, Ord JK. The Analysis of Spatial Association by Distance Statistics. Geogr Anal. 1992;24(3):189–206.

    Article  Google Scholar 

  6. Ganu G, Elhadad N, Marian A. Beyond the stars: Improving rating predictions using review text content, Proceedings of the 12th International Workshop on the Web and Databases. Providence; 2009.

  7. Hu L, Sun A, Liu Y. Your neighbors affect your ratings: on geographical neighborhood influence to rating prediction, Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. Gold Coast; 2014.

  8. Jacquez G. Cluster morphology analysis. Spat Spattemporal Epidemiol. 2009;1(1):19–29.

    Article  Google Scholar 

  9. Lei X, Qian X. Rating Prediction via Exploring Service Reputation, Proceedings of the 17th IEEE International Workshop on Multimedia Signal Processing. Xiamen; 2015.

  10. Li H, Wu D, Tang W, Mamoulis N. Overlapping Community Regularization for Rating Prediction in Social Recommender Systems, Proceedings of the 9th ACM Conference on Recommender Systems. Vienna; 2015.

  11. Li M, Sagl G, Mburu L, Fan H. A contextualized and personalized model to predict user interest using location-based social networks. Comput Environ Urban Syst. 2016;58:97–106.

    Article  Google Scholar 

  12. Lim YS, Van Der Heide B. Evaluating the wisdom of strangers: The perceived credibility of online consumer reviews on Yelp. J Comput-Mediat Commun. 2015;20:67–82.

    Article  Google Scholar 

  13. Lu K, Zhang Y, Zhang L, Wang S. Exploiting User and Business Attributes for Personalized Business Recommendation, Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. Santiago; 2015.

  14. Luca M. Reviews, reputation, and revenue: The case of Yelp.com, Harvard Business School NOM Unit Working Paper, No. 12-016. 2011.

    Google Scholar 

  15. Luca M, Zervas G. Fake It Till You Make It: Reputation, Competition, and Yelp Review Fraud. Manag Sci. 2016;62(12):3412–27.

    Article  Google Scholar 

  16. McAuley J, Leskovec J. Hidden factors and hidden topics: understanding rating dimensions with review text, Proceedings of the 7th ACM conference on Recommender systems. Hong Kong; 2013.

  17. Noulas A, Scellato S, Lambiotte R, Pontil M, Mascolo C. A Tale of Many Cities: Universal Patterns in Human Urban Mobility. PLoS One. 2012;7(9):10.

    Article  Google Scholar 

  18. Noulas A, Scellato S, Mascolo C, Pontil M. An Empirical Study of Geographic User Activity Patterns in Foursquare, Proceedings of Fifth International AAAI Conference on Weblogs and Social Media. Barcelona; 2011. p. 570–3.

  19. Ord JK, Getis A. Local Spatial Autocorrelation Statistics: Distributional Issues and an Application. Geogr Anal. 1995;27(4):286–306.

    Article  Google Scholar 

  20. Pentina I, Bailey AA, Zhang L. Exploring effects of source similarity, message valence, and receiver regulatory focus on yelp review persuasiveness and purchase intentions. J Mark Commun. 2015:1–21.

  21. Pranata I, Susilo W. Are the most popular users always trustworthy? The case of Yelp. Electron Commer Res Appl. 2016;20:30–41.

    Article  Google Scholar 

  22. Qian X, Feng H, Zhao G, Mei T. Personalized recommendation combining user interest and social circle. IEEE Trans Knowl Data Eng. 2014;26(7):1487–502.

    Article  Google Scholar 

  23. Quercia D, Saez D. Mining Urban Deprivation from Foursquare: Implicit Crowdsourcing of City Land Use. IEEE Pervasive Comput. 2014;13(2):30–6.

    Article  Google Scholar 

  24. Ranard B, Werner R, Antanavicius T, Schwartz A, Smith R, Meisel Z, Asch D, Ungar L, Merchant R. Yelp reviews of hospital care can supplement and inform traditional surveys of the patient experience of care. Health Aff. 2016;35(4):697–705.

    Article  Google Scholar 

  25. Salshutz E. Everyone’s a Critic: An Exploration of Yelp.com and Food Media (bachelor’s thesis). 2014. Retrieved from http://digitalwindow.vassar.edu/senior_capstone/361. Accessed Dec 2014.

  26. Silva TH, VazdeMelo PO, Almeida JM, Salles J, Loureiro AA. A comparison of Foursquare and Instagram to the study of city dynamics and urban social behavior, Proceedings of the 2nd ACM SIGKDD International Workshop on Urban Computing. Chicago; 2013.

  27. Sklar M, Shaw B, Hogue A. Recommending interesting events in real-time with foursquare check-ins, Proceedings of the sixth ACM conference on Recommender systems. Dublin; 2012.

  28. Sun Y, Fan H, Li M, Zipf A. Identifying the city center using human travel flows generated from location-based social networking data. Environ Plann B Plann Des. 2016;43(3):480–98.

    Article  Google Scholar 

  29. Tang D, Qin B, Liu T, Yang Y. User Modeling with Neural Network for Review Rating Prediction, Proceedings of the 24th International Joint Conference on Artificial Intelligence. Buenos Aires; 2015.

  30. Zhang Y. Incorporating Phrase-level Sentiment Analysis on Textual Reviews for Personalized Recommendation, Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. Shanghai; 2015.

Download references

Availability of data and materials

Yelp dataset: https://www.yelp.co.uk/dataset_challenge.

Authors’ contributions

YS implemented the experiments and wrote the paper. YS and JP revised the paper and responded to the comments from the referees. Both authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yeran Sun.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, Y., Paule, J.D.G. Spatial analysis of users-generated ratings of yelp venues. Open geospatial data, softw. stand. 2, 5 (2017). https://doi.org/10.1186/s40965-017-0020-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40965-017-0020-9

Keywords