Evaluating two freely available geocoding tools for geographical inconsistencies and geocoding errors
© The Author(s). 2017
Received: 29 December 2016
Accepted: 23 April 2017
Published: 1 May 2017
Geocoding is highly prone to error for various reasons. This paper examines the geographical inconsistencies associated with geocoding errors seen when using two freely available geocoding tools, Google Sheets and ggmap.
Two hundred restaurants, all recipients of California’s Center of Excellence award, were selected for the analysis. The geocoded addresses were plotted on maps using QGIS, Google Maps, OpenStreetMap (OSM), and Google Earth for visualization, comparison, and validation. A stepwise method of analyzing the geographical inconsistencies is provided that can be adapted for any locational analytics.
Results and discussion
Both Google Sheets and ggmap were able to successfully geocode all 200 addresses, but ggmap incorrectly geocoded eight addresses as being more than 2,000 miles from their actual location. Addresses containing the ampersand character, &, caused ggmap to incorrectly geocode their location. After replacing the ampersand with the word and, ggmap was able to correctly geocode those addresses. The corrected locations plotted on Google Maps and OSM were similar, and they exactly matched the actual locations when plotted on Google Earth.
Both Google Sheets and ggmap are equally capable of geocoding physical locations, but R users are advised that addresses for geocoding must be free of the ampersand character if correct results are to be obtained. In addition, geocoded outputs should be plotted on a map using QGIS, ArcGIS, Google Maps, OSM, R, or any other such mapping tools for visualization and validation. This will ensure a high-quality geospatial analysis of places or events when locational information is vital for decision-making.
KeywordsGeocoding Google sheets RStudio Ggmap Geosphere QGIS ArcGIS Google maps OpenStreetMap Google earth Location analytics
Geographic location plays a vital role in a variety of socioeconomic and environmental decisions, such as in selecting sites for new businesses or providing location-based services [1, 2]; detecting, valuing, and defining protected marine areas ; locating prospective areas for grid-connected offshore wind power development ; identifying disease-prone areas ; responding to crime and natural hazards; and locating customer-friendly shopping malls . Geocoding, the process of assigning coordinates (latitude and longitude) to a physical location, has helped various industries improve performance through spatial analysis [6, 7]. However, the accuracy and reliability of geocoded results have always been a matter of concern among the geospatial analytics community [4, 7–10]. Senaratne et al. (2017) provide a detailed review of the various methods applied in assessment of the quality of locational analytics. The authors report that accuracy measurement is the most frequent and reliable technique currently in practice . They define accuracy as the degree of closeness between measured and actual values, noting that it may vary with use of various geocoding tools . Geocoding is highly prone to error for various reasons, including lack of coverage (local vs. global); lack of complete, correct, consistent, and updated reference databases; and the making of inappropriate assumptions [7, 9, 10, 12, 13]. All these may affect match rate and positional accuracy . Additionally, incorrect geocoding may bias the results of spatial analysis, resulting in misclassification of actual physical locations that may adversely affect research outcomes or location-based business decisions [14, 15]. Accordingly, understanding and addressing these geocoding challenges is vital . Yet geocoding processes and error handling have been largely ignored in some studies [14, 15].
A non-exhaustive list of geocoding services (Adapted from )
Cost of ArcGIS license
1 at a time
100 per day
($25/25000); ($50/50,000); ($75/100,000)
1 million/$1195; unlimited/$5975
Individual and batch geocoding
10,000 per day
$99/month for 15,000 addresses
1 at a time
$275 - $1000 depending on version
1 at a time
1000 per day
$0.004/geocode up to 4000 addresses to $0.0005/geocode for >10,000,001 locations
1 at a time
1 at a time
1 at a time
1 at a time
1000 addresses for 39 euros
50-100 requests an hour
1 at a time
1 at a time
100 free, 25,000 for $410–100,000 for $1535
500 per day
1 at a time
1000 per day
1 at a time
Several studies have offered a comparative analysis of various free or subscription-based geocoding services. For example, Karimi et al.  have evaluated the matching rate of geocoded addresses using web-based geocoding services, including Virtual Earth, Google Maps, Geocoder.us, MapQuest, and Yahoo Maps. In contrast, Swift and his team members  assessed seven commercial geocoding services and one open-source geocoding service—Centrus, Geolytics, ERSI Address Locator, Geocoder.us, Google Earth, Google Maps API, and the Yahoo API and USC Geocoding Platforms, respectively—to match accurately geocoded addresses. The authors selected 50 addresses for this purpose and found that only 42% of samples matched their reference data, 54% of addresses matched parcel centroids, and only 4% addresses matched USPS ZIP code centroids . All the geocoding tools tested produced varying results, indicating that analysts should indeed take care when geocoding physical locations, especially when doing so for purposes of location-based analysis, and should take that same care when selecting geocoding tools in the first place.
This study compares the use of two commonly used free geocoding tools for research and business purposes: Google Sheets, which is a Google offering, and ggmap, which is an R package. No comparison of these tools has yet appeared in mainstream journals.
ggmap is one of the most widely used geospatial R packages in a variety of domains. For example, it was used to geocode helminth (nematodes popularly known as roundworms) host–parasite interactions that helped establish the London Natural History Museum’s Host–Parasite Database ; was used in a big-data environment to geocode customer movements from homes to shopping centers ; and was used to site locations for implementation of a U.S. federal program offering families and children healthful foods during the summer months, administered by the U.S. Department of Agriculture . Google Sheets, or Google Spreadsheets, by contrast, has gained little attention among geospatial analytics communities, even though it has been applied to an array of domains, such as in the geocoding of socioeconomic historical data for visualization of urban geographies  and in the public health domain [23, 24]. Google Sheets provides advantages not seen in ggmap, because it does not require coding and is a web-based application. By contrast, ggmap runs through the RStudio software and requires a sequence of queries; even so, it is widely used and accepted by researchers and professionals the world over.
For testing purposes, a publicly available list of Center of Excellence–awarded restaurants in California was downloaded from the website of the Public Health Care Agency of Orange County . In 2016, fully 3631 restaurants were recognized as a Center of Excellence for their performance in 2015. The list contains each restaurant’s name, address, city, and ZIP code. Because use of both Google Sheets and R is subject to a maximum geocoding limit per day, only 200 of these 3631 restaurants were selected for geocoding purposes in this study. Moreover, this study seeks to compare the accuracy of geocoded results produced by two popular geocoding tools while providing a stepwise method of resolving geocoding challenges: because visual verification of individual address is a tedious task, a small sample size—but larger than that used by Swift et al. (2008)—was selected. Because the selection was purely for research purposes, no priority was given to any specific restaurant chain. The 200 addresses were stored as address.csv for further analysis.
Geocoding using Google sheets and the RStudio ggmap package
Google Sheets is a free web-based application, developed by Google for real-time online document editing while collaborating with other users . Several blog articles and tutorial videos are available that instruct users in the steps used for geocoding physical locations through Google Sheets, such as one available through GitHub, which was adapted for this study .
R is one of the most widely used statistical and visualization open-source tools [28, 29], with more than 6000 packages  contributed by thousands of authors across the world. Ggmap, a bundle of 34 functions, is a spatial data modeling and visualization R package . This package uses Google and Stamen Maps as reference sources for geocoding and mapping. The codes used in this study are adapted from Shane Lynn . Most are kept intact for reproducibility, and the code used is available in geocode_2016.R and geocode_2016.txt, accessible through this article.
Distance calculation in RStudio using the geosphere package
Descriptive statistics in RStudio using the pastecs package
Descriptive statistical analysis of the distance calculated between the geocoded locations produced by Google Sheets and those produced by ggmap was performed using the pastecs R package . The stat.desc function of pastecs quantifies various descriptive statistics, including number of variables, null values, NAs, minimum, maximum, range, sum, median, mean, standard error of the mean, confidence interval of the mean, variance, standard deviation, and coefficient of variance . In the results, only a few required outputs are presented and discussed.
The Arc Geographical Information System (ArcGIS) has seen much use in spatial analytics and modeling in different perspectives and is one of the most advanced and reliable geospatial analytical tools available [35–37]. However, QGIS, an open-source GIS tool, has become very popular in the field of geospatial analytics . In this study, QGIS is used to plot geocoded locations on a map using QGIS version 2.18.2 for the Windows environment . Within the QGIS environment, the open layers plugin provides options for selecting Google Maps and OSM as base maps on which to plot geocoded locations. These locations were plotted on Google Maps and OSM for visualization, comparison, and validation. Google Earth, a freeware virtual globe, map, and geographical information program that offers various mapping facilities and that is one of the most reliable geocoding tools available, was also used to investigate locational accuracy . Google Earth has a street view option, which provides a 360° horizontal and 290° vertical panoramic view at the street level from a height of about 2.5 m . These help users verify actual locations by zooming to the street level.
Results and discussion
Geocoded outputs from Google sheets and ggmap
The accuracy level of geocoded addresses produced by ggmap
The geocoded addresses produced by Google Sheets and ggmap were compared by calculating the differences between the latitudes and the differences between the longitudes produced by Google Sheets and ggmap. The geocoded addresses matched in only 53% (107 of 200) cases. The minimum difference between the results produced by Google Sheets and those produced by ggmap was −0.00011 degrees of latitude and −35.65284 degrees of longitude. The maximum difference was 4.29319 degrees of latitude and 0.00016 degrees of longitude, with a standard deviation of 0.76 degrees of latitude and 6.60 degree of longitude. Furthermore, the geocoded addresses produced by ggmap exactly matched Google Sheets outputs in only 56% (92 of 165) of instances involving street address–level accuracy, 45% (10 of 22) of instances involving sub-premise-level accuracy, and 100% of instances involving either campground-level (1 of 1) or bus station–level (7 of 7) accuracy.
A summary descriptive statistics of the distance between the coordinates
The geocoded addresses after replacing “&” with “and”
Geocoded with “&”
Geocoded with “and”
Distance (in miles)
After correction of the addresses, all re-geocoded results matched Google Sheets outputs. Evidently, the use of even a single problematic character, the ampersand, can cause ggmap to produce incorrect outputs, assigning coordinates as far as 2000 miles from their real location.
Although the geocoding tools Google Sheets and ggmap use a common map reference, they produce varying results. In addition, specific formatting, free of problematic characters such as the ampersand, is required for correct geocoding by ggmap. Google Sheets, by contrast, features a user-friendly environment that does more to aid production of reliable geocoding results. Regardless, users of geocoding tools should not wholly rely on whichever tool they use but rather should always verify their results by the methods outlined in this study or by any other established approach. The visualizing of geocoded results on a map using QGIS, ArcGIS, Google Earth, OSM, or R can help in identifying and resolving potential challenges to accuracy. Certainly other factors not covered in this study could also produce erroneous geocoded results, so analysts should carefully evaluate their results and report them in detail, taking particular care when geocoding physical locations in bulk. This study seeks merely to compare the geocoding respective potentials of two freely available geocoding tools for research purposes, not to promote or undermine either of them. Reporting positional accuracy challenges and methods of resolving them can help users of geospatial analytics conduct efficient and accurate spatial analysis.
This work was conducted purely for research purposes and does not necessarily represent the official views of the organization with which the author is associated. RStudio and QGIS are open-source tools, freely available for research work.
The author did not receive any funding for this study.
The author declare that they have no competing interests.
SKS conceptualized the study, analyzed the data, and wrote the manuscript.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- Schmenner RW. Look beyond the obvious in plant location. Harv Bus Rev. 1979;57(1):126–32.Google Scholar
- Singh SK. Geospatial analysis of census data for targeting new businesses using Geoeconomics. Journal of Intelligence Studies in Business. 2016;6(12):5–12.Google Scholar
- Waewsak J, Landry M, Gagnon Y. Offshore wind power potential of the Gulf of Thailand. Renew Energy. 2015;81:609–26.View ArticleGoogle Scholar
- Rushton G, et al. Geocoding in cancer research: a review. Am J Prev Med. 2006;30(2):S16–24.View ArticleGoogle Scholar
- Mohamad MY, Al Katheeri F, Salam A. A GIS application for location selection and Customers' preferences for shopping malls in al Ain City; UAE. American Journal of Geographic Information System. 2015;4(2):76–86.Google Scholar
- Goldberg DW, et al. An evaluation framework for comparing geocoding systems. Int J Health Geogr. 2013;12(1):1.View ArticleGoogle Scholar
- Zandbergen PA. Geocoding quality and implications for spatial analysis. Geography Compass. 2009;3(2):647–80.View ArticleGoogle Scholar
- Goldberg, D.W., J.P. Wilson, and C.A. Knoblock, From text to geographic coordinates: the current state of geocoding. URISA-WASHINGTON DC, 2007. 19(1): p. 33.
- Karimi HA, Durcik M, Rasdorf W. Evaluation of uncertainties associated with geocoding techniques. Computer-Aided Civil and Infrastructure Engineering. 2004;19(3):170–85.View ArticleGoogle Scholar
- Zhang, J. and M.F. Goodchild, Uncertainty in geographical information. 2002: CRC press.
- Senaratne H, et al. A review of volunteered geographic information quality assessment methods. Int J Geogr Inf Sci. 2017;31(1):139–67.View ArticleGoogle Scholar
- Roongpiboonsopit D, Karimi HA. Comparative evaluation and analysis of online geocoding services. Int J Geogr Inf Sci. 2010;24(7):1081–100.View ArticleGoogle Scholar
- Karimi HA, Sharker MH, Roongpiboonsopit D. Geocoding recommender: an algorithm to recommend optimal online geocoding services for applications. Trans GIS. 2011;15(6):869–86.View ArticleGoogle Scholar
- McLafferty S, et al. Spatial error in geocoding physician location data from the AMA physician Masterfile: implications for spatial accessibility analysis. Spatial and spatio-temporal epidemiology. 2012;3(1):31–8.View ArticleGoogle Scholar
- Hay G, et al. Potential biases due to geocoding error in spatial analyses of official data. Health & place. 2009;15(2):562–7.View ArticleGoogle Scholar
- Goldberg, D.W., J.P. Wilson, and M.G. Cockburn. Toward quantitative geocode accuracy metrics. In ninth international symposium on spatial accuracy assessment in natural resources and environmental sciences. 2010.Google Scholar
- TAMG. Available Geocoding software. 2016 [cited 2016 October 05, 2016]; Available from: https://geoservices.tamu.edu/Services/Geocode/OtherGeocoders/.
- Swift J, Goldberg D, Wilson J. Geocoding best practices: review of eight commonly used geocoding systems. Los Angeles: University of Southern California GIS Research Laboratory; 2008.Google Scholar
- Dallas T. helminthR: an R interface to the London Natural History Museum's host–parasite database. Ecography. 2016;39(4):391–3.View ArticleGoogle Scholar
- Lovelace R, et al. From big noise to big data: toward the verification of large data sets for understanding regional retail flows. Geogr Anal. 2016;48(1):59–81.View ArticleGoogle Scholar
- Wilkerson RL, Khalfe D, Krey K. Associations between neighborhoods and summer meals sites: measuring access to Federal Summer Meals Programs. Journal of Applied Research on Children: Informing Policy for Children at Risk. 2016;6(2):9.Google Scholar
- Rodger R, Fleet C, Nicol S. Visualising urban geographies. e-Perimetron. 2010;5(3):118–31.Google Scholar
- Cinnamon J, Schuurman N. GeoWeb and web 2.0: new tools for public health. PositionIT; 2010. p. 47–51.Google Scholar
- Cinnamon J, Schuurman N. Web technologies for public health surveillance in low and middle-income countries, in sixth international conference on geographic information science. Zurich: GIScience; 2010.Google Scholar
- HCA. Food facility award of excellence food inspection program. 2016. Available from: http://ocfoodinfo.com/retail/award.Google Scholar
- Wikipedia. Google docs, sheets and slides. 2016. Available from: https://en.wikipedia.org/wiki/Google_Docs,_Sheets_and_Slides.Google Scholar
- Nuket. Google-sheets-geocoding-macro. 2010. Available from: https://github.com/nuket/google-sheets-geocoding-macro.Google Scholar
- Rossiter D. Introduction to the R project for statistical computing for use at ITC. International Institute for geo-Information Science & earth observation (ITC), Enschede (NL), vol. 3; 2012. p. 3–6.Google Scholar
- Ihaka R, Gentleman R. R: a language for data analysis and graphics. J Comput Graph Stat. 1996;5(3):299–314.Google Scholar
- Vries, A.d. How many packages are there really on CRAN? 2015. Available from: http://blog.revolutionanalytics.com/2015/06/how-many-packages-are-there-really-on-cran.html. [cited 2016 October 6, 2016].
- Kahle D, Wickham H. Ggmap: spatial visualization with ggplot2. R Journal. 2013;5(1):144–61.Google Scholar
- Lynn, S. Batch Geocoding with R and Google maps. 2013. Available from: https://www.r-bloggers.com/batch-geocoding-with-r-and-google-maps/. [cited 2016 September 20, 2016].
- Hijmans RJ, et al. Package ‘geosphere’. Wien: R Foundation.(R Foundation Rapport) Tillgänglig; 2015. https://cran.rproject.org/web/packages/geosphere/geosphere.pdf [02-01-2016]Google Scholar
- Grosjean P, Ibanez F. Pastecs: package for analysis of space-time ecological series. R package version 1.3–18. 2014. http://CRAN.R-project.org/package=pastecs
- Singh SK. Assessing and mapping vulnerability and risk perceptions to groundwater arsenic contamination: towards developing sustainable arsenic mitigation models (order no. 3701365), Available from ProQuest Dissertations & Theses Full Text. (1681668682). In earth and environmental studies. USA: Montclair State University; 2015. p. 392.Google Scholar
- Singh SK, Brachfeld SA, Taylor RW. Evaluating hydrogeological and topographic controls on groundwater arsenic contamination in the mid-Gangetic plain in India: towards developing sustainable arsenic mitigation models. In: Emerging issues in groundwater resources, advances in water security, A. Fares, editor. Switzerland: Springer International Publishing; 2016.Google Scholar
- Singh SK, Vedwan N. Mapping composite vulnerability to groundwater arsenic contamination: an analytical framework and a case study in India. Nat Hazards. 2015;75(2):1883–908.View ArticleGoogle Scholar
- QGIS, D., QGIS geographic information System. Open source geospatial Foundation project. 2015.Google Scholar
- Wikipedia. Google earth. 2016. Available from: https://en.wikipedia.org/wiki/Google_Earth. [cited 2016 October 9, 2016].