Published on

Insert Latitude/Longitude columns to Pandas DataFrame - Machine Learning Tutorial

Figure 1: Scenery from Switzerland, source: https://www.travelandleisure.com/trip-ideas/nature-travel/most-naturally-beautiful-countries-in-the-world

This will be a brief tutorial on how to insert latitude and longitude values to a Pandas DataFrame. These GPS coordinates will be derived from a column containing US state names as strings, as shown in this screenshot:

Figure 1: Image of 'State' column in Pandas DataFrame

Instructions:

You can find the Kaggle Notebooks link for this tutorial by clicking here

I recommend running the code in the tutorial using the dataset used in the above Kaggle Notebook. You can do so easily by clicking the 'Copy & Edit' button on the top-right of the Kaggle Notebook page. Using Kaggle as the execution platform would be the most convenient method to follow along for this tutorial. Now let's start the tutorial!

Development:

First we need to import the geopy.geocoders library by running the following Python line:

from geopy.geocoders import Nominatim

Next we need to instantiate a Nominatim instance using the following line:

geolocator = Nominatim(user_agent='us_opiod_deaths')

Now we can iterate over all the values in the state column of the df DataFrame, convert each string state name into latitude/longitude coordinates and then insert all those values into two arrays. One for the latitude values and one for the longitude values. Here is the code to do so:

lat_arr = []
long_arr = []
for state in df['state']:
    lat_long_coords = geolocator.geocode(state)
    lat_arr.append(lat_long_coords.raw['lat'])
    long_arr.append(lat_long_coords.raw['lon'])

After that, we have to insert the latitude and longitude values in the two arrays into the original dataset via the pandas.DataFrame.insert() method:

df.insert(loc=1, column='long', value=long_arr)
df.insert(loc=1, column='lat', value=lat_arr)

Finally, we can display the newly created lat and long columns using these lines:

for index, row in df.iterrows():
    print("lat: " + str(row['lat']) + ", long: "+ str(row['long']))

Here is a screenshot of the output:

Figure 2: Image of 'lat' and 'long' column in Pandas DataFrame

Well, that's it for this blog post! Hope it was helpful!

Conclusion

Thanks for reading this blog post!

If you have any questions or concerns feel free to post a comment in this post and I will get back to you if I find the time.

If you found this article helpful feel free share it and make sure to follow me on Twitter and GitHub, connect with me on LinkedIn and subscribe to my YouTube channel.