Latent Factor Models of Human Travel

Michael Guerzhoy and Aaron Hertzmann

Basic Idea

We decompose the likelihood of traveling from A to B into three factors: the desirability of B as a destination, the affinity between source A and destination B, and the individual-varying propensity to travel the distance between A and B. By analyzing the models that we learn from two large datasets, geotagged Flickr photos and tracks of Shanghai taxis, we estimate the desirabilities of destinations on the map and affinities between locations, as well as discover clusters of individuals with varying propensities to travel large distances. We analyze the parameters of our models to try to gain insight into travel patterns.


Papers

Michael Guerzhoy and Aaron Hertzmann, Learning Latent Factor Models of Travel Data for Travel Prediction and Analysis. In Proc. of the Canadian Conference on Artificial Intelligence (AI 2014), May 2014, Montreal, Quebec. Best Paper Award.

Michael Guerzhoy and Aaron Hertzmann, Learning Latent Factor Models of Human Travel. At NIPS Workshop on Social Network and Social Media Analysis: Methods, Models and Applications (Social 2012), Dec. 2012, Lake Tahoe, Nevada.

See the Appendix for details and derivations.

Popularity vs. Desirability of Destinations for Flickr Users

The popularity of a destination can be estimated by counting the number of users who visit the destination. Our model allows us to also estimate the desirability of a destination: how popular the destination might have been if not for factors such as distance. For example, we see that Melbourne and Sydney are more desirable than they are popular: they are in the top 10 most-desirable destinations, but not in the top 10 most-popular destinations, due to their being far away from most population centres.

Popularity

Desirability

popularity desirability

Top 16 most-popular destinations

Top 16 most-desirable destinations

  1. London, GB
  2. New York, US
  3. San Francisco/San Jose, US
  4. Paris, FR
  5. Milan, IT
  6. Washington DC/Baltimore, US
  7. Vancouver, CA
  8. Chicago, US
  9. Los Angeles, US
  10. Brussels, BE
  11. Berlin, DE
  12. Tokyo, JP
  13. Rome, IT
  14. Glasgow, GB
  15. Frankfurt, DE
  16. Barcelona, ES
  1. London, GB
  2. New York, US
  3. Brussels, BE
  4. San Francisco/San Jose, US
  5. Paris, FR
  6. Frankfurt, DE
  7. Sydney, AU
  8. Melbourne, AU
  9. Tokyo, JP
  10. Dublin, IE
  11. Shanghai, CN
  12. Washington DC/Baltimore, US
  13. Berlin, DE
  14. Toronto, CA
  15. Hilo, US
  16. Marseille, FR

Affinities between Locations

One of the factors determining if an individual is likely to go from location A to location B is the affinity between A and B: the strength of the connection between A and B that is not accounted for by either the desirability of B or the proximity of A and B. Looking at the affinities allows us to look for interesting connections between locations. By looking at the affinities and the desirabilities, we can also see what travel might have looked like if proximity hadn't been playing a role in it.

For travel outgoing from of the map quad that contains Montreal, the locations with the highest affinity to Montreal are:

Top affinities for transitions from Montreal

  1. Libreville, Gabon
  2. Rimouski, Quebec, Canada
  3. Caraquet, New Brunswick, Canada
  4. Baie-du-Poste, Quebec, Canada
  5. Singapore, Singapore

Rimouski and Baie-du-Poste are locations in Northern Quebec, and Caraquet is a Francophone town in the province of New Brunswick, Canada. In general, Europe, the west coast, and Florida have high affinities with Montreal.

You can view the affinity information for the various locations on the map. Enter the GPS coordinates of the location of interest, or view the affinity information for the London, Milan, Moscow, or Paris areas directly.

Latitude: Longitude:

Affinities are zeroed-out where no transitions are observed in the dataset; top transitions to major cities are necessarily noisy because they occur from the entire map

Affinities within National and Linguistic Borders

There is more travel within national and linguistic borders than across them, but that might be due to the fact that nearby locations are often located within the same borders. We can control for proximity (and popularity of destinations) by looking only at the affinity factors. In the model that we learn from our Flickr data, the affinity factors are larger than would be expected for pairs of locations within national borders and for pairs of locations within Russian- and Spanish-speaking locations (that are located across national borders). You might think this is obvious -- but it's not even generally true. The pattern holds for Russian and Spanish, but not for other global languages like English and French.

In other words, even accounting for distance travelled and popularity of destination, we find that travellers tend to stay within national borders, and within the linguistic borders of Spanish and Russian.

In our analyses, it is impossible to detect the patterns for travel within and across linguistic borders without controlling for proximity and popularity using our model.

Differences in Flickr users' propensity to travel long distances

We find evidence in the data for different users' traveling differently. We cluster Flickr users into 5 clusters with different travel patterns. Here, we show how far from New York City the different Flickr users are likely to end up 24 hours after taking a picture in NYC.

log prob of transitioning distance d from nyc

Conclusions

Our model of human travel is interpretable and demonstrates superior predictive performance to simple parameteric models as well as to large histogram-based models. Human travel modeling has been explored by the scientific community in recent years, and there are several applications of human travel modeling proposed in the literature.



Last modified: Feb. 23, 2014 by Michael Guerzhoy (guerzhoy at cs.toronto.edu)

Valid HTML 4.01 Transitional