workshop

View project on GitHub

Nieves Response

DATA 150

Nieves et al. uses the random forest machine learning method to predict what value globally? Describe in detail how random forest works. What is a dasymmetric population allocation? Which geospatial covariates proved to be the most important when predicting global values of where humans reside?

In this article, the authors use the random forest machine learning method in order to predict population spatial distributions globally. Random forest is a type of machine learning method that works by combining individual weaker decision trees into a ‘stronger learner. The general steps of conducting a random forest machine learning method is to first select the covariates for the model, then fitting the model on available data. The next step is to perform a dasymetric redistribution of population counts from census-based administrative units to grid cells. The accuracy of the decision tree is tested by calcularing an MSE (mean squared error) for some data that was not used in the training of the method. This data is known as out-of-bag data (OOB). Dasymetric population allocation is essentially when new boundaries are drawn and more a more accurate redistribution of population is created. The five covariates that were proved to be most important when predicting global values of where humans reside were urban/suburban extents, built environment and urban/suburban proxies, climatic/environmental variables, populated place covariates, and transportation networks.