Stevens Response
Data 150
In the readings for Monday and today (Stevens et al.) the authors use a technique to produce a high resolution description of the distribution of human populations across the globe. What is the name of the technique and describe in general and basic terms how it works?
In the Stevens article that we read this week, the authors use a technique known as random forests in order to determine the distribution of human populations across the globe. This technique essentially creates many different decision trees and these trees are used to determine the most effective machine learning model that can make the most accurate predictions. In this case, the authors used the model to predict popualtion density in different areas.
The random forest method used by the authors is a machine learning algorithm (ensemble method). In general terms, what is a machine learning algorithm? Within the context of this study what distinguishes a data science, machine learning method (such as random forest) from previous classical statistical approaches to describing and analyzing phenomenon and events?
A machine learning algorithm uses data in order to create a model that can then make predictions. The most effective model is build based upon the input data uses many different techniques such as random forest used here. In the context of this study, machine learning methods are certainlu different than traditional statistical approaches primarily because a machine learning model has the capability to be exhaustive and consider every potenital outcome.
In the reading, the authors use a number of geospatial covariates as predictors in their machine learning method. What were these geospatial covariates and approximately how big of a data set did they represent (in general terms)? What is the significance of big data in the estimation of machine learning methods for inferring the correlates and drivers of human population distributions?
Geospatial covariates are variables that have the potential to help the model make predictions. In this case there were many covariates related to the area being studies such as nighttime lights, temperature, and other measures of the geography. Big data is very significant in the estimation of machine leanrning methods for inferring correlates and drivers of human population distribution because the large amount of data allowed the model to be more effective and the predictions to be more accurate.
The authors’ results present a remarkable improvement over previous geospatial descriptions at very high resolution, of the distribution of the human population. Within the context of human development in LMICs, what is the significance of having a highly accurate description of where each person is located across planet earth?
Having a highly accurate description of human population distribution across the planet is incredibly useful in many different ways. This information allows us to measure the impact of population growth, understand our effects on the environment, determine how monetary resources ought to be distributed, and what policies need to be created and enacted.
Within the context of human development in LMICs, what is the relevance to your area of investigation in having a highly accurate description of where each household and person is located across planet earth?
This information is particularly relevant to my area of investigation. I am looking at maternal and neonatal care in Subsaharan Africa and it is important to understand the human population distribution in this area in order to determine how far peoplec currently live from functioning health facilities. With this knowledge, we can better determine where new facilities need to be built and where resources ought to be allocated in order to help the most people.