2020 Virtual undergraduate Research symposium

Genetic Algorithm Optimization Study for Atmospheric Carbon Monoxide Models


PROJECT NUMBER: 1

AUTHOR: Meera Duggal, Applied Mathematics and Statistics | MENTOR: Dorit Hammerling, Applied Mathematics and Statistics

 

ABSTRACT

The primary source of atmospheric carbon monoxide (CO) in the Southern Hemisphere are large burn events, making CO a useful proxy for fires. Therefore, predictive CO models can help districts or countries prepare for unusually large fire seasons. Fires are related to the climate through wind and sea surface temperatures, as they are more likely to occur when vegetation is dry. Climate indices are metrics that summarize variability in the climate. A multiple linear regression model was created that uses these climate indices to model atmospheric CO. We have created the R package southernHemisphereCO to perform model selection, specifically applied to atmospheric CO. This package offers three different model selection techniques: stepwise regression, a genetic algorithm, and an exhaustive search. The exhaustive search always finds the best possible model but is computationally expensive. Stepwise regression runs quickly and is scalable but often fails to find the best model. We implemented a genetic algorithm as a compromise between computational expense and model accuracy. Being a stochastic model technique, the genetic algorithm has many parameters that affect rates and stopping conditions. Here we present a parameter optimization study for the genetic algorithm, seeking to balance computational expense and model quality.

 

VISUAL PRESENTATION

 

AUTHOR BIOGRAPHY

Meera is a Junior studying Applied Mathematics. Her research was conducted in the mathematics department as well. She started her research in January of this year and plans to continue on into the next semester. Her research thus far has encompassed the study to optimize parameters in the genetic algorithm for atmospheric carbon monoxide models.

 


4 Comments

  1. Great job with the poster and the voice over. Good luck!

    • Thank you!

  2. Hello Meera, nice looking poster and lots of good looking work here.

    The main point of confusion for me was that I did not understand the transition from tracking fires to the genetic algorithm. How did the variables, mutation, reproduction, etc. relate to fires and how does that circle back to making conclusions from your simulations?

    Also unsure why the optimization is needed and why it is much better than the other two algorithms.

    It seems like the goal of this work was using the genetic algorithm to find optimum variable values, but I am unsure what the significance is of finding these values. Do they need to be accurate with little error? How do these values compare to the other algorithms?

    Great work done here! I think that it could be presented more clearly to delivery a useful message.

    • Hello Ben,

      I am sorry that the delivery was a little unclear. In essence, the multiple linear regression model using climate indices is what is “tracking fires” as you say. So, the way the genetic algorithm relates to this is that it is the variable selection technique that is helping produce these models. Since these models are going to be used for prediction purposes for different countries a big focus was making the data useable for the general public. As a result, the R package was made that has these three different variable selection techniques. These different variable selection techniques produce the best models for a specific region. The genetic algorithm was used instead of stepwise or exhaustive method because it produces very good models with low runtime. Meaning that the prediction power will hopefully be very high and it doesn’t take the user a long time to run on their own. In finding these optimal values the genetic algorithm will run quicker and still have very good predictive models. The goal is that when utilizing these parameters the models don’t decrease in accuracy significantly. For example, we found a .4 % difference in models which is more than acceptable in our case. It would problematic though if the models differed by a large percentage.

      Hopefully, that makes things a little more clear.

      Thank you for taking the time to listen to my presentation!

Share This