how the world works - a data scientist's perspective: Koalas vs Possums - Modelling Biological Growth using Recursive Bayesian Estimation

In this week's post I will be discussing how to track biological population growth using a simple numerical model coupled together with incomplete noisy measurements. Firstly I will provide an overview of the numerical population model, then I will illustrate how to incorporate the measurements to ensure the numerical model is more representative of the real world.

The example I am using to illustrate this is the predator-prey problem. In the classic predator-prey problem, there is one type of predator and type of prey, for example foxes and rabbits. Here, however, will be using two predators and one prey to achieve the required level of complexity. The selected elements of the model have a somewhat Australian flavour. The two types of predators are koalas and possums. You may not think that these two animals are the most vicious of predators, however, you might think otherwise if you were the chosen prey ... gum leaves! The numerical model calculates the population of the koalas, possums and gum leaves, based on certain assumptions. We need to provide the rate at which an average koala and possum eats gum leaves. For both the koalas and possums, we also need to provide a threshold amount of gum leaves below which they die and above which they breed. We can estimate these rates and thresholds from the data itself, but for the moment we'll assume that we know what they are. Details on the numerical model can be found in reference [1] detailed at the end of this post.

For a particular selection of rates and thresholds we calculate the evolution of the koala, possum and gum leaf population versus time, as illustrated in the plot below. The specific numbers do not mean much, it is just intended to be an illustration of the approach. Here you can see a very regular and periodic pattern. The number of gum leaves (green line) initially increase, which is then followed by an increase in the population of both the koalas and possums as there is now more food available. When there are so many koalas and possums such that they are eating the gum leaves faster than the gum leaves can grow, the number of gum leaves decrease. When the gum leaves get below a certain threshold value the animals start to die because there is not enough food available. With there now being less koalas and possums the gum leaves begin to grow back. When there is enough food the animals begin to breed again, and so the cycle continues.

However, if we slightly change the rate at which the possums eats the gum leaves then we get a very different evolution of all three populations, as you can see from the plot below. It is clear that the system is now more complicated and less periodic / regular. This is one illustration of the butterfly effect, where a very small change to the system can produce very different results. The overall pattern to the population growth, however, is the same with the growth in the animal population following a growth in the number of gum leaves, and the decay in the animal population following a decay in the number of gum leaves. The predator-prey model has no random components in it all, however, it produces changes in the population, which on the surface appear to be random. They are in fact not random, but chaotic. This type of chaotic behaviour can only be produced by a system with at least three degrees of freedom (koala, possum and gum leaves in this case). This comparison has illustrated how important it is to get the input parameters of the model right. A small change to the rate at which possums eat gum leaves produced qualitatively similar behaviour, but quantitatively different results.

We will now augment the simple model used above with measurements of the true system using Recursive Bayesian Estimation. Recursive Bayesian Estimation is the general term given to a variety of methods used to incorporate measurements into a numerical model, as soon as these measurements become available. In this application, we'll be using the Ensemble Kalman Filter. The interested readers can find the mathematical details on wikipedia. It essentially involves running an ensemble of many numerical models at the same time each starting with slightly different conditions (e.g. different populations), and also in some cases slightly different model parameters (e.g. possum feeding rates). A comparison of each instance of the numerical model gives an indication of the natural variability (or error) in the system. When a measurement becomes available it is compared to what the numerical model suggests the measurement should be, and the model is then pushed in the direction of the measurements. If the measurement has no noise in it at all, then the model is set to be measurement exactly. If there is some error in the measurement, then the numerical model is pushed in the direction of the measurement. The extent of which is determined by a comparison of the natural variability and measurement error.

Here the Ensemble Kalman Filter is used to estimate the population growth properties from a partial noisy measurement of the "true" system. The "true" system is the second case discussed above, but here we are just focussing at the first 100 time units of it. The measurement is "partial" in that we have only measured the koala and possum populations, and do not have a measurement of the number of leaves. They are "noisy" in that I have added a random number to the measurement of the "true" system.

In the plot below the "true" Koala population is represented by the red line, and the noisy measurements of it are represented by the small black boxes. You can see that the black boxes in some cases are shifted away from the red line, due to the noise in the measurement. In between each measurement, the numerical model calculates the population until a new measurement comes along to correct it. The numerical model estimate in each of the following plots is given by the black line.

Likewise, as can be seen from the plot below, the numerical model estimates the possum population, which is corrected when the measurements become available.

The Ensemble Kalman Filter uses the measurements of the koala and possum populations to not only correct the koala and possum population estimates from the numerical model, but also to correct the gum leaf population. This illustrates a key feature of the approach. One can make an estimate of something that you cannot measure (gum leaf populations), throughout its relationship (numerical model) to something that you can measure (koala and possum population).

In addition to estimating the population of the gum leaves from the measurements of the animal population, you can also estimate the model parameters from the data itself. For example we can estimate the rate at which the possums eat the gum leaves, which we identified earlier to significantly effect the evolution of the populations. In the plot below you can see that the estimate of the possum feeding rate approaches the true value as more measurements of the animal population become available. Each of the stair step changes in the estimate coincide with a new measurement. The more parameters you wish the estimate from the data, the more numerical simulations are required to be run simultaneously in the ensemble.

This fun example of Recursive Bayesian Estimation has illustrated a powerful approach for fusing together numerical models and real world data. It has implications for any application in which a numerical model is required to simulate reality, particularly when the numerical model parameters are not well known. This is particularly the case in socio-economic and complex biological systems.

References:

[1] Timo Eirola , Alexandr V. Osipov and Gunnar Soderbackao, Chaotic Regimes in a Dynamical System of the Type Many Predators One Prey, Helsinki University of Technology, Institute of Mathematics, Research Reports A368 (1996).

how the world works - a data scientist's perspective

Tuesday, 22 April 2014

Koalas vs Possums - Modelling Biological Growth using Recursive Bayesian Estimation

No comments:

Post a Comment