Wednesday 8 June 2016

Fitting models using scipy and unit testing

Hi everyone, this is my first blog after the start of the coding period and the past two weeks have been quite busy and eventful. We now have a working code to get model parameters which has been tested with simulated data. This is my first time with software testing and it has been quite a learning experience.

So, let me describe the work so far. Although we are fitting the IVIM (Intravoxel incoherent motion) model to dMRI signals here, the techniques and code developed so far can be used for any kind of model fitting.

The equation we want to fit : S = S0 [f e^{-b*D_star} + (1 - f) e^{-b*D}] but first let us try some basic fitting using Scipy's leastsq.

Fitting a model using scipy.optimize.leastsq

I am halfway through my Google Summer of Code project with Dipy under the Python Software Foundation and I have published a few short posts about the project before but in this post I am going to walk through the entire project from the start. This can also be a good tutorial if you want to learn about curve fitting in python, unit testing, constructing Jacobians for faster fitting and getting started with Dipy, a diffusion imaging library in Python.
The first part of this post will cover general curve fitting with leastsq, writing a Jacobian and writing unit tests. The second part will focus on Dipy and the specific model I am working on - The Intravoxel Incoherent motion model.

Curve fitting with leastsq

Mathematical models proposed to fit observed data can be charaterized by several parameters. For example, the simplest model is a straight line which is characterised by the slope and intercept. If we consider "x" as the independent variable and "y" as the dependent variable, the relationship between them is controlled by the model parameters m and C when we propose the following model : y = mx + C
Given a set of data points, we can determine the parameters m and C by using least squares regression which is simply minimization of the error between the actual data points and those predicted by the model function. To demonstrate how to use leastsq for fitting let us generate some data points by importing the library numpy, defining a linear function and making a scatter plot.
In [47]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import leastsq
# This is a magic function that renders the plots in the Ipython notebook itself
% matplotlib inline 
Let us define a model function which is an exponential decay curve and depends on two parameters.

y = A exp(-x * B)

Documentation is important and we will be specifying the input parameters and the output of out function.
In [19]:
# Define a linear model function
def func(params, x):
    """
    Model function. The function is defined by the following equation.
                    y = A exp(-x * B)
    Parameters
    ----------
    params : array (2,)
        An array with two elements, A and B
        
    x      : array (N,)
        The independent variable
    
    Returns
    -------
    y      : array (N,)
        The values of the dependent variable
    """
    A, B = params[0], params[1]
    return A*np.exp(-x*B)
Use numpy's arange to generate an array of "x" and apply the function to get a set of data points. We can add some noise by the use of the random number generator in numpy.
In [68]:
x = np.arange(0,400,10)
N = x.shape[0]
params = 1., 0.01
Y = func(params, x)
noise = np.random.rand(N)

y = Y + noise/5.
Let us make a scatter plot of our data
In [69]:
plt.scatter(x, y, color='blue')
plt.title("Plot of x vs y")
plt.xlabel("x")
plt.ylabel("y")
plt.show()
Now that we have a set of data points, we can go ahead with the fitting using leastsq. Leastsq is a wrapped around the FORTRAN package MINPACK’s lmdif and lmder algorithms. It uses the Levenberg Marquardt algorithm for finding the best fit parameters.
Leastsq requires an error function which gives the difference between the observed target data (y) and a (non-linear) function of the parameters f(x, params). You can also specify initial guesses for the parameters using the variable x0. For more reference check out the documentation athttp://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.leastsq.html
In [70]:
def error_fun(params, x, y):
    """
    Error function for model function.
    
    Parameters
    ----------
    params : array (2,)
        Parameters for the function
        
    x      : array (N,)
        The independent variable
    
    y      : array (N,)
        The observed variable
        
    Returns
    -------
    residual : array (N,)
        The difference between observed data and predicted data from the model function

    """
    residual = func(params, x) - y
    return residual
Now, we are ready to use leastsq for getting the fit. The call to leastsq returns an array whose first element is the set of parameters obtained and the other elements give more information about the fitting.
In [71]:
fit = leastsq(error_fun, x0=[0.5, 0.05], args=(x,y))
fitted_params = fit[0]
print ("Parameters after leastsq fit are :", fitted_params)
Parameters after leastsq fit are : [ 0.98855093  0.00679147]
Let us compare our fitting and plot the results.
In [74]:
y_predicted = func(fitted_params, x)
plt.scatter(y_predicted, x, color="red", label="Predicted signal")
plt.scatter(y, x, color="blue", label="Actual data")
plt.title("Plot results")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.plot()
Out[74]:
[]
Congratualations ! You have done your first model fitting with Python. You can now look at other fitting routines such as scipy.optimize.minimize, use different types of model functions and try to write code for more complicated functions and data. But at the heart of it, its as simple as this basic example.

The IVIM model function is a a bi-exponential curve with the independent variable "b" and parameters : f, D_star, D and S0. The parameters are perfusion fraction (f), pseudo diffusion constant (D_star), tissue diffusion constant (D) and the non gradient signal value (S0) respectively. We intend to extract these parameters given some data and the bvalues (b) associated with the data. We follow test-driven development and hence the first task was to simulate some data, write a fitting function and see if we are getting the correct results. An Ipython notebook tests a basic implementation of the fitting routing by generating a signal and plotting the results.

We have the following plots for two different signals generated with Dipy's multi_tensor function and the fitting by our code.

https://github.com/sahmed95/dipy/blob/ipythonNB/dipy/reconst/ivim_dev.ipynb

Once, a basic fitting routine was in place, we moved on to incorporate unit tests for the model and fitting functions. Writing a test is pretty simple. Create a file as test_ivim.py and define the test functions as "test_ivim():". Inside the test, generate a signal by taking a set of bvalues and passing the ivim parameters to the multi_tensor function. Then, initiate an IvimModel and call its fit method to get model_parameters. The parameters obtained from the fit should nearly be the same as the parameters used to generate the model. This can be achieved by numpy's testing functions
assert_array_equal(), assert_array_almost_equal().

We use nosetests for testing which makes running a test as simple as "nosetests test_ivim.py". It is necessary to build the package by running the setup.py file with the arguments build_ext --inplace.

The next step is to implement a two-stage fitting where we will consider a bvalue threshold and carry out a fitting of D while assuming that the perfusion fraction is negligible, This will simplify the model to S = S0 e^{-b*D}). The D value thus obtained will be used as a guess for the complete fit.

You can find the code so far here : https://github.com/nipy/dipy/pull/1058

A little note about the two fitting routines we explored : scipy's leastsq and minimize. "optimize" is a more flexible method, allowing bounds to be passed while fitting and allows the user to specify particular fitting methods while leastsq doesn't have that flexibility and uses MINPACK's lmdiff written in FORTRAN. However, leastsq performed better on our tests than minimize. A possible reason might be that leastsq calculates the Jacobian for fitting and may be better suited to fit bi exponential functions. It might also turn out that implementing Jacobian improves the performance of optimize for fitting. All these questions will be explored later and for now we have included both fitting routines in our code. However, as Ariel pointed out, minimize is not available in older versions of scipy and to have backward compatibility we might go ahead with leastsq.

Next up, we will implement a two stage fitting as discussed in the original paper on IVIM in Le Bihan's 1988 paper.[1].

Meanwhile feel free to fork the development version of the code and play around with the IVIM data provide by Eric Peterson here : https://figshare.com/s/4733b6fc92d4977c2ee1

1. Le Bihan, Denis, et al. "Separation of diffusion and perfusion in intravoxel incoherent motion MR imaging." Radiology 168.2 (1988): 497-505.

No comments:

Post a Comment