TAGS :Viewed: 11 - Published at: a few seconds ago

[ Python gamma.fit returning values that don't seem to give correct distribution in excel ]

I have a series of experimental data values X and Y which are used to produce a scatter graph, this scatter graph looks very similar to a gamma distribution and I have read papers saying that this experimental data can represented/modeled using a gamma distribution.

So I have written the following bit of python code to find the gamma distributions constants:

import csv
import random
import scipy as sp
import scipy.stats as ss

from collections import defaultdict
columns = defaultdict(list)
with open('case_1_RTD.csv') as f:
    reader=csv.reader(f)
    reader.next()
    for row in reader:
        for(i,v) in enumerate(row):
            columns[i].append(v)
X=(columns[0])
Y=(columns[1])

data=[float(i) for i in Y]

alpha= []
beta=[]
loc=[]

alpha,loc,beta=ss.gamma.fit(data, floc=0)
print (alpha,loc,beta)

I then use the outputs from this to generate a gamma distribution in Excel and compare this new Gamma distribution data with the original X, Y data. The sets of data values are not a like at all.

In excel I use the function

=Gamma.Dist(X,alpha,beta,False)  #I have tried switching alpha and beta around but no luck

The fact that I do not use the X data set in the python code is a bit disconcerting, but from what I have read in the Scipy documentation I cannot see where to use it. Does this have something to do with loc variable in python? (from what i have read it does not)

The X,Y data sets contain 3718 values withe smallest Y value being 1.11E-297 could this be causing an issue?

Thanks in advance for any help or guidance

Answer 1


You seem to be looking to model $Y$ as a non-linear function of $X$, $Y=f(X)$, and not trying to estimate the distribution of $Y$. Apparently from theoretical considerations $f$ is a non-negative function with area under the curve of 1 with an exponentially decaying tail (Wikipedia article on residence time distribution), so you want to use a probability density function, specifically the Gamma distribution pdf.

This is not a distribution fitting problem, but rather a non-linear regression problem. I have no idea how to do it in Python, but a quick search for these keywords brought up a promising link.