TAGS :Viewed: 22 - Published at: a few seconds ago

[ How do I index n sets of 4 columns to plot multiple plots using matplotlib? ]

I want to know how I should index / access some data programmatically in python. I have columnar data: depth, temperature, gradient, gamma, for a set of boreholes. There are n boreholes. I have a header, which lists the borehole name and numeric ID. Example:

Bore_name,Bore_ID,,,Bore_name,Bore_ID,,,, ... 
<a row of headers>
depth,temp,gradient,gamma,depth,temp,gradient,gamma ...

I don't know how to index the data, apart from rude iteration:

with open(filename,'rU') as f:
    bores = f.readline().rstrip().split(',')   
    headers = f.readline().rstrip().split(',')

# load from CSV file, missing values are empty 'cells'
tdata = numpy.genfromtxt(filename, skip_header=2, delimiter=',', missing_values='', filling_values=numpy.nan)

for column in range(0,numpy.shape(tdata)[1],4):  
    # plots temperature on x, depth on y
    pl.plot(tdata[:,column+1],tdata[:,column], label=bores[column])
    # get index at max depth
    depth = numpy.nanargmin(tdata[:,column])
    # plot text label at max depth (y) and temp at that depth (x)

It seems easy enough this way, but I've been using R recently and have got a bit used to their way of referencing data objects via classes and subclasses interpreted from headers.

Answer 1

You could put your data into a dict for each borehole, keyed by the borehole id, and values as dicts with headers as keys. Roughly like this:

data = {boreid1:{"temp":temparray, ...}, boreid2:{"temp":temparray}}

Probably reading from files will be a little bit more cumbersome with these approach, but for plotting you could do something like

pl.plot(data[boreid]["temperature"], data[boreid]["depth"])

Answer 2

Well if you like R's data.table, there have been a few (at least) attempts to re-create that functionality in NumPy--through additional classes in NumPy Core and through external Python libraries. The effort i find most promising is the datarray library by Fernando Perez. Here's how it works.

>>> # create a NumPy array for use as our data set
>>> import numpy as NP
>>> D = NP.random.randint(0, 10, 40).reshape(8, 5)

>>> # create some generic row and column names to pass to the constructor
>>> row_ids = [ "row{0}".format(c) for c in range(D1.shape[0]) ]
>>> rows = 'rows_id', row_ids

>>> variables = [ "col{0}".format(c) for c in range(D1.shape[1]) ]
>>> cols = 'variable', variables

Instantiate the DataArray instance, by calling the constructor and passing in an ordinary NumPy array and a list of tuples--one tuple for each axis, and since ndim = 2 here, there are two tuples in the list each tuple is comprised of axis label (str) and a sequence of labels for that axes (list).

>>> from datarray.datarray import DataArray as DA
>>> D1 = DA(D, [rows, cols])

>>> D1.axes
      (Axis(name='rows', index=0, labels=['row0', 'row1', 'row2', 'row3', 
           'row4', 'row5', 'row6', 'row7']), Axis(name='cols', index=1, 
           labels=['col0', 'col1', 'col2', 'col3', 'col4']))

>>> # now you can use R-like syntax to reference a NumPy data array by column:
>>> D1[:,'col1']
      DataArray([8, 5, 0, 7, 8, 9, 9, 4])

Answer 3

Here are idioms for naming rows and columns:

row0, row1 = np.ones((2,5))

for col in range(0, tdata.shape[1], 4):
   depth,temp,gradient,gamma = tdata[:, col:col+4] .T
   pl.plot( temp, depth )

See also namedtuple:

from collections import namedtuple
Rec = namedtuple( "Rec", "depth temp gradient gamma" )
r = Rec( *tdata[:, col:col+4].T )
print r.temp, r.depth

datarray (thanks Doug) is certainly more general.