TAGS :Viewed: 6 - Published at: a few seconds ago

[ How to get the mean of a subset of rows after using groupby? ]

I want to get the average of a particular subset of rows in one particular column in my dataframe.

I can use

df['C'].iloc[2:9].mean()

to get the mean of just the particular rows I want from my original Dataframe but my problem is that I want to perform this operation after using the groupby operation.

I am building on

df.groupby(["A", "B"])['C'].mean()

whereby there are 11 values returned in 'C' once I group by columns A and B and I get the average of those 11 values. I actually only want to get the average of the 3rd through 9th values though so ideally what I would want to do is

df.groupby(["A", "B"])['C'].iloc[2:9].mean()

This would return those 11 values from column C for every group of A,B and then would find the mean of the 3rd through 9th values but I know I can't do this. The error suggests using the apply method but I can't seem to figure it out.

Any help would be appreciated.

Answer 1


Try this variant:

for key, grp in df.groupby(["A", "B"]):
    print grp['C'].iloc[2:9].mean()

Answer 2


You can use agg function after the groupby and then subset within each group and take the mean:

df = pd.DataFrame({'A': ['a']*22, 'B': ['b1']*11 + ['b2']*11, 'C': list(range(11))*2})
# A dummy data frame to demonstrate

df.groupby(['A', 'B'])['C'].agg(lambda g: g.iloc[2:9].mean())

# A   B 
# a  b1    5
#    b2    5
# Name: C, dtype: int64