[ How to get the mean of a subset of rows after using groupby? ]
I want to get the average of a particular subset of rows in one particular column in my dataframe.
I can use
df['C'].iloc[2:9].mean()
to get the mean of just the particular rows I want from my original Dataframe but my problem is that I want to perform this operation after using the groupby operation.
I am building on
df.groupby(["A", "B"])['C'].mean()
whereby there are 11 values returned in 'C' once I group by columns A and B and I get the average of those 11 values. I actually only want to get the average of the 3rd through 9th values though so ideally what I would want to do is
df.groupby(["A", "B"])['C'].iloc[2:9].mean()
This would return those 11 values from column C for every group of A,B and then would find the mean of the 3rd through 9th values but I know I can't do this. The error suggests using the apply method but I can't seem to figure it out.
Any help would be appreciated.
Answer 1
Try this variant:
for key, grp in df.groupby(["A", "B"]):
print grp['C'].iloc[2:9].mean()
Answer 2
You can use agg
function after the groupby and then subset within each group and take the mean
:
df = pd.DataFrame({'A': ['a']*22, 'B': ['b1']*11 + ['b2']*11, 'C': list(range(11))*2})
# A dummy data frame to demonstrate
df.groupby(['A', 'B'])['C'].agg(lambda g: g.iloc[2:9].mean())
# A B
# a b1 5
# b2 5
# Name: C, dtype: int64