TAGS :Viewed: 11 - Published at: a few seconds ago

[ Finding intersection of values in a column associated with unique values in another column Pandas ]

If I have a DataFrame like this (very minimal example)

  col1  col2
0    a     1
1    a     2
2    b     1
3    b     2
4    b     4
5    c     1
6    c     2
7    c     3

and I want the intersection of all col2 values when they are related to their unique col1 values (so in this case, the intersection would be [1,2]), how can I do so with Pandas? Another way to word this would be the values in col2 that exist for every unique value in col1.

My (bad) solution was to get the unique col1 elements with unique, and then build dictionaries from each unique element in col1 and then take the set intersection of those dictionary values. I feel like there is a mechanism I should be using to relate the columns together however that could make this much easier.

Answer 1

One way is to use pivot_table:

In [11]: cross = df.pivot_table(index="col1", columns="col2", aggfunc='size') == 1

In [12]: cross
col2     1     2      3      4
a     True  True  False  False
b     True  True  False   True
c     True  True   True  False

In [13]: cross.all()
1     True
2     True
3    False
4    False
dtype: bool

In [14]: cross.columns[cross.all()]
Out[14]: Int64Index([1, 2], dtype='int64', name='col2')

Answer 2

Another solution:

print df.pivot_table(index="col1", columns="col2", aggfunc=len)
col2    1    2    3    4
a     1.0  1.0  NaN  NaN
b     1.0  1.0  NaN  1.0
c     1.0  1.0  1.0  NaN

ser = (df.pivot_table(index="col1", columns="col2", aggfunc=len) == 1).all()
print  ser.index[ser]
Int64Index([1, 2], dtype='int64', name=u'col2')