TAGS :Viewed: 1 - Published at: a few seconds ago

[ I have a dictionary stored as numpy array a typical key is in the format of ]

I have a dictionary of data stored as a numpy array. A typical key in the dictionary is in the format of:

('Typical Key', {'a': 100 'b': 'NaN', 'c': 'NaN', 'e': 360300, 'f': 8308552, 'g': 'NaN', 'h': 3576206, 'i': True, 'j': 'NaN', 'k': 'NaN', 'l': 'NaN', 'm': 'blah.blah@blah.com', 'x': 'NaN'})

I am trying to find which key in the dictionary contains an element with the maximum value in order to identify an outlier in my dataset which I can see on a graph. I know what the key of the data point SHOULD be from working through a tutorial (I know the answer)

I have tried a few ways of doing this but I'm consistently getting an unexpected result - I have been basing my code around using max() function. For instance see examples below:

inverse = [(value, key) for key, value in data_dict.items()]
print max(inverse)[1]

xx = max(data_dict, key=lambda i: data_dict[i])

print xx

import operator 
result = max(data_dict.iteritems(), key=operator.itemgetter(1))[0]
print result

I have a feeling that I'm not looking at the elements and that's the problem. Any help is appreciated!

Answer 1

O.K sorted it by tweaking the code proplerly - possibly because i had not articulated what i wanted to do properly Had to tweak the code slightly but this did work - i need to work to understand why my other code was not returning the expected value

import sys
Max = -sys.maxint
best_key = None
    for k, v in data_dict.iteritems():
    # k refers to each 'typical key'
  inner_dict = v
      for key, value in inner_dict.iteritems():
          if isinstance(value, int) and Max < value:
          Max = value
          best_key = k`

Answer 2

import sys
Max = -sys.maxint
best_key = None
for k, v in data_dict:
    # k refers to each 'typical key'
    inner_dict = v
    for key, value in inner_dict.items():
        if isinstance(value, int) and Max < value:
            Max = value
            best_key = key
print best_key

Answer 3

With your sample dictionary:

In [684]: dd
{'a': 100,
 'b': 'NaN',
 'c': 'NaN',
 'e': 360300,
 'f': 8308552,
 'g': 'NaN',
 'h': 3576206,
 'i': True,
 'j': 'NaN',
 'k': 'NaN',
 'l': 'NaN',
 'm': 'blah.blah@blah.com',
 'x': 'NaN'}

I can easily pull out a list of the values - but with those strings, I can't do a max.

In [685]: list(dd.values())

so as you discovered I have to first filter out the ints:

In [687]: max([i for i in dd.values() if isinstance(i,int)])
Out[687]: 8308552

Or a list of tuples of candidates for max:

In [692]: [(v,k) for k,v in dd.items() if isinstance(v,int)]
Out[692]: [(3576206, 'h'), (8308552, 'f'), (100, 'a'), (360300, 'e'), (True, 'i')]

and taking the max with a lambda key:

In [693]: max([(k,v) for k,v in dd.items() if isinstance(v,int)], key=lambda x:x[1])
Out[693]: ('f', 8308552)


from your comment (reformatted for clarity)

import pickle 
import sys 
import matplotlib.pyplot 
import numpy as np 
from feature_format import featureFormat, targetFeatureSplit
### read in data dictionary, convert to numpy array 
data_dict = pickle.load(open("../final_project/final_project_dataset.pkl", "r") ) 
features = ["salary", "bonus"] 
data = featureFormat(data_dict, features) 
print type(data)

You may import numpy, but the sample data is not an array. You gave us a tuple that contains a dictionary. And your code is all Python list and dictionary work, nothing using numpy.