[ I have a dictionary stored as numpy array a typical key is in the format of ]
I have a dictionary of data stored as a numpy array. A typical key in the dictionary is in the format of:
('Typical Key', {'a': 100 'b': 'NaN', 'c': 'NaN', 'e': 360300, 'f': 8308552, 'g': 'NaN', 'h': 3576206, 'i': True, 'j': 'NaN', 'k': 'NaN', 'l': 'NaN', 'm': 'blah.blah@blah.com', 'x': 'NaN'})
I am trying to find which key in the dictionary contains an element with the maximum value in order to identify an outlier in my dataset which I can see on a graph. I know what the key of the data point SHOULD be from working through a tutorial (I know the answer)
I have tried a few ways of doing this but I'm consistently getting an unexpected result - I have been basing my code around using max()
function. For instance see examples below:
inverse = [(value, key) for key, value in data_dict.items()]
print max(inverse)[1]
xx = max(data_dict, key=lambda i: data_dict[i])
print xx
import operator
result = max(data_dict.iteritems(), key=operator.itemgetter(1))[0]
print result
I have a feeling that I'm not looking at the elements and that's the problem. Any help is appreciated!
Answer 1
O.K sorted it by tweaking the code proplerly - possibly because i had not articulated what i wanted to do properly Had to tweak the code slightly but this did work - i need to work to understand why my other code was not returning the expected value
import sys
Max = -sys.maxint
best_key = None
for k, v in data_dict.iteritems():
# k refers to each 'typical key'
inner_dict = v
for key, value in inner_dict.iteritems():
if isinstance(value, int) and Max < value:
Max = value
best_key = k`
Answer 2
import sys
Max = -sys.maxint
best_key = None
for k, v in data_dict:
# k refers to each 'typical key'
inner_dict = v
for key, value in inner_dict.items():
if isinstance(value, int) and Max < value:
Max = value
best_key = key
print best_key
Answer 3
With your sample dictionary:
In [684]: dd
Out[684]:
{'a': 100,
'b': 'NaN',
'c': 'NaN',
'e': 360300,
'f': 8308552,
'g': 'NaN',
'h': 3576206,
'i': True,
'j': 'NaN',
'k': 'NaN',
'l': 'NaN',
'm': 'blah.blah@blah.com',
'x': 'NaN'}
I can easily pull out a list of the values - but with those strings, I can't do a max
.
In [685]: list(dd.values())
Out[685]:
['NaN',
'blah.blah@blah.com',
3576206,
'NaN',
8308552,
'NaN',
100,
'NaN',
'NaN',
360300,
'NaN',
True,
'NaN']
so as you discovered I have to first filter out the ints
:
In [687]: max([i for i in dd.values() if isinstance(i,int)])
Out[687]: 8308552
Or a list of tuples of candidates for max:
In [692]: [(v,k) for k,v in dd.items() if isinstance(v,int)]
Out[692]: [(3576206, 'h'), (8308552, 'f'), (100, 'a'), (360300, 'e'), (True, 'i')]
and taking the max with a lambda
key
:
In [693]: max([(k,v) for k,v in dd.items() if isinstance(v,int)], key=lambda x:x[1])
Out[693]: ('f', 8308552)
=============
from your comment (reformatted for clarity)
import pickle
import sys
import matplotlib.pyplot
import numpy as np
sys.path.append("../tools/")
from feature_format import featureFormat, targetFeatureSplit
#added
### read in data dictionary, convert to numpy array
data_dict = pickle.load(open("../final_project/final_project_dataset.pkl", "r") )
features = ["salary", "bonus"]
data = featureFormat(data_dict, features)
print type(data)
You may import numpy
, but the sample data is not an array. You gave us a tuple that contains a dictionary. And your code is all Python list and dictionary work, nothing using numpy
.