TAGS :Viewed: 6 - Published at: a few seconds ago

[ add items to dictionary of list ]

I have files with a lines as such, where every row has an index (a,b) and then a list of number associated to them


What i want is to get to this output

12    a
123   a
8340  a
4985  a
3856  b
276   b

Note that I am only wanting to output a unique set of the genes, with the value of first occurrence in case there are more than one of the same numbers in the rows.

I went about it in this way: by trying to add the numbers to a dictionary with the letter as keys, and the numbers as values. Finally, only outputting the set() of the numbers together with the corresponding letter.

uniqueval = set()
d = defaultdict(list)

for line in file:
   fields = line.strip().split(\t)
   Idx = fields[0]
   Values = fields[1].split("|")
   for Val in Values:
       d[Idx] += Val
       for u in uniqueval:
           print u,"\t", [key for key in d.keys() if u in d.values()]

The script runs, but when I look into the dictionary, the Val's are all split by character, as such:

 {'a': ['1','2','1'....], 'b': ['3', '8',....]}

I don't understand why the Values get split since it's in a for loop, I thought it was going to take each Val as a new value to add to the dict. Could you help me understand this issue?

Thank you.

Answer 1

You are extending your lists with Val:

d[Idx] += Val

This adds each character in Val as a separate element.

Use append() instead: