TAGS :Viewed: 5 - Published at: a few seconds ago

[ How to aggregate data into dictionary (or some other database) in python? ]

I am wondering if the following would be a good method in python to aggregate data that needs to be queried in another function by multiple keys or if I would have better performance using SQLite to read and write data.

For example some pseudocode of the function that aggregates:

import sys

def aggregatesources(sys.argv[1],sys.argv[2],sys.argv[3]):
    source1 = open(sys.argv[1], 'r') #source1.txt
        source1data = source1.read()
    source2 = open(sys.argv[2], 'r') #source2.txt
        source1data = source2.read()
    source3 = open(sys.argv[3], 'r') #source3.txt
        source1data = source3.read()

    aggregated_data = source1 + source2 + source3 # + etc...

This is the function that needs to make an aggregation of sources but my question is when I supply the sources as:

type1, 32
type2, 9
type3, 12
type4, 21
etc...

is there a way to take the aggregated data and associate it within a larger dictionary so that:

type1, [source1, 32], [source2,etc...], [etc...]

I want to use python's dictionary querying speed to make this instantaneous, but if there are alternative solutions that can do the same thing please elaborate on those.

Answer 1


This should do what you're looking for:

import csv

def add_source_to_dict(mydict, sourcefilename):
  with open(sourcefilename, 'rb') as csvfile:
    my_reader = csv.reader(csvfile)
    for atype, value in my_reader:
      if not atype in mydict:
        mydict[atype]={}
      mydict[atype][sourcefilename] = value
  return mydict

data = {}

data = add_source_to_dict(data, "source1.txt")

Interactively:

>>> data = {}
>>> data = add_source_to_dict(data, "source1.txt")
>>> data = add_source_to_dict(data, "source2.txt")
>>> data
{
  'type1,': {
    'source2.txt': '44', 
    'source1.txt': '32'
  }, 
  'type3,': {
    'source2.txt': '46', 
    'source1.txt': '12'
  }, 
  'type2,': {
    'source2.txt': '45', 
    'source1.txt': '9'
  }, 
  'type4,': {
    'source2.txt': '47', 
    'source1.txt': '21'
  }
}