TAGS :Viewed: 4 - Published at: a few seconds ago

[ Modify object via Python multiprocessing.Pool: strange behavior ]

I have an object with two attributes: a dict and an int. When I modify the object using a forked process via multiprocessing.Pool, I get back the object with it the modified int attribute, but the dict isn't modified. Why is that?

from multiprocessing import Pool

def fork():
    someObject = SomeClass()
    for i in range(10):
        someObject.method(i)    
    print("in fork, someObject has dct=%s and nbr=%i" % (someObject.dct, someObject.nbr))
    return someObject

def test():
    pool = Pool(processes=1)             
    result = pool.apply(func=fork)
    print("in main, someObject has dct=%s and nbr=%i" % (result.dct, result.nbr))

class SomeClass(object):
    dct = {}
    nbr = 0     
    def method(self, nbr):
        self.dct[nbr]=nbr
        self.nbr+=nbr

if __name__=='__main__':
    test()

Output:

in fork, someObject has dct={0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} and nbr=45

in main, someObject has dct={} and nbr=45

Answer 1


The parent process has a different copy of SomeClass.dct and SomeClass.nbt than the child process(es).

The reason nbr is updated but not dct is that nbr actually becomes an instance variable when you do self.nbr+=nbr, which gets pickled and sent back to the parent process. But you never assign self.dct to anything, so self.dct (which actually refers to SomeClass.dct) does not get pickled.

You can see this by defining a __getstate__() on SomeClass:

class SomeClass(object):
    dct = {}
    nbr = 0
    def method(self, nbr):
        self.dct[nbr]=nbr
        self.nbr+=nbr

    def __getstate__(self):
        res = self.__dict__
        print("pickled", res)
        return res

This prints:

in fork, someObject has dct={0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} and nbr=45
('pickled', {'nbr': 45})
in main, someObject has dct={} and nbr=45

You can force dct to be pickled by assigning it to "itself":

class SomeClass(object):
    dct = {}
    nbr = 0
    def method(self, nbr):
        self.dct[nbr]=nbr
        self.dct = self.dct
        self.nbr+=nbr

This prints:

in fork, someObject has dct={0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} and nbr=45
('pickled', {'nbr': 45, 'dct': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}})
in main, someObject has dct={0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} and nbr=45

Answer 2


I found an alternative solution. Instead of using a dict(), I used a multiprocessing.Manager.dict() and it worked as expected.