How to Use Python Multiprocessing in Windows10

Categories:

Updated:

6 minute read

1. How Unix and Windows Multiprocessing Differ

In Unix os, you can use python’s multiprocessing library conveniently and it won’t make trouble. However in Windows, it is hard to use multiprocessing, since Windows can only use ‘spawn’ method where Unix defaults on ‘fork’ method.

What is spawn and fork?

1. spawn

The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process objects run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.

Available on Unix and Windows. The default on Windows.

2. fork

The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.

Available on Unix only. The default on Unix.

quoted from https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

2. Typical Usage of Python Multiprocessing Package

You can use default python multiprocessing package like below pseudo-code.

from multiprocessing import Pool
def fn(x):
  ......
  ......
  return ...
array = [...]
with Pool(n_cores) as pool:
  res = pool.map(fn, array)

# fn: funtion that will be applied to each element of array
# n_cores: number of cores to use when multiprocessing
# array: iterative object where each element will be transformed by fn
# res: resulting array [fn(array[0]),...,fn([array[-1]])]

3. Difficulties in Windows

1. if __name__ == '__main__'

In windows, to use multiprocessing package, you should define function outside the if __name__ == '__main__' part and do multiprocessing inside that part like below.

def fn(x):
  ......
  ......
  return ...

if __name__ == '__main__':
  from multiprocessing import Pool:
  array = [...]
  with Pool(n_cores) as pool:
    res = pool.map(fn, array)

So it is not convenient to code using multiprocessing in windows, especially when you work in a jupyter notebook environment.

2. Memory Issue

Also, if the array you are passsing to the pool.map is big, you can easily have memory issues, since windows multiprocessing will duplicate that array n_cores times to pass them to each workers.

If your computer has 16gb ram and you want to multiprocess with 8cores, but if the array takes more than 2gb ram, the program will exceed memory usage and terminate.

4. How to Solve This Problem

We can at least solve the first problem by introducing the package multiprocess. Be careful since this is different from multiprocessing package.

You can install it with pip install multiprocess. It’s github page is https://github.com/uqfoundation/multiprocess .

With this package, you do not have to do if __name__ == '__main__' thing. You can just code like below pseudo-code.

from multiprocess import Pool
def fn(x):
  ......
  ......
  return ...
array = [...]
with Pool(n_cores) as pool:
  res = pool.map(fn, array)

5. Extra Tip: Using Keras fit_generator Multiprocessing in Windows Jupyter Notebook.

When training convolutional neural nets, you will want to do data augmentation (cpu) faster to prevent bottleneck with training (gpu). In that case, if you are using keras, you can use below pseudo-code.

model.fit_generator(x_train, y_train, use_multiprocessing=True, workers=num_cores)
# num_cores: number of cores to use when multiprocessing
# if you have 8core 16thread cpu, I would recommend num_cores=8or7

But above method only works in Unix os, since keras uses the default multiprocessing package.

So if you are a windows user and want to leverage cpu multiprocessing when augmenting/feeding the data, you should go and change your keras code a little.

Find the installed keras scripts, go to utils/data_utils and change the following two lines.

  1. https://github.com/keras-team/keras/blob/master/keras/utils/data_utils.py#L7 : Change import multiprocessing as mp to import multiprocess as mp
  2. https://github.com/keras-team/keras/blob/master/keras/utils/data_utils.py#L19 : Change from multiprocessing.pool import ThreadPool to from multiprocess.pool import ThreadPool

Leave a Comment