Python Multiprocessing: Pool vs Process – Comparative Analysis

Apr 3, 2021 | Blog

About Priyanka Mane

Introduction To Python Multiprocessing 

Multiprocessing is a great way to improve performance. We came across Python Multiprocessing when we had the task of evaluating the millions of excel expressions using python code. In such a scenario, evaluating the expressions serially becomes imprudent and time-consuming.

So, we decided to use Python Multiprocessing.

Generally, in multiprocessing, you execute your task using a process or thread. To get better advantage of multiprocessing, we decided to use thread. But while doing research, we got to know that GIL Lock disables the multi-threading functionality in Python. On further digging, we got to know that Python provides two classes for multiprocessing i.e. Process and Pool class. In the following sections, I have narrated a brief overview of our experience while using pool and process classes.  And the performance comparison using both the classes. I have also detailed out the performance comparison, which will help to choose the appropriate method for your multiprocessing task.


Python Multiprocessing: The Pool and Process class

Though Pool and Process both execute the task parallelly, their way of executing tasks parallelly is different.

The pool distributes the tasks to the available processors using a FIFO scheduling. It works like a map-reduce architecture. It maps the input to the different processors and collects the output from all the processors. After the execution of code, it returns the output in form of a list or array. It waits for all the tasks to finish and then returns the output. The processes in execution are stored in memory and other non-executing processes are stored out of memory.

The process class puts all the processes in memory and schedules execution using FIFO policy. When the process is suspended, it pre-empts and schedules a new process for execution.

When to use Pool and Process

I think choosing an appropriate approach depends on the task in hand. The pool allows you to do multiple jobs per process, which may make it easier to parallelize your program. If you have a million tasks to execute in parallel, you can create a Pool with a number of processes as many as CPU cores and then pass the list of the million tasks to The pool will distribute those tasks to the worker processes(typically the same in number as available cores) and collects the return values in the form of a list and pass it to the parent process. Launching separate million processes would be much less practical (it would probably break your OS).

Pool Process 

On the other hand, if you have a small number of tasks to execute in parallel, and you only need each task done once, it may be perfectly reasonable to use a separate multiprocessing.process for each task, rather than setting up a Pool.

We used both, Pool and Process class to evaluate excel expressions. Following are our observations about pool and process class:

  1. Task number

As we have seen, the Pool allocates only executing processes in memory and the process allocates all the tasks in memory, so when the task number is small, we can use process class and when the task number is large, we can use the pool. In the case of large tasks, if we use a process then memory problems might occur, causing system disturbance. In the case of Pool, there is overhead in creating it. Hence with small task numbers, the performance is impacted when Pool is used.

  1. IO operations

The Pool distributes the processes among the available cores in FIFO manner. On each core, the allocated process executes serially. So, if there is a long IO operation, it waits till the IO operation is completed and does not schedule another process. This leads to an increase in execution time.  The Process class suspends the process of executing IO operations and schedules another process. So, in the case of long IO operation, it is advisable to use process class.

Python Multiprocessing: Performance Comparison

In our case, the performance using the Pool class was as follows:

1) Using pool- 6 secs

2) Without using the pool- 10 secs

Process () works by launching an independent system process for every parallel process you want to run. When we used Process class, we observed machine disturbance as 1 million processes were created and loaded in memory.

To test further, we reduced the number of arguments in each expression and ran the code for 100 expressions.

The performance using the Pool class is as follows:

1) Using pool- 4secs

2) Without using the pool- 3 secs

Then, we increased the arguments to 250 and executed those expressions.

The performance using the Pool class is as follows:

1) Using pool- 0.6secs

2) Without using the pool- 3 secs

To summarize this, pool class works better when there are more processes and small IO wait. Process class works better when processes are small in number and IO operations are long. What was your experience with Python Multiprocessing? I would be more than happy to have a conversation around this. Get in touch with me here: [email protected]

Want new articles before they get published?

Subscribe to our Blog.

[email-subscribers-form id="1"]