Jupyter Notebook is a widely used platform among data scientists and developers to write code interactively and analyze and visualize data. However, it does suffer from some inherent limitations related to code execution using multiple CPU cores. This is related to the Global Interpreter Lock in Python, which prevents more than one native thread from executing Python bytecodes at once in a single process. Hence, the Jupyter Notebook cannot natively execute multithreaded code using multiple CPUs.
Offloading computation intensive work to external python scripts overcomes this limitation. Writing computing-intensive part in a separate .py
file running outside Jupyter environment you can fork the process using the multiprocessing
module in python, every process would run in a different python interpreter, that means bypassing GIL and hence using multiple CPUs.
Below we have documented a step-by-step guide to implement this:
- Create an External Python Script: Write your CPU-intensive code in a separate
.py
file. For example, create a file namedcompute.py
with the following content:
from multiprocessing import Pool
def heavy_computation(x):
# Your CPU-intensive computation here
return result
if __name__ == '__main__':
inputs = [...] # List of inputs
with Pool(processes=4) as pool: # Adjust the number of processes as needed
results = pool.map(heavy_computation, inputs)
print(results)
- Execute the Script Outside Jupyter Notebook: You may execute the script directly from the command line or the terminal:
python compute.py
You can conveniently download the example compute.py file for free from our github repository.
In that way, you are able to perform parallel calculations by utilizing all CPU cores available and get around the multithreading limitation in Jupyter Notebook.