ligo-ex ligo-ds
  Richardson Lab Experimental Log  Not logged in ELOG logo
Message ID: 555     Entry time: Thu Apr 17 12:12:29 2025
Author: Liu 
Type: HowTo 
Category: Interferometer Simulations 
Subject: Preventing Thread Contention in Multiprocessing with Finesse and Cython 

Avoiding Thread Contention When Using Multiprocessing with Finesse and Cython

When running Monte Carlo simulations or other computational workloads, it's common to use Python's ProcessPoolExecutor to parallelize multiple independent tasks. This approach works well—until it interacts with low-level libraries that themselves use multi-threading under the hood.

The Problem: Thread Over-Subscription

In a recent project, I ran into a significant performance issue while executing a large number of Monte Carlo trials using a process pool with 30 worker processes on Megatron (with 48 cores). Each trial ran a function that internally used cython.parallel.prange for fast, element-wise operations, which is what Finesse uses under the hood for many internal numerical calculations. Cython, via OpenMP, was configured to use all available CPU threads per process by default.

This resulted in severe thread over-subscription. With 30 parallel processes and each process attempting to use all 48 threads, the system was launching over 1,400 threads concurrently. The CPU quickly became saturated, and the tasks stalled. In some cases, the system became unresponsive, and the jobs had to be canceled repeatedly.

This happens because when the function calls into these libraries from within a Python multiprocessing context, each subprocess will attempt to use the full number of threads available to the machine.

The Solution: Limit Threads per Process

The solution is simple: explicitly limit the number of threads each subprocess is allowed to use. This can be done by setting the environment variable at the top of your script, before importing any thread-hungry libraries like Finesse.

      
import os
os.environ["OMP_NUM_THREADS"] = "1"

    

By setting OMP_NUM_THREADS to "1", we ensure that each multiprocessing worker uses only one thread internally, preventing them from overloading the system and allowing the tasks to run more efficiently.

ELOG V3.1.3-7933898