java.lang.Object

com.morphiqlabs.wavelet.extensions.parallel.ParallelConfig

public class ParallelConfig extends Object

Configuration for parallel execution in VectorWave.

This class provides fine-grained control over parallel execution strategies, leveraging Java 24 features including virtual threads and structured concurrency.

Features:

Automatic parallelism level detection based on system capabilities
Adaptive threshold calculation for efficient parallelization
Virtual thread support for I/O-bound operations
GPU acceleration configuration (when available)
Cost-based execution mode selection

Thread Pool Sizing Strategy

VectorWave uses a sophisticated thread pool sizing strategy that adapts to different workload types and system capabilities:

1. CPU-Bound Operations (ForkJoinPool)

Default Size: ForkJoinPool.commonPool() with parallelism = availableProcessors()
Rationale: CPU-bound tasks benefit from having one thread per CPU core
Work Stealing: ForkJoin's work-stealing algorithm balances load automatically
Shared vs Dedicated:
- Common pool used when metrics disabled (zero overhead)
- Dedicated pool created when metrics enabled (allows shutdown control)

2. I/O-Bound Operations (Virtual Threads)

Default: Executors.newVirtualThreadPerTaskExecutor()
Rationale: Virtual threads excel at I/O-bound tasks with blocking operations
Scaling: Can create millions of virtual threads with minimal memory overhead
Use Cases: File I/O, network operations, database queries during analysis

3. Parallelism Level Calculation


 Auto mode:
   parallelismLevel = Runtime.getRuntime().availableProcessors()
 
 Custom mode:
   parallelismLevel = user-specified value (1 to N)
 
 Chunking:
   chunks = min(parallelismLevel, dataSize / minChunkSize)
   where minChunkSize = max(512, parallelThreshold / 4)

4. Threshold Determination

Base Calculation: threshold = OPERATIONS_PER_CORE_MS / cores
Minimum: Never less than 512 elements (overhead dominates below this)
Adaptive: Can be adjusted based on complexity factor
Environment Override: System properties and env vars for fine-tuning

5. Memory Considerations

Chunk Size: Optimized for L1 cache (512 doubles = 4KB)
NUMA Awareness: Work-stealing helps with NUMA architectures
False Sharing: Chunk boundaries aligned to cache lines

6. Performance Characteristics

Performance characteristics showing expected speedup based on CPU core count and chunk size configuration
Cores	Default Threshold	Chunk Size	Expected Speedup
4	1024	512	2.5-3.5x
8	512	512	4-6x
16	256	512	8-12x
32	256	512	12-20x

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

ParallelConfig.Builder

Builder for ParallelConfig.

static enum

ParallelConfig.ExecutionMode

Execution modes for parallel operations.

static final record

ParallelConfig.ExecutionStats

Execution statistics for recent operations.
Method Summary

Modifier and Type

Method

Description

static ParallelConfig

auto()

Creates an auto-configured instance based on system capabilities.

int

calculateChunks(int dataSize)

Calculates optimal number of chunks for parallel processing.

static ParallelConfig

cpuIntensive()

Creates a configuration optimized for CPU-intensive operations.

AdaptiveThresholdTuner

getAdaptiveTuner()

Returns the adaptive threshold tuner instance (may be null).

int

getChunkSize()

Returns the chunk size used to partition work.

ExecutorService

getCPUExecutor()

Gets the executor service for CPU-bound operations.

ExecutorService

getIOExecutor()

Gets the executor service for I/O-bound operations.

ParallelConfig.ExecutionMode

getMode()

Returns the current execution mode.

double

getOverheadFactor()

Returns the empirical overhead multiplier.

int

getParallelismLevel()

Returns the configured CPU parallelism level.

int

getParallelThreshold()

Returns the size threshold to enable the parallel path.

ParallelConfig.ExecutionStats

getStats()

Gets execution statistics.

static ParallelConfig

ioIntensive()

Creates a configuration optimized for I/O-bound operations.

boolean

isAdaptiveThreshold()

Returns whether adaptive threshold is enabled.

boolean

isEnableAdaptiveTuning()

Returns whether adaptive tuning feedback loop is enabled.

boolean

isEnableGPU()

Returns whether GPU acceleration is enabled.

boolean

isEnableMetrics()

Returns whether metrics collection is enabled.

boolean

isEnableParallelThresholding()

Returns whether parallel thresholding is enabled.

boolean

isEnableStructuredConcurrency()

Returns whether structured concurrency is enabled.

boolean

isUseVirtualThreads()

Returns whether virtual threads are used.

void

recordAdaptiveFeedback(AdaptiveThresholdTuner.OperationType operationType, int inputSize, int threshold, long parallelTime, long sequentialTime)

Records adaptive tuning feedback.

void

recordExecution(boolean wasParallel)

Records execution metrics for performance tracking.

boolean

shouldParallelize(int inputSize, double complexity)

Determines if parallel execution should be used for given input size.

boolean

shouldParallelize(int inputSize, double complexity, AdaptiveThresholdTuner.OperationType operationType)

Determines if parallel execution should be used for given input size with specified operation type (for adaptive tuning).

void

shutdown()

Releases dedicated resources created by this configuration.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- auto
  
  public static ParallelConfig auto()
  
  Creates an auto-configured instance based on system capabilities.
  
  Returns:
  
  optimally configured ParallelConfig
- cpuIntensive
  
  public static ParallelConfig cpuIntensive()
  
  Creates a configuration optimized for CPU-intensive operations.
  
  Returns:
  
  CPU-optimized configuration
- ioIntensive
  
  public static ParallelConfig ioIntensive()
  
  Creates a configuration optimized for I/O-bound operations.
  
  Returns:
  
  I/O-optimized configuration
- shouldParallelize
  
  public boolean shouldParallelize(int inputSize, double complexity, AdaptiveThresholdTuner.OperationType operationType)
  
  Determines if parallel execution should be used for given input size with specified operation type (for adaptive tuning).
  
  Parameters:
  
  inputSize - the size of the input data
  
  complexity - computational complexity factor (1.0 = normal)
  
  operationType - the type of operation for adaptive tuning
  
  Returns:
  
  true if parallel execution is recommended
- shouldParallelize
  
  public boolean shouldParallelize(int inputSize, double complexity)
  
  Determines if parallel execution should be used for given input size.
  
  Parameters:
  
  inputSize - the size of the input data
  
  complexity - computational complexity factor (1.0 = normal)
  
  Returns:
  
  true if parallel execution is recommended
- calculateChunks
  
  public int calculateChunks(int dataSize)
  
  Calculates optimal number of chunks for parallel processing.
  
  Parameters:
  
  dataSize - total size of data to process
  
  Returns:
  
  optimal number of chunks
- getCPUExecutor
  
  public ExecutorService getCPUExecutor()
  
  Gets the executor service for CPU-bound operations.
  
  Returns:
  
  CPU executor service
- getIOExecutor
  
  public ExecutorService getIOExecutor()
  
  Gets the executor service for I/O-bound operations.
  
  Returns:
  
  virtual thread executor or CPU executor if virtual threads disabled
- recordExecution
  
  public void recordExecution(boolean wasParallel)
  
  Records execution metrics for performance tracking.
  
  Parameters:
  
  wasParallel - whether parallel execution was used
- getStats
  
  public ParallelConfig.ExecutionStats getStats()
  
  Gets execution statistics.
  
  Returns:
  
  execution statistics
- getParallelismLevel
  
  public int getParallelismLevel()
  
  Returns the configured CPU parallelism level.
  
  Returns:
  
  configured CPU parallelism level
- getParallelThreshold
  
  public int getParallelThreshold()
  
  Returns the size threshold to enable the parallel path.
  
  Returns:
  
  size threshold to enable parallel path
- isUseVirtualThreads
  
  public boolean isUseVirtualThreads()
  
  Returns whether virtual threads are used.
  
  Returns:
  
  true if virtual threads are used
- isEnableGPU
  
  public boolean isEnableGPU()
  
  Returns whether GPU acceleration is enabled.
  
  Returns:
  
  true if GPU acceleration is enabled
- getMode
  
  public ParallelConfig.ExecutionMode getMode()
  
  Returns the current execution mode.
  
  Returns:
  
  current execution mode
- getChunkSize
  
  public int getChunkSize()
  
  Returns the chunk size used to partition work.
  
  Returns:
  
  chunk size used to partition work
- isEnableStructuredConcurrency
  
  public boolean isEnableStructuredConcurrency()
  
  Returns whether structured concurrency is enabled.
  
  Returns:
  
  true if structured concurrency is enabled
- isAdaptiveThreshold
  
  public boolean isAdaptiveThreshold()
  
  Returns whether adaptive threshold is enabled.
  
  Returns:
  
  true if adaptive threshold is enabled
- getOverheadFactor
  
  public double getOverheadFactor()
  
  Returns the empirical overhead multiplier.
  
  Returns:
  
  empirical overhead multiplier
- isEnableParallelThresholding
  
  public boolean isEnableParallelThresholding()
  
  Returns whether parallel thresholding is enabled.
  
  Returns:
  
  true if parallel thresholding is enabled
- isEnableMetrics
  
  public boolean isEnableMetrics()
  
  Returns whether metrics collection is enabled.
  
  Returns:
  
  true if metrics collection is enabled
- isEnableAdaptiveTuning
  
  public boolean isEnableAdaptiveTuning()
  
  Returns whether adaptive tuning feedback loop is enabled.
  
  Returns:
  
  true if adaptive tuning feedback loop is enabled
- getAdaptiveTuner
  
  public AdaptiveThresholdTuner getAdaptiveTuner()
  
  Returns the adaptive threshold tuner instance (may be null).
  
  Returns:
  
  adaptive threshold tuner instance (may be null)
- recordAdaptiveFeedback
  
  public void recordAdaptiveFeedback(AdaptiveThresholdTuner.OperationType operationType, int inputSize, int threshold, long parallelTime, long sequentialTime)
  
  Records adaptive tuning feedback.
  
  Parameters:
  
  operationType - the type of operation
  
  inputSize - the input size
  
  threshold - the threshold that was used
  
  parallelTime - the parallel execution time in nanoseconds
  
  sequentialTime - the sequential execution time in nanoseconds
- shutdown
  
  public void shutdown()
  
  Releases dedicated resources created by this configuration.
  This method shuts down only executors that are owned by this config (for example, a virtual-thread executor created when useVirtualThreads is enabled). It never shuts down the shared ForkJoinPool common pool. When a transform or denoiser owns its ParallelConfig, it may invoke this during close(); otherwise, callers should invoke this explicitly to release resources deterministically.

Class ParallelConfig

Thread Pool Sizing Strategy

1. CPU-Bound Operations (ForkJoinPool)

2. I/O-Bound Operations (Virtual Threads)

3. Parallelism Level Calculation

4. Threshold Determination

5. Memory Considerations

6. Performance Characteristics

Nested Class Summary

Method Summary

Methods inherited from class java.lang.Object

Method Details

auto

cpuIntensive

ioIntensive

shouldParallelize

shouldParallelize

calculateChunks

getCPUExecutor

getIOExecutor

recordExecution

getStats

getParallelismLevel

getParallelThreshold

isUseVirtualThreads

isEnableGPU

getMode

getChunkSize

isEnableStructuredConcurrency

isAdaptiveThreshold

getOverheadFactor

isEnableParallelThresholding

isEnableMetrics

isEnableAdaptiveTuning

getAdaptiveTuner

recordAdaptiveFeedback

shutdown