Class ParallelConfig

java.lang.Object
com.morphiqlabs.wavelet.extensions.parallel.ParallelConfig

public class ParallelConfig extends Object
Configuration for parallel execution in VectorWave.

This class provides fine-grained control over parallel execution strategies, leveraging Java 24 features including virtual threads and structured concurrency.

Features:

  • Automatic parallelism level detection based on system capabilities
  • Adaptive threshold calculation for efficient parallelization
  • Virtual thread support for I/O-bound operations
  • GPU acceleration configuration (when available)
  • Cost-based execution mode selection

Thread Pool Sizing Strategy

VectorWave uses a sophisticated thread pool sizing strategy that adapts to different workload types and system capabilities:

1. CPU-Bound Operations (ForkJoinPool)

  • Default Size: ForkJoinPool.commonPool() with parallelism = availableProcessors()
  • Rationale: CPU-bound tasks benefit from having one thread per CPU core
  • Work Stealing: ForkJoin's work-stealing algorithm balances load automatically
  • Shared vs Dedicated:
    • Common pool used when metrics disabled (zero overhead)
    • Dedicated pool created when metrics enabled (allows shutdown control)

2. I/O-Bound Operations (Virtual Threads)

  • Default: Executors.newVirtualThreadPerTaskExecutor()
  • Rationale: Virtual threads excel at I/O-bound tasks with blocking operations
  • Scaling: Can create millions of virtual threads with minimal memory overhead
  • Use Cases: File I/O, network operations, database queries during analysis

3. Parallelism Level Calculation


 Auto mode:
   parallelismLevel = Runtime.getRuntime().availableProcessors()
 
 Custom mode:
   parallelismLevel = user-specified value (1 to N)
 
 Chunking:
   chunks = min(parallelismLevel, dataSize / minChunkSize)
   where minChunkSize = max(512, parallelThreshold / 4)
 

4. Threshold Determination

  • Base Calculation: threshold = OPERATIONS_PER_CORE_MS / cores
  • Minimum: Never less than 512 elements (overhead dominates below this)
  • Adaptive: Can be adjusted based on complexity factor
  • Environment Override: System properties and env vars for fine-tuning

5. Memory Considerations

  • Chunk Size: Optimized for L1 cache (512 doubles = 4KB)
  • NUMA Awareness: Work-stealing helps with NUMA architectures
  • False Sharing: Chunk boundaries aligned to cache lines

6. Performance Characteristics

Performance characteristics showing expected speedup based on CPU core count and chunk size configuration
CoresDefault ThresholdChunk SizeExpected Speedup
410245122.5-3.5x
85125124-6x
162565128-12x
3225651212-20x
  • Method Details

    • auto

      public static ParallelConfig auto()
      Creates an auto-configured instance based on system capabilities.
      Returns:
      optimally configured ParallelConfig
    • cpuIntensive

      public static ParallelConfig cpuIntensive()
      Creates a configuration optimized for CPU-intensive operations.
      Returns:
      CPU-optimized configuration
    • ioIntensive

      public static ParallelConfig ioIntensive()
      Creates a configuration optimized for I/O-bound operations.
      Returns:
      I/O-optimized configuration
    • shouldParallelize

      public boolean shouldParallelize(int inputSize, double complexity, AdaptiveThresholdTuner.OperationType operationType)
      Determines if parallel execution should be used for given input size with specified operation type (for adaptive tuning).
      Parameters:
      inputSize - the size of the input data
      complexity - computational complexity factor (1.0 = normal)
      operationType - the type of operation for adaptive tuning
      Returns:
      true if parallel execution is recommended
    • shouldParallelize

      public boolean shouldParallelize(int inputSize, double complexity)
      Determines if parallel execution should be used for given input size.
      Parameters:
      inputSize - the size of the input data
      complexity - computational complexity factor (1.0 = normal)
      Returns:
      true if parallel execution is recommended
    • calculateChunks

      public int calculateChunks(int dataSize)
      Calculates optimal number of chunks for parallel processing.
      Parameters:
      dataSize - total size of data to process
      Returns:
      optimal number of chunks
    • getCPUExecutor

      public ExecutorService getCPUExecutor()
      Gets the executor service for CPU-bound operations.
      Returns:
      CPU executor service
    • getIOExecutor

      public ExecutorService getIOExecutor()
      Gets the executor service for I/O-bound operations.
      Returns:
      virtual thread executor or CPU executor if virtual threads disabled
    • recordExecution

      public void recordExecution(boolean wasParallel)
      Records execution metrics for performance tracking.
      Parameters:
      wasParallel - whether parallel execution was used
    • getStats

      public ParallelConfig.ExecutionStats getStats()
      Gets execution statistics.
      Returns:
      execution statistics
    • getParallelismLevel

      public int getParallelismLevel()
      Returns the configured CPU parallelism level.
      Returns:
      configured CPU parallelism level
    • getParallelThreshold

      public int getParallelThreshold()
      Returns the size threshold to enable the parallel path.
      Returns:
      size threshold to enable parallel path
    • isUseVirtualThreads

      public boolean isUseVirtualThreads()
      Returns whether virtual threads are used.
      Returns:
      true if virtual threads are used
    • isEnableGPU

      public boolean isEnableGPU()
      Returns whether GPU acceleration is enabled.
      Returns:
      true if GPU acceleration is enabled
    • getMode

      public ParallelConfig.ExecutionMode getMode()
      Returns the current execution mode.
      Returns:
      current execution mode
    • getChunkSize

      public int getChunkSize()
      Returns the chunk size used to partition work.
      Returns:
      chunk size used to partition work
    • isEnableStructuredConcurrency

      public boolean isEnableStructuredConcurrency()
      Returns whether structured concurrency is enabled.
      Returns:
      true if structured concurrency is enabled
    • isAdaptiveThreshold

      public boolean isAdaptiveThreshold()
      Returns whether adaptive threshold is enabled.
      Returns:
      true if adaptive threshold is enabled
    • getOverheadFactor

      public double getOverheadFactor()
      Returns the empirical overhead multiplier.
      Returns:
      empirical overhead multiplier
    • isEnableParallelThresholding

      public boolean isEnableParallelThresholding()
      Returns whether parallel thresholding is enabled.
      Returns:
      true if parallel thresholding is enabled
    • isEnableMetrics

      public boolean isEnableMetrics()
      Returns whether metrics collection is enabled.
      Returns:
      true if metrics collection is enabled
    • isEnableAdaptiveTuning

      public boolean isEnableAdaptiveTuning()
      Returns whether adaptive tuning feedback loop is enabled.
      Returns:
      true if adaptive tuning feedback loop is enabled
    • getAdaptiveTuner

      public AdaptiveThresholdTuner getAdaptiveTuner()
      Returns the adaptive threshold tuner instance (may be null).
      Returns:
      adaptive threshold tuner instance (may be null)
    • recordAdaptiveFeedback

      public void recordAdaptiveFeedback(AdaptiveThresholdTuner.OperationType operationType, int inputSize, int threshold, long parallelTime, long sequentialTime)
      Records adaptive tuning feedback.
      Parameters:
      operationType - the type of operation
      inputSize - the input size
      threshold - the threshold that was used
      parallelTime - the parallel execution time in nanoseconds
      sequentialTime - the sequential execution time in nanoseconds
    • shutdown

      public void shutdown()
      Releases dedicated resources created by this configuration.

      This method shuts down only executors that are owned by this config (for example, a virtual-thread executor created when useVirtualThreads is enabled). It never shuts down the shared ForkJoinPool common pool. When a transform or denoiser owns its ParallelConfig, it may invoke this during close(); otherwise, callers should invoke this explicitly to release resources deterministically.