Class ParallelConfig
java.lang.Object
com.morphiqlabs.wavelet.extensions.parallel.ParallelConfig
Configuration for parallel execution in VectorWave.
This class provides fine-grained control over parallel execution strategies, leveraging Java 24 features including virtual threads and structured concurrency.
Features:
- Automatic parallelism level detection based on system capabilities
- Adaptive threshold calculation for efficient parallelization
- Virtual thread support for I/O-bound operations
- GPU acceleration configuration (when available)
- Cost-based execution mode selection
Thread Pool Sizing Strategy
VectorWave uses a sophisticated thread pool sizing strategy that adapts to different workload types and system capabilities:
1. CPU-Bound Operations (ForkJoinPool)
- Default Size: ForkJoinPool.commonPool() with parallelism = availableProcessors()
- Rationale: CPU-bound tasks benefit from having one thread per CPU core
- Work Stealing: ForkJoin's work-stealing algorithm balances load automatically
- Shared vs Dedicated:
- Common pool used when metrics disabled (zero overhead)
- Dedicated pool created when metrics enabled (allows shutdown control)
2. I/O-Bound Operations (Virtual Threads)
- Default: Executors.newVirtualThreadPerTaskExecutor()
- Rationale: Virtual threads excel at I/O-bound tasks with blocking operations
- Scaling: Can create millions of virtual threads with minimal memory overhead
- Use Cases: File I/O, network operations, database queries during analysis
3. Parallelism Level Calculation
Auto mode:
parallelismLevel = Runtime.getRuntime().availableProcessors()
Custom mode:
parallelismLevel = user-specified value (1 to N)
Chunking:
chunks = min(parallelismLevel, dataSize / minChunkSize)
where minChunkSize = max(512, parallelThreshold / 4)
4. Threshold Determination
- Base Calculation: threshold = OPERATIONS_PER_CORE_MS / cores
- Minimum: Never less than 512 elements (overhead dominates below this)
- Adaptive: Can be adjusted based on complexity factor
- Environment Override: System properties and env vars for fine-tuning
5. Memory Considerations
- Chunk Size: Optimized for L1 cache (512 doubles = 4KB)
- NUMA Awareness: Work-stealing helps with NUMA architectures
- False Sharing: Chunk boundaries aligned to cache lines
6. Performance Characteristics
Cores | Default Threshold | Chunk Size | Expected Speedup |
---|---|---|---|
4 | 1024 | 512 | 2.5-3.5x |
8 | 512 | 512 | 4-6x |
16 | 256 | 512 | 8-12x |
32 | 256 | 512 | 12-20x |
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
Builder for ParallelConfig.static enum
Execution modes for parallel operations.static final record
Execution statistics for recent operations. -
Method Summary
Modifier and TypeMethodDescriptionstatic ParallelConfig
auto()
Creates an auto-configured instance based on system capabilities.int
calculateChunks
(int dataSize) Calculates optimal number of chunks for parallel processing.static ParallelConfig
Creates a configuration optimized for CPU-intensive operations.Returns the adaptive threshold tuner instance (may be null).int
Returns the chunk size used to partition work.Gets the executor service for CPU-bound operations.Gets the executor service for I/O-bound operations.getMode()
Returns the current execution mode.double
Returns the empirical overhead multiplier.int
Returns the configured CPU parallelism level.int
Returns the size threshold to enable the parallel path.getStats()
Gets execution statistics.static ParallelConfig
Creates a configuration optimized for I/O-bound operations.boolean
Returns whether adaptive threshold is enabled.boolean
Returns whether adaptive tuning feedback loop is enabled.boolean
Returns whether GPU acceleration is enabled.boolean
Returns whether metrics collection is enabled.boolean
Returns whether parallel thresholding is enabled.boolean
Returns whether structured concurrency is enabled.boolean
Returns whether virtual threads are used.void
recordAdaptiveFeedback
(AdaptiveThresholdTuner.OperationType operationType, int inputSize, int threshold, long parallelTime, long sequentialTime) Records adaptive tuning feedback.void
recordExecution
(boolean wasParallel) Records execution metrics for performance tracking.boolean
shouldParallelize
(int inputSize, double complexity) Determines if parallel execution should be used for given input size.boolean
shouldParallelize
(int inputSize, double complexity, AdaptiveThresholdTuner.OperationType operationType) Determines if parallel execution should be used for given input size with specified operation type (for adaptive tuning).void
shutdown()
Releases dedicated resources created by this configuration.
-
Method Details
-
auto
Creates an auto-configured instance based on system capabilities.- Returns:
- optimally configured ParallelConfig
-
cpuIntensive
Creates a configuration optimized for CPU-intensive operations.- Returns:
- CPU-optimized configuration
-
ioIntensive
Creates a configuration optimized for I/O-bound operations.- Returns:
- I/O-optimized configuration
-
shouldParallelize
public boolean shouldParallelize(int inputSize, double complexity, AdaptiveThresholdTuner.OperationType operationType) Determines if parallel execution should be used for given input size with specified operation type (for adaptive tuning).- Parameters:
inputSize
- the size of the input datacomplexity
- computational complexity factor (1.0 = normal)operationType
- the type of operation for adaptive tuning- Returns:
- true if parallel execution is recommended
-
shouldParallelize
public boolean shouldParallelize(int inputSize, double complexity) Determines if parallel execution should be used for given input size.- Parameters:
inputSize
- the size of the input datacomplexity
- computational complexity factor (1.0 = normal)- Returns:
- true if parallel execution is recommended
-
calculateChunks
public int calculateChunks(int dataSize) Calculates optimal number of chunks for parallel processing.- Parameters:
dataSize
- total size of data to process- Returns:
- optimal number of chunks
-
getCPUExecutor
Gets the executor service for CPU-bound operations.- Returns:
- CPU executor service
-
getIOExecutor
Gets the executor service for I/O-bound operations.- Returns:
- virtual thread executor or CPU executor if virtual threads disabled
-
recordExecution
public void recordExecution(boolean wasParallel) Records execution metrics for performance tracking.- Parameters:
wasParallel
- whether parallel execution was used
-
getStats
Gets execution statistics.- Returns:
- execution statistics
-
getParallelismLevel
public int getParallelismLevel()Returns the configured CPU parallelism level.- Returns:
- configured CPU parallelism level
-
getParallelThreshold
public int getParallelThreshold()Returns the size threshold to enable the parallel path.- Returns:
- size threshold to enable parallel path
-
isUseVirtualThreads
public boolean isUseVirtualThreads()Returns whether virtual threads are used.- Returns:
- true if virtual threads are used
-
isEnableGPU
public boolean isEnableGPU()Returns whether GPU acceleration is enabled.- Returns:
- true if GPU acceleration is enabled
-
getMode
Returns the current execution mode.- Returns:
- current execution mode
-
getChunkSize
public int getChunkSize()Returns the chunk size used to partition work.- Returns:
- chunk size used to partition work
-
isEnableStructuredConcurrency
public boolean isEnableStructuredConcurrency()Returns whether structured concurrency is enabled.- Returns:
- true if structured concurrency is enabled
-
isAdaptiveThreshold
public boolean isAdaptiveThreshold()Returns whether adaptive threshold is enabled.- Returns:
- true if adaptive threshold is enabled
-
getOverheadFactor
public double getOverheadFactor()Returns the empirical overhead multiplier.- Returns:
- empirical overhead multiplier
-
isEnableParallelThresholding
public boolean isEnableParallelThresholding()Returns whether parallel thresholding is enabled.- Returns:
- true if parallel thresholding is enabled
-
isEnableMetrics
public boolean isEnableMetrics()Returns whether metrics collection is enabled.- Returns:
- true if metrics collection is enabled
-
isEnableAdaptiveTuning
public boolean isEnableAdaptiveTuning()Returns whether adaptive tuning feedback loop is enabled.- Returns:
- true if adaptive tuning feedback loop is enabled
-
getAdaptiveTuner
Returns the adaptive threshold tuner instance (may be null).- Returns:
- adaptive threshold tuner instance (may be null)
-
recordAdaptiveFeedback
public void recordAdaptiveFeedback(AdaptiveThresholdTuner.OperationType operationType, int inputSize, int threshold, long parallelTime, long sequentialTime) Records adaptive tuning feedback.- Parameters:
operationType
- the type of operationinputSize
- the input sizethreshold
- the threshold that was usedparallelTime
- the parallel execution time in nanosecondssequentialTime
- the sequential execution time in nanoseconds
-
shutdown
public void shutdown()Releases dedicated resources created by this configuration.This method shuts down only executors that are owned by this config (for example, a virtual-thread executor created when
useVirtualThreads
is enabled). It never shuts down the sharedForkJoinPool
common pool. When a transform or denoiser owns itsParallelConfig
, it may invoke this duringclose()
; otherwise, callers should invoke this explicitly to release resources deterministically.
-