AutoscaledPool
Manages a pool of asynchronous resource-intensive tasks that are executed in parallel. The pool only starts new tasks if there is enough free CPU and memory available and the Javascript event loop is not blocked.
The information about the CPU and memory usage is obtained by the Snapshotter
class, which makes regular snapshots of system
resources that may be either local or from the Apify cloud infrastructure in case the process is running on the Apify platform. Meaningful data
gathered from these snapshots is provided to AutoscaledPool
by the SystemStatus
class.
Before running the pool, you need to implement the following three functions:
AutoscaledPoolOptions.runTaskFunction()
,
AutoscaledPoolOptions.isTaskReadyFunction()
and
AutoscaledPoolOptions.isFinishedFunction()
.
The auto-scaled pool is started by calling the AutoscaledPool.run()
function. The pool periodically queries the
AutoscaledPoolOptions.isTaskReadyFunction()
function for more tasks, managing optimal
concurrency, until the function resolves to false
. The pool then queries the
AutoscaledPoolOptions.isFinishedFunction()
. If it resolves to true
, the run finishes
after all running tasks complete. If it resolves to false
, it assumes there will be more tasks available later and keeps periodically querying for
tasks. If any of the tasks throws then the AutoscaledPool.run()
function rejects the promise with an error.
The pool evaluates whether it should start a new task every time one of the tasks finishes and also in the interval set by the
options.maybeRunIntervalSecs
parameter.
Example usage:
const pool = new Apify.AutoscaledPool({
maxConcurrency: 50,
runTaskFunction: async () => {
// Run some resource-intensive asynchronous operation here.
},
isTaskReadyFunction: async () => {
// Tell the pool whether more tasks are ready to be processed.
// Return true or false
},
isFinishedFunction: async () => {
// Tell the pool whether it should finish
// or wait for more tasks to become available.
// Return true or false
},
});
await pool.run();
new AutoscaledPool(options)
Parameters:
options
:AutoscaledPoolOptions
- AllAutoscaledPool
configuration options.
autoscaledPool.log
autoscaledPool.minConcurrency
Gets the minimum number of tasks running in parallel.
Returns:
number
autoscaledPool.minConcurrency
Sets the minimum number of tasks running in parallel.
WARNING: If you set this value too high with respect to the available system memory and CPU, your code might run extremely slow or crash. If you're not sure, just keep the default value and the concurrency will scale up automatically.
Parameters:
value
:number
autoscaledPool.maxConcurrency
Gets the maximum number of tasks running in parallel.
Returns:
number
autoscaledPool.maxConcurrency
Sets the maximum number of tasks running in parallel.
Parameters:
value
:number
autoscaledPool.desiredConcurrency
Gets the desired concurrency for the pool, which is an estimated number of parallel tasks that the system can currently support.
Returns:
number
autoscaledPool.desiredConcurrency
Sets the desired concurrency for the pool, i.e. the number of tasks that should be running in parallel if there's large enough supply of tasks.
Parameters:
value
:number