Long eval clean up times - Weights & Biases Documentation

This page describes two methods to use together to reduce long clean-up times when you run W&B Weave evaluations with large datasets. It’s intended for users who have noticed extended delays after their evaluation code finishes but before the program exits. The following sections describe how to flush pending background work and how to increase client parallelism.

Flush pending background work

Flushing forces pending background work to complete in parallel with your main thread, rather than waiting for it after your code finishes. When you run evaluations with large datasets, you may experience a long delay before program execution completes, while the dataset uploads in background threads. This occurs when main thread execution finishes before background clean-up completes. Calling client.flush() forces all background tasks to process in the main thread, ensuring parallel processing during main thread execution. This can improve performance when user code completes before data uploads to the server. The following example flushes pending background work after an evaluation:

client = weave.init("fast-upload")

# ... evaluation setup
result = evaluation.Evaluate(dataset_id="my_dataset_id")

client.flush()

Increase client parallelism

Increasing client parallelism gives Weave more threads to use for background work such as dataset uploads, which can further reduce clean-up time alongside flushing. Weave determines client parallelism automatically based on the environment, but you can set it manually using the following environment variable:

WEAVE_CLIENT_PARALLELISM: The number of threads available for parallel processing. Increasing this value can improve the performance of background tasks such as dataset uploads.

You can also set this programmatically using the settings argument to weave.init():

client = weave.init("fast-upload", settings={"client_parallelism": 100})

Performance Evaluation

Documentation Index

​Flush pending background work

​Increase client parallelism

Flush pending background work

Increase client parallelism