CPU inference in libtorch causes OOM with repeated calls to forward

1 year ago

Comment

1 / 1

If you're encountering Out of Memory (OOM) errors when making repeated calls to the forward function in libtorch for CPU inference, it's likely due to memory fragmentation caused by allocating and deallocating memory repeatedly. Here are a few suggestions to mitigate the issue:

Use a Single Inference Buffer: Instead of creating and destroying tensors on each forward call, preallocate a single tensor and reuse it for inference. Allocate the tensor once and pass the input data into this tensor for each inference call. Reusing the same memory buffer can help reduce memory fragmentation.
Reuse the Model: If possible, reuse the loaded model object instead of loading it for each inference call. Loading the model can be an expensive operation, so if you repeatedly load the model for each inference, it may lead to memory fragmentation. Load the model once and reuse it across multiple inference calls.
Use Memory Pools: Create a memory pool to manage memory allocation and deallocation. Instead of directly allocating and deallocating tensors, use a memory pool to obtain and release memory blocks. This approach can help reduce memory fragmentation and improve memory usage efficiency.
Batch Inference: If your use case allows it, consider batching multiple inputs together and performing inference on the batch. This can help reduce the number of inference calls and potentially improve memory efficiency.
Check Memory Consumption: Monitor the memory consumption of your application during inference and ensure that you are not inadvertently holding onto unnecessary tensors or data. Properly releasing memory that is no longer needed can help prevent excessive memory usage.
Increase System Memory: If the above steps are not sufficient and you consistently encounter OOM errors, you may need to consider increasing the available system memory. This can be done by adding more RAM to your machine or using a system with higher memory capacity.

By applying these techniques, you can mitigate OOM errors caused by repeated forward calls in libtorch for CPU inference. Remember to profile and monitor your memory usage to optimize the memory management in your specific use case.