LeftoverLocals May Leak LLM Responses on Apple, Qualcomm, and AMD GPUs

Uncategorized

LeftoverLocals May Leak LLM Responses on Apple, Qualcomm, and AMD GPUs

MMS • Sergio De Simone

Article originally posted on InfoQ. Visit InfoQ

Security firm Trail of Bits disclosed a vulnerability allowing malicious actors to recover data from GPU local memory on Apple, Qualcomm, AMD, and Imagination GPUs. Dubbed LeftoverLocals, the vulnerability affects any application using the GPU, including Large Language Models (LLMs) and machine learning (ML) models.

Trail of Bits researchers built a proof of concept of how an attacker can recover GPU local memory, an optimized GPU memory region acting as a cache, across process or container boundaries. The video below shows an attacker listening to an interactive LLM chat session, getting access to the LLM response almost immediately.

LeftoverLocals can leak ~5.5 MB per GPU invocation on an AMD Radeon RX 7900 XT which, when running a 7B model on llama.cpp, adds up to ~181 MB for each LLM query. This is enough information to reconstruct the LLM response with high precision.

To exploit the vulnerability, an attacker needs to be able to run a GPU compute program on the same machine as the target LLM. This requires some kind of access to the target machine, possibly by exploiting a distinct vulnerability or inducing the user to install a malicious app on their system, which greatly reduces the vulnerability’s severity.

These attack programs, as our code demonstrates, can be less than 10 lines of code. Implementing these attacks is thus not difficult and is accessible to amateur programmers (at least in obtaining stolen data)

For example, using a framework like OpenCL, Vulkan, or Metal, an attacker can access data left in the GPU local memory by writing a GPU kernel that dumps uninitialized local memory. Browser GPU frameworks like WebGPU cannot be used in this way since they insert dynamic memory checks into GPU kernels.

Trail of Bits researchers remark that it is rather complex for a user to determine whether a GPU app uses local memory, since this would require inspecting the source code, including external dependencies, looking for GPU code. Likewise, the only user mitigation consists in modifying the source code of all GPU kernels that use local memory and making sure to clear local memory by resetting its content to 0s. This is made further complex by the possibility that the compiler optimizes those instructions away.

Of all impacted vendors, Qualcomm and Imagination released patched firmware addressing LeftoverLocals on some of their devices. Similarly, some Apple devices, i.e., the Apple iPad Air 3rd G (A12) or the iPhone 15, seem to have been patched, while others, e.g., the Apple MacBook Air (M2), seem to be still vulnerable.

Apple has confirmed that the A17 and M3 series processors contain fixes, but we have not been notified of the specific patches deployed across their devices.

AMD has confirmed to Trail of Bits that “they continue to investigate potential mitigation plans”.

On the other hand, NVIDIA and ARM GPUs confirmed their GPUs are not currently impacted by the vulnerability.

If you are interested in the low-level details of how LeftoverLocals works, as well as in a broader discussion of how it impacts LLM security and how GPU vendors should address the overall security of GPU compute devices, do not miss the original article.

About the Author

Sergio De Simone

Show moreShow less

Mobile Monitoring Solutions

Uncategorized