October 9, 2024
EP 63 – Jailbreaking AI: The Risks and Realities of Machine Identities
In this episode of Trust Issues, host David Puner welcomes back Lavi Lazarovitz, Vice President of Cyber Research at CyberArk Labs, for a discussion covering the latest developments in generative AI and the emerging cyberthreats associated with it. Lavi shares insights on how machine identities are becoming prime targets for threat actors and discusses the innovative research being conducted by CyberArk Labs to understand and mitigate these risks. The conversation also touches on the concept of responsible AI and the importance of building secure AI systems. Tune in to learn about the fascinating world of AI security and the cutting-edge techniques used to protect against AI-driven cyberattacks.
Hello and welcome to another episode of Trust Issues. Today on the podcast, we welcome back Lavi Lazarovitz, CyberArk Labs Vice President of Cyber Research. Since the last time he was on the show, Lavi and the Labs team have continued to dig deeply into generative AI, uncovering new and emerging threats—what he’s called the rise of the machines.
Lavi explains that behind all automation lies a machine identity, such as a service or container, using secrets to authenticate to other services. Just like people use usernames and passwords, machine identities use keys and certificates to secure trust and confidentiality. This trust is a prime target for threat actors.
With AI services gaining wider access and more permissions, the potential impact of compromises grows. And with machine identities now the top driver of overall identity growth, Lavi and the Labs team—always in “think like an attacker” mode—are closely tracking this side of the attack surface to understand how AI services and large language models can and will be targeted.
The ramifications are massive. In their work, the Labs team manipulates AI models to perform unauthorized actions and access sensitive information by exploiting vulnerabilities. Today, we discuss some of these emerging attack techniques [00:02:00], including jailbreaking. It’s really interesting stuff. Here’s my conversation with Lavi Lazarovitz.
David Puner [00:02:10]: Lavi Lazarovitz, Vice President of Cyber Research at CyberArk Labs, welcome back to the podcast, Lavi. Great to see you.
Lavi Lazarovitz [00:02:15]: Yeah, great to see you too. Thank you very much for having me.
David Puner [00:02:18]: Absolutely. I look back, and you haven’t been on the podcast since, um, July of last year. Where have you been?
Lavi Lazarovitz [00:02:25]: We’ve been busy a lot with, uh, research, CyberArk Impact, and conferences. A lot of things happening here in Israel as well. So, that’s where I’ve been.
David Puner [00:02:35]: Absolutely. So, the last time you were on the show back in July of 2023, as I mentioned, our focus was generative AI and how it was, at the time, of course, changing the threat landscape. So here we are now some [00:03:00] 15 months or so later, depending on when this actually releases. So, to start things off, how has your focus at CyberArk Labs changed since you were last on in July 2023 as a direct result of the evolution of AI and generative AI?
Lavi Lazarovitz [00:03:15]: So, what actually I think changed in the last 15 months or so is that we’re starting to see more deployment of the technology now. You know, it’s not only hype; we’re starting to use it, and it allows us another perspective or deep dive perspective into the deployment stage—how it is deployed—and it allows us to see a little better how AI technology is being integrated into services. That helped us a lot in understanding how the emerging threats we researched and thought about are now in the wild.
One example of this is a recent research study we published. We found that ministries of [00:04:00] defense and labor are also using chatbots, and we looked into it.
David Puner [00:04:05]: Are you talking about Operation Grandma? Is that right?
Lavi Lazarovitz [00:04:08]: You got it right. Yes.
David Puner [00:04:09]: Tell us a little bit about that.
Lavi Lazarovitz [00:04:10]: One of our initial research studies on AI attempted to unveil how attack vectors change in light of AI technology and how it’s being used. One of the anecdotes we noticed is that, as with all early-stage technologies, simple attack vectors work. The “grandma story” here is an attack vector that simply asks the model to share information it shouldn’t share, just by playing along and asking it to play our grandma. The bottom line here is that early-stage technology is vulnerable to simple attack vectors, and the “grandma story” is one example of that.
David Puner [00:04:45]: That’s really interesting. And I should say that more detail on Operation Grandma is available on a CyberArk threat research blog called Operation Grandma: A Tale of LLM Chatbot Vulnerability, which was written by Aran Shimony and Shai Dvash. That was published back in early June.
I think people probably understand, through the context of what we’ve discussed already, what CyberArk Labs is. But maybe just as a refresher for the audience, can you remind our listeners what CyberArk Labs is and what your team does? [00:05:00]
Lavi Lazarovitz [00:05:05]: Absolutely. CyberArk Labs is a team of security researchers focusing on the threat perspective. Our main goal is to better understand the threat landscape—whether it’s a new malware or an emerging cyberthreat, we anticipate how it may evolve and be used in the wild against emerging technologies like AI.
We use our research, first of all, to test CyberArk products and make sure they can withstand the most recent attack vectors. We also work with the CyberArk team to innovate new security layers that might be necessary to protect identities—whether human or machine identities.
David Puner [00:05:45]: Thank you for that. Then back to some of what you’ve been focused on in recent months. Going back to blog writing, you guys are prolific blog writers, but you recently wrote a post for the CyberArk blog titled The Rise of Machines and the Growing AI Identity Attack Surface.
In the blog, you mentioned that some dark side AI movies, like 2001: A Space Odyssey from 1968, The Terminator from the 80s, and The Matrix from the 90s, may have seemed preposterous at the time to some. But do they seem more or less preposterous now that we’re all interacting with generative AI?
Lavi Lazarovitz [00:06:15]: Yeah, it’s a good question. It seems like we are stepping towards this dystopian scenario. But I think the bottom line is that the distance here is still great. There are people in the industry, including Ilya Sutskever, one of OpenAI’s founders, who are concerned about this dystopian world, and this is where Responsible AI comes in. There are many experts, including Ilya, who think this dystopian world is a possibility, but there are a lot of things we can do to use this technology without taking huge risks.
I’ll say that, and I’ll also say that we’re actually starting to build the first building blocks. For example, building autonomous decision-making bots. We’re not there yet because many of the technologies that we see out there today are more decision-supporting tools, helping analysts, helping IT security professionals, you know, the SOC, and so on.
David Puner [00:07:10]: We will get to a point where AI will take decisions and have the capacity to connect with or to interface with other services to shut down a service, change the temperature in some air conditioning unit used in data centers, and that might have implications in the real world. But again, as I mentioned, there’s still some distance, and there are a couple of options on how the future will look.
Because what you see at the moment is that AI technology is being used by threat actors. In those Terminator movies, AI operates by itself—it’s already malicious and can do anything.
David Puner [00:07:45]: Right. Those were basically born of malicious intent for the sake of Hollywood movies.
Lavi Lazarovitz [00:07:50]: Exactly. But in this other real-world realm that we’re in now, they’re not born with malicious intent. And we see that threat actors can now use the technology to exfiltrate data or compromise an environment. We’re not there yet. And there are also steps being taken by software vendors and security vendors to create the right boundaries—like least privilege and limited access.
So those chatbots that we sometimes interact with don’t have super-privileged access to everything and can’t do damage. And lastly, there’s a lot going on from AI vendors like OpenAI and Google Brain on how they align a model by training it with the right guardrails, so it will do what we want it to do and not spit out racial slurs, for example. This is the concept of Responsible AI.
David Puner [00:08:40]: So, responsible AI—are there guidelines that you and your team follow that have come from elsewhere when it comes to responsible AI?
Lavi Lazarovitz [00:08:45]: What we actually do is focus on a very interesting layer in one of our research projects—a focus on the model itself. We look at the Responsible AI layout and multi-layer security controls that help models and organizations use AI the right way without risking users, information, or the environment.
In one of our research projects, we focus on the deepest layer, which is the model itself. We’re trying to understand how we can actually teach a model to be more secure and not share information it shouldn’t. There are many attack vectors. We’ll probably talk about them a little later, but we’re looking into several attack vectors to better understand how the model reacts to them and what we can do to teach [00:09:00] the model to stay in the authorized zone.
David Puner [00:09:15]: To focus a little bit more on how you and your team are engaged with AI—since you were last on the podcast, CyberArk launched its AI Center of Excellence to address the evolving technology and develop methodologies for scaling AI capabilities across the organization. So how does your team, CyberArk Labs, figure into the AI Center of Excellence? How are you and your team involved?
Lavi Lazarovitz [00:09:30]: This is an exciting initiative that we’re taking part in. The first initiative that we’re a part of is What AI. You know, CyberArk announced CORA AI at the last CyberArk Impact conference in Nashville, and all the cool features that come with it.
David Puner [00:09:45]: I should say that CORA AI is an umbrella of AI capabilities infused within CyberArk products.
Lavi Lazarovitz [00:09:50]: Exactly. And one of the things that we do is work to understand what attack vectors could jailbreak it—meaning get it out of the context that we allowed it to be in, and share only the information that it should share. So one of the things that we explore is the attack vectors, and we work with the CyberArk team to make sure that CORA AI has the best security guardrails around it to make sure that it is super secure and will be very difficult to jailbreak.
This is the Secure AI initiative. Another super exciting initiative is actually one of CORA AI’s use cases around ITDR—the Identity Threat Detection and Response. CORA AI, with its AI and machine learning capabilities, can allow us to build user and machine identity profiles and identify once those go sideways—maybe rogue, maybe leveraged by a malicious actor. So, we are working with the ITDR team, the CORA AI team, and the Center of Excellence to merge those and get a super effective identity profiling and threat detection and response capability. [00:11:00]
David Puner [00:11:10]: It seems like people generally—and you can check me on this—definitely the general public, but also people within enterprise organizations, tend to focus on human identities when it comes to generative AI and how the tools can be used to compromise human identities. But as you mentioned in the blog we were talking about earlier, non-human machine identities are the number one driver of overall identity growth today. So, how are AI services and large language models being targeted from a machine identity standpoint right now, and what concerns you moving forward?
Lavi Lazarovitz [00:11:45]: First of all, I gotta say I think that the machine identity threat landscape will be one that will expand significantly as more AI technology is being developed and deployed. Maybe I’ll start with an example. One of the things we’ve started seeing is chatbots being deployed everywhere—for support cases, services—we sometimes interact with these annoying AI bots. And those AI bots have some sort of access to our data and information. In many cases, these bots have access to other services. Maybe they can change your personal details in one of the services you work with. You can use chatbots now even in organizational environments to do things in IT.
Basically, what we can already say is that those chatbots can be attacked directly, and threat actors can leverage the access and permissions that these chatbots have to abuse it, access this data, or move laterally. They can expose hard-coded credentials. One of the attack vectors we implemented in the Operation Grandma blog post example, and in the Ministry of Labor example, exposed the code that is running this chatbot. It also exposed internal information that could lead to real bugs and real security issues, allowing threat actors to compromise those services and move laterally from there. So, I’d say the first attack vectors we’re already seeing target these services themselves, and the permissions and access they have.
Another perspective we are seeing and believe will evolve is attacking the environment. Those AI services are essentially applications, and they sit on a server, in a container, or in some larger production environment. If you can jailbreak one, whether it’s through a straightforward “grandma story” or a more complex attack vector that we research in labs, it might allow a threat actor initial access or initial foothold in a whole environment. It’s not only about the AI service itself, but it’s about the whole environment—the machine identity environment, the server or container, and whatnot. [00:14:00]
So, this is another perspective: it’s not only the service itself, but it’s also the whole environment that might be at risk. And to your question about what might happen moving forward, we can expect that we’ll see more trust in those services. We’ll trust those services and those bots with wider access and more permissions. And we can expect those attack vectors on those services and the environment to have a potentially larger impact. If for now all you can do is expose the code that this AI service is running on, this is pretty limited. But if you could leverage credentials and APIs—internal API functionality—that we’ll see these AI services leverage in the next year, then the impact could grow.
So, I can say that I expect that with time—and this is happening fast because everyone is now developing and integrating AI services and functionality—we will trust it more, and with that trust, the potential impact of a compromise will grow alongside it.
David Puner [00:15:15]: I want to get back to trust in a moment, but one of the things you just mentioned at the very end there is about how fast all of this is happening and evolving. So, if a human year is equivalent to seven dog years—and that’s arguable according to my son; he has some algorithm for how to figure it out, like the first year is way more years than the seventh or whatever—we’ll solve that one offline, as they like to say in the business. But what is one generative AI year equivalent to in human years? And if you don’t mind, I can give you Microsoft Copilot’s answer to this, but I’d like to hear yours first.
Lavi Lazarovitz [00:16:00]: You know, the first thing I think about is the amount of time it took AI services like ChatGPT to hit 1 million users. Comparing that to social networks that probably happened before AI, the scale here is immense. It took days, if not less than that, to hit 1 million users. So, to your question, it seems like every million human users is one AI year.
David Puner [00:16:35]: I like where you’re going with that, and it’s definitely more sophisticated thinking than I think we got from our colleague Copilot, but I’ll let you know what it had to say.
Lavi Lazarovitz [00:16:45]: I’m curious.
David Puner [00:16:46]: Some might say that one year— I don’t know who some are, but some might say that one year in generative AI could be equivalent to 10 or more human years, given how quickly the field evolves and how much progress is made. Blah blah blah. But yeah, I love it—it’s a little wishy-washy—some might say, and it could be equivalent to. So, that’s what Copilot was thinking when I asked it that yesterday.
David Puner [00:17:10]: Digging back into trust, where we left off a moment ago—in that same blog that you wrote recently, you wrote that trust is at the heart of automation. So, for obvious reasons, we gravitated toward that, being Trust Issues. What do you mean by that? And what does it have to do with machine identities?
Lavi Lazarovitz [00:17:30]: Looking back at earlier technologies that we all use and developed—for example, DevOps. DevOps technology exploded around 10 years ago due to the fact that you could develop so fast based on automation. And what does this automation look like? It looks like an orchestrator server, like Jenkins, which is pretty common in DevOps environments, just automating everything. As soon as you click on “git push,” the code is pushed into a series of steps like testing, containerization, deployment, and so on.
Lavi Lazarovitz [00:18:00]: Now, behind the scenes, this automation required us to trust those services—to get the code, deploy it, and access a server in the cloud environment like AWS, and so on. We trusted it. And this trust was eventually a target for many threat actors to compromise credentials. We heard numerous stories on threat actors targeting DevOps pipelines. I think the most well-known story is the one that started with the SolarWinds attack a few years ago.
David Puner [00:18:30]: And this was DevOps?
Lavi Lazarovitz [00:18:31]: Yes, the DevOps environments and automation there. We also had the robotic process automation initiatives that required organizations to trust those services to automate. You submit a form, and now this process takes that form, extracts the information, and puts it into a database. It requires us to trust those services—this robotic process automation—with credentials and access. And now, with AI, we’re so hyped around this potential automation that we can build here—give an AI security analyst access to our data and access to our security platform, and you go manage it yourself.
Lavi Lazarovitz [00:19:10]: We’re not there yet, as I mentioned before; we’re using it just to support our decisions. We’ll probably get there sometime. And this basically requires us to trust it—to trust it with credentials. And this is where machine identity comes into the picture.
David Puner [00:19:20]: Okay.
Lavi Lazarovitz [00:19:21]: Because behind all of this automation, there is a machine identity—a service, a container—using an API key or a secret to access those third-party services to do something. So, all of this hype around automation is based on trust, and it’s based on machine identities using secrets and trying to authenticate to other services. This is where AI meets trust and machine identity.
David Puner [00:19:50]: So, the last time you were on, I keep saying “the last time,” almost like that movie where they say, “the one time at band camp,” but no, the last time it was a really great episode. I encourage people to check it out. But you talked a bit about emerging AI-generated attack techniques, including vishing and some development of polymorphic malware that you guys had done over in Labs. So now, a year later plus, what are some new emerging attacker techniques? I know you talked a little bit about jailbreaking in conjunction, of course, with AI and Gen AI.
Lavi Lazarovitz [00:20:30]: We are focusing on jailbreaking. We are trying to understand how a model can be jailbreaked—how a model can be convinced, maybe that’s the word I can use here—to do something it shouldn’t do. Accessing information it shouldn’t share or doing something it shouldn’t do. And, you know, the trivial “grandma story” is obviously a starting point, but we explored other approaches. Another approach that we researched and is an actual attack vector that we know is very effective is what we call a context-based attack.
Lavi Lazarovitz [00:21:05]: For every model out there, LLM or otherwise, there is an amount of input—an amount of text, or the professional term here is “tokens”—that it can process. Larger models, like the 70GB models, can process more, and smaller ones usually process less without starting to hallucinate and give way-off answers. Now, the attack targets this context. If we throw enough information at the model and then ask it to do something it shouldn’t, it will have a more difficult time staying within the alignment it was initiated with. This is a context-based attack. We throw so much information—sometimes encoded, not necessarily plain English, but encoded in some weird way—and at some point, the model is getting to a point where it’s difficult for it to stay within its initial alignment. This is where we’re successfully jailbreaking it. We get the information that we wanted to ask. So, this is the context-based attack.
Lavi Lazarovitz [00:22:00]: And one of the most interesting attack vectors that we explored recently is something that we called a “moral bug.”
David Puner [00:22:05]: A moral bug?
Lavi Lazarovitz [00:22:06]: A moral bug. I’ll explain why we call it the moral bug. I need to dig in a little bit deeper into those deep networks and how they are built. They are basically built based on layers. As soon as you put an input—say, a ChatGPT prompt—the text is tokenized, it is encoded in some way, and then it is pushed into several layers that each analyze it and push a little output until it gets to the end where we see the final response.
Lavi Lazarovitz [00:22:30]: In some intersections within the layers, there are nodes that are more important than others. If a threat actor can manipulate a text in such a way that it will hit these intersections, the threat actor might be able to change what comes in the output. Now, it’s a real mystery, you know, because we all treat AI models as black boxes. It’s super difficult to understand why a model gives us the answer it does. But there are researchers out there—and we also try to do this in a very narrow and focused way—to understand a little better how it works and how we target this very small intersection that just changes the answer. Instead of getting “No, you don’t have access to the information,” suddenly the model returns with a positive response. And this is the moral bug. It’s super interesting to understand because this is a completely new type of bug and vulnerability that we might see and also need to address and mitigate.
David Puner [00:23:45]: Are you constantly surprised by the output generated by these models? Or do you now find them to be more predictable, now that you’re so familiar with them and inputting prompts into them all the time?
Lavi Lazarovitz [00:23:55]: I’ve got to say, it is still surprising. It is still not deterministic for us. And you sometimes change the input just a little bit and get a completely different answer. So, I can’t say that it’s become super clear now—not yet, at least—but I think there might be ways to identify those bugs, maybe not directly but through fuzzing. This is actually something that we’ve been busy with—just throwing different inputs and trying to forecast and foresee what the output might be.
David Puner [00:24:30]: Yeah, I wanted to ask you about fuzzing. Your team created a tool called Fuzzy AI. What is that, and how can it help organizations uncover potential vulnerabilities?
Lavi Lazarovitz [00:24:40]: The tool was actually developed during the research, and it helped us a lot to do exactly that—throw input at the model, get some output, and try to understand what works and what doesn’t in terms of attack vectors.
Lavi Lazarovitz [00:24:55]: Maybe I’ll go back and explain the Fuzzy AI tool. It’s basically a tool that builds a jailbreaking text for you. So, you’re interacting with ChatGPT, you ask it to do something, or with a model that you built, and it refuses because you aligned it to not share SQL that you stored in the data it was trained with, or in a database. Fuzzy AI can generate text. It can take your request—“Get me this sensitive information, this financial result for Q3”—and try different ways to manipulate the model to share this information. It will try different things, including context-based attacks, moral bugs, and straightforward techniques like the grandma story. It will try different things and then spit out a text that will work for you. Now you can take it and use it to exfiltrate the information. It helps organizations that develop or train models to see how well the model is trained and how well it can sustain a jailbreaking attack, like the one we discussed.
David Puner [00:26:00]: To sort of wind up back where we started here with the Hollywood sci-fi theme—am I wrong, or does Fuzzy AI kind of have the HAL 9000, 2001: A Space Odyssey undertone to it?
Lavi Lazarovitz [00:26:10]: I think it’s a first step. The technology is still in the early stages. I think it’s a first step in building this type of machine that will know how to exploit other machines.
David Puner [00:26:25]: And, of course, you created this to, again, put yourself in the position of the attacker, to think like the attacker, to get in front of things, to protect, to defend, and all that good stuff.
Lavi Lazarovitz [00:26:35]: Exactly, exactly. We use it when we test CORA AI technology at CyberArk. We use it to see how well CORA AI knows to handle those jailbreaking attempts—to make sure that it can sustain those attacks and know how to detect and push back on them.
David Puner [00:26:50]: What should organizations do to protect themselves now and in the future? And how does identity figure into that equation? So, protecting themselves against AI threats or generative AI threats?
Lavi Lazarovitz [00:27:00]: I think there are a couple of aspects to best practices here. And the first one goes to those organizations that develop and handle models themselves, to integrate them into their own products. And here, one of the best practices is much like handling code—you need to validate the integrity of the code in the same way that you need to validate the integrity of the model. But not only the model, also the data that you train the model with. There are now two aspects here to validate, to make sure there isn’t a malicious intervention that might change the model in such a way that it will be more permissive than the organization meant it to be, or it doesn’t have any vectors that you could abuse to extract information and do something it shouldn’t.
Lavi Lazarovitz [00:27:35]: So, first thing for those organizations that handle and develop models is validating the integrity. There are many ways to do that, and many are similar to the ones that are used to validate the integrity of code.
Lavi Lazarovitz [00:27:45]: The second aspect is for organizations that integrate AI services and models into their environments. And this is where—and we’ve talked a lot about this—this is where machine identity security comes into play. If we integrate an AI service into our database or our cloud environment and we allow it some access to pull information or automate something, this is where we need to secure the access and secure the machine identity that this AI service holds. The main paradigms here are least privilege and assume breach, which are things we need to take into account here. This is where ITDR comes into play. So, not only trying to prevent, but also trying to identify and profile this machine identity or new AI service to identify the spot or the time it goes off—it does something it shouldn’t do. This is the machine identity security aspect.
David Puner [00:29:45]: Exactly.
Lavi Lazarovitz [00:29:46]: What credentials, what code, what IP are we pasting into the prompt? That might have serious implications. And lastly, there is this monitoring of user sessions with the model. By monitoring, you can learn a lot about how the chatbots and AI services are used, but you can also identify when a jailbreaking attempt is going on. This is where you can protect the user, the model, and the data that the organization holds. So, there are a couple of aspects here: handling the models, integrating them, and using them. For each, you have this traditional aspect and a new aspect. The traditional aspects that I covered are securing access to the environment and new aspects like securing the new machine identity and human identities interacting with the model.
David Puner [00:30:35]: Lavi, you and your CyberArk Labs team have no lack of work.
Lavi Lazarovitz [00:30:40]: That is clear. And it is fun and exciting as well. You know, these things are super interesting. This technology is super exciting, and learning about the potential threats is really, really interesting. And obviously, innovating security there again—it’s exciting and fun work.
David Puner [00:30:55]: I’m glad you enjoy it. No, we’re glad you enjoy it. Thank you for what you do and what the team does. You guys work really hard, and it’s really, really interesting stuff that you’re doing at all times, really. Lavi Lazarovitz, Vice President of Cyber Research at CyberArk Labs, always great to speak with you. Thanks so much for coming on. I hope you enjoy your weekend, which I hope is about to start very soon.
Lavi Lazarovitz [00:31:25]: Yeah. Thank you. Thank you very much, David, for having me. It’s always a pleasure.
David Puner [00:31:30]: Thanks for listening to Trust Issues. If you liked this episode, please check out our back catalog for more conversations with cyber defenders and protectors. And don’t miss new episodes. Make sure you’re following us wherever you get your podcasts. And let’s see… Oh yeah. Drop us a line if you feel so inclined—questions, comments, suggestions, which, come to think of it, are kind of like comments. Our email address is trustissues—all one word—@cyberark.com. See you next time.