Securing LLM-based products
A month ago OpenAI threatened legal action against gpt4free, a project which allows you to use the GPT models via reverse-engineered APIs. Getting access to GPT-4 in particular is difficult at the moment, and it’s unlikely to get better any time soon. There’s a lot of incentive for pirates to figure out how your API uses GPT so that they can piggyback off of you.
On the other hand, a GPT call is expensive both monetarily and in terms of OpenAI’s rate limits. GPT-3.5 costs as much as $0.0082/request, and GPT-4 costs up to $0.4855/request. Bad actors can cause a lot of financial damage to your organization very quickly, and can trivially take down products leveraging GPT-4 due to low rate limits.
It’s important to harden any application built on top of GPT that faces parties external to your organization. Not developing security measures is an invitation to pirates to piggyback off your OpenAI account, and also an invitation to any nefarious actors who want to cause damage to your finances or product availability.
Below are a few techniques worth considering as part of a defence-in-depth strategy.
max_tokens
Set In a product I’m currently building, we have a prompt that is responsible for determining whether the user’s input is valid. In the prompt we ask GPT to give us either a true
or false
response, but a successful prompt injection attack is capable of giving the LLM alternative instructions.
Both true
and false
are only 1 token, so we can safely set the max_tokens
option to 1 when sending our prompt to GPT. This shuts down the possibility of someone hijacking our validation prompt and burning our cash.
Setting max_tokens
here also has the benefit of making us more resilient to hallucinations. If you’ve built anything on top of the current generation of LLMs, you’ll be no stranger to their tendency to go way off track at times. If our prompt ever ends up generating more than one token in the first place it’s likely that something has gone wrong and we’re better off short circuiting anyway.
Place limits on user input
This is something engineers naturally do in almost any other circumstance, but there are a lot of GPT-based products which blindly interpolate user-provided text into their prompt. This is pretty dangerous for a couple of obvious reasons: firstly it’s possible for someone to send a really really long payload that adds to your OpenAI token usage, which costs money; and secondly it leaves you wide open to prompt injection attacks.
Placing constraints on user input can save you money while also making it harder—but not impossible—to achieve a prompt injection attack against your product.
There are a few different ideas you can play with here.
Adding in a validation prompt like the one I mentioned in the previous section. A basic implementation of this for a tutoring app might look something like below, although bear in mind that passing through the raw user input here doesn’t really help us against an attacker seeking to inflate our token usage:
Answer whether the following text seems like a question from a student. The text is: {{input}}
Checking validity of user input through other methods. GPT and other large language models are incredibly versatile tools and it can be tempting to always reach for them, but there are other ways of solving the validation problem as well. You could store embeddings of all previous user inputs in a vector database and then check that any new user input is within a certain distance from the nearest tracked embedding, for instance, or run an off-the-shelf topic classification model on the input.
Limit user input length. Do users really need to be able to key in 3,000 characters of text for your use case? In the case of summarizing a Zoom meeting transcript they absolutely do, but in the case of a customer support bot or paragraph rewriter you can probably impose a low(er) limit on user input length.
Any limits you place on user input are going to cause false positives. The damage caused by false positives will be relative to each individual product—sometimes they’re fine, but sometimes they add too much friction to the user experience. Make your own judgment call for your own product.
Using a “currency” system
This is one of the more popular approaches being taken by SaaS apps. Rate limiting doesn’t really work for GPT-based products because the upstream rate limits are low in the first place, so you need to set a really low rate limit in order to actually limit the use of your product in a meaningful way.
An alternative is to give customers a fixed number of AI uses per billing cycle; optionally giving them the ability to purchase more as necessary. Buffer’s AI Assistant (currently in open beta) provides customers with a fixed number of AI credits, one of which is consumed every time the assistant is used. Kagi doesn’t have “AI credits” per se, but has a business model where users pay per Internet search and AI features get billed as some number of Internet searches—achieving the same thing.
The benefit of a currency system is twofold. Firstly, you directly remove any financial incentive for someone to abuse your API. If the cost per AI credit is tuned well enough and your AI features are sufficiently locked down, then it simply isn’t economical for someone to attack you in this way. A lot of the threat model here is how asymmetric the costs are when considering a product that has no security at all: it costs an attacker far less than 10 cents to send you a request.
The second advantage is that it helps limit consumption by your customers, and provides a very obvious and understandable metric for users to see how much use they have remaining. Providing unlimited uses to customers may not be realistic depending on the specifics of your product, so being able to cap usage in a user-friendly manner is really valuable.
Leverage analytics data / user heuristics
This is veering a little into security by obscurity and can run into problems with adblockers or other privacy measures, but it’s worth mentioning nonetheless. Is your product supposed to be accessed directly via API?
If not, then you can add in a few guardrails such as checking whether there’s a recent pageview event logged by the requester. If there aren’t any recent pageviews, then you can assume the user isn’t actually using the application through a browser and might instead be hitting your API directly.
Pageviews are far from a perfect heuristic and it’s unlikely you’d use this exact event in reality: a user may very well open your product in a tab and then forget about it for a bit before coming back much later on in the day. It’s probably reasonable to expect your user will refresh the page and try again when they run into an error from your heuristic, but the signal:noise from this heuristic will be bad.
But if you have a think about it, you may very well be able to come up with something—or a combination of things—that gets you good results. This isn’t a particularly strong layer of defence, as it’s possible for an attacker to simply spin up a browser instance and use your API via the product if they can’t figure out the rules you’re using on the backend, but it does make an attack a little less convenient and a little bit more costly.