OpenAI is offering rewards of up to $20,000 in a bug bounty program to those who discover security flaws in its artificial intelligence systems, including the large language model ChatGPT.
The AI company said in an April 11 blog post announcing the program that the reward amount will be based on the severity and impact of the reported issues, which can range from $200 for "low-severity findings" to $20,000 for "exceptional discoveries."
Bugcrowd, a bug bounty platform, has partnered with the company to handle the submission and reward process.
The announcement arrives amidst growing security concerns over the company's widely used and massively popular model ChatGPT.
Last month, the company temporarily shut down the entire ChatGPT system after users reported a bug that allows them to see others' chatbot conversations. While the company has patched the bug, it admitted that some users' payment information, including the last four digits of their credit card number and card expiration date, may have been exposed.
Three days later, a Twitter user known as rez0 said he found over 80 secret plugins for the ChatGPT API while hacking into the system. In response to the finding, Gal Nagli, an active researcher on BugCrowd's platform, said on Twitter that he would help the company "catch these edge-cases" in the future if it offered a paid bug bounty program.
The bug bounty awarded 14 vulnerabilities in the first day of the program, with an average payout of $1,287.50. Approximately 75% of submissions are accepted or rejected within three hours, BugCrowd data showed.
OpenAI bug bounty program excludes rewards for model issues
According to detailed rules for participation, the company highlighted that issues associated with the content of model prompts and responses are "strictly out of scope" and will not be rewarded. These excluded issues include jailbreaks and getting the models to say or do bad things.
Jailbreaking is a process of modifying a system to bypass its restriction, which can lead to unfiltered content on ChatGPT. Earlier this year, jailbreakers made GPT-3.5 speak slurs and hateful language by giving it a role of a different AI model with the prompt Do Anything Now, or DAN.
"While we work hard to prevent risks, we can't predict every day people will use or misuse our technology in the real world," the page read, suggesting users complete a separate feedback form to report those concerns.
In March, Greg Brockman, co-founder and chief executive officer of OpenAI, hinted on Twitter at his plan to start a bug bounty program or a network of red teamers in response to a post written by Alex Albert, a 22-year-old jailbreak prompt enthusiast.
“Democratized red teaming is one reason we deploy these models,” Brockman wrote.