Polymer

Download free DLP for AI whitepaper

Summary

  • With over 180 million users by March 2024, ChatGPT is a leading generative AI tool known for its usability and accuracy.
  • Yet, ChatGPT poses significant data security risks due to data leakage and vulnerabilities.
  • To safeguard against these risks, organizations must:
    • Enforce multi-factor authentication
    • Establish clear usage policies
    • Implement active learning solutions
    • Invest in data exposure prevention technology for ChatGPT

In the year and a half since its launch, generative AI application ChatGPT has become one of the most popular in the world, amassing over 180 million users as of March 2024. 

What sets ChatGPT apart is its usability and accuracy. Through a simple prompt interface, the tool can address a diverse range of queries across a multitude of domains, rendering it a frontrunner in the chatbot landscape.

With Gartner forecasting an 80% adoption rate of similar applications by enterprises by 2026, generative AI’s potential to enhance business productivity and efficiency cannot be overstated.

However, when it comes to ChatGPT data security, there is still much to be concerned about. In the short time since its release, there’ve already been several high-profile data leaks and numerous vulnerabilities discovered. 

To help organizations accelerate ChatGPT adoption, whilst avoiding the security pitfalls, we’ll dive into the must-know risks about the platform, including recommended mitigation strategies. 

How does ChatGPT work? Understanding ChatGPT and its functionality 

ChatGPT, an innovation from OpenAI, harnesses advanced natural language processing techniques to generate human-like text responses based on given input. 

Simply put, this online tool crafts responses similar to human sentences. It uses artificial intelligence and supervised learning to analyze vast internet data for suitable outputs.

Through continuous feedback, ChatGPT refines its responses to align with user needs. For instance, if it consistently receives unfavorable feedback on certain responses, the AI autonomously adjusts its approach to better address similar queries in the future, enhancing its question-answering capabilities over time.

Furthermore, ChatGPT boasts a distinctive feature compared to other chatbots: its ability to retain conversation context. 

This enables it to draw insights from previous interactions and integrate them into ongoing discussions, fostering a natural conversational flow. Users can also request revisions or clarifications, which helps to create an authentic conversational experience.

Common security risks associated with ChatGPT 

As a nascent and novel technology, ChatGPT introduces several security risks that organizations must contend with. The major ones are as follows. 

Accidental data leakage 

Generative AI tools, such as ChatGPT, operate by analyzing user queries. While queries containing public information pose minimal risk, those entailing sensitive data, such as personally identifiable information (PII) or proprietary source code, raise concerns regarding potential data leakage.

This is because ChatGPT works by continuously learning from the data users feed it. Consequently, any confidential information entered into the system becomes ingrained within its neural network, and could then be regurgitated out to other users at any time, triggering a data breach.

Compliance issues

Aside from the immediate consequences of a data breach, inputting sensitive data into generative AI introduces several compliance challenges that could lead to GDPR and CCPA fines. 

This is because generative AI models like ChatGPT are inherently difficult to monitor. Once data is ingested, it becomes extremely challenging for organizations to track its whereabouts and recipients, hindering efforts to minimize data, anonymize it, or erase it as needed.

Data theft

ChatGPT is accessible simply by a username and password, and as we know, many people reuse passwords across accounts. 

Should ChatGPT credentials get into the wrong hands, malicious actors could copy whatever information is contained in users’ queries. 

Vulnerable code 

Generative AI platforms are vulnerable to replicating flawed code commonly found in open-source repositories. 

One notable concern is “package hallucinations,” where platforms such as ChatGPT or GitHub’s Copilot suggest non-existent packages or generate source code with dependencies that don’t exist or may pose security threats. 

Supply chain issues

Supplier contracts have become more rigorous over the past few years, with numerous organizations requiring certifications and evidence of security measures prior to establishing partnerships.

In this sense, tools like ChatGPT and others pose challenges to supply chain risk management for both suppliers and customers. 

For example, if an employee inadvertently shares sensitive client data with these tools, it could breach existing contracts and result in legal consequences.

Application vulnerabilities

Like all software, ChatGPT is vulnerable to flaws that could leave user data exposed. Far from being a hypothetical issue, there have been several instances where security researchers have discovered ChatGPT vulnerabilities, which we will explore in more detail in the next section. 

Unique vulnerabilities identified by researchers 

The emergence of generative AI models introduces a fresh landscape of potential security risks, characterized by unique attack avenues such as data poisoning, model evasion, and extraction.

Traditionally, malicious actors needed proficiency in programming languages like Go, JavaScript, or Python to execute successful attacks. However, with large language models (LLMs), adversaries can exploit vulnerabilities through adept instruction and prompting alone.

A recent security assessment by IBM researchers underscored the vulnerability of LLMs. 

Through extensive testing, they demonstrated the susceptibility of these models to manipulation, resulting in the disclosure of sensitive user data, compromised code generation, creation of malicious scripts, and issuance of subpar security advice.

Moreover, security experts recently uncovered two additional vulnerabilities affecting ChatGPT. Firstly, during the installation of new plug-ins, ChatGPT redirects users to plug-in websites for code approval. 

This vulnerability could be exploited by attackers to deceive users into approving malicious code, leading to unauthorized plug-in installation and potential account compromise.

Secondly, PluginLab, a framework for plug-in development, lacks proper user authentication. This flaw allows attackers to impersonate users and carry out account takeovers, as evidenced by the “AskTheCode” plug-in connecting ChatGPT with GitHub.

Case studies: Real-world impacts of ChatGPT vulnerabilities 

While the above vulnerabilities were discovered by well-intentioned security researchers, there have also been several troubling instances where malicious actors and unwitting employees exposed sensitive information via ChatGPT. 

Here is an overview of the most prominent incidents. 

  • In 2023, Samsung’s semiconductor division allowed engineers to use ChatGPT to check source code. However, this resulted in three separate instances where employees accidentally shared sensitive information with the platform, resulting in a data leak and widespread media coverage. 
  • In 2023, it was reported that over 100,000 ChatGPT users had their data stolen through credentials compromise attacks targeting the application. 
  • Last year, OpenAI confirmed ChatGPT suffered a data breach because a bug allowed the personal data of ChatGPT Plus subscribers and their queries to be seen by other users.  

Shadow AI: Why you shouldn’t ban ChatGPT

In light of the data leakage risks posed by ChatGPT, some organizations have outright banned the tool. However, this leads to a new issue: Shadow AI. 

Much like its predecessor, Shadow IT, Shadow AI involves employees engaging with ChatGPT and similar generative AI applications without the approval of the IT department.

When employees engage with these tools without proper authorization and training, they often fail to understand how their data is used. In the case of ChatGPT, for example, input prompts and responses are essential for refining and training the platform’s AI models.

Consequently, any data inputted into these systems holds the potential to reappear as output when another user submits a prompt. 

This becomes particularly concerning considering a study conducted at the beginning of 2023, which found that employees within a single organization shared confidential business information with ChatGPT over 200 times in a week.

In short, this means banning generative AI won’t improve data security. In fact, it could exacerbate the problem. 

Harnessing ChatGPT securely: Strategic recommendations 

As McKinsey research shows, Generative AI’s potential to boost workforce productivity and efficiency is astounding. Organizations that resist or prohibit the technology risk getting left behind, and becoming obsolete. 

However, the issues of data security and compliance cannot be ignored. With the average data breach costing $4 million in 2023, companies that fail to adopt generative AI securely could also find themselves in a very sticky position.

Thankfully, there is a way to harness ChatGPT’s potential, while also being mindful of security.

Here are the steps to take. 

Enforce multi-factor authentication

As we’ve discovered, many of today’s malicious actors breach enterprise ChatGPT accounts through simple account hijacking attacks, using stolen passwords to break into employee accounts and steal data. 

A foundational defense against this sort of incident is enforcing multi-factor authentication, which mandates that a user shares at least two authentication means–such as a password and code sent to their phone or email–to login to their account. 

Create a generative AI usage policy 

Help your employees understand the risks of sharing sensitive data with ChatGPT and other generative AI apps by establishing a clear generative AI usage policy. 

This should explain what types of data sharing are prohibited and allowed, along with the reasoning why in layman’s terms. 

  • Prohibited activities: It’s imperative to strictly prohibit certain actions, such as employing ChatGPT to analyze confidential company or client documents or to evaluate sensitive company code.
  • Authorized activities requiring approval: Some applications may be permissible under specific circumstances with the authorization of a designated authority. For example, the generation of code using ChatGPT might be sanctioned, but only if an expert evaluates and approves it before implementation.
  • Permitted activities: Certain tasks may be allowed without prior authorization, like utilizing ChatGPT to generate administrative internal content such as brainstorming ideas for icebreakers for new hires.

Rollout active learning 

Human error accounts for over 90% of data breaches. Even with well-intentioned policies in place, your people will likely still accidentally share sensitive data with ChatGPT. 

To reduce data leakage, support your employees further through real-time active learning. New-age, cloud-based security training tools can embed directly into apps like ChatGPT and Slack. 

Using real-time analytics, these active learning tools will offer ‘nudges’ to users when they violate your data protection or compliance policies. 

For example, if an employee tries to share intellectual property with ChatGPT, the training widget will pop up and block the action, explaining to the user why this act of data sharing is deemed risky. 

Our research has found that these training nudges can reduce repeat offenses of sensitive data sharing by over 40% in just one week.  

Scan, detect and protect 

Ideally, your active learning solution will combine with a ChatGPT-specific data exposure prevention solution. These AI-enhanced security solutions use an agentless architecture to easily and swiftly integrate with ChatGPT in your organization. 

Once connected, these tools use natural language processing (NLP) and automation to autonomously scan your ChatGPT environment bi-directionally, redacting and blocking sensitive data sharing in real-time. 

Moreover, thanks to NLP-driven contextual awareness, these tools are much more accurate than traditional data loss prevention (DLP) solutions. 

While archaic DLP might mistake a reference number for a credit card string, data exposure prevention tools automatically analyze the context of every message to ensure precision and minimal disruption to employee productivity. 

The role of developers and AI specialists

ChatGPT works on the cloud’s shared responsibility model. As a customer, it’s your responsibility to protect user roles and data, which you can do by following the above steps.

When it comes to underlying platform vulnerabilities, however, the responsibility lies firmly in OpenAI’s hands. While there have been several discoveries of vulnerabilities in ChatGPT, so far, this is to be expected in any and all software to some extent. 

What’s more important is how regularly a platform provider releases updates and patches for these vulnerabilities, which OpenAI excels at. 

Plus, with a lot of attention on this innovative platform, the security research community is near-constantly testing the platform and its associated applications for bugs, which helps to boost security. 

Future outlook: Evolving security in the age of AI chatbots 

With ChatGPT still in its infancy, many organizations are still grappling to understand the security and compliance risks associated with generative AI. 

One thing is for sure; we have not heard the last of ChatGPT-related data breaches. As with all new tools, ChatGPT involves a steep security learning curve. While some organizations have been quick to fortify their defenses, others are still using ChatGPT without even an acceptable usage policy. 

Over time, awareness of the data security issues surrounding generative AI will grow. In tandem, more security providers will release AI-based security controls, like Polymer’s Active Learning and Data Exposure tool for AI, which directly combat the security risks associated with ChatGPT. 

Conclusion

Ultimately, ChatGPT represents somewhat of a paradox to the modern enterprise. On the one hand, using generative AI is vital to competitiveness and productivity. On the other hand, ChatGPT could lead to financially harrowing data leaks and breaches. 

The answer isn’t to avoid using ChatGPT altogether, but to invest in security processes and controls that enable employees to unleash the tool’s potential, securely. 

By writing an acceptable usage policy, and investing in an AI-infused active learning and data protection tool, organizations can harness the power of ChatGPT, while boosting security and compliance. 

FAQ (Common)

  • Are there any risks to using ChatGPT? Like all software, ChatGPT is vulnerable to bugs that could undermine data security. A specialist data security solution for generative AI tools can remediate these risks. 
  • Is ChatGPT safe from hackers? ChatGPT is secure if you make it so. Use multi-factor authentication to limit the likelihood of credentials compromise. 
  • How secure is ChatGPT? ChatGPT is safe to use for simple queries, but you should not share sensitive information with the platform. 
  • Will ChatGPT leak my data? ChatGPT’s large language learning modules ingest and regurgitate the data you feed them, so don’t share sensitive information with these tools.

Polymer is a human-centric data loss prevention (DLP) platform that holistically reduces the risk of data exposure in your SaaS apps and AI tools. In addition to automatically detecting and remediating violations, Polymer coaches your employees to become better data stewards. Try Polymer for free.

SHARE

Get Polymer blog posts delivered to your inbox.