Download free DLP for AI whitepaper


  • It’s become commonplace for developers to leave data like API keys, tokens, and passwords in GitHub repositories.
  • There’s a problem with this. Misconfigurations are rife. Sensitive data is being left exposed to the public.
  • Secure GitHub by disabling forking, enabling MFA and using Polymer DLP to discover and secure sensitive data across your repositories. 

What software developer doesn’t love GitHub? This amazing—and free—platform is a fantastic way to share code files, connect with fellow developers and collaborate on projects seamlessly. 

It’s no wonder that the platform has over 32 million monthly users. But, unfortunately, many people aren’t using the platform securely. Just recently, security researchers scanned GitHub for sensitive information like API keys, private keys, certificates, username and passwords. 

They found over two million sensitive records publicly exposed.

We’re definitely not pointing the finger at GitHub here. The platform itself is secure. Under the shared responsibility model of the cloud, providers like GitHub have a responsibility to secure the underlying infrastructure of their applications. It’s then up to customers to make sure they are using these platforms securely: configuring everything in the right way, controlling user privileges, implementing authentication mechanisms and so forth.

It’s here that things have gone a bit awry. 

Why is GitHub a security risk?

As with all cloud platforms, customers access GitHub via the internet, meaning they only need an internet connection, a browser and their login credentials to use the application. While this is great for productivity and flexibility, it also creates security risks. 

If developers don’t securely configure GitHub so that sensitive data is private, anyone on the internet could potentially find it. While GitHub has the security features to prevent this from happening, putting them into place can be difficult because: 

  • It takes technical know-how to securely configuring repositories 
  • Manually configuring these repositories is time-consuming and error-prone
  • The sheer volume of repositories means some are likely to be missed
  • Repositories are constantly being added and edited

Just last month, in fact, Toyota, the Japanese automotive brand, revealed it had unintentionally left exposed credentials public on GitHub for almost five years, enabling anyone who found the credentials to potentially access sensitive customer information

GitHub security best practices

No company wants to be the next Toyota. To overcome the risk of a GitHub data breach or leak, you need a strategy. WIth that in mind, here are security best practices to keep in mind, specifically for GitHub

No more forking 

Forking is a practice in GitHub that enables users to copy a repository and make edits to it, without altering the original project. While forking has its uses, it makes security much more complex, making it more difficult for your security team to keep track of sensitive data and validate its integrity. 

Moreover, forked repositories are known to be a huge trigger for data leakage. As an example, just imagine that you’ve got some IP data in one of your repositories. Should someone make a copy of it, that data begins to proliferate. It could be copied again and again and again, putting you at huge risk of exposure. This is especially true if one of the forked repos is set to public—anyone could access your company secrets! 

Enable multi-factor authentication 

With the rise of data breaches and surge in leaked credentials up for sale on the dark web, using a password alone to access corporate resources just isn’t enough anymore. So, make sure to enable multi-factor authentication for all users with GitHub accounts in your organization. 

It’s best to combine MFA with the principle of least privilege, whereby you ensure that your teams can only access the repositories they need to complete their roles, rather than having blanket access to the whole application. You can assign roles for each of your repositories with varying levels of control. 

Don’t store credentials and PII in GitHub

It’s all too common for developers to store their credentials and other PII in GitHub for easy access. But leaving data like API keys, tokens and passwords is a huge risk. If a repository is shared with a third-party—which they often are— or shared publicly, your data is at a huge risk of compromise. In some cases, storing this data in a public GitHub repository is also a compliance violation, especially if the data is considered PII or PHI. 

We most commonly see this error occur when developers insert their credentials into CI/CD assets to streamline their workflows. While this is undoubtedly convenient, it is troublesome from a security perspective. You see, even if a developer deletes the credentials later down the time, they’ll remain embedded in the GitHub repository history. Should a hacker manage to find this code, they could do all sorts of nefarious things. 

Putting in place a policy that mandates keeping sensitive data out of GitHub is the first step, but it’s not the finish line. Human error and user convenience mean that developers will likely, at some point or other, place sensitive information in GitHub. 

When this happens, you need a tool that can automatically discover, store and secure this information. Solutions like Polymer data loss prevention (DLP) find and secure sensitive data like PII, passwords, secrets, and Kubernetes keys in GitHub so you can prevent data exposure before code is deployed.  

Check third-party access and GitHub applications

GitHub is highly customizable; you can add tons of features to your repositories in order to boost functionality and make work easier. However, there needs to be some checks and balances in place. Adding third-party tools without first verifying them increases the risk of compromise.

With that in mind, make sure to carefully review any add-ons before installing them, assessing factors like past reviews, news pieces and so on to gain an idea of the application’s credibility. As well as this, make sure you audit the permissions for the application to the bare minimum. 

Scan and review your repositories in real-time 

In the fast moving GitHub landscape, getting a handle on misconfigurations can be tricky. So, instead of trying to correctly configure thousands of constantly changing repositories, we recommend focusing your attention on securing data in GitHub instead.

You can do this through tools like Polymer DLP, which automatically scan your GitHub repositories for sensitive data in real-time and automatically. Using ready-to-go policy templates for regulations like HIPAA, PCI, GDPR, CCPA, Polymer quickly discovers vulnerable sensitive information in GitHub and monitors it. 

If an unauthorized or suspicious user attempts to interact with this data, the tool will block all actions and send an alert to your IT team for further inspection. At the same time, it creates in-depth audit logs for seamless compliance reporting. 

Find out more about Polymer DLP for GitHub.

Polymer is a human-centric data loss prevention (DLP) platform that holistically reduces the risk of data exposure in your SaaS apps and AI tools. In addition to automatically detecting and remediating violations, Polymer coaches your employees to become better data stewards. Try Polymer for free.


Get Polymer blog posts delivered to your inbox.