‘Frontier data’ from SaaS: An enterprise’s competitive advantage in leveraging AI

Introduction

Today’s enterprises sit on a goldmine of data, much of which is embedded within the SaaS platforms they rely on—Slack, Google Workspace, Office365, Salesforce, and Zendesk. These organizations are taking a page from Alexandr Wang, Founder of ScaleAI, and his pronouncement that “human data is key to AI and frontier data can transform businesses”.

Data within collaborative SaaS apps are a good proxy for what really matters for an organization and how employees transaction this knowledge amongst themselves and across their customer & partner base. This data, which drives day-to-day operations, represents the next wave of “frontier data.” But are enterprises harnessing its full potential? For CTOs, CIOs, and CISOs, tapping into this frontier offers both a competitive advantage and new challenges in security, governance, and scalability.

What is frontier data for a business?

The term “frontier data” can be understood as the untapped, often unstructured data embedded within various SaaS platforms where critical workflows happen. Unlike traditional structured datasets stored in enterprise data lakes, this frontier data is scattered across different applications and includes everything from customer interactions in Salesforce to internal communications on Slack.

Why It matters:
- This data is rich in insights, but is often overlooked in enterprise data strategies.
- Frontier data represents an untapped asset that could provide deep organizational insights, optimize operations, and drive innovation.

Benefits of using frontier data in LLM Models

There are numerous use cases that open up when using frontier data that go beyond simple chatbots:

Contextual business insights:
- Enhanced decision-making: LLMs trained on frontier data can process unstructured information from internal communications, customer interactions, and workflows to provide more contextually relevant insights. This allows leadership teams to make informed decisions quickly.
- Cross-functional intelligence: By aggregating data from different platforms, LLMs can break down silos, offering a holistic view of organizational performance and customer interactions.
Operational automation:
- Automating routine tasks: Frontier data allows LLMs to automate repetitive tasks, such as analyzing support tickets or categorizing internal communications, leading to increased productivity and reduced operational overhead.
- Streamlining workflows: Integrating data from multiple SaaS tools enables seamless automation of complex workflows, improving efficiency across departments like customer service, HR, and IT.
Proactive insights and recommendations:
- Personalized customer engagement: LLMs can analyze frontier data to deliver real-time recommendations for customer service, product suggestions, and marketing strategies based on historical behavior and patterns.
- Predictive maintenance and forecasting: By leveraging unstructured frontier data from operational systems, LLMs can forecast trends, identify potential issues, and suggest preventive actions, reducing downtime and optimizing resource allocation.
Data-driven innovation:
- New product development: LLMs can mine frontier data to identify unmet customer needs, trends, or feedback that inform product innovation, helping organizations stay ahead of the competition.
- Tailored employee solutions: Internal communications from Slack or email can be analyzed to identify opportunities for improving employee experiences, such as streamlining processes or identifying training needs.
Compliance and risk management:
- Regulatory compliance: LLMs can track and classify sensitive data across SaaS platforms, ensuring that organizations remain compliant with regulations like GDPR or HIPAA.
- Risk mitigation: By continuously analyzing frontier data, LLMs can identify security threats.

Challenges of extracting frontier data from SaaS applications

While the potential of frontier data is immense, CTOs and CIOs need to confront specific challenges associated with extracting and managing it:

Data silos:
- These datasets are usually sitting isolated from each other. Ticketing systems might be in sync with Github but could be lacking the latest communication from a customer.
- These datasets are usually never ‘cracked’ into the database and are likely to be unclassified, making any analysis difficult.
Unstructured nature:
- Frontier data is often unstructured (emails, chat logs, tickets), making it difficult to analyze using traditional methods.
- Leveraging AI and machine learning to extract meaningful insights is key.
Scalability:
- As enterprises scale, the volume of frontier data grows exponentially. Handling this growth requires flexible and scalable infrastructure, from storage solutions to real-time analytics tools.

Security and governance: The CISO perspective

With the rise of frontier data, security and governance challenges intensify:

Data security:
- SaaS applications often house sensitive information. For CISOs, ensuring this data is protected both in transit and at rest is crucial.
- This involves safeguarding against insider threats, enforcing strict access controls, and operationalizing data security program.
Compliance and governance:
- As frontier data is spread across various applications, maintaining governance and ensuring compliance with regulations like GDPR or CCPA can be complex.
- Automated tools that track and manage compliance across SaaS ecosystems have becoming essential.
Data privacy:
- Balancing the extraction of valuable data insights with user privacy is key. Enterprises need strategies that anonymize or tokenize sensitive information without losing analytic value.

Building the frontier data foundry

Just as traditional data foundries were created to harness structured data, enterprises now need to build frontier data foundries to extract value from their SaaS environments. Key steps in this process include:

Data integration platforms:
- Deploying solutions that aggregate data from SaaS platforms into a central repository, enabling cross-application insights.
AI and ML for unstructured data:
- Utilizing advanced AI tools capable of processing and analyzing unstructured data in real time.
Real-time analytics:
- Implementing analytics tools that provide real-time insights from SaaS-based data to enable faster decision-making.
Data classification:
- A traditional approach will require to scan and tag documents and other data sets before even embarking on this AI journey. This approach works but can slow down the AI journey, as witnessed in the MDM/Cloud migration journeys of the last decade.

GenAI for frontier data via Polymer’s secure RAG

Polymer’s secure RAG is designed as a secure gateway for connecting enterprise frontier data with Google Gemini & other LLM models. This is a runtime data security approach that filters embeddings based on the user group-ensuring no sensitive data is leaked to unauthorized individuals.

This framework is built on Polymer’s battle-tested data loss prevention platform that is securing millions of data assets in real time across some very large organizations.

Enhanced data retrieval:
- Polymer’s RAG is designed to intelligently pull data from across SaaS platforms, ensuring that the data used to train or power Gemini models is relevant, up-to-date, and securely retrieved.
- The system integrates seamlessly with Slack, Office365, Salesforce, and other SaaS ecosystems, extracting the necessary data for insights while minimizing exposure.
Data security by design:
- Unlike traditional models, Polymer’s RAG embeds security and privacy controls into every step of the data retrieval and processing pipeline. Data classification is part of the initial LLM-ingest process-this is done not only on the data asset level but also embeddings.
- Polymer’s secure RAG takes into account the following dimensions to restrict results based on:
  1. Business context and data classification at the document level
  2. Embeddings’ data classification and its sensitivity
  3. User (prompter) group and data access rights
Scalability and compliance:
- As the volume of frontier data grows, Polymer’s RAG offers a scalable infrastructure that can grow alongside it. More importantly, it ensures that data handling remains compliant with industry regulations like GDPR and HIPAA.
- Whether tokenizing sensitive information or setting up audit trails, Polymer’s secure RAG empowers enterprises to confidently deploy Gemini and other LLM models, ensuring compliance with data privacy laws.