Is your sensitive data at risk? Request a free scan to learn more.

Polymer

Download free DLP for AI whitepaper

Summary

  • Serverless architectures are application designs that incorporate third-party “Backend as a Service” (BaaS) services.
  • This means your data heavy-lifting is done via virtual services that can start up as needed and then shut down.

Summary

Serverless architectures are application designs that incorporate third-party “Backend as a Service” (BaaS) services, and/or that include custom code run in managed, ephemeral containers on a “Functions as a Service” (FaaS) platform. Essentially, your data heavy-lifting is done via virtual services that can start up as needed and then shut down.

Data environment on the cloud

A typical Data Analytics/Data Lake/Data Warehouse will roughly have the following architecture.

Functional view of a cloud data lake

This provides a fairly scalable and low Total Cost of Ownership (TCO), especially for organizations that migrated from on-premise ‘old-school’ Enterprise architecture to the cloud.

However in working with clients we have consistently seen that the EC2 costs borne by organizations constitutes a large percentage of their cloud costs. Although high performance computing (such as EC2) is never cheap but organizations have generally following options to reduce cost:

A look at EC2 computing options

The Reserved and Spot are fairly good options for non-critical jobs that can wait. For production environments these options typically do not work.

Why serverless?

Most Analytics related data transformations and computing is done via scheduled or pre-defined jobs. This typically involves ingesting raw data and running it through Python/Ruby/R etc. type code or through pre-compiled algorithms written in C++/Java. Uptime needed for Data Transformations for EC2 is never 24/7. This is the scenario where serverless computing or databases via Glue/DynamoDB/Lamda etc. should be the preferred choice.

Setup

There needs to be some rethink of the scripting setup. Cataloguing could be an annoying step but will pay dividends in the long run to keep a handle on your data transfers. Upkeep &maintenance is not that much more onerous than typical always-on computing solutions. Cost savings are immense. Depending on the size of the data being processed, we have seen EC2 costs decrease 30–60%.

Illustration of serverless ETL setup

Serverless computing obviously does not work in all scenarios and can be an aspect of your overall infrastructure. Some of our clients use EC2 on-demand, Reserved, Spot, DynamoDB, MongoDB and Glue at the same time depending on their business needs.

Polymer is a human-centric data loss prevention (DLP) platform that holistically reduces the risk of data exposure in your SaaS apps and AI tools. In addition to automatically detecting and remediating violations, Polymer coaches your employees to become better data stewards. Try Polymer for free.

SHARE

Get Polymer blog posts delivered to your inbox.