Blog
11.09.2021

An Introduction to Data Ingestion

Enterprises are dealing with an unprecedented influx of data, and forecasts indicate that growth will continue at an even faster pace in years to come. Working effectively and strategically with data to gain actionable business intelligence is more important than ever before in helping businesses gain a competitive edge. That’s where data ingestion comes in.

In today’s digital landscape, that means analyzing data from an ever-increasing number of sources, from databases and SaaS platforms to mobile and IoT devices. But before businesses can assess and apply analytics to their data, they need to ingest it, bringing it all together in a centralized location.

What is data ingestion?

In data ingestion, enterprises transport data from various sources to a target destination, often a storage medium. A similar concept to data integration, which combines data from internal systems, ingestion also extends to external data sources.

A data ingestion layer can be architected in several different ways, with design often dictated by how quickly an organization needs analytical access to the data.

Batch data ingestion

The most commonly used model, batch data ingestion, collects data in large jobs, or batches, for transfer at periodic intervals. Data teams can set the task to run based on logical ordering or simple scheduling.

Companies typically use batch ingestion for large datasets that don’t require near-real-time analysis. For example, a business that wants to delve into the correlation between SaaS subscription renewals and customer support tickets could ingest the related data on a daily basis—it doesn’t need to access and analyze data the instant a support ticket resolves.

Streaming data ingestion

Streaming data ingestion collects data in real time for immediate loading into a target location.

This is a more costly ingestion technique, requiring systems to continually monitor sources, but one that’s necessary when instant information and insight are at premium.

For example, online advertising scenarios that demand a split-second decision—which ad to serve—require streaming ingestion for data access and analysis.

Micro batch data ingestion

Micro batch data ingestion takes in small batches of data at very short intervals—typically less than a minute. The technique makes data available in near-real-time, much like a streaming approach. In fact, the terms micro-batching and streaming are often used interchangeably in data architecture and software platform descriptions.

The data ingestion process

To ingest data, a simple pipeline extracts data from where it was created or stored and loads it into a selected location or set of locations. When the paradigm includes steps to transform the data—such as aggregation, cleansing, or deduplication—it is considered an Extract, Transform, Load (ETL) or Extract, Load, Transform (ELT) procedure. The two core components comprising a data ingestion pipeline are:

Sources

The process can extend well beyond a company’s enterprise data center. In addition to internal systems and databases, sources for ingestion can include IoT applications, third-party platforms, and information gathered from the internet.

Destinations

Data lakes, data warehouses, and document stores are often target locations for the data ingestion process. An ingestion pipeline may also simply send data to an app or messaging system.

Examples and challenges

Common business use cases for data ingestion include:

Moving data from siloed in-house systems to a reporting or analytics platform for enterprise-wide access
Taking in a continual stream of data from various sources as part of a marketing campaign
Collecting data from different suppliers to develop an in-house product line
Taking in a large quantity of daily records for an internal Salesforce platform
Allowing customers to ingest and aggregate data via an application programming interface (API)
Capturing data from a Twitter feed for further analysis

Setting up a pipeline in-house comes with a complex set of challenges. In the past, organizations could write scripts or manually create maps for their processes. But with the size and diversity of data today, older methods often aren’t adequate for businesses moving at a rapid pace.

For one, the data that companies need to ingest is often managed by third parties, which can make it difficult to work with, particularly if it’s not fully documented. If a marketing team needs to load data from an external system into a marketing application, for example, considerations include:

Quality: Is the data of a sufficient quality? By what metrics?
Format: Can the ingestion pipeline handle all the various data formats?
Reliability: Is the data stream reliable?
Access: How will the pipeline access the source data? How much in-house IT work will that require?
Updates: How often does the source data update?

In addition, managing a pipeline will demand significant time and resources if manual supervision and process administration is involved. Human intervention in the process also greatly increases the risk of error and, ultimately, data integrations that fail.

And as always, data governance and security are chief concerns. These are particularly vital when determining how to expose data to users. When designing an ingestion pipeline, organizations have to consider:

Whether the data will be exposed both internally and externally
Who will have data access and what kind of access they will have
Whether the data is sensitive and what level of security it requires
What regulations apply to the data and how to comply with them

Matillion and Data Ingestion

For fast, automated, and security-rich data ingestion, businesses are increasingly turning to cloud-based solutions. With platforms designed to easily extract and load data from multiple data sources into cloud data environments, companies avoid the costs, complexity and risk associated with ingestion pipelines designed and implemented by in-house IT teams.

Matillion offers cloud-native applications to help enterprises rapidly ingest data for analytics and business innovation:

Matillion Data Loader ingestion software helps companies continuously extract and load data into their chosen cloud data environment
Matillion ETL software is designed for companies that also requires powerful capabilities for data transformation

Learn how Duo Security created a single, easily replicable model for transforming financial data and accelerated reporting from days to just minutes with Matillion ETL software.

Get started with Matillion Data Loader for free

Get started with Matillion Data Loader now, for free.

Request a Matillion ETL Demo

Request a Matillion ETL demo now.

Previous Entry Next Entry

View all

Featured Resources

Blog

View all resources

Data Productivity Cloud

Generative AI

Automation & Management

Data Connectivity

Security

Data Transformation

All Features

Connectors

Workday

Adobe Marketo

PostgresSQL

Salesforce

MySQL

Custom Connectors

By Use Case

By Cloud

By Industry

Success stories

Learn

Connect

Support

Learn

Connect

Support

An Introduction to Data Ingestion

What is data ingestion?

Batch data ingestion

Streaming data ingestion

Micro batch data ingestion

The data ingestion process

Sources

Destinations

Examples and challenges

Matillion and Data Ingestion

Get started with Matillion Data Loader for free

Request a Matillion ETL Demo

Featured Resources

What Are Feature Flags?

How Your Data Teams Can Do More With Marketing Analytics

The Importance of Data Classification in Cloud Security

Generative AI

Automation & Management

Data Connectivity

Security

Data Transformation

All Features

Workday

Adobe Marketo

PostgresSQL

Salesforce

MySQL

Custom Connectors

By Use Case

By Cloud

By Industry

Success stories

Share:

An Introduction to Data Ingestion

What is data ingestion?

Batch data ingestion

Streaming data ingestion

Micro batch data ingestion

The data ingestion process

Sources

Destinations

Examples and challenges

Matillion and Data Ingestion

Get started with Matillion Data Loader for free

Request a Matillion ETL Demo

Share:

Featured Resources

What Are Feature Flags?

How Your Data Teams Can Do More With Marketing Analytics

The Importance of Data Classification in Cloud Security