This article will help you understand the concepts behind IP addresses. It will describe the concept of subnets, and will explain when to use public and private IP addresses. This information will be useful for Matillion developers when connecting to data sources and targets. It will also help cloud architects who are planning how to configure Matillion itself.
You should also find this article useful as advanced reading before taking the Matillion Security course from the Matillion Academy.
The prerequisites for working with public and private IP addresses are:
- Access to Matillion ETL or Matillion Data Loader
- Access to the network management area in your cloud provider’s console
What are IP addresses?
The entire internet, including most cloud networking, is based on the Internet Protocol Suite known as TCP/IP. It contains two closely related standards:
- IP (Internet Protocol) – concerned with addressing and routing
- TCP (Transmission Control Protocol) – concerned with making data transmission reliable
Internet Protocol (IP) addresses are made up of four numbers, and are also known as IPv4 addresses. The numbers are usually written separated by dots, for example like this:
The above is actually the IP address of one of Google’s servers. Whenever you perform a Google search, data flows between your computer’s IP address and this one.
In TCP/IP, both the sender and the receiver are uniquely identified by their IP address
Just like a postal address, an IP address that anyone can use has to be globally unique. They are known as public IP addresses.
This is how the internet works. Any IP address in the world can, in principle, communicate with any other. The diagram below shows a simplified TCP/IP network with five addresses.
Each of the four numbers in an IPv4 address can go from 0 to 255. So the total number of possible addresses is 256 x 256 x 256 x 256: about 4.3 billion. That’s a large number, but in fact, it’s nowhere near enough to give every single device in the world its own globally unique public IP address.
This issue is known as IPv4 address exhaustion. It would have been a big problem for the internet without another invention: subnets.
What is a subnet?
A subnet is a set of closely related IP addresses that share their own, private address scheme. As far as the internet is concerned, a subnet just looks like one ordinary public IP address. But zooming in reveals a whole new private network inside:
One single public IP address may act as a gateway to thousands of private IP addresses inside the subnet. The gateway has two IP addresses: a public-facing one that the internet can see, and a private-facing one just for within the subnet. Nowadays, almost all devices get allocated private IP addresses inside a subnet. IPv4 address exhaustion has been sidestepped.
In a typical cloud deployment, the private IP addresses within a subnet mostly communicate with each other. It is very fast, efficient, and secure to route this data entirely within the subnet.
Public vs Private subnets
Some cloud providers allow fine-grained control over IP address allocation within their subnets, and differentiate between “private” and “public” subnets.
In a “private” subnet there is no direct communication between the internet and the private IP addresses. Data going to or from the internet travels via the gateway through a process called Network Address Translation (NAT).
Most home networks are set up like this. It is very secure, although you can not host an internet-facing server inside the private subnet.
In contrast, all the members of a “public” subnet are able to obtain both a public and a private IP address. All internet communication still goes via the gateway, but it is possible to access individual members of the subnet from the internet.
A subnet set up this way is often used to protect further, private subnets that are hidden from the internet. A public subnet used for this purpose is known as a DMZ.
When to use a Public vs a Private IP address
As far as an individual device or server is concerned, it makes little difference whether it has a public IP address, a private one, or both. Routing data to the correct target is handled automatically by the Internet Protocol (IP) part of TCP/IP.
From inside Matillion ETL, you can find its IP address by running an ifconfig command inside a Bash Script component, like this.
The IP address is shown in the Task Output, and is 172.21.27.37 in this example. But is that a public IP address or a private one? What if it has both?
The way to tell is to reference a standard used by TCP/IP:
- An IPv4 address beginning with any of the following is a private address:
- 172.16., 172.17., 172.18, etc up to 172.30. and 172.31.
- Otherwise it’s a public IPv4 address (with a few minor exceptions)
In practice you will find that the ifconfig command almost always returns the private IP address.
To find the public IP address, you can use a REST API provided by Matillion. It also works inside a Bash Script component, like this:
In most cases, Matillion ETL will have both a private IP address and a public one. So which is better to use? The answer depends on where you are trying to communicate from and to.
If data needs to go between two devices in the same subnet, you should use private IP addresses. This is faster, cheaper and more secure. Some common examples are:
- Using the migrate utility, provided the source and target are in the same subnet
- Loading data from a source database in the same subnet
- Connecting Matillion to Snowflake using Private Connectivity
In contrast, whenever data needs to pass through a gateway between subnets, you should use public IP addresses. Some common examples are:
- Logging into Matillion ETL from your home or office
- Loading data from a source database on the internet, or cross-cloud provider
- Connecting Matillion to Snowflake over the internet
Use private IP addresses if you are communicating entirely within the same subnet. Otherwise use the public IP addresses.
If you are still uncertain which IP address is going to work, Matillion ETL has a shared job for checking network access.
How to Check Network Access
Matillion ETL’s Check Network Access shared job is a convenient way to check in advance if there is a network path to a particular destination address.
You can check either a public or private IP address. If there is a DNS name for this destination you can use the name instead of the dotted IP address.
TCP/IP requires that you choose a port number during all communication. In the example below it is set to 3306 which is the standard port for MySQL. The component has run successfully, which confirms that Matillion ETL does have network access to this particular MySQL database.
The example above is checking a private IP address. From this information, you can infer that the source MySQL database is in the same subnet as Matillion.
To help with DataOps, consider using a Check Network Access component immediately before every data loading component. In conjunction with the standard Matillion error handling, this can help manage a variety of transient network problems that would otherwise manifest as timeouts.
Understanding your own network is a vital part of configuring security. In particular, this applies to your network firewalls, and to data encryption in transit. For more information:
- Take the Matillion Security course on Matillion Academy.
- Learn about implementing Role Based Access Control (RBAC) with Matillion.
- Read about Data Fabric Integration using the Matillion ETL REST API.
I have made some simplifications in this article around the relationship between IP addresses and servers. In reality, address allocation and routing are configured as separate virtualized services in different ways among the cloud providers. Follow these links for more information on AWS networking, Azure networking and GCP networking.
Two other related subjects that I did not discuss are:
- The Domain Name System (DNS) – which can convert a name into an IP address (either private or public)
- IPv6 – which is conceptually similar to IPv4 but has a much larger address space
Finally for Matillion ETL users, if you do not find the Check Network Access shared job installed already, you can download it from the Matillion Exchange.