Visit Matillion AI Playground at Snowflake Data Cloud Summit 24

Find out more

The Importance of Python and its Growing Influence on Data Productivity: A Matillion Perspective

Python is everywhere. It seamlessly weaves itself into the fabric of data, data science, Artificial Intelligence (AI), and Machine Learning (ML). As the growth of Python remains intricately tied to the expansion of these critical domains, it makes a cornerstone for data professionals. This blog post delves into Python's significance, history at Matillion, importance, how Matillion successfully incorporates Python, and the future. 

History of Python at Matillion

Matillion recognized the need for a versatile language within its low-code tool, Matillion ETL, leading to integrating a Python component around version 1.1. Choosing Python as a fallback language was a deliberate move due to its user-friendly nature, accessibility, and widespread use in the data space. The initial implementation used Jython, a Java-based Python version with some nice features. While functional, it faced challenges such as limited external library support and eventual obsolescence after Python 2. 

In response, Matillion enhanced its Python component in 2017/2018, supporting Python 3. This update allowed users to seamlessly leverage modern Python versions and import external libraries. Today, Matillon’s Python component ranks among the top 5 most used components, serving as a pivotal tool for data engineers, enabling them to orchestrate and transform jobs within Matillion ETL effectively. This functionality extends to manipulating variables, printing debug information, interfacing with databases and handling files on the file system.

Importance of Python

Python’s popularity among data professionals arises from its simplicity and effectiveness. Unlike complex, object-oriented languages like Java, Python caters to those prioritizing functionality over intricacy. The expansive eco-system of Python libraries, support, and connectors make it the go-to language for data-related tasks. 

Another driving force behind Python's popularity lies in the emergence of notebooks. A prominent example is the widely utilized Jupyter Notebook, which offers a flexible space for individuals to experiment interactively with data. In this environment, users can effortlessly integrate short Python snippets with SQL queries, instantly observing and visualizing results. However, challenges arise in the data space when using notebooks. The interactive environment frequently results in the formation of less resilient code, lacking crucial components like tests, auditability, traceability, and scalability. While notebooks offer an appealing space for initial data exploration, the need for uniform, scalable data presents a challenge. Matillion tackles this issue with a unique strategy, harmonizing the exploration process with developing sturdy, scalable code on its platform.

Comparison of Jupyter Notebook and Push Down Python

How Matillion Succeeds with Python

Matillion addresses the limitations of traditional Python usage in the data space by providing a balanced approach. Matillion’s Python component is designed for small, testable Python snippets within orchestrated pipelines. This approach ensures testability, auditability, traceability, and scalability, mitigating the risks associated with pure Python or SQL capabilities. 

By strategically incorporating Python into a low-code tool, Matillion empowers data engineers to avoid the pitfalls of uncontrolled Python codebases. The integration effectively manipulates variables, database interactions, API calls, and file system manipulations within a managed and secure platform.

Incorporating Python into a low-code pipeline

What’s the future? 

The future of Python in the data productivity landscape is intricately linked with advancements in cloud infrastructure, containerization, and platform-specific optimizations. With technologies like Snowflake, AWS, and Databricks embracing Python execution engines, the horizon looks promising for data engineers. 

Matillions' future steps involve leveraging these advancements to push down Python execution onto various data platforms, allowing customers to choose the most cost-effective environment for their Python logic. Additionally, improved library management tools like Conda enhance the ease of integrating external Python libraries into these powerful data platforms.

Matillion’s Next Steps

Matillion stands at the forefront of empowering data engineers with a new Python pushdown component. Seamlessly integrating into Matillion’s fully SaaS platform, this feature offers auditability, traceability, and security. The pushdown component enables data engineers to utilize Python within the context of their jobs, maintaining access to variables and session information without additional setup. 

Moreover, Matillion recognizes the importance of user customization, allowing users to bring their own Python packages. This approach ensures a streamlined development process, where everything, including Python code and project configurations, is managed, versioned, and deployed together. 

Conclusion 

Python’s journey at Matillion exemplifies its evolution from a fallback language to a crucial component in data engineering. The synergy of low-code functionality with Python’s versatility empowers data professionals to orchestrate complex transformations seamlessly. As the Python landscape evolves, Matillion remains committed to providing innovative solutions that elevate the data engineering experience. Try Matillion's new Python Pushdown component for Snowflake now!

Ed Thompson
Ed Thompson

CTO and co-founder

Ed Thompson is CTO and co-founder of Matillion. Along with CEO Matthew Scullion, he launched Matillion in 2011 and built a cracking team of data integration experts and software engineers. He and his team launched Matillion’s flagship ETL product in 2014, driving the company’s growth ever since. Ed’s strength is his ability to bring together best-in-class technologies from across the software ecosystem and apply them to solve the deep and complex requirements of modern businesses in new and disruptive ways.