Now Available On-Demand | AI. The Future of Data Engineering - Today. Not Just Tomorrow.

Watch now

Data Predictions and Trends of 2024

Another year, another predictions webinar. Unlike others who make bold assumptions about the future, Matillioners Mark Balkenende, Matillion's VP of Product Marketing, and Ciaran Dynes, Chief Product Officer, look back to evaluate what was true and what was fake news. This year, they are joined by Kate Strachnyi, founder of DATAcated, to help evaluate the past and predict the future. Read below for a thorough review of their 2023 predictions, engaging the audience in actively grading the accuracy of their foresight. Following this reflection, each presenter explored their individual forecasts, offering insights into what they believe will be the focal points of the data landscape in 2024.

A Look Back on Last Year’s Predictions 

How did we predict trends in the landscape of data integration - did these themes stand the test of time? Read on to discover how Kate, Ciaran, Mark, and our live audience rated 2023’s predictions. 

Prediction 1: The Year of Productivity

Given tight budgets and challenging macroeconomic conditions, we predicted that 2023 would be all about doing more with less and getting the most out of existing data resources.

Presenter Grades:

  • Kate - B The talent pool doesn’t seem to be growing at the same pace as the complexity of the work that needs to be done, so I think, like it or not, companies have to do more with less.
  • Ciaran - B+/A- I think there was a healthy conversation around how to be more productive and…there’s at least a pent-up hope that generative AI will lead to greater productivity once we learn how to use it for data and analytics.
  • Mark - B  Mark agreed with both Kate's and Ciaran's points, acknowledging that economically, this year presents challenges. As you both pointed out, the talent pool is limited, and the companies we collaborate with aren't making substantial hires, particularly in the data teams.

Audience Grade: B (55% of the audience selected)

Prediction 2: Cloud Native

Anticipating a surge in cloud service integration, this prediction emphasized increased cloud processing and scalability across various cloud services. 

Presenter Grades: 

  • Mark - B Emphasizing its significance for enterprise-driven cloud adoption 
  • Kate - B Highlighting a cautious approach in resource allocation among companies 
  • CiaranB Mentioned there is definitely a move to pure SaaS, cloud-native solutions. 

Audience Grade: B (68% of the audience selected)

Prediction 3 - Streaming is the new batch 

This prediction highlighted the expected surge in Change Data Capture (CDC) and streaming adoption within enterprise operations to expedite decision-making. The discussion noted increased demand for features and integration, particularly on platforms like Databricks, Snowflake, and Redshift. 

Presenter Grades:

  • Ciaran - B Noted the surge in features and integration, especially on platforms like Databricks, Snowflake, and Redshift.
  • Kate - B Emphasizing the necessity of recourse allocation for real-time beta access. 
  • Mark - C While Mark agreed with the points mentioned by Ciaran and Kate, he voted it a C. 

Audience Grade C (54% of the audience selected) 

Prediction 4: Generative AI

This prediction foresaw users identifying valuable applications for Generative AI in 2023. Discussions acknowledged the widespread interest in Generative AI but raised concerns about trust and privacy issues in workplace contexts. While potential in certain roles was recognized, skepticism lingered regarding its widespread adoption within data analytics teams. 

Presenter Grades

  • Kate - A Recognizing pervasive discussions around Generative AI and emphasizing adoption enthusiasm despite concerns around trust and privacy 
  • Ciaran - B Acknowledged potential in certain roles but expressed skepticism about widespread adoption within data analytics teams, questioning the extent of opportunities for its usage. 
  • Mark - B As it mainly came out in the market last year, the talks centered more broadly on its usage in enterprises and companies. 

Audience Grade A (50% favored A and 27% selected B) 

Looking Ahead: 2024 Predictions 

Here’s what our presenters think the future holds in the data world. We’ll find out next year to see how these pan out.  

First Up: Mark Balkenende

Prediction #1: Unstructured data will become a normal part of analytics and AI projects. 

Unstructured data, such as Zoom call transcripts, documents, PDFs, audio, and video files, will be widely integrated into AI projects, specifically as a way to enrich existing datasets.

What did the other panelists think?

Kate: This prediction is likely very true, but should we be doing so just because we can collect and ingest unstructured data? Especially considering what it takes to store and secure unstructured data.

Ciaran: The accuracy of this prediction likely comes down to the evolution of tools that can ingest and use unstructured data. Many potential solutions have emerged recently, mostly as generative AI tools. In particular, there are great strides being made in a longstanding analytics challenge: making sense of unstructured text. The only question is: are the tools we have now the ones to make the leap in dealing with unstructured data, or are the real solutions still to come?

Prediction #2: Vector Embedding will become the core for data engineering.

Vector embedding will emerge as a fundamental aspect of data engineering in 2024, becoming a crucial skill for all data engineers who are adept at interacting with vector databases. Embedding will be critical in training large language models through the RAG (Retrieval-Augmented Generation) process.

What does Kate have to say about this one?

Kate: If you want them to train their Large Language Models, their data engineers and the whole talent pool will need the required training to be capable of doing all that. Make sure that companies are focused on their talent development programs that keep pace with all the developments happening so fast.  

Next Up: Ciaran Dynes 

Prediction #1: Prompt Engineering and more experimentation will change how we do data engineering 

A shift in data engineering through prompt engineering and increased experimentation allows engineers to bypass traditional upfront model descriptions and instead focus on swift integration into the vector database. Consequently, a more experimental mindset is fostered among data scientists, who engage in unconventional practices like screen scraping for rapid iteration and formulating business questions.

How does this resonate with Kate?

Kate: Concerns about potential complacency induced by over-reliance on AI calls for a balance between trust and skepticism. This emphasizes the need for greater reproducibility in prompt outcomes to enhance reliability and consistency, a feature currently lacking in the present scenario.

Prediction #2: Reviving Data Lineage for RAG Implementation

Anticipating a heightened focus on data lineage driven by the "RAG" process, which is crucial for LLM training, this strategy is vital within enterprises. It involves comprehensive data tracking throughout model training, emphasizing the vector database. A comprehensive lineage is crucial for deploying an RAG use case. Transparent communication with teams about specific datasets is pivotal for efficiently tracking responses.

What does Kate have to say about this one?: 

Kate: Raises a question on if there will be difficulty in conveying that message to stakeholders on how important this is.  

Ciaran's response: It is too early to tell at this moment, emphasizing their focus on exploring the concept and the importance of explainability. 

Last but not least, Kate Strachnyi

Prediction #1: Anticipating a surge in AI regulations. 

Anticipating a notable upswing in AI regulations, there's a vision for comprehensive oversight covering various data aspects, including analytics, visualization, processing, and modeling. The emphasis extends to data management, where increased structuring and standardization are expected to enhance transparency and explainability. In sectors like healthcare, meticulous data labeling and comprehensive data lineage analysis will be crucial for unraveling the intricacies of AI decision-making processes.

What does Ciaran think?

Ciaran: How do we make this happen ethically? 

Kate's response: Hire the right team to effectively navigate and implement regulatory requirements with a good ethical background and moral high ground.

Prediction #2: Heightened Focus on Privacy and Security 

Underscoring the growing significance of privacy and security regulations, there's a particular emphasis on the risks associated with chatbot interactions. The need for robust prompt engineering to avoid soliciting sensitive details from users is highlighted. Advocacy for stringent measures, including encryption, secure deletion, and rigorous access controls, is made to prevent unauthorized access or misuse of sensitive data.

Prediction #3: Tighter regulations on unstructured data 

As Mark highlighted, the collection of unstructured data, including emails, voice recordings, and customer Zoom calls, and then analyzing and structuring it, is understandable and great. However, I predict that more stringent regulations will govern the use and collection of such data. Questions will arise, such as: Can the data be legitimately used? Where will it be stored? While many customer calls start with the declaration that the call will be recorded for training purposes, the crucial inquiry remains: Is it solely for training purposes, or is there an intention to use it for other purposes?

Mark has some thoughts about this one!

Mark: Totally agree. Using examples from healthcare, such as training Language Model Models (LLMs) on individuals' healthcare information, underscores the immense responsibility of ensuring data security. While introducing technology is crucial, it must be accompanied by robust regulations.

In Short: 

Matillion's annual data predictions webinar offered valuable insights into 2024 data trends. The session began with a review of their 2023 predictions, engaging the audience in assessing their accuracy. Key themes included productivity, cloud integration, streaming technologies, and Generative AI adoption. Looking forward, predictions centered on unstructured data integration, the role of vector embedding in data engineering, and the impact of AI regulations and privacy concerns.

Missed the webinar? Watch it on-demand here!