Now Available On-Demand | AI. The Future of Data Engineering - Today. Not Just Tomorrow.

Watch now
Back to all
Elsevier logo2

More time for data discovery and innovation

Schedule a demo
Elsevier banner

About

Elsevier is part of RELX Group, a leading medical publishing company. Their Education Technology department and Analytics and Recommender team are based in Philadelphia, Pennsylvania. In their own words “Elsevier provides information and analytics that help institutions and professionals progress science, advance healthcare and improve performance.”

Challenge

Elsevier wanted to grow and needed a new technology stack to make this happen. Daniel Klein, Software Engineer explained that “Amazon Redshift integrates perfectly into our Amazon-centric data back-end, given that it scales with us as we grow … and it plays nicely with the rest of the Amazon services that we use.” Amazon Redshift is a leading cloud-based data warehouse that is scalable and is a cost-efficient, running at as little $1,000 for 1TB/Yr. Furthermore, when you select one Amazon Web Services offering, a number of other compatible native and third party solutions are available, giving users a large network of options to enhance their data analysis capabilities. In addition to Redshift, Elsevier is also using Kinesis and Lambda for data streaming and is implementing a data lake strategy with Amazon Simple Storage Service (S3). 

With the right data warehouse in place, Elsevier needed to find a new ETL solution that would fit in with their new infrastructure. The Extract, Transform, Load (ETL) solution that Elsevier had in place “was deteriorating by the day: jobs would suddenly halt, data would get backed up, the preprocessing workarounds” slowed down their pipeline. Due to these shortcomings, Elsevier couldn’t keep up with their increasing user-base demands. They needed an ETL solution that could handle new products/projects, growing data and overcoming the limitations of its previous solution.

Solution

Three years ago, before Sherpath, Elsevier underwent “a careful evaluation of a few options” and “Redshift came up as the clear winner”, recalls Klein. Amazon Redshift was easy to scale in line with growth and fit in with the other AWS offerings they already had in place. Since then, Redshift has proved its scalability and is providing a storage solution for historical user data. Furthermore, Elsevier has been able to easily integrate with ancillary services including, but not limited to, Lambda, Kinesis, S3 and Cloudwatch.

A member of their project team introduced Elsevier to Matillion ETL for Redshift. Klein recalls getting started with Matillion ETL for Redshift “couldn’t be easier”; “once we got it going it was like a whole new ball game.” How did Elsevier come to select Matillion ETL? First of all, it was compatible with Amazon Redshift. In fact Matillion ETL for Redshift was built specifically for Amazon Redshift. This makes set up and continued use with Redshift simple and seamless. Secondly, Matillion ETL address the struggles they were experiencing with associated processes by completely streamlining the data pipeline. Lastly, it is easy to learn, use and explain. Matillion ETL has a simple graphical user interface that is available via a web browser. This makes it digestible and accessible to individuals across your business regardless of role or background. 

To get started Elsevier went to the AWS Marketplace, accessed Matillion through a retail-liketransaction and spun up an instance within minutes. From there they conducted a proof ofconcept with the 14-day free trial. “After seeing how well the proof of concept for our data pipeline overhaul went over, with Matillion powering our job scheduling and general ETL of our incoming user data, putting it into our production environment was a no-brainer.”

Results

With Amazon Redshift and Matillion ETL under their belt, Elsevier has been able to do “some really cool stuff” upping their data game. By nature, the two solutions offer stability and scalability and full control over data and data pipelines. This reduces risks attributed to legacy data management solutions. With this granular level of control, developers benefit from “being able to debug a transformation job from component to component”, reducing the amount of time needed to fit a job “from days to hours, and from hours to minutes.” Furthermore, those not directly involved in the projects using Matillion, can easily understand “the project, jump in and quickly contribute”. The graphical interface with self-annotated jobs can allow project teams to articulate their work thus gaining wider company buy-in through greater understanding.

Most importantly, however, Matillion ETL and Amazon Redshift are helping Elsevier better serve their users. “Now in our second year of use with even more functionality and adopters, Sherpath is helping students learn and study for their medical courses more effectively than ever.” Resolving previous technical glitches and blockers inherently provides a better service to users while also freeing up developer and analytics resources to invest back into discovery and innovation, giving Elsevier a competitive edge.

Benefits

  • Built specifically for AWS and Amazon Redshift
  • Intuitive browser-based user experience – easy on-boarding and powerful
  • Push-down ELT architecture – simplified infrastructure, fast performance
  • Powerful feature set Retail-like acquisition through AWS Marketplace
  • Affordable pricing for everyone, from small startups to Fortune 500 companies
  • Wide range of data source connectors, all included
  • A fully-integrated, data-integration tool that requires no additional development or maintenance staff
Getting started with Matillion couldn’t be easier [...] I couldn’t recommend it enough. Daniel Klein Software Engineer| Elsevier