Building the data-driven future
[Microsoft Azure] [Data Factory] [Data Lake] [Databricks] [C#] [Docker] [Python]
If there’s a need for the best quality data available for your AI and machine learning algorithms, who you gonna call? Data engineers!
Artificial intelligence (AI) is on everybody’s lips these days, for good reason, but certain things need to be in place in order to do any of it. And at the very heart of everything is a four-letter word: data.
In order for AI to succeed, it needs a lot of data to learn and improve - the more, the better and it’s up to the data engineers to make sure it’s all available and usable.
"We’re producing lots of data these days and the way AI is changing our lives wouldn’t be possible without data engineering. Basically, without data engineering you can’t run any machine learning or data science algorithms."
Tahir is part of the “Omniators”, the very first data engineering team formed in Equinor. They were assembled in February 2018 to build the Reservoir Experience Platform (REP).
But it’s not all data engineering: the frontend part of the web dashboard is created by software developers. They’ve also helped create the powerful API’s that enable the frontend to talk to the database and give the end users the information they need.
Thanks to these two teams and disciplines, the MVP (minimum viable product) has evolved into a fully functional product and is in full use across our Norwegian continental shelf assets.
Stay in the Loop
Learning on the job
REP extracts data from 9 different sources, which come in a plethora of shapes, sizes and even speeds. Combining all of this data is a challenge, Tahir explains.
“Data engineering is an incredibly fast-evolving field with new tech popping up every day. You always have to be up to date, and that can make it quite difficult to decide how to do things. But that is also what makes it fun.”
Data engineering also contains quite a lot of development: gathering and processing data comes to life through code. Additionally, there’s a separate team of software developers working on developing the frontend part of REP. Together, these teams have made REP a success and now the tool is in use all across our Norwegian continental shelf assets with more to come.
Back in February 2018 the team was fresh to data engineering at the scale they’re working on now. They all had computer science backgrounds, but came from different fields and most were new to the oil and gas industry.
“Naturally, the start was all about learning data engineering and figuring out how to do this, learning about our Omnia data platform and the subsurface world. Simultaneously, we had to do some work on the project,” Kjetil Tonstad says.
Kjetil took over as team lead after the team had delivered a successful MVP in as little as 6 months. And the people are one of the key ingredients in the recipe for success. Jay Ma explains that while they were learning everything from scratch in the beginning, their success with the MVP relied on their dedication.
“Everyone is really engaged in learning and helping each other out. We also had an excellent product owner who gave us a great introduction to the field,” Jay says.
Making the data work together is one thing but understanding what the data is actually about is just as important.
“We have to work with large amounts of data from various sources. Rather than spending a lot of time knowing all the details of the data, it can be much more efficient to know who’s already an expert in it and utilize their knowledge,” Jay says.
The flow of data
If you break it down into four steps, the data flow of data engineering looks a little like this:
(1) Uploading raw data to the cloud. (2) Transforming and enriching data by combining it with other sources. (3) Changing data into appropriate formats. (4) Putting the data into a database and allowing users to connect with APIs.
A truly data-driven future
Their agile way of working and making data available has proved to make the Reservoir Experience Platform a great success. Since they started as the very first team of their kind, the data engineering community has grown. Today, there’s probably 15-20 different teams working with data engineering in one shape or form.
“There’s a lot of talk about digitalization everywhere but we’re the ones who are actually working on making it happen. It’s becoming more and more recognized all the way to the top and prioritized more, which is great to see.”
Even though the number of data engineers has grown since the Omniators were assembled in 2018, the kind of people we’re looking for hasn’t changed much since then, Merete Svidal Llewelyn, Leader IT, says.
“We’re looking for IT professionals who understood both the value of data and that we had to make it accessible in order to make digitalization a reality.”
Like most IT teams in Equinor, the Omniators are an agile bunch - which helped them deliver their MVP at such a fast speed, Shahila Retnadhas tells us.
“We also get together and present our first thoughts for a solution at the beginning of the sprint. This lets us get initial feedback and ideas from others so we know if we’re going in a direction that’s been proven to not work.”
In this video, Shahila gives us an introduction to the first steps of data engineering.
Taking data to the cleaners
With data spanning across several years, maybe even decades, these small inconsistencies can become quite a challenge. And as a data engineer you work with raw data that can contain both human- and machine-made errors. You might even get excited about seeing certain databases heading your way!
“I always get excited when I get a task related to the WellDB G-drive because I know it’s going to be a challenge. It triggers my analytical skills, since I have to find a pattern of eliminating these inconsistencies. And it’s a lot of fun to tackle.”
Data engineering can sound simple but it’s quite a varied and challenging field. The data they’re working with is not always structured according to a defined standard and in this age of data, there’s more and more unstructured data coming from different sources like IoT devices.
All of these data have to be cleaned, enriched and transformed before it can be valuable and meaningful for the end users, Shahila explains:
“For example, one challenge can be that naming standards for wellbores aren’t followed completely. Instead, there might be a dash where there should be a slash, and we have to standardize it in order to make it usable across the business.”
Tackling a challenge
Challenges aren’t only found in one database however, they’re all over. Being adaptable and open to new things will also help you a great deal as a data engineer. Few, if any, days are the same, Sindre Osnes explains.
“You have to understand different data formats and sizes, but you also need to grasp coding and infrastructure. These ever-changing challenges and problems you have to solve is what makes data engineering fun.“
Issues might not even appear in the data until a late stage of the team’s work. Then, they have to start digging into the data to find where the root cause is.
“Sometimes it’s an easy fix while others are more demanding. But we have to make sure we’re showing/uploading the right data. Displaying the wrong data can cause more damage than showing no data at all,” Sindre says.
Making use of the data available is the most obvious perk to data engineering, but far from the only one, Omniators team lead Frode Jansen Lande tells us.
“I think a lot of people don’t realize that when you put different data sets together you also uncover problems or errors in the data quality. And when you know what the problems are, you can start working on fixing them.”
Frode Jansen Lande
He’s the current team leader for the Omniators, having taken over the reins after Kjetil left to lead another data engineering team. While the Omniators team is one of many teams, the goal is to spread this way of working organically to other teams like it.
“I think it’s important to have a mix of different people like we do, because it leads to more and different questions,” Frode says.
Data: the new oil?
Some say less is more, but that’s not the case with data. Here, more is more and Tahir tells us that he believes that if there’s any industry capable of tapping into the true potential of data engineering it’s the energy industry.
“You can always run your prediction using few samples, but more data means more accurate decisions. Equinor has so many data available that’s untouched - there’s literally petabytes of unexplored data and much, much more still to come. That makes being a data engineer here incredibly interesting,” Tahir says.
His Omniators colleague, Andrey Kuznetsov, explains that while Equinor alone has tons of data, other companies have the same - or more. This can lead to an interesting future.
“We can use this for research and science, but also to make better decisions as a company. It’s exciting to be a data engineer and help people make use of all this data, instead of spending their time engineering. I think all this data really is the new gold."
It will be exciting to see what our data engineers and others can do in the future! Don’t forget to subscribe to our newsletter below. That way all the latest stories will come flying into your inbox right away. Until next time!
Frode Jansen Lande
Eskil Høyen Solvang