Articles

From Love To Hate And Back Again. Who Data Engineers Are And How To Become One

First steps and typical problems you are likely to face in your journey to becoming a data engineer

 

First of all, you need to understand the role of a data engineer within the related hierarchy of professions. 

The data engineer is just one of the many professions directly related to data. 

 

The hierarchy outlined in the diagram above shows which role a data engineer plays in the process starting from data collection and concluding with the creation of models/business insights/reports.

 

At the bottom of the hierarchy, there are the backend engineers and architects, then there are the experienced data engineers who also work with the infrastructure, and, finally, we have the “ordinary” data engineers. 

 

As we can see from the image, data transformation and validation can also be part of data engineering. In short, a data engineer is a professional who collects, delivers, processes, and stores data. They are also responsible for keeping the data up-to-date, making sure that it is available and correct.

 

How can I become a data engineer? 

 

The field of data engineering is relatively new. For example, five years ago, it wasn’t so relevant and few people knew about it. A data engineer’s tasks could be performed by developers, data scientists, and analysts. 

 

However, the market is growing rapidly, data flows are increasing, many companies are now switching to data-driven solutions, and the work that other specialists were doing before is no longer sufficient. 

 

This is how the need for data engineering emerged. The field is multifaceted and,  depending on the company and its needs, the tasks and responsibilities of data engineers may vary greatly. That’s why there is no conventional training that will allow you to master everything from scratch and land a job as a data engineer. In this field, learning comes from experience, by trial and error. 

In most cases, there are two main ways of entering the data engineering field.

 

1. Switching to data engineering from the backend

 

Most of the principles of these professions are similar, so the transition is quite smooth. The transition may be justified by the requirements of a company or the individual’s desire to work in a new field. It is not about loving or hating data because to a developer, data is simply a resource to be utilised. This resource can create difficulties, of course, but the professionals must always take up the challenge. However, the real challenges start when the developer becomes a data engineer and is bombarded with data-related questions, tasks, and bugs. 

 

2. Transitioning from the field of data analytics

 

Let’s take for example someone who has always enjoyed working with data, who likes to study it carefully, and is able to see not just a series of symbols, but the information it can provide. And now this person has a desire to move into the field of engineering. It is worth noting that this is not the only option for a data analyst, but this is the option we will mention now. 

 

What are the main reasons for entering this particular field? It can simply be because a person wants to professionally develop in a new and relevant area of work.

 

More often, however, there is a different reason. When it comes to the love of data, hate can quickly be born out of love when you work with it on a daily basis. The constant lack of urgently required pieces of information, the lack of documentation, the inability to check sources and fix data problems yourself, the need to wait for other specialists to fix these issues… the list goes on.

 

Certain situations, such as urgent reports, can make the analyst hate the very data they adored and cause irritation when identifying even the slightest of problems. What’s the solution? If you can’t get others to do it faster and better, just do it yourself. This is how many analysts get into data engineering. It’s not a piece of cake, you need to go through a lot of training first. After all, an analyst is not a technical expert, but at the end of the day, there are numerous opportunities that make life easier and allow you to fall in love with data and enjoy working with it.

 

 What are the basic concepts a professional needs to master in order to make the transition? 

 –        First and foremost, of course, a professional must understand SQL. There’s a great quote: “Our whole life is data, and in order to extract this data from the database, you need to ‘speak’ the same language as it.” You will have to process information from the database and use SQL when working with data. The better you know how to use it, the quicker you are able to complete a task and the higher your value as an expert will be. 

 

–        Strengthen your programming skillsremind yourself about OOP, the functional approach, multithreading, and many more things. The most popular languages for data engineering currently are Python and Scala.

 

–        Constructing data pipelines. This is probably the main task of an engineer – to build the architecture for the data delivery process. There are many concepts and tools for this. Airflow is the most popular and widely used tool at the moment. Fortunately for beginners, it is easy to use. It is advisable to learn how to work with this tool if you want to become a data engineer.

 

–        You also need to understand the basics of databases – design, structures, application, and troubleshooting, as well as understand the difference between SQL and NoSQL. Any engineer who lacks this knowledge is working blindly.

 

–        Working with the cloud. Many companies keep, if not all, at least some of their data in the cloud, so you will often see this in the job description. What’s essential here? Knowing how to upload and download data from the cloud, and how to use the platform’s tools. Knowing how to transfer data from one place in the cloud to another, e.g. from containers to the base.

 

 –        Diving into the topic of distributed systems. This is quite an old, extensive and complex area, but when working with big data, sooner or later, you will have to deal with clusters. You need at least a basic understanding of how they interact, what problems may occur and how to fix them, and if not prevent them, at least be able to find a quick solution.

 

–        Processing tools. The most commonly used ones are Spark, used to perform complex calculations within a huge data network in multiple threads to further speed up processes, and Kafka, a queuing service that allows you to centralise the collection, transfer, and processing of large amounts of data in continuous data streams, as well as store this big data without worrying about the risks of data loss and system performance.

 

Many people are reluctant to enter the field of data engineering from the ground up because they do not have the necessary knowledge. There are not many Junior Data Engineer jobs in the market. Often the job description and role requirements for the position look like middle+ level skills, especially in the eyes of a newcomer who is seeing most of these technologies for the first time. 

 

 Our team has always been supportive of professional development initiatives, so I can highlight 3 basic skills that a professional must master to be considered for a junior specialist role.

 

 The first two can be seen at the base of this pyramid (Programming and SQL) and the third, which I consider equally important, is an analytical mindset. 

 

The ability to read and understand data, understand structures, see problems and get insights. This is an essential skill for a data engineer because, as I mentioned earlier, business representatives often provide very vague terms of reference (TOR). A data engineer should be able to figure out how to perform a given task, how to collect the requested data, and where to find it.

 

 It is important to emphasise that this is the bare minimum skill set needed to get into the data engineering field. What next? Constantly develop your skills, solve more complex problems and learn new technologies.

 

 Diana Dmytrashko, Team Lead for Data Engineering at AUTODOC