A common language is half the battle. How Data Scientist connects technology to business
Today's world of banking and finance is increasingly aware of the need to use the knowledge and skills of Data Science specialists. Organizations and companies hiring them realize that they should possess a variety of competencies. However, they usually focus mainly on technical skills, such as programming, applying algorithms, working with data or analyzing data.
In doing so, they forget about the second important set of skills that Data Science specialists should possess. After all, it is thanks to them that they can prepare better products for their clients. We are talking about soft skills. Let's take a closer look at them and see what they really mean and why it is worth paying special attention to them:
Understanding business processes
A Data Scientist does not create a model or algorithm just for the sake of creating one. He builds a solution for the business, for someone who will use the results of his work. That is why it is so important to understand what business processes our solution will touch. This is important because then our expert will be able to select variables, data, algorithm more precisely and, above all, weave the solution into the whole process more efficiently. We will then avoid a situation in which we could say, with a wink: "The operation was successful, the patient died."
Ability to present results
Every Data Scientist knows that moment when he or she has prepared the data, analyzed it, prepared a model, checked many different algorithms and variables, and can say that he or she has the result ready and is just waiting to put the solutions into production. Only then comes the time to present the results to the business, before we hand over the model for implementation. And this is where the problem very often arises. It turns out that Data Science specialists are so focused on technical-algorithmic issues that they can't clearly and, above all, simply communicate their results to business users. Sometimes the problem is no longer the content itself, but the way it is presented. They prepare PowerPoint presentations and forget to arrange all the information in a very logical sequence, highlight the most important points, make sure the charts are relevant to what we are saying. Or they prepare a document that describes the approach used and find that it is simply unreadable. It lacks paragraphs, marking of the most important information, formatting of the text, titles of subsections, etc. All of this makes it harder for the recipient to find the information they need and worse for them to perceive the material presented to them.
Ability to explain the algorithms used
Sometimes during a recruitment interview, he asks a question asking him to explain some algorithm. Although it is apparent that the candidate has experience with the algorithm, knows what it means and how to apply it, but often cannot explain in simple terms what it is all about. And although business clients usually do not need to go into detail and get an idea of what the algorithm is all about, it is often necessary to discuss with them and talk about the solution used. Then we will still have to tell how the algorithm captures certain dependencies or does not do so. Or, being in a Senior position, there will be a need to explain to some team member what the solution is all about. Keep in mind that not everyone will have advanced mathematical knowledge and we need to be able to convey the most important information in such a way that it can be understood by our audience.
Conducting discussions on the variables used and the approach to the problem
Often, in order to build the right solution, the Data Scientist should talk to experts in the area, that is, acquire so-called domain knowledge. He should find out what their experience is, present what the data says, and talk about which approach is the best one to implement. It may be that someone on the other side cares a lot about including certain assumptions or variables. A good Data Scientist is able to capture this, analyze these needs and juxtapose them with the possibilities provided by the data, the business process and the technical environment.
Critical approach to results and data
It is a popular statement that the process of data cleaning and preparation takes 80% of the time of a machine learning project. This indicates that the data may deceive us more than once and we may not notice errors in the data that will affect our model. It's also a cautionary tale, and we need to keep in mind that data doesn't always reflect the business process we are dealing with. Therefore, when analyzing the results of a model or analysis, it is important to take a critical look at them all the time. It may turn out that our poor performance is not the result of the given variables, but the effect of changes in business and reality. And also conversely, our very good result may be the result of misprepared data or some leakage that occurs in it. There is a reason why it is often joked that if the first result is over 90%, it is not good.
Feature Engineering, or creating new variables, is an art. It is difficult to tell someone who is just starting out in Data Science how to create such variables. This is where creativity and domain knowledge are needed. The process of selecting data is not simple and is not subject to strict rules. This is when you need to consider what might be helpful and what might not. Will information about the number of days since the customer's last campaign help detect a potential churn, or perhaps a variable about the customer's custodian would work better? Or perhaps the number of days since the last campaign is not important, but the number of weeks or the number of days, but working days? In addition, the data that Data Scientists use is usually not perfect which can negatively affect the model. Then you need to solve this problem either from the data side or the algorithm side. Creativity also comes in handy when you need to get around non-standard problems, when the usual XGBoost is no longer enough. Then you need to use other methods, sometimes combining several into one algorithm, and in another situation testing multiple solutions to find the right one. There are usually no beaten paths in Data Science, so you have to find new best-fit solutions every time.
Soft skills emerge at different points in a Data Scientist's life. Their development takes all the time, a course on one of the popular platforms like Udemy and or Courser is not enough. Technical and soft skills are interconnected vessels. Each needs to be successively filled so that development is balanced. They come with the overall increase in experience and it is important to pay attention to them, polish and train. It is also no less important that companies begin to pay attention to the level of various competencies of their employees and support them in improving them. It's no use having a good model if no one can explain how it works, whether it meets the needs of the business and what its results really mean? Especially when the finance and banking market is subject to many regulations. It's time for the Data Scientist to be a human being who can be spoken to by more than just the names of algorithms.