6. October 2023 By Sezen Ipek and Stefan Mönk
Making the right choice: agile approaches that can be used in data science projects
Agile software development has proved to be an effective method for increasing productivity, quality and customer satisfaction in a host of industries. But what about data science (DS)? Here is one key takeaway from the interviews with colleagues from data science projects in the lead-up to writing this post: It is not a question of whether agile methods are useful and necessary in DS projects, because the fact is they are. So the only real questions are how do you go about choosing the right strategy, and how do you take the strengths of the different agile approaches and combine them.
In this blog post, we will explore the requirements specific to DS projects, explain how to choose the right agile approach and take a closer look at the current state of research.
Requirements for DS projects
Data science projects often involve complex data models, in-depth analyses and constant improvements. We will be looking at the specific requirements of these types of projects, including how to deal with large amounts of data, the need for an iterative approach and the integration of feedback loops. By understanding these requirements, we can better assess which agile approaches will be needed in the DS project. Despite having established data science process models, 82 per cent of data scientists stated in response to an earlier survey conducted by Saltz (2018) that they did not follow an explicit process and 85 per cent felt that project results could be improved if a systematic process methodology were adopted.
As a proven process model, CRISP-DM provides a set of guidelines on how to carry out data science projects. While these are helpful, they need to be incorporated into an agile framework that stipulates incremental iterations.
CRISP-DM does not describe processes like team coordination, communication or prioritisation. It defines what needs to be done, but not how. For example, feedback loops and iterations are possible at an earlier stage under the CRISP-DM approach, but there is no defined process for how and when development teams should iterate. It is therefore necessary to employ suitable project management methods that verify whether the iterative process is viable. Because implementation is a major undertaking and one of the main challenges, it is important to take a considered and systematic approach to the use of project management methods and procedures. Projects often fail not on technical grounds but rather for reasons related to process and project management.
According to Kleist and Pier, systematic communication and effective management of expectations are important factors for the success of data science projects. It is important to adopt an iterative approach that seeks to deliver continuous improvement.
Having data scientists with the requisite skill sets is another key requirement for data science projects in an agile context. Data science is an interdisciplinary field that requires specialists with expertise in statistics, mathematics and computer science. These data scientists need to constantly learn new methods and add new skills in order to keep their qualifications up to date.
Choosing the right approach
There are a number of agile approaches to choose from, including Scrum, Kanban and Lean. But not every approach is the right choice for every data science project. We will examine the pros and cons of the different frameworks and methods and show you how to choose the right approach for a specific project. Factors such as the complexity of the project, team size, availability of domain experts and flexibility in terms of the requirements play an important role.
Agile approaches are suitable for complex tasks involving unpredictable outcomes. While conventional approaches make sense if the requirements are consistent and do not change, an agile strategy is better suited for projects where the requirements are not settled and subject to change. Another rule of thumb is the more complex the project is, the more it makes sense to use agile methods and frameworks. One way to determine the complexity of a project is to use the Stacey matrix (see Figure 4), which compares the technical solution and the requirements. However, this is more of a rough guide than a definitive method to determine complexity.
According to the Stacey matrix, projects can be assigned to one of four categories in relation to their complexity: simple, complicated, complex and chaotic. The x-axis indicates the clarity of the solution, while the y-axis represents the clarity of the project requirements. The question of whether or not to use of agile methods comes down to the level of clarity available, which varies from project to project.
The Kanban agile methodology is suitable for complicated projects, whereas, according to the Stacey matrix, Scrum is the right framework for complex projects. Design thinking processes are ideally suited for chaotic projects with no clear requirements or solutions.
Agile approaches make sense for data science projects since they are classified as complex and unpredictable. This leads, however, to the question of which agile approaches to use and how to evaluate them. For example, what are the opportunities and challenges associated with the use of agile frameworks in data science projects?
We will share our insights into these questions with you in the next blog post.
The current state of research
There are a number of papers on the topic of agility that cover a variety of agile project management methods, highlighting the importance of agile thinking and action for companies in a dynamic environment.
The Scrum framework and the Kanban method are discussed in many of them, whereas there is hardly any literature on Scrumban. On the topics of data science, machine learning and AI, there are many other papers that provide a conceptual and technical overview of the fields. There are a few publications on CRISP-DM, though there is still much need for further research, especially in the agile context. This is particularly true for the CRISP-ML(Q) approach, as there is precious little research available on this. It is essential that you choose the right method or framework due to the complexity and uncertainty surrounding data science projects. A number of different approaches like Scrum, Kanban and design thinking can be considered here. However, there are only a few scientific approaches to agile data science projects. The topics of agility and data science have only been explored in the project management context by Kleist and Pier (2021) and Saltz and Suthrland (2019).
In the paper by Kleist and Pier (2021), Scrum is deployed in a data science project in the automotive industry. Saltz and Suthrland (2019) conducted a study on an agile framework and compared it with Scrum and Kanban on a conceptual level. Data thinking, which combines design thinking and data science, is a concept that also appears in the study. That being said, there is no holistic solution for agile data science projects, nor is there in-depth research available on project criteria, the compatibility of agile methods and frameworks with data science or the use of a combination of agile approaches in data science projects.
As always, it ultimately comes down to the specific requirements of the project, the team make-up and other factors when choosing the right agile software development methods and frameworks to use in your project. The question as to which agile approach is the right choice is difficult to answer. Along with the project criteria, the project phases are also critical when choosing a strategy that fits your needs. For example, Scrum or Scrumban are better suited for later phases of a project, whereas Kanban or design thinking has been demonstrated to deliver greater benefits during the early project phases. However, I will not be providing specific recommendations for action at this time. I will save this for my next blog post.
The focus here is on being agile and on supporting the teams as they work together to find the right strategy. An experimental, iterative approach can be a good choice if you wish to test and optimise best practices step by step.
In this blog post, I looked into the requirements for DS projects, described how to choose the right agile approach and discussed the current state of research.
Here is a brief recap of the requirements:
- In response to an earlier survey, 82 per cent of data scientists indicated that they did not follow an explicit process and 85 per cent felt that project results could be improved if a systematic process methodology were adopted.
- As a process model, CRISP-DM provides guidelines for data science projects that need to be incorporated into an agile framework that stipulates incremental iterations.
- Processes like team coordination, communication or prioritisation are not described in CRISP-DM.
- Project management methods should examine the iterative process and adapt it to the needs of the specific project in question.
- Systematic communication and effective management of expectations are crucial for the success of data science projects.
- Data science requires a wide range of skills in statistics, mathematics and computer science.
- Data science projects place great demands on the project management team and the people involved.
- Selecting the right agile approach is critical to the success of a project. The decision must be well-founded and carefully considered.
You can find more exciting topics from the adesso world in our blog articles published so far.