"Riding the Chaos": How 1000+ engineers increased backlog burn rate to 95%.

Delays, increasing defects, inconsistent design, rework and team burnout are a reality for all large digital projects. I would like to share my experience of building a production process, which we used to create the Alfa Business Internet Bank. We have succeeded. We have already increased the backlog convergence from 45% to 95% and in general we were able to synchronize the work of 100 teams, which is about 1000 engineers.

In this article I'll talk about the scheduling practices we borrowed from the SAFe © methodology, but first I'll introduce myself. I've been building digital services for 19 years, since 2004. I started with my own design studio, was Product Owner in Yandex and Mail.ru (now VK), was responsible for creating Sberbank's B2B ecosystem from ID and API to launching and monetizing non-banking services (Value Addes Services), and today I am developing digital channels (web, mobile, API) for legal entities in Alfa-Bank. It seems I had everything: intuitive development, canonical Waterfall by PMBok and, of course, Agile.

What we have going on

100 teams and 1,000 specialists work on the development of digital channels for legal entities. They develop 70 banking products and 20 platform services. We implement all the functionality in three channels: web, mobile app and API, and personalize it for our business segment. Roughly, that's 7,000 releases a year.

The breakneck speed and DevOps pipeline available to every team created a lot of problems. The overall consistency of online banking suffered, the interface varied from product to product. The time to implement large functional modules was constantly increasing. The number of bugs in the production environment was growing. We spent a lot of time in synchronization meetings and often redesigned functionality at the business acceptance stage, because there was no agreed vision and because of the desire to get results faster.

Problems were accumulating in the teams, stakeholders were dissatisfied, managers were putting out fires. It became obvious that something needed to change. The Agile approach and its manifesto helped us in many aspects, but did not solve the key problems. We started building the production process to our requirements 2.5 years ago. In this article, I will focus on our approaches to planning and synchronization and talk about the framework with which we create products and services on a daily basis.

Let's start with the planning

Most of the companies I know that talk about Agile and agile development actually work with long lead times and don't change plans every two weeks. Yes, the world is very changeable. Toyota in the early 2000's presented a 10 year strategy, in 2012 Apple showed a 5 year strategy. Twitter in 2020 and AMD in 2021 have already presented a 3-year strategy. By the way, in Alpha we are now also planning for 3 years.

Three-year plans are organized into annual roadmaps, which are only slightly adjusted. And this is normal: if each division changes its roadmaps, they will diverge from the overall strategy of the company, and we will not meet the expectations of the market and shareholders.

We will focus on direct planning within the year. Our task is to implement the roadmap qualitatively, not to formally close the tasks.

Our planning process: bi-weekly sprints, intra-quarter and sprint PBR, monthly review and quarterly PI Planning
Our planning process: bi-weekly sprints, intra-quarter and sprint PBR, monthly review and quarterly PI Planning

At the end of the year, we break down annual roadmaps into quarterly epics without deep detailing. Our task is to evenly distribute the load on the teams and take into account the dependencies of related teams.

On the other hand, we plan the quarter in super-detailed detail. This process helps us synchronize super complex initiatives and involves all 100 teams.

For quarterly planning, we chose the PI Planning approach - Programmatic Incremental Planning. This is a piece of SAFe (Scaled Agile Framework) methodology for large Agile teams. We have this event on two days. All teams get together and discuss plans, risks and dependencies. Each team leaves with a detailed plan that is agreed upon by all participants and backed by a shared understanding of the goal - this is super important in large and dispersed teams.

For example, we are now working on the implementation of the "Holding" mode of operation in Internet Banking - this is a mode that allows you to view and work with documents from different companies within one functional area. The initiative involves 50 teams. We reviewed the information and system architecture, planned refinements to the bank's core products and synchronized the readiness of components for the new aggregation screens of dashboards, for parent companies.

For quarterly planning, Product Owner can't come empty-handed. Every sprint we have PBR (Product Backlog Refinement) sessions. We have two PBR sessions per sprint. The first one focuses on the current sprint's tasks and the second one focuses on the next quarter's tasks. Each event takes 1-2 hours. About PBR is well described by Roman Pichler, simplifying the idea into a thesis: "The importance of regular and efficient planning and refinement of the product queue cannot be overemphasized. You need it to maintain adaptability and respond quickly to change." So take heart.

For us, regular PBR sessions help us normalize the workload of teams as we evaluate the next quarter's tasks. Without PBR, we simply wouldn't be able to keep up with the increasing workload in the final days of any quarter.

Preparing for PI Planning is a special process. First we used the GetCourse platform with training material and task tracking, then we moved to a Telegram bot. Eventually just made a PDF file with PBR schedule, task decomposition and estimation instructions and task description templates. Too bad simple solutions are hard to find.

Scrum masters lead our planning processes. They help guys fit planning prep into their sprint schedules, facilitate backlog breakout meetings, teach task estimation techniques, help manage dependencies, and prepare ROs to defend quarterly goals. Without this role, none of the planning would happen.

Overall cool. A solid quarterly plan is ready. Now we need to get the production beat in place. We've chosen canonical bi-weekly sprints. I have a simple goal: every two weeks, the Product Owner must show a useful result. Missing a deadline is unacceptable. This creates a rhythm similar to conveyor belt production. It sounds rigid, but with a lot of teams, rhythm helps a lot to keep the pace high. Interconnected teams understand when to expect their dependent increment.

Publicity is important to continually issue useful increments to the work environment. Team representatives show each other what they've launched every two weeks at a common event. The desire to not be the one who shows nothing encourages better planning and intelligently breaking down large initiatives into small deliverables. I like this element of self-organization the best because it doesn't need to be done.

So what?

100 teams, each with their individual goals, ambitions, experience and vision, are working together to create a unified online bank. Since March 2022, we have conducted six quarterly planning meetings. We have improved the quality of forecasts and transparency. Questions like "Why have you taken so little?" and "What will be the outcome?" have disappeared from the discussion.

We don't reinvent the wheel and simply apply proven market frameworks like PI Planning and PBR. The foundation of our pipeline is a single production beat with elements of public self-organization. In the last couple years, we've pumped our backlog convergence from 45% to 95%. That's pretty cool.

That's all about planning. In the next article I will tell you what product development framework we use and how.

Subscribe to Sergey Parshikov
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.