Here are some highlights and my notes from reading the Kanban: Successful Evolutionary Change for Your Technology Business by David Anderson.
The Agile Manifesto doesn’t say anything about quality, although the Principles Behind the Manifesto[1] do talk about craftsmanship, and there is an implied focus on quality. So if quality doesn’t appear in the Manifesto, why is it first in my recipe for success? Put simply, excessive defects are the biggest waste in software development. The numbers on this are staggering and show several orders of magnitude variation.
There seems to be a psychological advantage in asking developers to write the tests first. So-called Test Driven Development (TDD) does seem to provide the advantage that test coverage is more complete.
In trying to organize stuff, you end up with much better results by asking “Where will I look for this when I want to find it?” instead of “Where should I put this?” The same thing is true of development. When you start by asking “How do I test this?” you generally end up with much better code.
The big bang for the buck comes from using professional testers; writing tests first, automating regression testing, and code inspections.
Professional testers understand edge cases. Getting them involved before the code is written can dramatically reduce the number of defects that get committed to the code base. Leveraging their skillset early in the process is one of the best ways to increase the quality of the code.
Longer lead times seem to be associated with significantly poorer quality. In fact, an approximately six-and-a-half times increase in average lead time resulted in a greater than 30-fold increase in initial defects.
Reducing work-in-progress, or shortening the length of an iteration, will have a significant impact on initial quality.
I found this to be insightful. The longer you drag out working on a feature, the more errors you’ll get. Things that shorten the amount of time it takes to finish a unit of work will reduce the amount of errors that slip in.
Reducing WIP shortens lead time. Shorter lead times mean that it is possible to release working code more often. More frequent releases build trust with external teams, particularly the marketing team or business sponsors. Trust is a hard thing to define. Sociologists call it social capital. What they’ve learned is that trust is event driven and that small, frequent gestures or events enhance trust more than larger gestures made only occasionally.
It is easy to overlook how the signals we send to others are perceived. Thinking of trust as being event driven–not magnitude driven encourages a very different development approach.
Complexity in knowledge-work problems grows exponentially with the quantity of work-in-progress. Meanwhile, our human brains struggle to cope with all this complexity. So much of knowledge transfer and information discovery in software development is tacit in nature and is created during collaborative working sessions, face-to-face.
You need slack to enable continuous improvement. You need to balance demand against throughput and limit the quantity of work-in-progress to enable slack.
Managers are often trained to remove slack. The idea that slack is necessary to give people time to improve things is likely to be seen as either incredible insightful or heretical.
There is little point in paying attention to prioritization when there is no predictability in delivery. Why waste effort trying to order the input when there is no dependability in the order of delivery?
Improving prioritization requires the product owner, business sponsor, or marketing department to change their behavior.
This is how I think a team should mature: First, learn to build high-quality code. Then reduce the work-in-progress, shorten lead times, and release often. Next, balance demand against throughput, limit work-in-progress, and create slack to free up bandwidth, which will enable improvements. Then, with a smoothly functioning and optimizing software development capability, improve prioritization to optimize value delivery.
Variability results in more work-in-progress and longer lead times.
Several of his predecessors in this position were still colleagues working on other projects within the same business unit, and they worried that were he to improve the performance, it would make them look bad in comparison.
Unfortunately this is the political reality of many businesses. People are willing to expend a great deal of effort in order to keep from exposing flaws in their existing processes.
We quickly calculated that the estimation effort alone was consuming around 33 percent of capacity, and on a bad month it could be as much as 40 percent.
In physics there is something called the Observer Effect which basically says that some things can’t be measured with out some how changing the thing that is being measured. In software, you can’t accurately estimate how long a feature will take without consuming time that could have been spent completing the feature. Trying to calculate how much time is being spent/wasted on estimation is a good way to bring this to light.
The process was running so smoothly that Dragos had the Product Studio tool modified so it would send him an email when a slot became available in the input queue. He would then alert the product owners via email, who would decide among themselves who should pick next.
He reduced the test team from three to two and added another developer (Figure 4.6). This resulted in a near-linear increase in productivity, with the throughput for that quarter rising from 45 to 56.
I’m curious if the addition of a developer was done along with a move toward test driven development which might have lessened the amount of time the testers needed to spend by pushing some of the work earlier in the process and onto the developers.
When leaders of business units start collaborating, so, it seems, do the people within their organizations. They follow the lead from their leader. Collaborative behavior coupled with visibility and transparency breeds more collaborative behavior.
The opposite seems to be true as well. When leaders are basically fighting, organizations find it difficult to collaborate at all levels.
In some political situations, there may be an official process that is not being followed. When you attempt to map the value stream, your team will insist that you re-document the official process, not the actual process being used. You must resist this and insist that the team map the process they actually use. Without this, it will be impossible to use a card wall as a process-visualization tool because team members can use the card wall only if it reflects what they actually do.
And this is one of the big values of mapping the value stream. It shows what is actually happening, not what is “supposed” to be happening.
Some teams have adopted a convention of showing buffers and queue columns by using a 45-degree rotated card. This provides a strong visual indicator of how much work is flowing rather than queuing at any given instant in time. This allows the team and other stakeholders to “see,” literally, the amount of economic cost.
Electronic tracking is necessary for teams that aspire to higher levels of organizational maturity. If you anticipate the need for quantitative management, organizational process performance (comparing the performance across kanban systems, teams or projects), and/or causal analysis and resolution (root-cause analysis based on statistically sound data), you will want to use an electronic tool from the beginning.
What is interesting about these two quotes from the book is the ability to just organically adopt something like rotating cards to 45 degrees is much harder to do in an electronic system. There is probably some room for an electronic kanban tool that offers almost free form arrangement of cards while still tracking metrics.
Some teams like to use additional, smaller sticky notes with names, or small avatars stuck to the work item to signify who is working on it. This allows everyone on the team to see who is working on what.
Queue replenishment meetings serve the purpose of prioritization in Kanban. This prioritization is said to be deferred or postponed until the last reasonable moment due to the nature of the queue-replenishment mechanism and the cadence of the meetings. Queue replenishment meetings are held with a group of business representatives or product owners (to use popular Agile development vernacular). It’s recommended that these meetings happen at regular intervals. Providing a steady cadence for queue replenishment reduces the coordination cost of holding the meeting and provides certainty and reliability over the relationship between the business and the software development team.
Ideally, a prioritization meeting will involve several product owners or business people from potentially competing groups within the company. The tension created by this actually becomes a positive influence on good decision making and stimulates a healthy, collaborative environment with the software development team.
In other parts of the book, he talks about how the collaborative bargaining process of getting the groups to work together to vote on what filled the empty slots, can start to influence and improve the way people work together across the entire organization.
The cadence of prioritization meetings will affect the queue sizing in the kanban system and hence the overall lead time through the system. To maximize the agility of the team, it is recommended that the meetings be as frequent as is reasonably possible; weekly is a commonly recommended interval.
With Kanban it still makes sense to triage defects. However, the most useful application of triage is to the backlog of items waiting to enter the system.
The purpose of a backlog triage is to go through each item on the backlog and decide whether it should remain in the backlog or be deleted. It is not to stack rank or provide any prioritization beyond the simple keep-or-delete choice.
Some teams have avoided the need for triage through automation and policy. The Microsoft XIT team from the chapter 4 case study would delete any item older than six months at a regular monthly interval. The reasoning was that if an item had not been selected for the input queue in six months, it was unlikely to be of significant value and therefore unlikely ever to be selected. If this changed, it was also likely that it would be requested again and hence nothing was lost by deleting it from the backlog.
This is a great way to prune the backlog because it is easy to gradually build up a huge numbers of stories that seemed like a good idea at the time, but as more knowledge is acquired and more of the product is built are no longer relevant or can be expressed in a much better way.
Kanban dispenses with the time-boxed iteration and instead decouples the activities of prioritization, development, and delivery. The cadence of each is allowed to adjust to its own natural level. Kanban does not dispense with the notion of a regular cadence, though. Kanban teams still deliver software regularly, preferring a short timescale.
Kanban still delivers on the Principles Behind the Agile Manifesto. However, Kanban avoids any dysfunction introduced by artificially forcing things into time-boxes.
I’ve met many people who seem surprised that you can claim to be Agile without using Scrum. Agile is a set of principles. There are many ways to follow those principles. It can be done with Scrum. It can be done with Kanban. It can be done with other methods.
The equation to calculate the efficiency of delivery can be assessed in two ways. The simpler way is to look at the labor and costs involved. The more complex method is to consider the value delivered.
Kanban decouples delivery cadence from both development lead time and prioritization cadence.
This is a key differentiator between Kanban and Scrum.
By choosing to eliminate estimation for most classes of service, both the transaction costs and the coordination costs of prioritization are reduced. This reduction facilitates much more frequent prioritization meetings because the meetings remain efficient. This has enabled kanban teams to make ad hoc or on-demand prioritization.
Estimating how long it will take to make a feature is significant work because it usually means solving the hard parts of the problem. Estimating the cost of a building makes sense because once you decide how you are going to build, you still have to do the work of building it. In software, the things that take a long time require thinking. For the estimate to be accurate on a non-trivial feature, you have to do the hard part of the work–thinking and solving the difficult problems.
You can eliminate estimation and just categorize features into a few bucket sizes. Basically saying, “Is this feature smaller, about the same, or massively bigger than our normal feature size.” Typically this type of “size categorization” can be done in a few minutes or less.
Eliminating estimation or at least detailed estimation is a big deal. If you have always done estimation it seems like a bad idea. However, if you have internal resources and they are going to be working on something. It is more important to make sure that they are working on the most important thing (prioritization) than knowing exactly how long it will take (estimation).
The place where estimation becomes important is when you are trying to figure out whether a feature or improvement will have positive ROI. The logic is that if it costs $10,000 to implement a feature then it needs to pay for itself in three years (or five or some other amount of time). The problem with doing detailed estimates is that it can encourage you to do work with a low ROI. If you are working on a feature that would lose money at double or even five times an estimate that is little more than a guess, then you aren’t working on high value work.
This only works if your features / stories are small. Implementing a features that you think will take 1 day, but ends up taking 10 is much different than a monolithic approach where you think it will take 1 year, but instead it takes 10.
Prioritization of new work requests such as user stories involves coordination of many people from various functions. All of this coordination has a measurable cost.
Estimation to facilitate prioritization decisions represents the transaction costs in both time and money associated with prioritization. These costs can be determined and tracked.
Planning games used in Agile methods do not scale easily and can represent a significant coordination cost for larger teams with broader focus than a single product line.
Frequent prioritization meetings build trust.
Trust is important. It is worth making sure your team recognizes that building trust with stakeholders is a valuable thing to pursue.
There has been some research and empirical observation to suggest that two items in progress per knowledge worker is optimal. This result is often quoted to justify multi-tasking. However, I believe that this research tends to reflect the working reality in the organizations observed. There are a lot of impediments and reasons for work to become delayed. The research does not report the organizational maturity of the organizations studied, nor does it correlate the data with any of the external issues (assignable-cause variations, discussed in chapter 19) occurring. Hence, the result may be a consequence of the environments studied and not indeed an ideal number. Nevertheless, you may encounter resistance to the notion that one item per person, pair, or small team is the correct choice. The argument may be made that such a policy is too restricting. In that case, setting a WIP limit of two items per person, pair, or team is reasonable. There may even be cases where a limit of three per person, pair, or team is acceptable.
Even if two items per person is less optimal than one, it is still a huge improvement over having each person working on an unlimited number of items at the same time.
I’ve become convinced that the tension created by imposing a WIP limit across the value-stream is positive tension. This positive tension forces discussion about the organization’s issues and dysfunctions.
The idea here is that if you enforce a small WIP limit, it will help bring to light other parts of the organization that are having problems. So if everyone needs to have more than one item in process at a time, it is probably an indication that there are process bottlenecks or over-allocated resources that are delaying everyone so they can’t complete a single work item from start to finish.
I would prefer a definition of a minimum marketable release (MMR) that describes a set of features that are coherent as a set to the customer and useful enough to justify the costs of delivery.
Obviously the amount of work necessary to justify the cost of delivery goes down as the cost of delivery goes down. There is significant value in trying to keep the cost of delivery low so value can be delivered as frequently as possible. If the cost of delivery is zero, then even very simple changes that will have a small positive ROI can start to delivery value quickly.
It does not make sense to treat an MMR as a single item to flow through our kanban system. An MMR is made up of many work items. MMR makes sense from a release transaction-cost perspective, not from a flow perspective.
On the other hand, as many have found out, “the first MMF is always large” because the first release of a new system must include all of the essential capabilities to enter a market and all of the infrastructure to support them. There can be two or three orders of magnitude difference in the size of MMFs (or MMRs). A work item type with instances that vary in size up to a thousand-fold will be problematic.
Some of this can be fixed by continuously delivering the application into production or a production like environment. Even if the first version of the app is just a single blank web page with the the projects name.
However, using a Minimal Marketable Release (MMR) to trigger a delivery, in conjunction with smaller, fine-grained work item types, is likely to minimize costs and maximize satisfaction with what is released. Teams can adapt to this challenge by focusing on analysis techniques that produce a lower level of requirements, such as user stories or a functional specification. These will generally be fine-grained, small, and have a relatively small variation in size. An ideal size would be something in the range of a half-day to four days or so of engineering effort.
If your delivery cost is low enough then pretty much anything can be a minimal release.
Shared resources should develop their own kanban systems. A network of kanban systems for shared resources across a portfolio of projects can be thought of as a service-oriented architecture for software
Interface points between teams using Kanban and teams using other approaches can cause a lot of friction.
We are doing Kanban because we believe it provides a better way to introduce change. Kanban seeks initially to change as little as possible. So change with minimal resistance must be our first goal.
The ability to use Kanban to visualize your existing processes is powerful because it means you can start using it without changing anything. It is a lot easier to get people to buy into “lets visualize our work and see if there are ways to improve it” than “lets change everything.” The book points out that you have to be careful to visualize what is actually happening–not what is supposed to be the process.
Goal 1 Optimize Existing Processes Existing processes will be optimized through introduction of visualization and limiting work-in-progress to catalyze changes. As existing roles and responsibilities do not change, resistance from employees should be minimal.
Goal 2 Deliver with High Quality
If, for example, we set a strict policy that user stories cannot be pulled into acceptance test until all other tests are passing and bugs resolved, we are effectively “stopping the line” until the story is in good enough condition to continue. With a team new to Kanban, we may not implement such a strict rule, but there should be some policies relating to quality that focus the team on developing working code with low defect numbers.
Goal 3 Improve Lead Time Predictability
Goal 4 Improve Employee Satisfaction
Work/life balance isn’t only about balancing the number of hours someone spends at work with the number of hours they have available for their family, friends, hobbies, passions, and pursuits. It is also about providing reliability. Say, for example, that a team member with a passion for art wants to take a painting class at the local middle school. It starts at 6:30 p.m. and runs every Wednesday for ten weeks. Can your team provide certainty to that individual that he or she will be free to leave the office on time each week in order to attend the class?
Goal 5 Provide Slack to Enable Improvement
The throughput delivered downstream is limited to the throughput of the bottleneck, regardless of how far upstream it might be. Hence, when you balance the input demand against the throughput, you create idle time at every point in your value chain except at the bottleneck resource.
Without slack, team members cannot take time to reflect upon how they do their work and how they might do it better. Without slack they cannot take time to learn new techniques, or to improve their tooling, or their skills and capabilities. Without slack there is no liquidity in the system to respond to urgent requests or late changes. Without slack there is no tactical agility in the business.
Goal 6 Simplify Prioritization
Goal 7 Provide a Transparency on the System Design and Operation
While transparency onto work requests and performance is all very well, transparency into the process and how it works has a magical effect. It lets everyone involved see the effects of their actions or inactions. As a result, people are more reasonable. They will change their behavior to improve the performance of the whole system.
Goal 8 Design a Process to Enable Emergence of a “High-Maturity” Organization
The traditional approach to forming a commitment around scope, schedule, and budget is indicative of a one-off transaction. It implies that there is no ongoing relationship; it implies a low level of trust. The Kanban approach is based on the notion that the team will stay together and engage in a supplier relationship over a long period of time. The Kanban approach implies lots of repeat business. It implies a commitment to a relationship, not merely to a piece of work. Kanban implies that a high level of trust is desired between the software team and its value-stream partners. It implies that everyone believes that they are forming a long-term partnership and that they want that partnership to be highly effective.
The outcome you desire is the most frequent release cadence that makes sense. So start by asking, “If we give you high-quality code with minimal defects, and it comes with adequate notice, transparency into its complexity, and reliability of delivery, how often could you reasonably deploy it to production?” This will provoke some discussion around the definitions you’ve used, and some reassurance will be required. However, you should push for a result that maximizes business agility without over-stressing any one part of the system.
The Theory of Constraints was developed by Eli Goldratt and first published in his business novel, The Goal, in 1984. Over the last 25 years, The Goal has gone through several revisions and the theoretical framework known as the “Five Focusing Steps” has become more obvious in more recent editions.
A long-term fix may have been to invest heavily in test automation. The key word in the last sentence is “invest.” If you find yourself saying “invest,” you are generally talking about an elevation action. Adding resources is not the only way to elevate capacity. Automation is a good and natural strategy for elevation.
The key part of the word “invest” is that it isn’t a last minute knee jerk reaction to a problem.
Non-instant availability resources are not, strictly speaking, bottlenecks. However, by and large they look and feel like bottlenecks, and the actions we might take to compensate for them are similar in nature to those for a bottleneck.
One example of non-instant availability that I observed occurred with a build engineer. The company had a policy that only configuration-management team personnel were allowed to build code and push it into the test environment. This policy was a specific risk-management strategy based on historic experience that developers were often careless and would build code that would break the test environment.
Many processes are put in in place to slow down actions that have produced problems in the past. If something gets ordered that shouldn’t have been, management may look to putting in another layer of approval for purchasing. Adding layers of approvals is easy to do, but it isn’t the only way to make a process reliable.
When you ask Scrum advocates who vehemently argue that daily standup meetings are value-added, whether they would hold the standup twice daily or whether they would lengthen it from 15 minutes to 30 minutes, they will surely reply, “No!” “Well, if standup meetings are truly value-added,” I reply, “then surely doing more of them would be a good thing!” This is really the acid test that demonstrates the difference between a truly value-added activity and a transaction or coordination cost. Developing more customer requirements is clearly value-adding. You would do more of it if you could and the customer would gladly pay for it. Planning is clearly not value-added. The customer would not pay for more planning if he could avoid doing so.
This is a good way to categorize work. One one hand we have, “things we want to spend more time doing” and on the other “things we don’t want to spend more time doing”. This doesn’t mean that non-value work isn’t important, but it exists to help us be more efficient or to enable the value added work.
Within a couple of years, a template for writing user stories had emerged from the community in London. As a , I want a , in order to The use of this template greatly standardized the writing of user stories. One of the creators of this approach, Tim McKinnon, reported to me in 2008 that he now had data to show that the average user story was 1.2 days of effort and the spread of variation was a half-day to about four days.
When a requirement made it the whole way through to acceptance testing before it was rejected as not what the business really needed, the team reacted by creating a waste bin on the board and placing the ticket in it.
The combination of transparency, reporting, and building awareness of the impact and cost of poor requirements resulted in the business voluntarily changing its behavior. The waste report that showed the effect of poor requirements initially showed five to ten items per month. By the fifth month it was empty. The business had come to appreciate that by taking more care, they could avoid wasting capacity. They voluntarily collaborated to make the outcome of the system better.
This is a key benefit of Kanban. It gives you an easy way to visualize what you are doing. Once you get things out where you can see what is happening, it is much easier to have a conversation about how things could be done differently. Getting people to change their behavior based on seeing a better way is much more effective than putting rules in place to try to fix problems. Kanban improves processes by allowing people to respond intelligently with flexibility instead of trying to lock down the processes and tie people’s hands.
In this approach, you leave the WIP limits, buffer sizes, and working policies fairly tight, and you cause the work to stop when things become blocked. Idle time for those people assigned to blocked work raises awareness of the blocking issue. It may cause a swarming behavior to try and fix the issue, which has been seen to encourage those idle team members to think about root causes and possible process changes that will reduce or eliminate the possibility of recurrence. Keeping WIP limits tight and pursuing issue management and resolution as a capability has been seen to create a culture of continuous improvement.
Stopping the process when something is blocked makes people more aware of whether the end result is going to be delivered instead of just focusing on whether or not they are busy doing their part. This is a simple way to enable system thinking. If something is broken in the chain, it doesn’t matter how productive you are individually.