The 20% Is The Hardest

10 Mar 2014 on Architecture

The 80/20 rule is an oft-used metaphor in many contexts. I’m embarrassed to admit that I did not realize that there is a formal name for it, the “Pareto Principle”.

The Pareto Principle: For many events, roughly 80% of the effects come from 20% of the causes.

There are many contexts where the Pareto Principle holds true. Vilfredo Pareto, for whom the Pareto Principle was named, observed in the early 1900s that 80% of land in Italy was owned by 20% of the population. In software, Jim Highsmith has said that 20% of the features in most software systems provide 80% of the value.

I believe that the Pareto Principle holds sway in designing and implementing complete software systems. Most effective teams can get a system 80% complete fairly quickly. The system will have all the capabilities listed on the box. It will likely demo well, look pretty, and “wow” potential customers. But in most cases, it’s not done, and may not even be production ready. Getting it fully “done” may take a lot of work, and that work can be HARD. The (flawed) reason for why it’s hard is that the implementation was poor, or too complex, or used the wrong technology. I’d propose that it’s hard because it’s HARD. The knee-jerk reaction is, “Let’s rewrite it.” I think that’s wrong in many cases. What should be done is to knuckle-down and solve these difficult problems.

In most cases, systems that are only 80% complete might be functionally complete. However, one-hundred percent completeness almost always comes down to fully satisfying the non-functional requirements (NFRs).

Non-functionals

These requirements specify qualities of the software, and in some cases, the “how” of things. These are things like maintainability, scalability, security, performance, etc. It’s been my experience that most teams are aware that these things are important, but don’t know how to define them properly, measure them properly, or remain focused on their importance.

An example might be a performance requirement that specifies “the system must be fast”. That’s clearly not something that can be tested. “The system must load a web page in 1 second”. Better, but does this refer to all pages? Under what user load? In what geographical regions? At what time of day? Scalability requirements tend to be harder, since they’re more challenging to test. “The system must scale such that 1 gazillion users can fiddle the widget at lunch time”. Even this is a weak requirement, because it doesn’t say anything about cost, which tends to be an integral aspect of scalability.

Once all the non-functionals have been defined, they must be implemented. There are times when they must be done immediately, as part of the initial release. But in many cases, implementing all of the non-functionals is not a requirement for the minimum viable product (MVP). Scalability is an example. Your initial solution may be scalable for your anticipated first year of growth, which gives you time to improve things for year two’s growth.

Done Is Hard

However, there comes a point when the rubber meets the road. Your non-functionals are there for a reason, and they must be met. And it’s at this point that things get really difficult. If the product is doing well, your users may be happy, but asking for more features. The user base may be growing. Tolerance for outages is quite low. Perhaps over the year (or more) period since you’ve launched the product, you haven’t spent enough time reducing your technical debt. Maintenance is harder than it should be, because your code coverage is poor, the design has spaghettified, and you’ve lost some of the original members of the development team.

This is where it gets a bit scary. I’ve seen this in a number of places. The instinct of the team is, “I don’t understand this. The design is broken, it’s too complicated, and we’re never going to be able to scale this thing. Let’s rewrite it.”

I think there may be times where following this instinct makes a lot of sense, but not enough thought is given to the alternative approach. Here is a worst case when a rewrite is getting considered:

Initial system gets built. Doesn’t meet all NFRs.
More functions get added to system.
Time to meet the NFRs
Attempt made to meet NFRs.
Before getting started: It’s too hard. Rewrite the system.
V2 gets built. Doesn’t meet all NFRs, but is functionally equivalent to V1.
Time to meet the NFRs.
Attempt made to meet NFRs.
Before getting started: It’s too hard. Rewrite the system.
Repeat until IPO, everyone gets fired, or you die.

The path of least resistance is a rewrite. Starting with a new design metaphor, language, framework, etc is all EASY. It’s easy to reason about how to get the system re-built. However, it’s difficult to fully learn the ins and outs of a design, a language, a framework, and understand it well enough to be able to meet all of your NFRs. Many organizations undervalue this knowledge and the skills that go with it.

What To Do?

The pat answer is indeed one of the correct ones: build your NFRs in from the ground up. Setup a continuous delivery pipeline that not only functionally tests your system, but also has tests to ensure it meets NFRs. Make these test stages gating: when they fail, you don’t push your product into production. You no longer allow commits, and the team must rally to fix these issues. Unfortunately, testing NFRs is fairly immature. Like much else in our business, it’s hard.

I specifically said “one of the correct” answers in the last paragraph. This is because I am a pragmatist. Many (most?) organizations do not operate the “right” way, for many valid reasons. It’s for these organizations that I’ve written this article. Accept the maturity level you have, and make a hard choice about the investment that you have already made. Yes, there are some cases where there is no feasible way to fix what’s broken. But for the cases where it is, it’s my belief that the simplest way forward is to take the asset you have and improve it. Refactor it, performance test it, add tests, all of which make it better. The remaining work is the 20%, so the visible changes will be small and hard-won. But little by little, you will reach the goal: a system that meets 100% of your entire set of requirements.

Along the way, you and your team will have learned enormous amounts. One of the learnings might be to avoid the situation entirely, and build for your NFRs from day one. More importantly, your team will have the skills and wherewithal necessary to meet the business needs. Learning how to scale an architecture, tune performance, operate securely, and maintain software are skills that must be learned through experience. I’d posit, though I have no hard data to support it, that the organization will save money in the process. Both in the raw expenditure of people’s time, but also in the acceleration you get for future investments.

The 20% Is The Hardest

Non-functionals

Done Is Hard

What To Do?

Rick Carragher

Error

Non-functionals

Done Is Hard

What To Do?

Templates:

Error