Lightsail, Bitnami and SSH

This blog runs in Lightsail on WordPress with a Certified by Bitnami instance. Lately, I’ve run into the following when trying to ssh in from the Lightsail console:

This hangs forever. A couple of times, a reboot of the instance seemed to fix it but this morning, nothing changed the behavior. So I decided to try to ssh from the terminal and lo and behold, this works. You have to download your private keys from your Lightsail account and then following these instructions worked fine from the terminal. This will probably be my default going forward.

On Zeno’s Migration and Tech Leadership

Zeno’s Migration

Jean Yang, founder of Akita Software, has been talking on Twitter lately about the chasm between the technical “guidance” from large tech firms like Netflix and Amazon and what your average, run-of-the-mill tech firm does in the wild. I’ve started calling this latter group, a much larger cohort than the former, “small tech” loosely defined as 10-50ish engineering types. This idea of Zeno’s Migration, a riff on Zeno’s Paradoxes for those who slept through their classics courses, is full of promise like the Dichotomy paradox that says because a journey is a set infinite steps, it can’t even begin much less end which is especially apropos of many small tech companies cloud migration plan.

The idea here is that in order to complete a migration, you have to travel halfway an infinite number of times. The constant refrain from technical leadership is that next quarter we’re going to be “X” farther along the journey on the migration de jour. This migration often involves shiny, jangly technology choices like “rewrite the front end in React” or “migrate everything to the cloud”. Unfortunately, this constant refrain ignores hard strategic realities on the ground where a small tech firm has serious limitations in what it can accomplish. In an ideal world, free of harsh realities, these constraints would be manageable. Alas, there are no ideal worlds.

I think the problem arises because the pervasive definitional concept in our industry is “The Project”. When everything is a project, it sets up incentive conflicts in priority determination. In an organization that is highly typical in the industry where “the business” and “technology” are considered different entities, the migration is always a project defined and defended by “technology” which puts it at odds with other projects defined by “the business”. Even at organizations that nominally follow a more product based organizational structure, the key concept is projects involving kickoffs and planning and story writing and infinite other steps. As with most systems, the focus at this level is too granular and misses the forest for the trees.

A more holistic solution is to step back at least one level in the system to “The Team” and assess the situation there. The product team becomes the atomic unit of measure, not the project. This requires a shift in thinking and language but when done correctly, Zeno’s Migration melts away in the same way Zeno’s Paradoxes do because we shift from a discrete function with a start and an end to an infinite series (the outcomes and measurements of the team on a continual improvement cycle ala DevOps philosophy). In this model, a cloud migration just becomes part of the continual improvement process.

This still raises the question of when to do such things. With limited resources, even a well organized product team needs guardrails on what to focus on. This is where organizational technical leadership can provide value with written visions and strategies. In tandem with these strategies, it’s critical to have metrics measured in outcomes that are regularly reviewed and publicized to the broader organization. These strategies should be cyclical in nature, e.g. they do not have an end but define measurements like “we will migrate X services per quarter or half”. Without these strategies defined and monitored for outcomes, even well organized product teams in the typical technology business will fail to do this migration work because it’s too easy for it to fall on the floor.

A corollary to Zeno’s Migration that I’ve seen and heard a lot is the lift and shift strategy of cloud migration. An organization, hearing that “the cloud is the future” or whatever, moves (or more often pays some consulting firm to move) their entire operation to the cloud with the idea and promise that this allows the architecture to then move to a more cloud native form over time without having to migrate where the services live.

Of course, this achieves none of the benefits of a cloud native architecture and moves all of the existing maladaptive processes along with it. But returning to “The Project” as atomic unit, this is easy to reason about, define and measure. I think there is also often an unstated concept here that technology leadership maintains related to cost and incentives. Over time, as subsidies and cost breaks expire, running an org in the cloud just like you did on prem is likely to be much more expensive. The idea is that at that point, it would be easy to explain to the business how a rewrite or rearchitecture will save money which is always the easiest metric to measure.

There are two things going on here, one part sociological and one part technical (which is convenient since these are all sociotechnical systems). The sociological part involves humans tendency to look at achievement as An End. When we look at greatness, we think of a finished thing and not as something that happens to just be quite far along the continuum of that particular thing. The second issue is that in order to deal with the continuum of technical growth and improvement, we need to have a way to move successfully from vision to tactics which is what Richard Rumelt calls Strategy. We must define the problem, the constraint and actions required to solve the problem. This helps identify tradeoffs and things we cannot do in order to achieve the vision. This is where good technical leadership shines brightest and is often missing from existing tech growth plans.

In the end, I think technical growth is contextual. A team of 30 engineers can’t read a paper from Netflix and apply it to their situation. Leadership must analyze the existing context for opportunity and then define strategies for that opportunity that define the problem to be solved and the actions that will be taken (or not taken which is a critical part of strategy) to succeed. Then the strategy must be publicized and monitored for progress along its continuum. As growth occurs, the strategy and context should be revisited for modifications as necessary. This is all hard management work that in my experience is difficult to develop and nurture. There are systems involved here with first order incentives that are easier to both talk about and create that work against the harder second order incentives to dig into the history and context of a given situation and create solutions specifically for it. Until technical leadership as an industry is better at thinking about these systems, we’ll continue to read shiny new ideas from large tech organizations and try to pound that round object into the square hole we are dealing with.

On Change

Cross posted from OT Engineering Blog and added here for when someone deletes that blog

Nothing is so painful to the human mind as a great and sudden change.

Mary Wollstonecraft

With Halloween approaching, a horror story seems to be in order. It is often said that change is scary, a cliche that like all cliches originates from some kernel of truth but then evolves into either tautology or is applied in situations that are more nuanced than a cliche can capture. Situations of change are often like that. It is one of the great paradoxes, written heavily about by many of our modern and postmodern philosophers from Marx to Nietzsche, Lefebvre to Pearls Before Swine that while we seem to crave stability, we also crave growth and development. Growth and development require change. While these things can seems scary at first glance, human beings often search them out when they aren’t getting enough in their lives. They find new jobs or new relationships or go on Amazon shopping sprees, all of which are often the easy way out for satisfying the need for change and growth. So if not all change is scary, where does the cliche come from? You know what is actually scary, deep down, bone shaking world altering kind of scary? Cancer.

Narrator: Well that took a dark turn.

Editor: Let’s wait and see where he’s going with this.

Why is cancer scary? Leaving aside the very visceral reaction one feels upon hearing about cancer and the assumptions of impending death, when we operate at a more philosophical level, cancer is scary because it represents chaotic change. It is uncontrolled growth, irrational at its core, with cells turning on each other in a race to destroy the organism even though they may not know they are doing this. Irrational, chaotic change is scary. When faced with this kind of change, humans, ill-equipped to deal with it rationally, turn to narratives, story making and rumor. We want change but we want to understand it. We almost always want the change agent to understand why things are the way they are before embarking on a radical new path. We need to assess the reality in which we operate in before changing it.

So how do we (specifically those of us operating in the technological fields) navigate times of change in a way that doesn’t involve narrative tales of impending doom and instead fulfill the very human and sociological need for growth and development? The very first thing, prerequisite before all, is to come to terms with the fact there are no purely technical systems anymore. All systems upon which and within which we operate are sociotechnical in nature, meaning they have the dual components of being sociological and technical. It is critical if we are to achieve any success in our desire for change to understand how systems work, how they are designed and operated by the humans that built them and for what reasons they exist.

Then, assuming we have the necessary understanding of systems, we can walk through the steps for successful change. The first step, as Chesterton so eloquently put 100 years ago, is to understand why things are the way they are. This is rather hard work and almost always skipped when it comes right down to it. It involves understanding and discussion and research and is messy and painful and very much on the “socio” side of the sociotechnical continuum. Often, agents of change come into a situation with their own historical context and they try to make their history the current history. This only leads to resentment and antagonism because in almost all situations, the existing system was built for important and meaningful reasons by people who were just trying to do their best to get along within the system that they operated in. To cast all of that aside with the brush of a hand and say “I know this better way” is to guarantee failure without knowing it.

Then, with a proper historical context, one must assess the reality on the ground. What is this system doing right now? What are its goals and incentives? Where are its constraints? What are its areas of leverage? What are its feedback cycles? Again, this is all quite hard. It involves many of the same steps as the discovery of historical context while closely watching the system in operation, tracking its changes and its outcomes.

Next, look to the networks, the flows of communication and how the system gets work done. Think through how changes will fit within the organizational structure. Conway’s Law is a harsh mistress and not easily subverted. Again, this third step is sociological in nature, not technical. So many agents of change believe mere technical solutions will solve everything when in fact the technical parts are typically quite straightforward and provide no benefit without the accompanying sociological work which is much different and often harder. Without looking at the networks of communication flow and organizational design, the best case is that the change takes much much longer as it works through the gates and friction built into the system. At worst, the change becomes something else as the organizational design necessarily dictates the outcome, not the architecture we design (did I mention Conway’s Law is a harsh mistress?)

Finally, after all that work is done, the change can be introduced in the form of small, perhaps tiny, experiments meant to lightly pull the levers within the system and analyze the results. In systems with high leverage, large changes will almost always result in the wrong behavior as feedback cycles you aren’t aware of take over and spiral out of control. In our business mythology, we have giants who tell stories of radical overhaul that resulted in incredible outcomes. Leaving aside the fact that history is written by the winners, it is more likely that the radical overhaul was either excruciatingly painful for a great many people or was actually implemented in a decidedly non-radical way. By coming up with and executing small, continual experiments that are analyzed to make sure their outcomes drive towards the system we want, we can, over time, improve things with minimal pain as we make feedback cycles shorter and areas of leverage less dependent on others.

Perhaps a better cliche is that successful change is hard and requires a deep understanding of systems thinking, sociological interactions and communication flow. Change is possible and almost always good. But it is a continuum and contextually specific and requires a deft hand to navigate successfully. Each situation is different and there are no easy solutions. To build something different than what we currently have, rather hard work is required. But if we put in the work, successful change is possible and deeply rewarding. It just always takes longer than anyone realizes.

Note: This essay is really just a summation of Esther Derby’s excellent book 7 Rules For Positive, Productive Change. I’ve said nothing new here and if you are interested in change, you can do worse than starting with Derby’s work.

On Stewardship

Texas Parks and Wildlife has a program called Lone Star Land Steward Awards. It is designed to recognize landowners who institute a program of restoration and care taking for the land that they own. Stewardship in this sense is defined as “a deeply held conviction that motivates landowners to care for and to sustain the land entrusted to their care…for their own personal benefit, for the benefit to future generations and for the benefits to society” (emphasis mine).

Historically, owning land has been a very present moment sort of activity. During the land rushes of the 1800s, people were granted sections of land in return for settling and working the land to produce food via agriculture. The focus was survival and immediate returns. Because land was so cheap and seemingly so plentiful, often settlers would withdraw all the resources from a piece of property and then move on after several years leaving behind a wasteland. This practice culminated in the Dust Bowl in the early 1900s where a vast landscape had been utilized for its immediate resources and then left stripped of necessary components to sustain it. In his book Goodbye to a River John Graves talks about this mentality in the Brazos river valley, the impacts it had and the cultural significance of it on the river.

This attitude towards land is derived from a need for survival. Eventually though, attitudes can change from the immediacy of today towards a more balanced approach. In fact, this is required if the land is to continue to provide for its inhabitants, human, plant and animal. An eye towards the future must take into consideration the actions of today or else catastrophe might occur leaving no choice but to move on and start over. But it takes an entirely different skill set as well as a different point of view to successfully make this happen. Steps must be taken to remove fewer resources in the present. A management strategy must be put in place to dictate how much activity goes into production for a given year versus doing work to ensure the future production is successful. Invasive species must be managed. Riparian areas must be protected.

This skill set is one that can be developed but won’t typically appear organically. Someone good at producing value for the present has a different vision for what land is for. It’s important to realize that these two skill sets have no inherent good or bad to them. Carried to extremes, both are bad. But it’s important to realize that both the values and the skills to move from a viewpoint of present resource extraction to a more holistic approach must be taught and incentivized for. Hence the program mentioned before. By awarding landowners prestigious awards for land stewardship, viewpoints about land ownership can change.

Stewardship isn’t just for land owners though. Stewardship is a concept that extends to ownership or management in general. You can be a steward of your home or your car or your business or if you happen to be a software engineer, of your codebase. Chelsea Troy has written extensively on Technical Debt and talks about the idea of stewardship for code which provided the spark for this essay. In a codebase, you see the same evolution as you do with landownership. We even use the same analogies (greenfield and brownfield), the same concepts and the same viewpoints. It takes different skill sets to do feature development versus stewardship management. Debt is the pulling of resources forward from the future into the present and eventually, all debts come due. You must have a plan for reducing that debt and then follow through on it.

So often in software (and lately in politics), the impulse is to start over, to declare technical bankruptcy. The vocabulary is designed to favor that approach (greenfield sounds so much better than brownfield). Because there is no coherent education around debt reduction and because feature development is typically the skill set selected for in most businesses, stewardship in code ownership is almost universally viewed as a negative. High priced consultants never come in and give you a plan that reduces your debt through careful maintenance and reduction of technical debt. No resumes ever cross your desk highlighting how an engineer took a set of misguided microservices and combined them into a single service because the operational overhead was destroying the organization. Because our industry is essentially in the 1800s of land ownership, all we know how to do is burn it all down and start over.

This is no more a bad thing now than it was in the 1800s, at least at a surface level. But there are lessons to be had in the environmental responses like the Dust Bowl for those of us staring at codebases with mountains of the red dust of technical debt. If we do not begin to cultivate the skills and views necessary to steward code through its evolution, we cannot hope to grow as an industry.

Stewardship in code is made harder because unlike with land, it appears that starting over is not a zero sum game, that there is no chance we might run out of the resources of bits and bytes reorganized into something different. However a brief glance deeper shows that we do not operate in an open system, that in fact the business is a closed system with limited resources that it must utilize carefully and with consideration not only towards today’s profits but towards the survival of the future. Technologists must recognize this constraint and employ techniques to make future changes not only possible but easier. This is made difficult by the fact that engineers can and do regularly move to other sections of land so to speak by finding a new job. Good technical management at the business level is critical for mitigating this effect. This in turn is difficult to do because most technical managers were once engineers themselves and remember what it was like to work in a codebase struggling with debt.

Until we find ways in our industry to incentivize stewardship, this behavior will continue. Perhaps fancy awards for code stewardship are in order. Perhaps businesses can find ways to combine the the very real need to grow and move forward with the very real need to pause, to reflect, to improve and to rest. Constant growth is the same as cancer and there must be times when the movement forward is put on hold in order to improve what exists. In order to do all this, technical leadership must step up to understand the stewardship mindset, the skills required and the benefits for doing it. We have to find ways to reward those who are stewards of our industry in the same way we reward and deify those who have caused great change through growth or creation. Until that can happen, there is little hope of our industry moving beyond Dust Bowl farming. This is bad for everyone because the technical people can always easily move to another “farm” but the businesses upon which the constant growth was built cannot.

It’s important to realize the societal part of the stewardship definition and not just the two first parts about productivity for the now and for the next generation. Stewardship has societal benefits that, while difficult to define and measure, may far outweigh the costs. Many people now understand it is far better to protect land from overuse to prevent catastrophe but almost no one understands that about technology. Perhaps as our industry matures (and it is still barely in its early childhood), we will develop ways to more accurately understand the costs of constant growth. We must begin to think of and to socialize the benefits of technological stewardship so that we don’t continue to create miniature Dust Bowls in our wakes.

On Individual and Departmental Goals

It’s that time of year again whereupon three months into the year, organizations everywhere begin the exciting task of examining the tabula rasa of a new year filled with potential of achieving amazing things. As part of that task, many organizations set goals for individuals and departments in hopes that they will lead said individuals and departments to greater success for the company. There is no shortage of information on this topic, freely available after a short Google search.

One common theme to goal setting is that they must be SMART which is a fun HR derived acronym for Specific, Measurable, Aggressive, Realistic and Time-bound. Substantial evidence shows that goals set in this manner are in fact correlated with happier, more motivated employees and greater success in the achievement of the goals. However, I would argue that the resultant success and happiness are not caused by the fact that goals are SMART but instead by the underlying process through which the goals are worked towards. It is quite feasible that an organization would set SMART goals and then still have highly unhappy employees who achieve few if any of the goals nine months later. This happens because goals alone are useless regardless of how SMART they are. Goal achievement relies on the underlying system and strategy that support the goals.

Scott Adams has written about the difference between systems and goals. We all have regular experience with goals gone awry. Many of us are sure this will be the time that we lose 20 lbs or exercise more or read more books or whatever. It does not matter if the goal is SMART or not. I can say “I will read 24 books in 2021” which is Specific (read more books), Measurable (24), Aggressive (I read 7 last year), Realistic (outside any context of course which is critical) and Time-Bound (one year). But if there is no underlying system and strategy for accomplishing this goal, all the smartness in the world will result in the same failure.

A system is a set of things—people, cells, molecules, or whatever—interconnected in such a way that they produce their own pattern of behavior over time. The system may be buffeted, constricted, triggered, or driven by outside forces. But the system’s response to these forces is characteristic of itself, and that response is seldom simple in the real world.

Donella Meadows, Thinking In Systems

System has a specific definition and meaning in this setting. It is the interconnected nature of the pieces of the system that produce the results that the system outputs. In order for an organization or a department or an individual to achieve goals, there must be an underlying system of management designed at least in part to facilitate the necessary behavior of goal accomplishment. If this system is designed, either intentionally or more often haphazardly, to produce behavior other than goal accomplishment, no amount of SMARTness will ever overcome it.

A trivial example applied to my desire to read more books. On its face, the goal seems SMART. However, that is only so if the underlying system that I use to make choices with my free time has taken into consideration the constraints on that time. Unless I have examined the amount of free time I had in 2020 and discovered a great deal more of it and then dedicated my future expenditures of that time to reading via a disciplined schedule, the goal of 24 books is much more likely to be DUMB (Definitely Underestimating My Behavior) than it is to be SMART even though on the surface, the goal seems to conform to the definition.

I would actually argue that one of the core principles of good management is the diagnosis and analysis of the Realistic in SMART. This is the area where management skills make or break the underlying system that allow successful goal achievement. In order to determine if a goal is realistic, a manager must understand and be able to answer the following questions:

  • What is the underlying system of how work gets done by the organization?
  • What is the underlying system of how work gets done by the department?
  • What is the underlying motivational type of the individual (if setting individual goals)?
  • Are all of these cohesive with each other?
  • Are they congruent with each other? (see Esther Derby’s work on Change and Congruence)
  • What are the constraints on work within the system?
  • How much work can the org, department or individual realistically do given the system within which it operates?
  • And so on and so on.

The key to successful goal setting and achievement is to have an underlying system of behavior that clearly defines what is realistic. It is not realistic to lose 10 lbs if you do not throw away all the Cheetohs in the pantry and continue to eat donuts every Saturday morning because it is a family tradition. The system that includes Cheetohs and donuts will overcome any amount of SMART goal setting because it is the system that produces the behavior that leads to outcomes. It is not realistic to have 8 priority one departmental goals if the underlying system of the organization is such that at any moment the entire department may be reallocated to focus on unrelated organizational goals.

A good manager understands the constraints on what is and is not realistic for an organization, department and individual. Here lie the dragons of management. Most goal setting exercises I have been a part of have applied a great deal of wishful thinking and magical handwaving around the capacity of the organizational structure. Most of these exercises identify several things that seem highly desirable and then ask “can we do all this?” Because humans are naturally inclined to be good, your experience with social media notwithstanding, the result is often a half-hearted “yes” if only so that we can get on about the business of actually doing work. But in order to have not only successful goal setting but also goal achievement, we must have a more rigorous system around what is realistic. At the very least, a manager must understand the constraints of the system within which she operates and have a strategy for dealing with those constraints.

The strategy work of Richard Rumelt is very helpful here. Specifically, we must realize that good strategy is about policy choice and commitment to action. We have to write strategies for our goals that lay out policies to guide action towards the system behavior that we want. We must lay out consequences for violating the strategy and be prepared to defend them. Circling back to my personal goal, when I discover I have 30 minutes of free time and am thinking about practicing my guitar, I must realize that this hampers the realistic definition of my reading goal and that there are consequences to that choice. The same goes for reducing technical debt of an engineering organization. When faced with opportunity of some free time, if we fill it with yet another story delivering business value instead of dedicating it to removing NHibernate (don’t ask), we have violated the Realistic nature of our goal. To prevent this from happening, we have to have hard policies that say things like “when presented with available resource time, we will always choose to apply that time to the reduction of technical debt”. This guides teams actions but does not dictate it. They can still choose what actions to take within the guardrails of the policy. We must also then ensure that free time both exists and is encouraged. If it does not or cannot be created due to organizational constraints, no amount of SMART goal setting will ever result in a different behavior.

By building a system that produces the behavior we want, goals become mostly secondary in nature. If there are clear policies and feedback loops built into the system to confirm, analyze and affirm behavior, goals will just happen. We must understand the constraints of the system and operate within them as well. We must know the inputs and the outputs of the system and how those interact to balance or reinforce behavior. We must work to ensure that goals are written in such a way that they do not violate the boundaries of the system because that guarantees failure. Successful goal setting is really about system design and is just as hard as more concrete problems like making an API faster or migrating to the cloud.

Change Requires System Change

“Getting started begins with the simple, self-evident premise that every system is perfectly designed to deliver the results it produces.”

Paul Batalden, MD

At first glance, this seems cliche. A traffic system is perfectly designed to produce the results of safe, smooth traffic flow through an intersection. A plumbing system is perfectly designed to bring water in and take gray and black water out. A release and deploy system is perfectly designed to walk through 7 stages and take 2 days and involve 40 people even if one of the company’s stated goals is to get better at web delivery. Oops.

I came across the leading quote in Esther Derby’s excellent book, 7 Rules for Positive, Product Change: Micro Shifts, Macro Results. At its heart, it is an idea about systems thinking, about how change must either work within the system to change the boundaries of the system or work without the system to alter the system itself. If you want to change something, you must change the system that produces the behavior you are interested in. Any effort to implement change without first taking into account the sociotechnical system within which you are operating will result in muddled results at best and failure at worst.

Where I work, we currently have an annual goal to improve Web Delivery Excellence. At this level, we can think of it more as a vision made up in some part by the following: improve the performance of our funnel and improve the engineering discipline and performance of the overall system. We’re now in March, almost into the second quarter of the year and we’ve struggled to make much headway on this vision. I’ve spent quite a bit of time thinking about this and I think it’s directly attributable to Batalden’s quote. We have a sociotechnical system that is perfectly designed to produce mediocre web delivery. This is because the system wasn’t built to produce the result of Web Delivery Excellence. It was designed to Prevent Web Delivery Crappiness. Hoping to improve web delivery without explicitly addressing the underlying system will almost certainly result in failure.

What does the underlying system look like? Fairly common in the industry, we have a two week sprint cycle. We deploy everything at the end of the sprint. We have a waterfall designed SDLC even though we call it agile in that the reporting structure of teams is in silos with competing biases, incentives and directives. Teams are not empowered in any meaningful sense. The two week long feedback cycles are far too long to make immediate changes. Each silo (engineering, QA, ops) has built processes that serve the silo’s needs quite well while managing to serve the organization’s new needs quite poorly. The system that has been built is a direct artifact of Conway’s Law which is probably a corollary of Batalden’s quote.

Much of this system, perhaps all of it, was designed to protect against Web Delivery Crappiness, to wit, QA tests three times because QA often finds broken things in different types of environments and only Ops can deploy because the production environment is sacrosanct as a way to shield failures. There are many more examples, none particularly unique in the industry as it once existed. We’re actually quite good at this process. But today, after all the research that has come out of DORA and other organizations, we know this type of system is not the way to achieve excellence.

In order to implement change in a system like this, the system itself must be the target. You cannot will “Web Delivery Excellence” into a system that was specifically designed to protect against Web Delivery Crappiness. Protecting against Crappiness is not the flip side of the coin to Implementing Excellence unfortunately. To change the outcomes of your software delivery process, you must change your software delivery system that produces those outcomes. The good news is that the process for that is straightforward. Some people even wrote a book about it.

So often in my career, I have seen agents of change completely fail because they try to operate within the same system boundaries that produce the current situation the change agents want so desperately to change. Many times, they will intuitively understand this and will try to create an entirely new system. Unfortunately, this also often ends in failure because systems have evolved over time to prevent change unless the system is specifically designed to facilitate change. This is one of the reasons why rewrites go wrong so often because the sociotechnical system that the software system runs in has evolved to produce results designed specifically around the existing software. The only way to successfully rewrite a system like this is to also rewrite or rewire the surrounding sociotechnical system to accommodate it.

Change is hard. But it is also inevitable. In order for it to be orderly, change agents must alter the system that produces the outcomes. In order to do that, we must examine the sociotechnical system that produced the current outcomes we wish to change and then take steps to alter that meta system in a way that is likely to result in different outcomes. To achieve excellence as opposed to prevent crappiness, we must realize that fear of failure is a major part of the current system and we must ameliorate that fear by creating a system that welcomes failure as a path to learning. To achieve excellence as opposed to prevent crappiness, we must reduce process to a bare minimum so that we optimize for flow and feedback instead of gates and checkpoints. Until we begin to analyze the systems within which we operate and which are perfectly designed to produce the results we now no longer appreciate, we’ll continue to fail to change the outcomes we receive.

Managing Inertia

In Simon Wardley’s business strategy methodology, Wardley Maps, there are a class of behaviors you can take in all contexts to improve your ability to act strategically and improve your chances of success. These are called doctrine, universal rules that one should use across contexts. By analogy, in war, you have a doctrine to train your soldiers to shoot before you go into battle or in chess, you should learn the moves of the pieces before playing your grandfather. Doctrine isn’t a sliver bullet but it is guiding principles broadly applicable to multiple situations.

These doctrine are applicable to certain broad categories of activities you might take: Communication, Development, Operation, Learning, Leading and Structure. One of the key ones in Operation that intrigues me is Manage Inertia. This applies to the broad category of operating the business, of doing the day to day things that allows progress to be made on a variety of fronts. A formal definition of inertia is the resistance of any physical object to a change in its velocity. In business, the physical object takes on other definitions and could be a team, a process, a piece of software, or any number of other constructs. The fact that inertia exists in your business is A Good Thing. It’s a sign of success because without past success, continuing to do the same thing would be pointless. It is the fact that inertia rises out of past successes that creates a paradox and the need to deal with it. Much like your brokerage telling you that past success is no guarantee of future performance, Tesla notwithstanding, business success in the past must be often disregarded so that change can happen and the business can evolve.

Inertia in business manifests in particular ways. Organizational units that have achieved success in the past will be unwilling to try new things or learn new processes. Software that has become successful will over time become ossified as more features are added to it in its current structure, cementing design decisions into place. Processes like Scrum or Six Sigma fix initial problems and then are never revisited for examination. The paradox here is that the business landscape is always changing. Businesses must adapt in order to have continued success but they have tendencies to stay the same. This is how the brash startup can disrupt an established player in a vertical. The startup lacks the inertia of the established that acts on every part of the business to keep it from changing for its own good.

Often, certain elements within an organization will, consciously or not, chafe against the inertia and push to make radical changes. In the software landscape, this is often expressed as rewriting some piece of the business’ software after a period of time because technology has moved on and improved. In the best case, this often is a push out of the development teams who see improvements to technology and methodologies and want to do things better. In the worst case, it can be a lack of good business strategy that allows a rogue element to begin a project without guidance or a clear road map. Typically, it’s a little of both.

These rewrites can in fact be successful. With enough engineering firepower and good leadership that focuses on business value and quick wins, rewrites can be done in a way that leads to a rapid evolution of the existing software. However, more often than not, none of that is true. The business focuses engineering talent elsewhere leaving the rewrite understaffed. Management, somewhat out of touch with the landscape as well as the day to day activities, prefers to just sort of hope things will turn out ok. Hard decisions are avoided. The organizational inertia towards the success of the past weighs heavily. This inertia is far more powerful than the average engineer or engineering manager understands.

Other organizational units that interact with the working system have developed rules and processes for that interaction. Over time, through the success of the software, they themselves have become successful which leads to inertia from a different quarter. The marketing team has learned to use the software for its benefits. The business insights group has learned how to get data out and into the hands of stakeholders in a reliable manner. The business executives understand the vocabulary and the predictability of working software. All these sources of inertia work against a rewrite and must be managed thoughtfully and strategically or else the project is doomed.

So we have a paradox. The business must change in order to adapt to an evolving landscape. But the business must not change because what they are doing is successful. Navigating this landscape takes planning and must be done constantly as maintenance on existing systems, no differently than the oil must be changed in the car. Managing this inertia, while a general doctrine, involves critical thought applied to a particular context. There is no silver bullet. Perhaps you can reclaim your software. Perhaps a rewrite is the best plan. But if so, all the sources of inertia that act on the organization must be taken into account and mitigated. You cannot suddenly change a significant chunk in a successful business. The business reached a stable state through time and evolution and a sudden rupture in that stability will rarely succeed.

This management of inertia involves consensus and partnership across organizational units. The irony of course is that the drive for change often arises because one organizational unit has grown tired of the existing inertia and seeks to overcome it by moving alone. However, in a successful business, there is no alone. All units are connected, however tenuously, and the smaller the business, the stronger the bonds between units. So these moments of punctuated equilibrium where a unit thinks they can rapidly change something that the entire business relies on are largely doomed to failure. Conway’s Law cannot be ignored.

How then can we ensure inertia won’t kill us over time? Good strategy goes a long way. By analyzing the issue and developing policies that guide teams’ actions, inertia can be used against itself as small successes help teams develop confidence in their ability to manage change in the organization. A policy that says “we will always be on a framework version within one major version of the current accepted version” will guide teams actions in their planning process and insure that improvements in technology work their way through your system. A policy of “As a team, we will spend one week a year exploring the landscape of our current technology stack” will help sharpen skills and invites broad participation. Policies are critical to guide behaviors and actions. Without them, there can be no consensus on how to move forward.

Overall, the management of inertia is a function of good management. This seems trite but is critical. By defining strategies and policies that guide actions across an organization and then enforcing these policies over time, inertia can be prevented from becoming ossification. Without this management over time, software will grow in size and complexity to a point where changing it becomes perilous and rumblings of replacement will grow louder. It is unlikely at this point that excellent management will suddenly leap out of the fire to guide a difficult project to success. As in health, it is always more advisable to take small steps over time rather than have heart surgery as a strategy. Managing inertia must be consistent and well-guided to allow the business to evolve as circumstances warrant.

More Postgres

In my long suffering, ongoing battle with Postgres running locally on the Mac installed via Homebrew, I fired it up a couple of weeks ago to start on a new project only to find it again didn’t run for some reason even though Homebrew said the service was running, as it always does. I’m not even sure what I did to fix it this time (upgraded, restarted, created a db, created a user, remove the process pid file, who knows) but what I do know is that in the future, FIND THE LOG FILES FIRST.

That was the key to the solution as it always is. I don’t know why I always look for logs when I’m at work but then don’t think about it when I’m working on my own tools.

Anyway, on my current system, I can find the log here:

atom /usr/local/var/log/postgres.log

And in it you’ll (and by you, I mean future Brett) find useful information that might help debug the problem.

The Right Tool For The Job Isn’t What You Think It Is

This tweet recently took me down a rabbit hole of ideas about software and the epiphenomenon that we produce when we write it.  As is often the case when I start thinking about something, other seemingly random events or articles bubble to the top of my consciousness or Twitter feed or whatever.    Choose Boring Technology had recently popped up, linked from another article on architectural working groups and the idea of talking about technology choices. Outside of all that, I’ve recently been waking up at 1 in the morning thinking about some looming changes at work in our technology stack. It’s weird how the universe knows when you are ready for an idea and suddenly, you can tie multiple streams of thought into a coherent whole. Well, you can at least try. This post is an attempt to do that.

Epiphenomenon is a secondary effect that an action has that occurs in parallel to the primary effect. The medical world is rife with examples of epiphenomenon. I assert the software world is too but that they are poorly documented or catalogued because they are primarily negative. I believe epiphenomenon are what Michael Feathers is talking about in the lede. If you only see the effects of your software choices, you don’t really understand what you have built. It is only when you see the effects of the effect, the epiphenomenon, do you really understand. I contend this is rarely technological in nature but is instead cultural and has wide ranging effects, many of them negative.

How is this related to choosing boring technology? Epiphenomenon are much more well known and much less widespread in boring, well understood technologies. When you choose exciting technologies, the related effects of the effects of your choices are deeper and broader because you understand fewer of the implications of the choice.  These are the unknown unknowns that Dan talks about.  We see this over and over in the tech space where people think that choices are made in a total vacuum with no organizational effects outside the primary technological ones.

At Amazon, they are famous for their service oriented architecture.   It sounds so dreamy.  We’ll have services that allow us to iterate independently and deploy pieces independently and we’ll all be so independent.  The problem is that independence requires incredible discipline, discipline that is paradoxically very dependent on everyone being on the same page about what a service looks like and what it has access to and how it goes about getting the data it needs to function.  Without any of that very hard discipline that rarely seems to exist outside the Amazons of the world, what you have is not your dreamy Service Oriented Architecture but instead a distributed monolith that is actually a hundred times worse than the actual monolith you replaced.

I saw several people disagreeing with that tweet and wondered why it was so controversial.  It dawned on me that the people disagreeing with it were developers, people deep down in the corporate food chain who have this idea of using the right tool for the job in all instances which is great if you are a carpenter but fucking insane if you are a software shop.  When a carpenter uses a miter saw instead of a hammer, it’s because you can’t cut a 2×4 with a hammer unless you are very very dedicated and also the shittiest carpenter in the world.  However, when an engineer says “This is the job for Super Document Database (which by the way we’ve never once run in production)!” in his best Superman voice, he’s saying that in a total vacuum, a vacuum that doesn’t exist for the carpenter (and actually doesn’t exist for the engineer, he just doesn’t know it).  Now you have your data in two places.  Now you need different engineering rules for how its accessed, what its SLAs are, how its monitored, how it gets to your analytics team who just got blindsided for the fourth time this year with some technology, the adoption of which they had no input into, etc, etc, etc, until everyone in the company wants to go on a homicidal rampage.

Logical conclusion time: Imagine a team of 5 developers with 100 microservices.  Imagine the cognitive overload required to know where something happens in the system.  Imagine the operational overload of trying to track down a distributed system bug in 100 microservices when you have 5 developers and 1 very sad operations person.  Ciaran isn’t saying it’s technologically a bad idea to have more services than developers.  He’s saying it’s a cultural/organizational bad idea.  He didn’t say it in the tweet or the thread because he didn’t have #280Characters or just doesn’t know how to express it.  But that’s what he’s saying.  It introduces a myriad of problems that a monolith or a very small set of team or developer owned services do not.

Our industry has spread this “right tool for the job” meme and to our benefit, it’s stuck.  It’s to our benefit because we developers get to play with shiny jangly things and then move on to some other job.  People who don’t have such fluid career options are then stuck supporting or trying to get information out of a piece of technology that isn’t the right tool for THEIR particular job.  “The Job” is so much broader than the technological merits and characteristics of a particular decision.  As Dan points out in his point, it’s amazing what you can do with boring technology like PHP, Postgres and Python.  You better have a really damn good reason that you can defend to a committee of highly skeptical people.  If you can’t do that, you use the same old boring technology.

Our industry and by extension our careers live in this paradoxical contradiction.  On the one hand, a developer can’t write VB.Net his entire career because he’ll watch his peers get promoted and his salary not keep up with inflation and his wife leave him for the sexy Kotlin developer who just came to town.  On the other hand, taking a multimillion dollar company that happens to use VB.net and using that as an excuse to scorch the earth technologically speaking is in my mind very nearly a crime.  There is a middle ground of course but it’s a difficult one, fraught with large falling rocks, slippery corners with no guard rails and a methed out semi driver careening down the mountain in the opposite direction you are going.

Changing technologies has impacts for different arms of the organization and I’ve found it useful to frame these in terms of compile versus runtime impacts.  Developers and development teams get to discover things at compile time.  When you choose a new language, you learn it slowly over the course of a project or 4.  But if you operate in a classic company where you throw software over the wall for operations, they get to find out about the new tech stack at runtime, i.e. at 3 AM when something is segfaulting in production.  The pain for choosing a new technology is felt differently by different groups of the organization.  Development teams have a tendency to locally optimize for pain, e.g. push it off into the distant future because they are under a deadline and trying to get something, anything to work and so decisions are made that put off a great deal of pain.

Technological change requires understanding the effects of the effects of your decisions.  Put more succinctly, it requires empathy.  It’s a good thing most developers I’ve known are such empathetic creatures.  SIgh.  Perhaps it’s time we start enforcing empathy more broadly.  The only way I know to do that is oddly a technological solution.  If you want to roll out some new piece of technology (language, platform, database, source control, build tool, deployment model or in the case of where I currently work all of the above), you have to support it from the moment it’s a cute little wonderful baby in your hands all the way up to when it’s a creaky old geezer shitting its pants and mumbling about war bonds.  Put more directly, any time someone has a question or a problem with your choice, you have to answer it.  You don’t get to put them off or say it’s someone else’s job or hire a consultancy to tell you what to do.  If it’s broken at 3 AM, you get the call.  If analytics doesn’t know how to get data out of the database, you get to teach them.  If you fucked up a kubernetes script and deployed 500 instances of your 200 line microservice, you get to explain to the CFO why the AWS bill is the same amount as he’s paying to send his daughter to Yale.  Suddenly, that boring technology that you totally understand sounds fantastic because you’d like to go back to sleeping or drinking Dewars straight from the bottle or whatever.

We cannot keep existing as an industry by pushing the pain we create off onto other people.  On the flip side, those people we have been pushing pain to need to make it easier for us to run small experiments and not say no to everything just because “it’s production”.  There has to be a discussion.  That’s where things seem to completely fall apart because frankly, almost no developer or operations person I’ve known has, when faced with a technological question, said “I know, I’ll go talk to this other team I don’t really ever interface with and see what they think of the idea.”

Software is just as much cultural as it is technological.  Nothing exists in a vacuum.  The earlier we understand that and the more dedicated to the impact and effects of that understanding, the happier we’ll be as teams of people trying to deliver value to the business.   Because in the end, as Dan puts it, the actual job we’re doing is keeping the business in business.  All decisions about tooling have to be made in that framework.  Any tool that doesn’t serve that job and end is most decidedly NOT the right tool for the job.