Problems in Large Software Dev Teams

Hopefully by now most developers and project managers are well aware of the mythical man-month and Brooks’ Law:

Adding manpower to a late software project makes it later

The idea is that communications overhead scales up quickly as you add more people to a project. Oftentimes it is counter-intuitively not worthwhile to keep adding more people to try to catch up. Some implications of larger team/project size may not be immediately obvious. Some problems scale up faster as team/project size grows:

  1. Lower productivity due to increased overhead as mentioned above.
    • Meetings will tend to involve more people and take longer
    • There will be a lot more emails
    • Project management effort scales up quickly too
    • More people need to be allocated to maintaining builds and servers
    • More time needs to be spent on task prioritization, bug triage, etc
    • More people asking WTF happened to their code (LOL)
    • Any decision making that requires consensus building takes longer
    • It becomes more difficult to find the right person to ask things
  2. Simply due to the number of people, there are more things that could go wrong
    • Developers breaking the build happens more often
    • People going on sick days will happen more often
    • Server performance becomes much more important since any delay or downtime affects more people
    • Schedule delays or others unexpected problems will be more likely
  3. Maintainability becomes more important
    • Technical debt becomes more burdensome and poor code is more likely to come back and bite you in the ass in the future
    • The need for good coding and development standards increases
    • Higher likelihood of code duplication (“I didn’t know that Developer R already wrote a function that does X!”)
    • More important for code to be well-decoupled, to reduce the likelihood of one developer breaking a lot of things
  4. Source control gets harder to use, with so many people making so many changes.
    • The team needs to develop standards for commit messages and linking commits to bug reports. to make it easier to track and monitor changes
    • Source control commit comments need to be a lot more helpful or descriptive.
    • More commits happening in the same amount of time, the more you need to be constantly updating from the repository.
    • Merging is more likely to become difficult and complicated (may be made easier by modern source control systems)
    • More important to use more, smaller files instead of fewer large files (less likely to produce conflicts)
    • Need better coding/programming standards. Otherwise you have the problem of changes/commits being difficult to track for example if one developer uses different autoformatting standard (his commits will have many small reformats)
  5. Having consistent rules for naming, UI,  and other things becomes more important
    • The more developers you have, the more likely that they will have different ways of thinking. There are far more likely conflicts among a team of 8-10 developers than between 2-3 developers.
    • It becomes more important to have a standard or plan for where different kinds of files should be placed. Otherwise you run into problems like different developers using different folders for their css or different package naming conventions, etc.
    • Consistency and standards more difficult to enforce (since there are more devs)
    • Need to keep things consistent on all levels: databases, code, UI, and so on.
  6. Documentation becomes more important
    • Tribal knowledge is often spread out among multiple developers
    • Undocumented things are less likely to be passed on to new developers
    • Developers unaware of undocumented things are more likely to have difficulties or to break things
    • Becomes a lot more difficult to absorb new developers into the team in times of urgentness
    • Documentation more likely to quickly become out of date due to rapid pace of changes

Anything you want to add?

Qualities to Look for in a Software Developer

Just a list I’ve been maintaining for a while:

(Disclaimer: This list in no way implies that developers who don’t exhibit all of these attributes are terrible human beings who don’t deserve to live. But working with developers who exhibit many of these traits will probably result in a better experience over the course of your developer career.)

  1. Laziness, Impatience and Hubris – from the well-known (notorious?) Larry Wall quote
  2. Communicates well; is able to explain and communicate his ideas clearly, especially to nontechnical people; able to write good documentation
  3. Understands the concerns with scheduling and project management and communicates clearly with the team to avoid problems. This means: willing to speak up as soon as any problem is encountered that introduces any kind of risk; not bloating estimates or pretending that tasks take longer than they really do; not cutting estimates to make managers happy; 
  4. Cares about writing elegant code; understands the risks involved with code that is complicated or difficult to maintain; Understands the importance of data structures, algorithms and design patterns
  5. An attitude towards learning and self-improvement; owns up to his own faults; ready and willing (and often excited) to pick up new domains or technologies (and advises you of the appropriate schedule risk); Can easily pick up and learn new technologies and programming languages; recognizes and understands programming principles and able to carry them across to different domains or technologies; Able to study or learn new topics with minimal guidance;
  6. Understands engineering tradeoffs; able to tell you the differences in performance, storage, etc among different options.
  7. Works well with others: Willing to help with other people’s work when possible/needed; Has an open mind and willing to consider other people’s suggestions; Doesn’t take criticism personally; Chill AF
  8. Able to think logically and sequentially; able to break down a problem into a discrete set of solvable tasks; Able to investigate and find the cause of problems with minimal info; Able to think outside the box when necessary; Able to point out problems or logical inconsistencies with program requirements;
  9. Able to read and understand and maintain other people’s code; Can update code with the minimal possible changes to avoid breaking things;

Any other suggestions?

 

The Simplest Code That Can Do The Job

So the other day I was reworking a Python script that I had been using for years on my home PC to manage and categorize some downloaded files for me. This time I wanted to add some smarter behavior to make it more able to figure out when to group files into folders without constantly needing manual intervention from me. To do this, I needed to persist some data between runs – so that the script remembers how it categorized previous files and is able to group similar files together.

Now since my software development career has largely been as an enterprise-y kind of developer, my first thought was to just use a database to store the data. I already had a MySql installation on my machine so that was fine, I just needed Python to interface with it. After looking up how to do it, I balked at having to install a new Python library just to connect to MySql and reconsidered.

As programmers, we have a tendency sometimes to over-engineer solutions because that’s what we’re used to doing. Did I really need a database for this? The data won’t be very big, and I won’t need to do any sort of maintenance on it, so maybe a simpler solution was in order.

I ended up just using pickle, which was already built-in to Python:

def load_db():
	all_series = {}
	with open(DATABASE_FILE, 'rb') as handle:
		all_series = pickle.load(handle)
	return all_series
 
def save_db(all_series):
	with open(DATABASE_FILE, 'wb') as handle:
		pickle.dump(all_series, handle, protocol=pickle.HIGHEST_PROTOCOL)

(Above code probably gives you an idea what kind of files I’m sorting…)

As an added benefit, I didn’t need to design any database schemas or tables or whatnot, pickle just lets me serialize the map as-is and reload it later from disk without any hassle.

I guess my lesson here was: don’t over-complicate things when something simple will work fine. Write the simplest code that can do the job.

Client and Server Validation in Web applications

Because of the nature of the web and the fact that you should never trust user input, all the validation in a web application should be done on the server side. You can additionally provide validation on the client side (via JavaScript), but this is only a concession towards a better user experience and should not be used as a substitute for server-side validation.

One would think that anyone with a basic understanding of how HTTP works would understand the above easily and any failure to practice it should be considered amateur hour. But in shops where most of the testing is done manually, developers can easily fall into the habit of adding the client-side validations (since failing to do so would earn them a bug report) and forgetting the server-side validations altogether.

The main problems are that (a) HTTP requests can be spoofed, they do not need to have come from a form submitted via a browser; and (b) even for forms submitted via a browser, the Javascript validations may have been tampered with on the client-side.

For explicit validations for which you wrote out some logic (for example: email address must be so-and-so format), it is obvious that you need that on the server side. But for some classes of validation you may forget to handle them especially if they do not explicitly generate errors in the webpage on the client-side.

First example: when the contents of a drop-down list are dependent on some other value on the form. On the client-side you probably already restrict the choices such that the user is unable to select an invalid combination so it doesn’t look like a check is needed. But on the server-side, you still have to check that the choice submitted for the drop-down field is a valid value given the other values submitted in the request.

Second example: when you hide or disable certain fields in the web page depending on some other value on the form. Same as above, you don’t need to add a specific check on the client-side since the user is already prevented from doing so by the UI. But on the server-side, you have to make sure not to save or process any values from those hidden/disabled fields if the other values on the form indicate they shouldn’t be processed.

Weak validations on the server side are dangerous because at the very least they will create bad data in your system and at the very worst may expose you to security vulnerabilities.

Cleaning up your Code

In one of my most recent projects, a large system that had gone through a relatively long and unstable period of many, many changes due to sales demonstrations, different clients and whatnot, one of the “fun buffer tasks” I always kept around for devs was code cleanup. Because of the unstable nature of the project, there was always a lot of duplication, unused/unnecessary/obsolete classes/functions/files and so on. Unnecessarily large CSS files where most of the selectors were no longer really needed or JS libraries that weren’t actually used. That kind of thing.

It’s one of those things that you’ll never get official approval from management to do, so you have to somehow sneak it in during your daily tasks. But it’s important for a couple of reasons:

  • Having too much cluttered code makes your system a lot harder to grok. That means new developers will have a much higher learning curve, and existing developers will find it difficult to be assigned tasks in modules and functions they’re not familiar with. Lower understanding means more bugs, lower quality and so on.
  • Having a lot of unused files, classes, functions, etc. bloats the build process (making build times longer, extending development cycles) and makes build files bigger (extending deployment times)

A lot of developers prefer not to throw away old code, for fear that “we might need it later”. They would prefer to just comment them out in large blocks (making the code a lot more unreadable) or just leaving dangling functions/classes unused. The reason is hogwash of course, since you should be using source control, and source control means never being afraid to delete old code. (Of course, you should make sure what you’re deleting is really no longer in use!)

 

Learning a New Programming Language

Related: Learning new skills

While many people working as programmers/software developers are happy enough specializing in a single programming language or platform, I generally consider it a better idea to have a wider toolset and the ability to easily pick up new programming languages as needed. The benefits should be obvious: when you have a wide variety of tools under your belt and are able to quickly learn to use a new tool, the number of work options you have increases greatly.

Happily, programming languages share a lot of similar constructs. Only your first programming language (when you first learn programming) should provide you with any difficulty – once you’ve cleared that hurdle, learning additional programming languages shouldn’t be too much of a concern.

You typically start with syntax, variable declarations, function declarations and program flow (loops, conditionals and so on). Some languages may have a strange syntax that don’t share much in common with other programming languages, but that’s pretty much not a concern as long as you have access to a modern-day compiler that will tell you when you stray from the desired syntax.

I find that learning a programming language is best done the same way you actually program – iteratively. Learn something new, try it out, modify it a bit, try it out, and so on. I once had to help someone prepare for a programming interview where he would be expected to know C++. He had learned it before in college, but hadn’t used it for a few years, but for some reason he stuck to reading up on it instead of taking my suggestion to install a compiler and actually try out all the suggestions. (He ended up passing the interview, but that was luckily because they didn’t ask too much about syntax.)

It also helps a lot to learn about theory and terms and whatnot. Like for object-oriented  programming, it helps to know and understand the concepts behind polymorphism, inheritance, abstract classes and so on. When studying different programming languages, you can then easily compare how they are done in one language vs another, making it easier for you to carry over design concepts to the new language you are learning. Being familiar with the terms and vocabulary also helps you communicate better with mentors, teachers, and fellow learners studying the same language.

Building small toy applications with the new programming language is a great way to learn too. Or maybe if you had a certain project in mind that you wanted to do, you could use the new language with it. Slightly related: When I was consulting with a startup on developing a new product, I asked the CTO whether he preferred to use a technology stack that the developers were more familiar with (to reduce learning costs) or whether he wanted to try this different language that neither of us had tried before, and he told me that one of the great things about being in a startup was that he could choose to do new projects with new technologies to expand their horizons and not have to listen to higher-ups shut it down for fear of increased costs.

For the longest time, I tried to learn at least one new programming language/platform a year. For 2016, it was Unity/C#, although I’ve also started studying Node.js in the past month or so. I hope it’s something I’m able to keep up, even as I’m trying to explore new skills other than programming.

Generalists and Specialists in Dev Teams

In any reasonably large software project, the system will be so large that no one developer will have a good grasp of the details of every function in the codebase.

The tendency is for developers to specialize – that is, developers tend to focus only on certain parts of the codebase and become more familiar with that part, while not having much knowledge about the other parts. This tendency is self-reinforcing – once it becomes known that the developer is an “expert” in the given module, there is a tendency that he will be assigned the most difficult and urgent tasks or fixes related to that module, further cementing his expertise. Thus, the developer becomes a sort of specialist within the system.

In contrast to a specialist, you will also sometimes have developers who prefer to be generalists. That is, they are comfortable working with any part of the system, although their familiarity and knowledge are probably not as deep as the specialist for any given module.

Both generalists and specialists are valuable in different situations. If you need a complicated change done quickly on a particular module with minimum impact, it’s best to have a specialist who is very familiar with how everything works. On the other hand, generalists are very useful from a resource management perspective, since they can jump in to help at any time in any part of the codebase. Say, if your specialist is sick or out of town and you urgently need to do a small change, the generalist can probably take it on no problem.

Ideally, you train more than one specialist per module of interest in your system, through some sort of mentoring or maybe pair programming, but not all dev teams have that luxury (mostly due to schedule or resource constraints). It’s best for software dev teams to find the right mix of generalists and specialists that their particular development process entails.

Power Distance in Software Development

I was in a meeting once with my boss (literally the CEO, a Malaysian) and some representatives of another company (Americans) where we were discussing the technical details of a possible future partnership. At one point, one of the Americans said to my boss that he was pleasantly surprised that I was openly speaking up independently of my boss and willing to correct him on some points when he didn’t quite get the technical details right. It seems they were used to working with some Indian outsourcing firms, where due to cultural differences, the tendency was for the Indian guys to accept everything the Americans asked for without question and delivered it exactly as requested, even if there were obvious problems.

The concept is called Power Distance, where cultures with a higher power distance are more likely to just accept without question the authority of “higher-ups”. While in cultures with lower power distance, people feel less of a gap with people of “higher” status, and are thus more willing to speak up openly.

I believe that I live and work in a country with a high power distance. It is typical of workers here to have an exchange like:

“Why are we doing this, isn’t it kind of dumb?”

“Because the boss says so.”

“Oh, ok”

Not just with people in “higher” positions, but especially with foreigners. I witnessed this first hand when I first observed how other people behaved when they first had to work with our project managers who were based in another country; many would be hesitant to raise their concerns directly with the foreign counterparts.

In an industry where users and clients and management often do not really understand the finer technical details of what exactly they want to happen, being able and willing to raise concerns regardless of differences in position or status is not only a distinct advantage, it may very well be an important aspect of the job. All the best developers I’ve worked with are the ones who are willing to call out problems, and it’s a trait I personally encourage in anyone I work with.

Javascript: References to out-of-scope variables.

In JavaScript, referencing variables that are declared outside of a function’s scope can be tricky. If you have code like this:

<script>
  var btn = document.getElementById("BTN");
  var test = 1;
  btn.onclick = function() {
      alert(test);
  }
  test = 2;
</script>

The click handler above retains a reference to the test variable even though it falls out of scope as soon as the script block finishes execution. When you actually click the button, the alert will show the last value of the variable when the block finished execution (2) instead of the value at the time the function was initialized (1).

I thought about this because another developer raised a similar problem to me a few days ago. He had a loop that was initializing click handlers for an array of elements. Of course I can’t replicate his example here, but let’s say we wanted to add click handlers to an array of buttons that would show the result of multiplying an input value by different integers.

  <script>
    var btns = [];
    btns.push(document.getElementById("BTN1"));
    btns.push(document.getElementById("BTN2"));
    btns.push(document.getElementById("BTN3"));
    btns.push(document.getElementById("BTN4"));
    var input = document.getElementById("IN");
 
    for (var i=0; i<4; i++) {
      btns[i].onclick = function() {
        alert(input.value*(i+2));
      }
    }
  </script>

This is kind of an analogous example for the problem. In this case, the expected behavior is that the first button outputs the input value times 2, while the second button outputs the input value times 3, and so on. But because each of the click handlers retains a reference to the loop counter i, what they will remember on execution is the last value of i after the loop exits, that is, i==4. All the buttons will show the same output.

There are several ways to correct the behavior. One way would be to build the click handlers using a utility function, like so:

   <script>
    var btns = [];
    btns.push(document.getElementById("BTN1"));
    btns.push(document.getElementById("BTN2"));
    btns.push(document.getElementById("BTN3"));
    btns.push(document.getElementById("BTN4"));
    var input = document.getElementById("IN");
 
    function getClickHandler(counter) {
      return function() {
        alert(input.value*(counter+2));
      }
    }
 
    for (var i=0; i<4; i++) {
      btns[i].onclick = getClickHandler(i);
    }
  </script>

This way, the value for the local function variable “counter” is locked in once getClickHandler exits execution, and each returned function now has a reference to a different “counter”, and the buttons will behave as expected.

Your Product Should Be Easy to Install

This is a story of something I consider to be one of my worst mistakes in software product development.

Some years ago I was asked whether it was feasible to write software that would be integrated with Software X that allowed us to export that software’s output into a format that was compatible with Standard Y. I took a look and after a while came back with “Well sure. We could use Programming Language M that has an API that lets us integrate into Software X so we can export the output data. Then we’ll have to use Library N which lets us generate files in the format compatible with Standard Y. What project is this for by the way?”

“Oh, it’s not a bespoke project. It’s a product we’re going to develop with a partner company.”

“Oh.” That set off some alarm bells, so I pointed out that Programming Language M and Library N required the client to install two different runtimes on the client machine. I suggested we consider itself writing our own conversion library so that we wouldn’t have to require two different runtimes, but the cost estimates turned out to be prohibitive so of course we went with the more complicated stack with two runtimes.

It was a disaster. It turned out to be almost impossible to convince users to install and try out our software when the installation process included a step where the user needed to download and install a programming runtime from an external website (the licensing terms of the runtime did not allow us to package it together with our installer). In hindsight, it was probably a newbie mistake since this was the first time we were working on software product development. If this was our usual bespoke software project where users had IT staff to install software on their enterprise systems, it wouldn’t be a problem.

I learned a lot of technical stuff from that project (there was a lot of math involved in the data export too), but the most important thing I learned was that the best software product in the world is going to fail if you make it difficult for your users to install it.