Back in 2004, I signed up for the Google Code Jam for the first time. Unfortunately I didn’t make it past the qualifying round.
I was a bit luckier in 2008 and 2010, making it to round 2 both times. In fact in 2008 as I recall I was one of only two participants from the Philippines who made it to round 2, which allowed me to jokingly brag about being the #2 programmer in the country.
I registered again a few times after that, and tried again last year (2016), where I only made it to round 1.
I’m no stranger to programming competitions, although the GCJ was the only one I’ve ever really participated in. Back in college, one of my friends usually joined such competitions, and I had a good time going over some of the problems with him.
Programming competitions don’t usually reflect the kind of programming done in most real-world jobs (although it might be interesting to use competitive programming problems in whiteboard interviews and see what happens). They focus on pushing the boundaries of the programmers talents in problem solving. You also need to be fairly familiar with algorithmic approaches to searching, sorting, and so on. Having a decent background in math also helps a lot.
The Google Code Jam in particular focuses on scaling up your code to handle very large amounts of data – well past the integer limit sometimes. Each question usually has a small and a large input. For the large inputs, you only have a short amount of time (I think it was 5 minutes?) to submit the output. Presumably you wrote and tested your code before starting the timer, so the short time is really only for your code’s execution. You can afford to be inefficient or maybe even brute force your way through the small input, but if you want to solve the large inputs on time you have to be able to optimize your approach.
When I first signed up, I dreamed of winning the entire thing (and of course being able to brag about being the best programmer in the world). But realistically, a lot of the people joining these competitions are “pros” more or less, so the chance of a “casual” like me winning are not that high. They are “pros” not in the sense that they do it for a living, but more of they do a lot of these competitions and regularly practice for them. Websites like TopCoder provide such competitions regularly for those who are interested. Also, I feel like I am very old and rusty, having worked too long in web development stuff where my focus is usually on a different kind of problem.
Still, I think it’s a good idea for programmers of all skill levels to give it a try, just to see where you are at. Who knows, you might even make it all the way! The next Google Code Jam starts soon – the qualification round is on Apr 7. I signed up, although I’m unsure if I will have the time this year even if I advance, due to scheduling problems. But a man can still dream, right? Who knows, maybe I’ll see some of you there too in the final round!
This may surprise some people I’ve worked with, but I didn’t have formal computer science training in school. I’m not actually a computer science major. Yet I’ve worked as a software developer for more than a decade now. Literally zero times have I needed to write a sorting function or balance a BST.
I have a rudimentary understanding of some sorting algorithms (mostly just bubble sort and selection sort), and I have some idea of how to balance a BST. And if I needed to implement any of those I could probably whip something up after a few tries. (Something buggy and/or inefficient probably.) If I had to interview at a company that used one of these whiteboard interviews, there’s a good chance I wouldn’t pass.
For maybe 90% (or even as high as 99%) of modern-day programming work, it simply isn’t needed to know these things by heart. Those skills are more necessary if you’re doing low level things like writing an OS or a graphics driver or such. For everyday web development or Java or Rails or whatever, it’s an indulgence. And even for those instances that you need them, well, the internet is right there.
I’m not a fan of the idea that not knowing data structures and algorithms means you aren’t a real software developer. I have a couple of developer friends who jokingly refer to themselves as “fake devs”. It’s supposed to mean someone who doesn’t have any formal computer science background and has at best a passing familiarity with data structures, algorithms, and other computer science concepts. As one of them stated “I wouldn’t know how to make a map without using a map.” (I thought about writing a post about that, but it’s surprisingly easy – just look it up.). One of them even asked me to give them a quick crash course in data structures and algorithms because he was applying in a large and well-known software company.
For me, this term is silly. If you’re doing software development work, then you’re a software developer.
I’m not saying knowledge of these things isn’t useful. If you want to do more complicated work it is certainly useful to have these tools in your toolbox. Knowing them also makes you more flexible and expands your thinking. And certainly one should be the type of developer who is willing to seek out that knowledge if it turns out you need to use it.
But the problems with the whiteboard interviewing process isn’t that they require knowledge of these things; it’s that they require it under pressure, off the top of your head, without access to the internet. This doesn’t reflect how programming work happens in the real world, where you often have a team, an IDE, and even the entire internet to help you.
Unfortunately, it’s often far too costly for companies to evaluate you under real-world conditions, so many resort to this common interviewing process. After all, it is difficult to evaluate the performance of even developers you have been working with for months, what more someone who just whipped up a short function on a whiteboard in twenty minutes? We tried some variations on it at our last company, but for me it is a problem yet unsolved. Maybe someday…
I’ve been hesitant to try Python 3.x because it’s not backward compatible with Python 2.x which I’ve been using for scripting since forever. But recently I found out that since Python 3.3, they’ve included a launcher in the Windows version that supports having both versions installed.
You can use the launcher to specify the Python version to use at the command line (it defaults to whichever version was installed first):
Even better, the launcher will recognize a header in your .py files that can specify which version of python to use:
If the launcher sees this header, it will automatically launch the appropriate python version. Handy!
I had been meaning to try writing a Twitter bot for a while now. I figured a trivia bot would be pretty easy to implement, so I spent some time a couple of weekends to rig one together.
It’s (mostly) working now, the bot is active as triviastorm on Twitter, with a supporting webapp deployed on http://trivia.roytang.net/. The bot tweets out a trivia question once every hour. It will then award points to the first five people who gave the correct answer. The bot will only recognize answers given as a direct reply to the tweet with the question, and only those submitted within the one hour period.
Some technical details:
My scripting language of choice for the past few years has been Python 2.7. I’m using Tweepy to interact with the Twitter API, PyMySQL to connect to the database, and Flask to run the webapp. I haven’t used Flask in some time, but it’s still very straightforward. I actually had a harder time configuring the webapp with mod_wsgi on my host.
The main problem with a trivia system is that you need a large and high-quality set of questions. Right now the bot is using a small trivia set –around a thousand questions I got from a variety of sources. If I want to leave this bot running for a while, I’m going to need a much larger trivia set. However, reviewing and collating the questions is a nontrivial task. Hopefully I can add new questions every so often.
Feel free to follow the bot and help test it out. I’d be grateful!
There are a few things that one should consider when using and integrating an open source library into your application:
- What are the licensing terms for the library? There are some liberal licenses that mostly let you do anything you want. The MIT license is an example of a very permissive license. Other licenses may provide a number of restrictions. Can you integrate with closed-source software? Can you distribute binaries without the source? Do you need to put some kind of attribution somewhere in your software? Another thing to look our for are the so-called viral licenses. Viral licenses specify that if you integrate their code into your system, then the terms of that license apply to your own software as well. These can be very dangerous from the standpoint of a company developing a commercial product. The most well-known example of such copyleft licenses is the GPL. Integrating GPL code into your system will often mean your software needs to be open source as well. While more open source is good, it may not be in the best interest of your company, so tread lightly.
- Which version of the library will you use? Many open source libraries will release updates on a regular basis. There may be some compatibility issues depending on the version you use. For example the popular open source CMS WordPress releases regular updates. If you are integrating with WordPress, you need to decide which version you are supporting. You will also need a plan for upgrading to future versions and ensuring compatibility. (See the next item.)
- Is there an active help and support system for the software? The demands of enterprise software development often need timely support. There was a time we had to use a certain open source document management system. We later discovered that it was very hard to get help on any issues encountered, either from official channels or the community. This led to many difficulties and issues in that project. You might also want to check popular programming forums such as StackOverflow. Check how many questions about that library usually remain unanswered. If the number is low, that’s a good indicator that you might have trouble looking for help later on.
- What dependencies does the library have? You might want to make sure the dependencies don’t have any problems with the previous items above.
The great thing about modern software development is that we have a large number of open source libraries available to use and draw inspiration from, but that doesn’t mean you can just willy-nilly pick up just any library. The above questions provide a good starting point for evaluating open source libraries for your use.
Back when I was starting out as a software developer, webapps weren’t really a thing. Not as much as they are now anyway. My company provided training to new hires, but I didn’t get any web development training at the time, even though they already had a few web development projects in play at the time.
Instead my initial training involved mostly development of so-called client-server software. This was software that was installed and run on the client machine but they would connect to a remote database server. Up until the early 2000s most enterprisey-type systems used these kinds of program.
I was trained in using mostly two tools: Borland Delphi and Oracle Forms and Reports. These were the sort of tools that would have been billed as reducing the need for specialized programmers or developers. They featured drag and drop user interfaces to design forms or report layouts. The Oracle tools featured robust database integration that let you drag and drop to associate form fields to database tables and fields. Supposedly even people with minimal training would be able to do the work of application programmers.
In practice of course, the only people patient enough to work with these tools in-depth were the programmers themselves. And client requirements always eclipsed the capabilities of these tools, such that expert programmers were still needed to push the tools to their limits and beyond.
Tools like Oracle Forms and Reports had a lot of problems with modern software development practices. For one thing, the “source” files weren’t actually text files. They were mostly binary format files with some PL/SQL interspersed, and could only be opened in the proprietary IDE that only Oracle provided.
Binary formats meant that while they could reap some of the benefits of source control (namely version history and such), actually figuring out the differences between revisions involved opening up both versions in the IDE and comparing each item/script/setting in the two files. Doing things like global find and replace were also a pain in the ass, especially in systems that had hundreds of forms. It was so bad I remember spending some time trying to reverse engineer the binary format so that I could attempt to make some of this work easier. No dice.
Another issue was that Oracle’s IDEs were notoriously buggy. Their Reports IDE in particular often had me incensed. If you dragged the wrong thing to the other wrong thing, you would crash the IDE entirely. In fact, we had a list of things to avoid doing that would crash the entire IDE entirely. The built-in text editors were so bad, I often had a entirely separate text editor open in a separate window so I could copy-paste PL/SQL code there for any complicated edits. (Back then we were for some reason fond of Crimson Editor although it wasn’t as full-featured as UltraEdit or as reliable with updates as Notepad++) This was actually also sometimes a problem as some versions of the Reports IDE also had a crash on paste.
Most of my first two years of work involved maintenance of Oracle Forms-based systems (with some Delphi work thrown in occasionally). I didn’t get an introduction to web development until late into the second year of my career. I was so impressed with web development I made a mock-up of one of the really complicated screens from one of our Oracle Forms projects and it seemed so pretty. I secretly hoped we could convince our clients to port them over to a web application. (1) No dice again; and (2) I probably would have regretted such a thing.
I know I would have regretted it because some number of years later, we got a project to migrate an existing client-server system to a web application. The original application was written in Powerbuilder. I figured it was a fairly straightforward reverse-engineer and implement as a webapp, but nooooo. One of the higher-ups decided that to save on costs, we should look into attempting to automate the porting process somehow. We were to write translation software that would take the Powerbuilder source and convert it into the appropriate web application screens.
This was ridiculous for a good number of reasons, but as a tech lead, I had to look into it and had to prove it was in fact not viable (Spoiler alert: it was not). The first concern was that Powerbuilder also had a binary format. This was solved when I found that someone had already written a tool to export Powerbuilder binaries into some kind of text format. It was a bit crude, but it was a way forward.
The second concern is that client-server form behaviors do not map well to web applications. My favorite example of this is editable grids. These are some kind of excel-like grids that were commonplace in client-side systems. And when users upgrade their legacy systems to web apps, they inevitably expect that their editable grids should work the exact same way on the web as they did on the client-side forms.
<jaeger> you say “editable grid” i hear “problems”
When I raised that the idea was not viable, I was asked to give a more detailed explanation. This involved taking one of the more complex Powerbuilder source files, putting it into an Excel, and line-by-line explaining how and why that line could or could not be translated into a web application. As an alternative option to an automated porting program they wanted a guide or process so simple that even junior developers with minimal experience could follow the guide and port the programs quickly. I think they eventually went with this approach. (Which was closer to our normal reverse-engineer-and-reimplement methodology.)
These days web applications are the norm and client-server programs are a thing of the past. They were relics of a different age. The browser is our universal client now, on the desktop at least. On mobile, “apps” are effectively client-server programs, though they have a different set of advantages and limitations. Maybe in the future mobile apps will converge as well, into a unified browser client – although probably mobile browsers need to be more feature-rich before this convergence can take place. My historical disdain for client-server programs carries over to mobile apps though – I don’t like to use too many of them, although that is a story for another blog post.
So after so many months of development you deployed your webapp to production and it’s up and running and everything is fine and you celebrate and your work is done right?
Two days later you get an urgent support call in the middle of the night. (Your clients are halfway across the world.) They’re asking why the website is inaccessible. You check via your browser and sure enough there’s an error 500. You have to ssh into the server. Half-asleep, you have to quickly comb through the log files to figure out if there’s something you can do about it right now or you have to sadly tell the client that the team has to handle it in the morning.
Nobody expects software errors. And nobody can predict how they will happen. (If we could do that, there wouldn’t be any errors!) But in general you should stay ahead of future headaches during development time by having robust error handling and reporting principles in place for your development team to follow.
There are two classes of errors:
- Validation errors – checks on user inputs. These errors are expected, and the most important thing here is to present clear error messages that tell the user what was wrong and how they can correct it.
- Everything else is an unexpected error, which we can breakdown further according to how the development team would react:
- “That shouldn’t happen!” – the broadest category, usually caused by human errors in coding/configuration or hardware problems. Examples include: database failure, typos in configuration files, different data formats used by different developers, unexpected system crashes and so on.
- “Hmm, oh yeah, we didn’t think of that.” – i.e., design flaws or unhandled cases. These are errors caused by program flow or usage that was not anticipated in the program’s design but are considered valid use.
- “That can’t possibly happen!”, also popularly known as “We have no clue how that happened.” Most unexpected errors start out under this category then after some detective work (some developers use the term “debugging”), it transfers to one of the above categories.
Unexpected errors may also be either recoverable (the user can either attempt to repeat the operation to continue or use other functions in the system) or catastrophic (the entire system or subsystem is completely unusable.)
Ideally, what happens when an unexpected error occurs? There’s a number of things that probably need to be considered:
- The user needs to be aware that it’s not an error on his part and that the problem needs to be reported to support or the system administrator.
- The user needs to be presented with enough information to give to the support team to allow them to determine the cause and fix the error.
- The program needs to capture what information it can to help the support team find and fix the error.
For server-side errors, it’s a bit easier than client-side errors. The first two items are typically handled by having a catch-all error page in the application that gives an error code and message for unexpected errors.
For the last item, it generally means logging, either to a file or a database-backed audit trail (the file is more reliable). Generally we log the error/exception encountered, including the stack trace, and some log messages on each request/page access. A majority of the time, having the stack trace is sufficient. The access logs allow us to back-trace previous steps which might have contributed to the problem. Any special conditions encountered in the code should also be logged – this helps to identify errors that can only happen under some combination of special conditions.
You don’t want to log too much information (what to log might need to be a separate post all on its own), but just enough to help the support team determine the cause of the problem.
Sometimes, the error is specific to the data being used by the users, or happens only in the environment they are using. And the support team investigating the problem will often not have access to that data or that environment, so your logging and the error reports have to be good enough to allow them to deduce the conditions that caused the error.
Error handling for deployed systems is already hard – so make sure not to make it harder on the future support team by having standard error handling and logging mechanisms.
Hopefully by now most developers and project managers are well aware of the mythical man-month and Brooks’ Law:
Adding manpower to a late software project makes it later
The idea is that communications overhead scales up quickly as you add more people to a project. Oftentimes it is counter-intuitively not worthwhile to keep adding more people to try to catch up. Some implications of larger team/project size may not be immediately obvious. Some problems scale up faster as team/project size grows:
- Lower productivity due to increased overhead as mentioned above.
- Meetings will tend to involve more people and take longer
- There will be a lot more emails
- Project management effort scales up quickly too
- More people need to be allocated to maintaining builds and servers
- More time needs to be spent on task prioritization, bug triage, etc
- More people asking WTF happened to their code (LOL)
- Any decision making that requires consensus building takes longer
- It becomes more difficult to find the right person to ask things
- Simply due to the number of people, there are more things that could go wrong
- Developers breaking the build happens more often
- People going on sick days will happen more often
- Server performance becomes much more important since any delay or downtime affects more people
- Schedule delays or others unexpected problems will be more likely
- Maintainability becomes more important
- Technical debt becomes more burdensome and poor code is more likely to come back and bite you in the ass in the future
- The need for good coding and development standards increases
- Higher likelihood of code duplication (“I didn’t know that Developer R already wrote a function that does X!”)
- More important for code to be well-decoupled, to reduce the likelihood of one developer breaking a lot of things
- Source control gets harder to use, with so many people making so many changes.
- The team needs to develop standards for commit messages and linking commits to bug reports. to make it easier to track and monitor changes
- Source control commit comments need to be a lot more helpful or descriptive.
- More commits happening in the same amount of time, the more you need to be constantly updating from the repository.
- Merging is more likely to become difficult and complicated (may be made easier by modern source control systems)
- More important to use more, smaller files instead of fewer large files (less likely to produce conflicts)
- Need better coding/programming standards. Otherwise you have the problem of changes/commits being difficult to track for example if one developer uses different autoformatting standard (his commits will have many small reformats)
- Having consistent rules for naming, UI, and other things becomes more important
- The more developers you have, the more likely that they will have different ways of thinking. There are far more likely conflicts among a team of 8-10 developers than between 2-3 developers.
- It becomes more important to have a standard or plan for where different kinds of files should be placed. Otherwise you run into problems like different developers using different folders for their css or different package naming conventions, etc.
- Consistency and standards more difficult to enforce (since there are more devs)
- Need to keep things consistent on all levels: databases, code, UI, and so on.
- Documentation becomes more important
- Tribal knowledge is often spread out among multiple developers
- Undocumented things are less likely to be passed on to new developers
- Developers unaware of undocumented things are more likely to have difficulties or to break things
- Becomes a lot more difficult to absorb new developers into the team in times of urgentness
- Documentation more likely to quickly become out of date due to rapid pace of changes
Anything you want to add?
Just a list I’ve been maintaining for a while:
(Disclaimer: This list in no way implies that developers who don’t exhibit all of these attributes are terrible human beings who don’t deserve to live. But working with developers who exhibit many of these traits will probably result in a better experience over the course of your developer career.)
- Laziness, Impatience and Hubris – from the well-known (notorious?) Larry Wall quote
- Communicates well; is able to explain and communicate his ideas clearly, especially to nontechnical people; able to write good documentation
- Understands the concerns with scheduling and project management and communicates clearly with the team to avoid problems. This means: willing to speak up as soon as any problem is encountered that introduces any kind of risk; not bloating estimates or pretending that tasks take longer than they really do; not cutting estimates to make managers happy;
- Cares about writing elegant code; understands the risks involved with code that is complicated or difficult to maintain; Understands the importance of data structures, algorithms and design patterns
- An attitude towards learning and self-improvement; owns up to his own faults; ready and willing (and often excited) to pick up new domains or technologies (and advises you of the appropriate schedule risk); Can easily pick up and learn new technologies and programming languages; recognizes and understands programming principles and able to carry them across to different domains or technologies; Able to study or learn new topics with minimal guidance;
- Understands engineering tradeoffs; able to tell you the differences in performance, storage, etc among different options.
- Works well with others: Willing to help with other people’s work when possible/needed; Has an open mind and willing to consider other people’s suggestions; Doesn’t take criticism personally; Chill AF
- Able to think logically and sequentially; able to break down a problem into a discrete set of solvable tasks; Able to investigate and find the cause of problems with minimal info; Able to think outside the box when necessary; Able to point out problems or logical inconsistencies with program requirements;
- Able to read and understand and maintain other people’s code; Can update code with the minimal possible changes to avoid breaking things;
Any other suggestions?