At OmniTI we're known for solving hard problems. Where others might find limitations, we look for new ways to resolve problems, and we're not shy about using brute force when we need to. When brute force costs too much, we help people find more elegant solutions. Sometimes we hear terms like "guru" or "wizard" tossed about, describing our people or the work we do. While it's always flattering to receive such compliments, I think it's important to think about what exactly do people mean by things like "guru", and what type of technical expert we want to be.
Sometimes when I have a few minutes of downtime, I'll go hang out on IRC; you can usually find me in the #postgres channel on freenode. This is an old habit that stems from back when I changed my career path from roles where I was primarily doing web development into roles where I was managing databases full time. I spent a lot of time asking questions, and studying the questions that others asked. I learned countless things there, both about errors within Postgres and the types of problems people ran into; it was an invaluable experience. If it really does take 10,000 hours to become an expert, I suspect many of my hours were logged right there.
Building Better Gurus
A few months ago, one of our DBA's asked me to take a quiz. It had arisen from an interaction he had had with a new prospect, who was looking for a Microsoft SQL Server "Guru". Now, I wouldn't claim to be a guru in MSSQL, but it isn't totally foreign to me either; I've personally developed applications that hit against MSSQL, and done trigger development as well. I've also debugged replication systems that ran against it and had to debug extremely complicated locking behaviors; in short, I have helped a fair number of companies find success with MSSQL problems, and even to this day we still work with a few companies who make use of it, so I figured I might as well give it a go. The test sounded like a typical database problem; a query was not making use of an index when the prospect had thought it should. I started down the normal paths; have the statistics been updated? Yes. And we know there isn't corruption? Yes. The cardinality of the data within the column is such that the index should be used? Yes. They had even tried adding a hint, but it was ignored. At this point, I felt I needed to see the query and the table structure, and maybe even an explain plan.
BOOM! I failed. I was not a guru.
The opinion of the prospect was that a guru should just know what the answer to that was, so since I couldn't solve the question straight out, clearly I was not a guru. Turns out I'm ok with not being a MSSQL Guru, but I do think that the idea that this is how you measure gurus is wrong. Or, maybe that definition is fine, but it doesn't really tell you anything about whether someone that you are talking to can help solve your problem, which is the thing you really want.
What's the difference?
The problem here is that this person was conflating two distinctly different skills sets, but attributing (I believe incorrectly) the expectations of one skill to the other. What the person in the quiz wanted was someone who was good at pattern recognition, not someone who could troubleshoot. Don't get me wrong, pattern recognition is certainly useful, and can be extremely valuable, but it also has limited applicability. If you get a "failed to re-find parent key" error in Postgres, I'm probably (well, hopefully) one of the few people who have seen that error in production, and I have the burn marks to remind me about it. If you are asking on IRC at the same time I am hanging out there, you're going to get a much quicker path to a solution than otherwise. This is great, and maybe some people would say I am a guru because of that, but I sometimes find these pattern recognition skills to be a liability. If you have a connection error to your system, there are less than half a dozen problems it will likely be, and if I'm trying to deduce it, I know I sometimes just fall back to applying the patterns and seeing which one fits. More often than not this is actually going to result in a fix faster than if I tried to troubleshoot the problem. But what happens when you come across something that doesn't match the patterns?
Troubleshooting is really the ability to follow a (repeatable) methodology, step by step, to find a problem point. For something like a connection error, you should be able to start on one end of the connection and walk the path that connection should take, verifying that things work at each point along the way, step by step. You don't check the client credentials and then jump to permissions on the server; you actually verify that the connection is initiated by the client first; then you verify that the request makes it to the server. Not just to the host machine, but do you see a connection being received by the server itself? Yes, this takes longer. And if you know that 40% of all problems are incorrect credentials and you know that another 40% are permissions settings in the server, skipping things like checking if a firewall is getting in the way is probably going to slow you down, but if you really want to be a "guru", you need to be able to handle that other 20%.
Have you ever had the opportunity to watch someone debug website slowness with curl and tcpdump? That is someone who is working to become a true guru; not just understanding patterns of operation, but understanding how things really work. This is one reason why we encourage lower level systems knowledge and a polyglot approach to each part of the stack. When we started building Node.js websites, we were able to take fundemental skills we had learned building scalable sites in Perl and PHP and apply them to this new technology. And that's where you want your guru to be. Even when in completely unfamiliar with a technology, with no pattern matching built in, to be able to understand and solve problems in a new domain by applying the fundamental tools that you know along with troubleshooting skills. That is a "guru".