Everyone would like to spend less money on their server infrastructure, but it can be difficult to figure out where money can be saved, and whether reducing the amount that you spend on infrastructure will result in a revenue drop, caused by outages and reduced service quality. By looking at your costs, revenue, system metrics and task/time management system together, it's possible to avoid these pitfalls, and reduce the amount of money that you spend on your infrastructure while simultaneously improving the quality of the services that you provide to your customers.
One of the first steps to reducing your infrastructure cost is to make certain that you have a monitoring solution in place that can help you identify underutilized parts of your infrastructure, as well as what parts of your infrastructure are costing you money.
Most shops already have metrics in their monitoring system that can be used to identify underutilized equipment, but if you do not: CPU usage, disk IO, network IO, and memory usage are all good places to start looking--and they are all easy to monitor. When you find underutilized equipment, a money saving solution is usually pretty obvious. If the system is older, and is a good candidate for virtualization, then it should probably be virtualized. If an expensive network link is not being heavily utilized, or it's being over utilized, reconsider whether there are cheaper (and better) options available from another vendor. Reducing costs by virtualizing, moving off of old hardware and getting network connections that are suited to your business should be second nature at this point, but it can still be difficult to isolate exactly where you can do this. Simple systems monitoring can help.
Figuring out which parts of your infrastructure are costing you money can be more difficult. Most businesses are not monitoring the number of customers they lose as a result of unplanned downtime, or what the cost of support on old hardware is over time. By integrating business information (like number of customer sign ups over a period of time, amount of money refunded or number of support cases opened) into your technical monitoring, error detection or trending system, you can immediately see what the results of an outage or change are; and, you'll know how much money should be spent to fix a problem, or to build a more fault-tolerant infrastructure.
There are some things that are difficult to automate in monitoring, but should still be reviewed on a regular basis. Support contracts, rack space/colocation bills, bandwidth overages (or underutilized contracts for bandwidth) and power bills all fall into this category. As equipment and environments age, fixed costs become taken for granted. When this happens, you'll frequently forget that you are paying money for rack space that is no longer in use, bandwidth that is no longer necessary, and expensive support contracts on systems that could be virtualized onto an under-utilized new system. A newer system is often already in place, but the legacy system is left running for months, if not years, in case the new system fails. When you replace something and keep the old system as a backup, set up a reminder to revisit the decision to keep the old system after a month or two. If you don't, the legacy equipment can end up staying in use for years before someone remembers it.
Finally, review your own time tracking system to see how you are spending your time. It's easy to get into the rut of documenting a manual way of taking care of a task, and then doing it that way every time. If you can automate a process (or even parts of a process), or make the documentation simple enough for anyone else to follow, you can reduce the amount of time you spend on things and have more time for setting up new clients, investigating new software and helping your users.
One of the things to look for in your time tracking system would be who is spending time on tasks, and what those tasks are; if senior people are using their time to do the same task over and over again, it can be a sign that the task should be better documented, so that more people in your group can take care of it (and the more expensive time of senior administrators can be used for more difficult work).
To summarize:
- Monitor everything and look for under- or over-utilized resources.
- Track your time; if you are spending lots of time on the same procedures; automate them. If you are responding to the same problems over and over again, find a way to permanently fix them.
- Watch your invoices. It's easy to pay support, bandwidth and power bills month after month, or year after year, without reviewing them to see if you can get a better deal elsewhere, or to see if non-critical infrastructure is costing you more money than you would like.
These ideas are all simple, but by considering them during your day-to-day operations, as well as during periodic reviews, you may find yourself spending less money, and using less time, on keeping your infrastructure running well. Reducing the cost of your existing infrastructure gives you more time and money to spend on improvements and new projects, rather than merely on maintaining what is already in place.