OmniTI ~ Custom Trending and the Benefits of Source Code Availability

Custom Trending and the Benefits of Source Code Availability

By: Mark Harrison 13 Feb '09

Level:

This article reviews a fundamental concept or principle
This article reviews an intermediate concept or principle
This article reviews an advanced concept or principle
This article expresses an opinion or just a downright rant

One of the self evident truths about system administration is that you need to know what is going on with your systems. Monitoring - knowing that your systems are working as expected and, more importantly, knowing when they aren't - is the thing most people consider first when they realize that fact. Equally important however, is trending - knowing what your systems were doing in the past. Trending allows you to determine if the current state of your system is normal, or if something has changed that could signify a problem. Trending also allows you to predict when your current systems will become unable to handle what is expected of them. When the amount of traffic to your website is about to outgrow your systems, you can see this and add more capacity, either by adding servers or replacing them with something more powerful, before the capacity problems start to occur.

At OmniTI, we use a number of systems for trending, including Cacti and our very own Reconnoiter. Each system comes with a large number of monitors built in, allowing you to trend anything from network traffic, to system load to disk space. Sometimes however, you need metrics for which there is nothing currently available. This is where these systems' extensibility comes into play.

The following example shows a situation where we needed information that our current monitoring/trending systems were not able to provide, and we needed to extend them with a custom trending solution:

One of our clients had a website that had become very popular, and was suffering performance issues as a result. We suspected that at least part of the system was I/O bound, and so we wanted to gather metrics on the I/O performance of the system over time. The systems in question were running Solaris 10 with the data on ZFS. The normal iostat command, for which a number of monitors exist, does not give true values for reads and writes performed by ZFS. Iostat can only see read/write requests from filesystems. True I/O statistics can be obtained on the command line by running the zpool iostat command. This works in a similar way, producing output similar to the following:

# zpool iostat rpool 10 5
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
rpool       16.6G  57.7G      1      0  43.5K  2.06K
rpool       16.6G  57.7G      0      0      0      0
rpool       16.6G  57.7G      0      0      0      0
rpool       16.6G  57.7G      0      0      0      0
rpool       16.6G  57.7G      0      0      0      0

The statement above that our monitoring systems were not able to provide the information that we needed isn't quite true. Elsewhere, we had a monitor that obtained zpool I/O statistics using a long running zpool iostat process, taking the values out and entering them into a database, with a custom script that fetched the values from the database and entered them into cacti. The system in question was a database server, so this method, while clunky, worked well enough for its purpose. For monitoring the web servers however, using the same method just wasn't practical and we needed something better. We needed something that didn't require running a long running process and running a database server on the machine just for trending information.

The obvious choice here was to use SNMP. Cacti (as well as pretty much every monitoring/trending package) has built-in support for obtaining data over SNMP, and net-snmp (the snmp agent in use on the server) has various ways of extending functionality to get custom metrics.

Having chosen SNMP, the next decision was how to get the data we needed and present it over SNMP. The seemingly obvious choice would be to run zpool iostat and parse the output as was done previously, presenting those values over SNMP. However, that either requires the long running zpool iostat process, or running it once for a few seconds at a time to get a snapshot of the I/O over that period, which will lead to inaccurate results (it won't tell us anything about the performance of the system during the time between checks). One of the things that Cacti (or rather rrdtool, which cacti makes use of) is very good at is taking raw data and generating meaningful statistics from it. If we could somehow get raw I/O values rather than already aggregated values such as 'n KB over the past m seconds read' and pass those to cacti, then cacti could do the work and we would get accurate values.

Enter open source. The source code to OpenSolaris is available, including the source code to the zpool command, which it possible to see how the zpool iostat command itself worked. Once you trace the various calls made by that function, it turns out that underneath, the zpool iostat command uses libzfs to fetch the exact raw values we are looking for. It was then a relatively simple matter to take that code and print out the raw values:

#include <stdio.h>
#include <sys/fs/zfs.h>
#include <libzfs.h>

/*
 * Sample code to demonstrate printing of raw zpool io stats.
 * Compile with: cc -lzfs -lnvpair zpoolio.c -o zpoolio
 */

int print_stats(zpool_handle_t *zhp, void *data) {
    uint_t c;
    boolean_t missing;

    nvlist_t *nv, *config;
    vdev_stat_t *vs;

    if (zpool_refresh_stats(zhp, &missing) != 0)
        return (1);

    config = zpool_get_config(zhp, NULL);

    if (nvlist_lookup_nvlist(config, 
        ZPOOL_CONFIG_VDEV_TREE, &nv) != 0) {
        return 2;
    }

    if (nvlist_lookup_uint64_array(nv, 
        ZPOOL_CONFIG_STATS, (uint64_t **)&vs, &c) != 0) {
        return 3;
    }

    printf(
        "pool:%s read_ops:%llu write_ops:%llu " \
            "read_bps:%llu write_bps:%llu\n",
        zpool_get_name(zhp),
        vs->vs_ops[ZIO_TYPE_READ],
        vs->vs_ops[ZIO_TYPE_WRITE],
        vs->vs_bytes[ZIO_TYPE_READ],
        vs->vs_bytes[ZIO_TYPE_WRITE]
    );
    return 0;
}

int main() {
    libzfs_handle_t *g_zfs;
    g_zfs = libzfs_init();
    return(zpool_iter(g_zfs, print_stats, NULL));
}

Once this was done, the next step was to get the values exported over SNMP so that cacti could view them. Net-SNMP has a pass directive that allows you to delegate an OID to an external program, and have that program print out the results it needs. These values are then exported over SNMP, available for any of the above monitoring tools to make use of.

In Cacti, it was then just a matter of creating an appropriate SNMP Data Query, adding some Graph Templates, and wait for the pretty pictures to come flowing in.

This example shows how you might approach developing code to obtain custom metrics, and shows the benefits of having the source code available so that you can learn from tools that do most, but not all of what you are trying to achieve. Sometimes, what you need just isn't available and you just have to build a solution from the available pieces.

In This Issue…

Marketing Malware

23 Dec '09 from Mark Hammonds
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
Internet registrar GoDaddy.com is notorious for two things: domain names and risque super bow…
Business Metrics Too

16 Dec '09 from Jason Dixon
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
When I began tinkering around with web services as a hobby, it was common to fiddle with an ap…
Transcending the Medium

15 Oct '09 from Mark Hammonds
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
I grew up in Boyd County, Kentucky, where John Deere tractors are practically an indigenous sp…
When Commodity Makes Sense

29 Sep '09 from Eric Sproul
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
We’d all like to spend as little money as possible to get the performance we desire from…
Stacking the Deck for Publishers

22 Sep '09 from Jon Tan
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
Newspapers and magazines have a unique opportunity with online publishing. They have the bes…
Concepts of Cloud(ish) Storage

22 Sep '09 from Theo Schlossnagle
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
It’s rare that I write an article simply to educate. Most of the time I am attempting t…
What is Web Operations?

22 Sep '09 from Theo Schlossnagle
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
The field of web operations is one with which I am intimately familiar. For the last twelve y…
ORMs Done Right

22 Sep '09 from Clinton Wolfe
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
Object-Relational Mapper (ORM) systems are one of the most contentious topics in database appl…
Virtualization, ZFS and Zetaback

22 Sep '09 from Mark Harrison
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
It used to be the case that when you wanted to deploy a new application you would need to buy …
Dissecting Today's Internet Traffic Spikes

22 Sep '09 from Theo Schlossnagle
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
Today's Internet has changed quite a bit from the Internet I used to know. The Internet has a…
YSlow! to YFast! in 45 minutes.

22 Sep '09 from Theo Schlossnagle
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
The web is a complex beast. There are many moving parts involved in delivering a complete web…
RubyRep - Yet Another Tool For PostgreSQL Replication

19 Aug '09 from Denish Patel
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
One of the key features any enterprise considers when choosing a database technology for their…
Under the Hood

8 Jun '09 from Sherry Schlossnagle
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
My perspective on the evolution of OmniTI is somewhat like that of a mechanic on a team of rac…
Increasing the Aperture on Security

13 Feb '09 from Jason Dixon
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
Security is good. Security is necessary. Security is someone else's concern. Security is for o…
Embracing Failure to Rise Above Enterprise-Class Thinking

13 Feb '09 from Eric Sproul
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
Failures in technical systems are inevitable. Drives die, network interfaces wink out, backho…
The Irony of Sun Database Technology

13 Feb '09 from Robert Treat
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
It's been just over a year since Sun announced it had agreed to purchase MySQL, the ever popul…
Custom Trending and the Benefits of Source Code Availability

13 Feb '09 from Mark Harrison
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
One of the self evident truths about system administration is that you need to know what is go…
Using Less is Green

13 Feb '09 from Theo Schlossnagle
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
Every time I hear about green computing I feel like there is a gap — an enor…
Oracle Flashback Versions Query

27 Jan '09 from Denish Patel
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
In database system, some times flashback for the older versions of the data or query is import…

Seeds 2009

Year:

In This Issue…

Marketing Malware

Business Metrics Too

Transcending the Medium

When Commodity Makes Sense

Stacking the Deck for Publishers

Concepts of Cloud(ish) Storage

What is Web Operations?

ORMs Done Right

Virtualization, ZFS and Zetaback

Dissecting Today's Internet Traffic Spikes

YSlow! to YFast! in 45 minutes.

RubyRep - Yet Another Tool For PostgreSQL Replication

Under the Hood

Increasing the Aperture on Security

Embracing Failure to Rise Above Enterprise-Class Thinking

The Irony of Sun Database Technology

Custom Trending and the Benefits of Source Code Availability

Using Less is Green

Oracle Flashback Versions Query

What We Do

Footer

Year:

Custom Trending and the Benefits of Source Code Availability

Subscribe to Our Newsletter

Stay in Touch!

In This Issue…

What We Do

Footer