So why did InsightETE feel the need to completely turn the idea of how we monitor systems on its head?  Simple.  Someone had to do it.  Nobody else was going in the right direction, so we decided: “why NOT us?”

Here’s what we mean…

In short, popular end-user-facing systems have been around since the 70′s… systems monitoring started right around the same time.  Then came a revolution of technology.  And while the way we design user interfaces kept up… the way we utilized more and more processing power kept pace (more or less).. and how the systems impacted business became more and more complex.  There are a lot of reasons for that.   It is so much easier it is to write applications now than it was back then.  Resource constraints are far more rare and the languages are arguably easier to work in.  So IT systems have abstraction layer after abstraction layer built on top of them, increasing the complexity of figuring out where something went wrong…

And you know what didn’t advance?  The way we monitor and diagnose these systems.  Not the code… there’s been a ton of advancements within the limited framework we were all given.  But the monitoring framework we all work within is ill equipped.  And let me be even more specific:

The way we – as an industry – handle events is wrong, wrong wrong!

Now, there are many good reasons why monitoring tools, when first created, handled events the way they did:

  • Storage was expensive
  • Memory was even more expensive
  • Processing power we have today would be nearly unimaginable back then

As a result of those constraints, they had to make do.  Because they didn’t have the resources to keep historical trending data, that was thrown out.  So how can a tool look at the performance of a metric or server and determine if it is broken?  Well, you give it one more number, a threshold, to compare real-time data against.  If it breaches the threshold, then raise an event.

Totally makes sense, right?  Except that a single number will never describe the optimal health of any metric… and often times you need to know when something isn’t being used as heavily as it normally would be… and sometimes no news is NOT good news… sometimes it is bad, really bad.  So sure, it makes sense that they didn’t address these issues then because of resource constraints…

But why haven’t they fixed it in 40 years?

We in the technology industry – or really, any of the sciences – like to think of ourselves as somehow above normal human psychology.  That things like “group-think” and “tradition” aren’t something to which we can fall prey.  However, no matter how logical and perfect a computer program may be… a human created it.  So my theory as to why these issues haven’t been resolved are as follows:

  • It is easier to change marketing material than to develop new product (exceptions are sort-of events… so just say they are events and get that ITIL marketing power.)
  • For new companies… people do things the way they do sometimes because that is the way people have always done it.  Innovation is hard to execute, not because it is always difficult to build the disruptive technology, but because it is hard to get our minds to see things in new ways.
  • The “no news is good news” mentality actually works 99.9% of the time… and so many customers may never see a negative impact of a tool that uses this mentality.  The problem though, is the 0.1% of the time it doesn’t work can cost someone their entire business!

So that’s why we decided to make PAPA an implementation standard

PAPA-logoWe could have just decided to make a product that addresses these gaps (we did that too!) but we also wanted to be sure that the rest of the industry could benefit from this new way of looking at things as well.  The standard is still very much in its early phases of development, but already massive utility companies and health care organizations are benefiting from the Universal Information Center, the very first tool based on the PAPA methodology.

So what does this methodology do?  We’ll get to that in our next blog post.  Stay tuned!

ABOUT THE AUTHOR
Matthew Bradford has been in the I.T. Performance Business for 15 years and has been critical to the success of many Fortune 500 Performance Management groups. He is currently the CTO of InsightETE, an I.T. Performance Management company specializing in passive monitoring and big data analytics with a focus on real business metrics.