Our Blog

Current Articles | RSS Feed RSS Feed

The Sensation of Sensu

 

After only 3 months of service, Sonian's open source monitoring framework, Sensu, has stirred many discussions and attracted the attention of many developers, techies and cloud services in general. Sensu has since filled a void in monitoring cloud based systems that previously did not exist in open source or commercial products. 

As many users have begun to utilize the monitoring platform, questions have evolved and moves are being made to solve them. One of our contributors, Joe Miller, has taken some of those queries and posted a blog in response. He describes Sensu in detail and then continues with installation processes and add-ons components.

Read his blog here: http://joemiller.me/2012/01/19/getting-started-with-the-sensu-monitoring-framework/

Cloud Monitoring Framework Sensu at 30 Days

 
About a month ago Sonian released our cloud-appropriate monitoring framework to the open source community. We are extremely proud of the Sonian team that created this framework, and as all great projects typically are born from the seeds of discontent, Sensu is no different.

Checkout the project here: http://github.com/sonian/sensu. Read lead developer Sean Portners inaugural blog post here: http://blog.sonian.com/technology-blog/bid/77977/Sensu-A-Monitoring-Framework

Since the beginning of our working with the cloud, the ability to monitor and alert on a dynamic computing environment has been a challenge. We initially relied upon Nagios and created quite a customized implementation to manage our 500+ virtual compute nodes. But Nagios wasn’t designed for a world of “scale-up and scale-down,” which is one of the primary reasons to adopt cloud computing. So we had a pressing need to create a monitoring framework.

Sean Porter, @portertech, inspired by a recent trip to Japan, came up with the name “Sensu,” which means “fan.” Sensu is a modern take on cloud CPU and application monitoring, leveraging a enterprise service bus queuing system to allow monitoring agents and data consumers to coordinate with each other thorough a queueing system in a publish-subscribe methodology. Sensu’s framework approach is designed to allow easy customization and horizontal scaling.   

Pete Cheslock, @petecheslock, who manages the team that created Sensu, shared this update on how Sensu has fared in it’s first 30 days.
  • 132 “watchers” are following the project on Github http://github.com/sonian/sensu
  • 26 people have forked the project and their work flows back to the main project
  • 2 external contributors have actively extended Sensu to work with Puppet and CentOS.

Sensu - A Monitoring Framework

 

At Sonian, we monitor an ever-changing number of Amazon EC2 instances. As I write this post, that number is 476, and that is expected to rise and fall before the day is done. But with the "elastic" nature of our infrastructure monitoring EC2 instances is a not such a trivial task.

We have found the standard tools from the community toolbox to be inadequate when operating in the "cloud." Until recently, Sonian utilized several tools in conjunction to monitor systems and collect metrics; Nagios, Collectd, Graphite, and Ganglia.

The Evolution of Nagios at Sonian.

Our servers are grouped into "stacks", providing isolated environments that are globally distributed. In the past, a Nagios server would reside in each one. The servers were responsible for monitoring the components of their stack, triggering notifications when something was amiss. Check coverage was gradually increased over time, as applications began to require more moving parts. As the number of stacks increased, a centralized view of the organization was desired. To appease the engineering teams, a distributed Nagios solution was created. The monitoring server in each stack would forward their check results to a central Nagios server running the Nagios Service Check Accepter. The central server ran the Nagios web interface, displaying the status of every client and service under our control. Notifications could only be triggered by the central server, making it easier to silence notifications for a client or one of its services.

The Neanderthal.

Nagios is NOT designed for the cloud. It expects your environment to be fairly static, with every aspect of it under your control. The initial release was May 14th, 1999. The concept of elastic Infrastructure as a Service (IaaS) didn't exist at the time of its creation, and it has failed to adapt.

Nagios' inability to discover clients is an excellent indication of its antiquity. Nagios must know of every client, group, and service on start. When a new server is spun up, the Nagios configuration must be updated and the service reloaded in order to begin monitoring it. Configuration Management is commonly used in this case as a semi-solution; using a method of server discovery, re-writing the configuration, and then triggering a service reload. It's only a semi-solution as the process usually only happens on a set interval or is too intensive for frequent changes. Distributing Nagios in a tiered fashion only complicates this further, making it far more difficult to begin monitoring a new server or deploy new checks. The following diagram depicts a sample of events that would require Nagios configuration changes.

central nagios resized 600

Our problems with Nagios

  • Configuration is unpleasant & restrictive
  • Cannot discover new servers on its own
  • Easily overwhelmed with a high number of clients & checks
  • Difficult to extend & hack

A Brief Introduction to Sensu.

Enter Sensu, a monitoring framework that aims to be simple, malleable, and scalable.

sensu diagram resized 600

The Building Blocks.

In this modern world of computing, we're blessed with ever improving Configuration Management tools, such as OpsCode Chef and Puppet. These tools already gather the information needed to effectively and efficiently monitoring your systems. Not only are these tools a rich source of data, but they can also handle the distribution of supporting libraries and plugins. Sensu was built with the intention of being paired with a CM tool.

Message-oriented middleware is commonly used by developers to decouple and distribute components of their applications. Sonian currently uses RabbitMQ for all sorts of job queues. For example, RabbitMQ allows Rails application to communicate to a backend written in Clojure, without any knowledge of its status or implementation.

Sensu uses RabbitMQ to securely route check requests and results, making it possible to scale out and back in on demand.

Open source key-value data stores have been around for a long time, recently gaining a lot of attention with NoSQL being all the rage. Redis is a very fast in-memory "data structure server" with keys that can contain strings, hashes, lists, sets, and sorted sets. Its support for atomic operations and ability to persist to disk has made it a common choice for new projects. Sensu uses Redis as a non-persistent database, to store client and event data.

The Concept.

The idea behind Sensu is simple, schedule the remote execution of checks and collect their results. As mentioned above, Sensu uses RabbitMQ to route check requests and results, this is the secret sauce. Checks will always have an intended target; servers with certain responsibilities, such as serving web pages (webserver) or data storage (elasticsearch). A Sensu client has a set of subscriptions based on its server’s responsibilities, the client will execute checks that are published to these subscriptions. A Sensu server has a result subscription, this is where clients publish check results. Since each component only connects to RabbitMQ, there is no need for an external discovery mechanism, new servers are monitored immediately.

Code.

Sensu is written entirely in Ruby, using the EventMachine library for single process concurrency. This has produced a fully functional, clean, and small code base.

Sonian has made it publicly available on GitHub.

sensu code resized 600

All configuration is done with JSON files, making it easy for Configuration Management and other automation tools to create and read them. The following are configuration snippets.

Client attributes

senu configuration client attributes resized 600

Check definition

sensu configuration check definition resized 600

I hope this very shallow dive into Sensu has spiked your interest. For the nitty gritty, please check out the GitHub repository, and jump on IRC (irc.freenode.net #sensu). I realize you probably have many questions, drop them into a comment bellow and I will do my best to fill the holes. 

What’s Next?

My plan for the next few days is to produce documentation (wiki) and a Vagrantfile for provisioning a VM with either OpsCode Chef or Puppet.

All Posts

Follow Me