LRRD - Linpro RRD



Table of Contents


Background

LRRD is a server/client pair that graph, htmlifies and optionaly warns nagios about data it gathers. It's designed to let it be very easy to graph new datasources.

The Client

lrrd-client

Lrrd-client is a small perlscript listening to port 4949 using Net::Server. It reads all the scripts in /etc/lrrd/lrrd-client.d on startup. . The client accepts three commands:
list [node]
list available scripts for this node
nodes
List availbale nodes
config [script]
output configuration for [script]
fetch [script]
output script values
version
Output version string
quit
disconnect

Scripts

These scripts can be in you language of choice: bash, perl, python, C. The scripts can be run in two modes: with and without the "config"-parameter. When run with "config" as parameter, the script should output the configuration of the graph.
[ay@james:/etc/lrrd/lrrd-client.d] ./open_files config
graph_title File table usage
graph_args --base 1000 -l 0 --vertical-label number_of_files
used.label open files
max.label max open files
used.warning 7536
used.critical 8028
Usefull options:
host_name
If you want this plugin to measure something thats relevant to another node, specify fully qualified hostname. i
graph_title
The titletext of the graph, defaults to the servicename
create_args
If set, the arguments will be passed on to rrdcreate
graph_args
Extra arguments to lrrd-graph
graph_order
In witch order to draw the datasources. Can also include aliases on the form alias=domain:host:graph:datasource. See the FAQ for examples.
graph_title
Title of the graph
graph_vlabel
Y-axis label of the graph
graph
Set to "yes" or "no". Decides wether to draw the graph. Defaults to "yes".
update
Set to "yes" or "no". Decides wether lrrd-update should will fetch data for the graph. Defaults to "yes".
{name}.label
REQUIRED. Name of the datasource. You can have many datasouces in one graph
{name}.warning
Used by lrrd-nagios. Can be a max value or a range sepereated by colon.
{name}.critical
Same as above
{name}.type
Type of datasource, COUNTER, ABSOLUTE, DERIVE and GAUGE, defaults to GAUGE. Read man rrdcreate for more info
{name}.draw
What to draw from the data source: AREA, LINE1-3.
{name}.cdef
RPN-expression. Modify the values before graphing. See the FAQ for examples
{name}.min
Minimum value. If the fetched value is below "min", it will be discarded
{name}.max
Maximum value. If the fetched value is below "max", it will be discarded
{name}.negative
Name of field to 'mirror' on the opposite side of zero. See the FAQ for examples
{name}.hrule
Draw horizontal ruler on the graph
{name}.graph
Set to "no" or "yes. Decides wether to graph the data source. Defaults to yes.
{name}.extinfo
Used by lrrd-nagios. If lrrd-nagios is about to send a warning to nagios, also send the contents of extinfo.
{name}.special_stack
On the form "alias=domain:host:graph:datasource ...". Stacks several other data-sources on top of each other. See the FAQ for examples.
{name}.special_sum
On the form "domain:host:graph:datasource ...". Sums up several other data-sources and draws them as one. See the FAQ for examples.
{name}.filename
Override filename of rrd-file. Not used when creating/updating rrd-files, only when drawing graphs.
{name}.rrdfield
Override name of rrd-field. Not used when creating/updating rrd-files, only when drawing graphs.
{name} is limited to 19 characters.

Without options the script should only give out {name}.value (value):

[ay@james:/etc/lrrd/lrrd-client.d] ./open_files       
used.value 363
max.value 8192

All scriptnames containing "~","#" or starting with "." will be skipped.

The Server

The server runs a cronjob as the user lrrd every 5 minutes. The cronjob runs lrrd-update,lrrd-nagios,lrrd-graph and lrrd-html one by one. All scripts creates a lockfile in $dbdir. Everytime a script starts, it checks if the pid in the lockfile is alive before starting.

/etc/lrrd/lrrd-server.conf

This is the configuration-file for all serverscripts.
#Configfile for lrrd-server
dbdir   /var/lib/lrrd/
htmldir   /var/www/lrrd/
indextmpl  /etc/lrrd/templates/lrrd-overview.tmpl
nodetmpl   /etc/lrrd/templates/lrrd-nodeview.tmpl
servicetmpl /etc/lrrd/templates/lrrd-serviceview.tmpl
domaintmpl /etc/lrrd/templates/lrrd-domainview.tmpl
htaccess  /etc/lrrd/templates/lrrd-htaccess
templatedir /etc/lrrd/templates/
logdir    /var/log/lrrd

#To warn Nagios
#nsca /usr/bin/send_nsca
#nsca_server nagios.server.org
#nsca_config /etc/nagios/send_nsca.cfg

# Edit and uncomment the following to start surveilance
#
#<domain>
#  <testdomain.org>
#    <node>
#      <machine.testdomain.org>
#        address   localhost
#      </machine.testdomain.org>
#    </node>
#  </testdomain.org>
#</domain>
Explaination:
dbdir
Rootdir for alle rrd-files (files go into $dbdir/$domain/)
htmldir
Where to png's and htmlfiles end up
*tmpl
htmltemplates
templatedir
Where the templates reside
htaccess
The default htaccessfile
logdir
Where to send logs
nsca*
Nagios options. See seperate section
Then you have i htmlalike nodetree. To add a new node, just put in the node-section and add the address option.

lrrd-update

Lrrd-update reads /etc/lrrd/lrrd-server.conf, searches for nodes, and connect to the lrrd-clients using the address-field. When connected it will run the list-command to fetch available scripts, then it will run config for each script. This configuration will expand in the /etc/lrrd/lrrd-server.conf-file and rdd-databases will be created. Already expanded configuration will be skipped. Then lrrd-update runs through it's newly modified configuration file and runs fetch on all scripts.

lrrd-graph

Lrrd-graph reads /etc/lrrd/lrrd-server.conf and graphs all services unless [service].skipdraw. The following options are available in the configuration
limited to 19 characters
[client].graph_title
The title of the graph
[client].graph_order
Which order to graph the lines.
[client].graph_args
Extra arguments to the graph
[service].label
REQUIRED, the name of the value to be graphed,
[service].type
Type of value. COUNTER, GAUGE, defaults to GAUGE
[service].max
Maxvalue for the graph
[service].max
Maxvalue for the graph

lrrd-html

Lrrd-html creates the html-pages for the graphs.
Usefull configuration in the server.conf file is:
node_order [node1] [node2] ....
In which order the nodes should be listed, defaults to sorted.

lrrd-nagios

Lrrd-nagios is a optional script to send a passive alert to a nagios-server. For this to work, you need a nagios-nsca server, a working send_nsca configuration and the following configuration in /etc/lrrd/lrrd-server.conf:
nsca   /usr/bin/send_nsca
nsca_config   /etc/nagios/send_nsca.cfg
nsca_server   [nsca-server] 
Then add .warning and .critical fields in your configuration or directly into you clientscripts. The value for these field can be a single maxvalue or a colonseperated range
processes.warning 10:300
processes.critical 5:500
A value lower than 10 or higher then 300 will result in a warning to nagios, a value lower than 5 or higher than 500 will result in a critical to nagios

Other usefull ranges:

[service].warning :400
is equal to:
[service].warning 400
Only warn if lower than 300:
[service].warning 300:
When a service contains .critical or .warning it will chech it's status agains the last fetched value. If it's ok, a "{service}.ok" file will be created in the $dbdir/$domain directory. If the value is not ok. This file will be removed and lrrd-nagios will update nagios every 5 minutes untill the value is ok and a new ".ok" file will be created.