Quantcast
Channel: modular-input – Splunk Blogs
Viewing all 15 articles
Browse latest View live

Splunking Websphere MQ Queues and Topics

$
0
0

What is Websphere MQ

IBM Websphere MQ , formerly known as MQSeries , is IBM’s Message Oriented Middleware offering and has been the most widely implemented system for messaging across multiple platforms over the last couple of decades.

What is Message Oriented Middleware

From Wikipedia :

“Message-oriented middleware (MOM) is software or hardware infrastructure supporting sending and receiving messages between distributed systems. MOM allows application modules to be distributed over heterogeneous platforms and reduces the complexity of developing applications that span multiple operating systems and network protocols. The middleware creates a distributed communications layer that insulates the application developer from the details of the various operating system and network interfaces. APIs that extend across diverse platforms and networks are typically provided by MOM.”

Where does MQ fit into the landscape

In the course of my career , I’ve architected and coded solutions across many different verticals (aviation, core banking, card payments, telco, industrial automation, utilities) and MQ has been a fundamental mainstay in the enterprise IT fabric in all of these industries, stitching these often very heterogeneous enterprise computing environments together.Ergo , the messages being sent to MQ queues and topics represent  a massive source of valuable machine data that can be pulled into Splunk to derive operational visibility into the various systems and applications that  are communicating via MQ.

Enter Splunk

So how can we tap into MQ from Splunk ? The JMS Messaging Modular Input (freely available on splunkbase) is the answer , and I blogged about this in more detail recently.

When I first developed the JMS Messaging Modular Input , it wasn’t particularly feasible to test it against every MOM system that had a JMS provider , so my testing was done against ActiveMQ with the knowledge that JMS is just an API interface and in theory the modular input should work with all JMS providers. Upon release , the emails started coming in , and much to my delight , many users were successfully using the JMS Messaging Modular Input to Splunk their MQ environments.The theory had worked.

So recently , in collaboration with Splunk Client Architect Thomas Mann , we set about building a Websphere MQ environment and hooking the JMS Messaging Modular Input into it so as to end to end test this for ourselves.

I am not an MQ administrator by any means , I am quite familiar with JMS concepts and coding , but as far as install, admin and configuration of the MQ software , this is probably what took me the most time. Once MQ was setup properly , then configuring the JMS Messaging Modular Input via the Splunk Manager UI , was very quick and simple.

If you already have MQ in your environment and have  MQ admins that know JMS concepts with respect to setting up the MQ side and configuring the client side , then you should find all of the setup steps to be quite trivial.

Setting up MQ

Prerequisites

  • Websphere version 7x installed (for this example, but previous MQ versions are compatible also)

Create a Queue

Create an MQ Queue under the default queue manger.  This step is optional if you have some destination the messages are already going to.

Create the JMS Objects

Using a JNDI File context is the simplest approach, unless you want to setup a directory service to host your JMS objects.  In this step you setup  the location of the .bindings file that MQ will create for you.  The Provider URL and Factory Class will be used later in your JMS Modular Input configuration.

Configure a connection factory

In this case I called it SplunkConnectionFactory.  This name will also be used in the JMS Modular Input configuration.

Ensure that you set the Transport mode  to Client – not Bindings

Setup server name in bindings file

Right click on the SplunkConnectionFactory and open properties.  Select the Connection item on the right hand side.  Change localhost(1414) to <servername>(1414).  This is the connection info between the .bindings file and the MQ host.  If you don’t specify that, then Splunk will try to connect to MQ on the localhost , which clearly won’t work in a remote configuration.

Create a new JMS Destination

Create a new Destination under the JMS Administered Context.  This links what is published in the .bindings file with the associated queue you want to manage.

Associate the JMS destination with a MQ Queue Manager

Disable Channel Auth in MQ

This step should not be done in production.  In that scenario, work with your MQ admin to set up appropriate access in MQ 7x and later.

  1. Open a terminal and navigate to <mq install dir>/bin
  2. Run runmqsc
  3. Enter the following command: AlTER QMGR CHLAUTH(DISABLED), then hit Return.
  4. Type END to exit runmqsc

More details here

Create the Channel

This step only needs to be done if the MQ Admin doesn’t create a channel for you.  Also, this may be unnecessary if you disable Channel Auth as outlined above.

  1. First thing to do is make sure your listener is running.  runmqlsr –t tcp –m qmgr –p nnnn where qmgr is the name of your queue manager and nnnn is the port number your listener is on (default is 1414).  For my configuration, the command was:  runmqlsr –t tcp –m QMgr –p 1414
  2. Create the channel for Splunk: DEFINE CHANNEL(‘splunkChannel’) CHLTYPE(SVRCONN) TRPTYPE(TCP)  DESCR(‘Channel for use by splunk programs’)
  3. Create a Channel Authentication rule using the IP address of the splunk indexer that will be reading the queues.  Assign it to a user (splunk is a local user created on the MQ box, non admin but in the mqm group). Run the command: SET CHLAUTH(‘splunkChannel’) TYPE(ADDRESSMAP) ADDRESS(’10.0.0.20′)  MCAUSER(‘splunk’)
  4. Grant access to connect and inquire the Queue manager: SET AUTHREC OBJTYPE(QMgr) PRINCIPAL(‘splunk’) AUTHADD(CONNECT, INQ).  Be sure to replace QMgr with your queue manager name and splunk with your local username.
  5. Grant access to inquire / get / put messages on the queue. SET AUTHREC PROFILE(‘SplunkQueue’) OBJTYPE(QUEUE) PRINCIPAL(‘splunk’) AUTHADD(PUT, GET, INQ, BROWSE).  SplunkQueue is the name of the queue you created.  Replace splunk with the user id specified in step 3.

More details here

Setting up Splunk

Prerequisites

  • Splunk version 5x installed
  • JMS Messaging Modular Input installed

Jar files

You need to copy the MQ jar files into the mod input’s lib directory at $SPLUNK_HOME/etc/apps/jms_ta/bin/lib
These are the jars I  ended up needing , note : these 4 jars are already part of  the core mod input release.

  • jmsmodinput.jar
  • jms.jar
  • splunk.jar
  • log4j-1.2.16.jar

Bindings file

MQ will create your bindings file for you and write it to the location that you specified.
If your Splunk instance is running locally to MQ , then you are good to go.
If your Splunk instance is running remote to MQ , you can just copy the bindings file to the remote Splunk host.
The directory location of the bindings file can be anywhere you like , the path gets specified as a parameter (jndi_provider_url) when you setup the mod input stanza.

Setup the Input Stanza

You can setup the JMS Modular Input stanza manually(as a stanza entry in an inputs.conf file) or via the Splunk Manager UI.

Browse to Data Inputs >> JMS Messaging

The values that you use for the setup will come from what you setup in MQ.

Optionally , you can also configure what components of the messages you wish to index, and also whether you just want to browse the messages queue rather than consuming the messages.

This is what the resulting stanza declaration that gets written to inputs.conf will look like. The items in bold are values that come from your MQ setup.

[jms://queue/SplunkQueue]
browse_mode = all
browse_queue_only = 1
durable = 0
index = jms
index_message_header = 1
index_message_properties = 1
init_mode = jndi
jms_connection_factory_name = SplunkConnectionFactory
jndi_initialcontext_factory = com.sun.jndi.fscontext.RefFSContextFactory
jndi_provider_url = file:/home/damien/MQJNDI/
sourcetype = mq
strip_newlines = 1
browse_frequency = -1
disabled = 1

Queues : browsing or consuming ?

The JMS Messaging Modular Input allows you to specify browse mode or consume mode(default).
Browsing does not remove the messages from the queue whereas consuming does(so it is slightly more invasive).
However there are issues with browsing in 2 main respects :

  1. you might miss messages if they are consumed before they are browsed
  2. you might get duplicate messages if you browse multiple times before the messages get consumed

So my preferred “least invasive” approach is actually to have the MQ admin setup an alias queue where a copy of all the messages you are interested in can get sent to and then the mod input can just consume from this queue without impacting any other consumers of the source queues.

Testing

Expand the Queue Managers tab on the left hand side.  Select your queue manager and expand it (QMgr in this example).  Expand Queues.  Find the queue you have linked your JMS object too.

Right click the queue and select “Put Test Message”

You can put as many messages as you like in the queue.  You will see these messages indexed if you everything is configured correctly.


Modular Inputs Tools

$
0
0

Tools


I’m a tools kind of a guy. I like things that make my life easier or allow me to accomplish some task that would be otherwise prohibitive. I also like Tool the band , but that’s another blog.

And so it is with software. Languages, libraries, frameworks are just tools that make it easier for us to accomplish some task.

Modular Inputs

With the release of Splunk 5 came a great new feature called Modular Inputs.

Modular Inputs extend the Splunk framework to define a custom input capability.In many respects you can think of them as your old friend the “scripted input” , but elevated to first class citizen status in the Splunk Manager. Splunk treats your custom input definitions as if they were part of Splunk’s native inputs and users interactively create and update the input via Splunk manager just as they would for native inputs (tcp, files etc…) The Modular Input’s lifecycle, schema, validation, configuration is all managed by Splunk. This is the big differentiator over scripted inputs which are very loosely coupled to Splunk.
What attracts me most to Modular Inputs  is the potential we have to build up a rich collection of these inputs and make is easier and quicker for users to get their data into Splunk.

Modular Inputs Tools

When I wrote my first modular input , there was certainly an initial learning curve to figuring out exactly how to do it. As powerful as modular inputs are , there are many semantics that have to be understood, for development and also building the release.

So I have created 2 Modular Inputs frameworks that should abstract the developer from having to understand all of these semantics up front , and instead just focus on developing their modular input’s business logic , significantly lowering the technical barrier of entry and getting to that point of productivity faster.

You can  write a modular input using any language , but for the most part my recommendation would be to stick with Python. It is more seamlessly integrated into the Splunk runtime. The reason you might use another language is if there is a specific library or runtime environment that your modular input depends upon.

The 2 modular inputs frameworks that I have created are for Python and Java. They can be cloned from github, and the best way to get started is to have a look at the hello world example implementations.

Python Modular Inputs framework

Github Repo

https://github.com/damiendallimore/SplunkModularInputsPythonFramework

Helloworld example

https://github.com/damiendallimore/SplunkModularInputsPythonFramework/tree/master/implementations/helloworld

Java Modular Inputs framework

Github Repo

https://github.com/damiendallimore/SplunkModularInputsJavaFramework

Helloworld example

https://github.com/damiendallimore/SplunkModularInputsJavaFramework/tree/master/helloworld

Developing Modular Inputs in C# – Part 1

$
0
0

One of the cool new features of Splunk 5.0 is modular inputs, and we’ve already seen some great examples of this, such as the built-in perfmon gathering modular input and the Splunk Addon for PowerShell. However, the examples that are provided in the documentation are in Python. When I started writing my own modular input, I saw that much of the process of writing a modular input is scaffolding and repeatable. Thus I set out to write an SDK that would alleviate much of the scaffolding and provide a good framework for writing modular inputs. This multi-part series will cover the same process by writing a C# version of the Twitter example from the documentation.

The first part of writing a modular input is to implement the introspection scheme. When Splunk starts up, it searches for defined modular inputs and runs each modular input with the –scheme parameter. Splunk expects an XML document back that defines the parameters and configuration of the modular input. This is the first part that I thought I could improve with some of the scaffolding. Rather than embed the XML into the program, why not produce a definition of the scheme programmatically and then serialize it with the standard C# XML Serialization library?

Let’s look at my base program:

namespace Splunk.Twitter
{
    class Twitter
    {
        static Twitter twitterApp = new Twitter();

        static void Main(string[] args)
        {
            if (args.Length > 0 && args[0].ToLower().Equals("--scheme"))
            {
                twitterApp.Scheme();
                Environment.Exit(0);
            }
            else
            {
                Console.WriteLine("ERROR Not Implemented");
                Environment.Exit(1);
            }
        }

        public Twitter()
        {
        }
}

Our program is a standard console application that looks for when Splunk feeds us the –scheme parameter and runs the Scheme() method. Our Scheme() method will construct the introspection scheme programmatically and output it to Console.Out (the Windows equivalent of stdout):

public void Scheme()
{
    Scheme s = new Scheme
    {
        Title = "Twitter",
        Description = "Get data from Twitter",
        UseExternalValidation = true,
        StreamingMode = StreamingMode.SIMPLE
    };
    s.Endpoint.Arguments.Add(new EndpointArgument
    {
       Name = "username",
       Title = "Twitter ID/Handle",
       Description = "Your Twitter ID."
    });
    s.Endpoint.Arguments.Add(new EndpointArgument
    {
        Name = "password",
        Title = "Password",
        Description = "Your Twitter password."
    });
    Console.WriteLine(s.Serialize());
}

This is all fairly basic object creation stuff. There are a couple of enumerations that are important. Most notable in this code-segment, the StreamingMode can be SIMPLE (which is a simple line-based output similar to a log file) or XML (where each event is encapsulated in XML before being transmitted to the Splunk server for indexing). We also define the endpoint. This drives the Splunk UI when defining the new data input within the Splunk Manager. In this case, the Splunk UI will ask for two parameters – a username and password.

Compile and run the Twitter.exe application with the –scheme argument and you will see the XML introspection scheme.

<?xml version="1.0" encoding="utf-16"?>
<scheme xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <title>Twitter</title>
  <description>Get data from Twitter</description>
  <use_external_validation>true</use_external_validation>
  <use_single_instance>false</use_single_instance>
  <endpoint>
    <args>
      <arg name="username">
        <title>Twitter ID/Handle</title>
        <description>Your Twitter ID.</description>
        <required_on_edit>false</required_on_edit>
        <required_on_create>false</required_on_create>
      </arg>
      <arg name="password">
        <title>Password</title>
        <description>Your Twitter password.</description>
        <required_on_edit>false</required_on_edit>
        <required_on_create>false</required_on_create>
      </arg>
    </args>
  </endpoint>
</scheme>

Compare this to the XML embedded in the Python version of the Twitter app and you will see that this version is more compliant with an XML document (something that isn’t required by Splunk), but it is otherwise identical.

Next week, we will move on the instantiation of the modular input and getting the parameters you have configured in inputs.conf parsed. Until then, you can follow my progress on github by pulling down my github repository at http://github.com/adrianhall/splunk-csharp-modinputs-sdk.

Learn More about PowerShell and Modular Inputs

$
0
0

For over five years, I have been working with co-host Jonathan Walz on the PowerScripting Podcast, a weekly Internet radio show. The primary topic of the show is the Windows PowerShell scripting language. We like to talk about news, tips, and resources related to the PowerShell community, but the biggest part of most shows is the interview. We’ve had a wide variety of guests on the show, ranging from prolific scripters who enjoy sharing their work, to PM’s, architects, and engineers from largest software and hardware vendors in the world, including Microsoft, IBM, Intel, NetApp, and more.

Recently, we caught up with Joel Bennett, a Windows PowerShell MVP awardee, who also happens to be my teammate on Splunk’s BD Labs team. Joel is the lead developer for the Splunk Add-on for Microsoft PowerShell, a modular input for Splunk 5 which enables you to easily and efficiently add data to Splunk. Please visit powerscripting.net to listen to the full episode or subscribe to the podcast feed.

Getting data from your REST APIs into Splunk

$
0
0

Overview

More and more products,services and platforms these days are exposing their data and functionality via RESTful APIs.

REST really has emerged over previous architectural approaches as the defacto standard for building and exposing web APIs to enable third partys to hook into your data and functionality. It is simple , lightweight , platform independent,language interoperable and re-uses HTTP constructs. All good gravy. And of course , Splunk has it’s own REST API also.

The Data Potential

I see a world of data out there available via REST that can be brought into Splunk, correlated and enriched against your existing data, or used for entirely new uses cases that you might conceive of once you see what is available and where your data might take you.

What type of data is available ? Well here is a very brief list that came to mind as I typed :

  • Twitter
  • Foursquare
  • LinkedIn
  • Facebook
  • Fitbit
  • Amazon
  • Yahoo
  • Reddit
  • YouTube
  • Flickr
  • Wikipedia
  • GNIP
  • Box
  • Okta
  • Datasift
  • Google APIs
  • Weather Services
  • Seismic monitoring
  • Publicly available socio-economic data
  • Traffic data
  • Stock monitoring
  • Security service providers
  • Proprietary systems and platforms
  • Other “data related” software products

The REST “dataverse” is vast , but I think you get the point.

Getting the Data

I am most interested in the “getting data in” part of the Splunk equation. As our esteemed Ninja once said , “Data First , Sexy Next”.

And I want to make it as easy, simple and intuitive as possible to allow you to hook Splunk into your REST endpoints, get that data , and starting writing searches.

Therefore building a generic Splunk Modular Input for polling data from any REST API is the perfect solution. One input to rule them all so to speak.

Building the REST Modular Input

From a development point of view it is actually quite a simple proposition for some pretty cool results.

For RESTful API’s we only need to be concerned about RESTful HTTP GET requests , this is the HTTP method that we will use for getting the data.

And by building the Modular Input in Python , I can take advantage of the Python Requests library , which simplifys most of the HTTP REST plumbing for me.

Using my Python Modular Inputs utility on Github , I can also rapidly build the Modular Input implementation.

You can check out the REST Modular Input implementation on Github

Using the REST Modular Input

Or if you want get straight into Splunking some REST data , make your way over to Splunkbase and download the latest release.

Installation is as simple as untarring the release to SPLUNK_HOME/etc/apps and restarting Splunk.

Configuration is via navigating to Manager->Data Inputs->REST

And then clicking on  “New” to create a new REST Input. As you can see below , I have already created several that I used for testing.

Configuring your new REST input is simply a matter of filling in the fields

Then search your data ! Many RESTful responses are in JSON format , which is very convenient for Splunk’s auto field extraction.

Key Features

  • Perform HTTP(s) GET requests to REST endpoints and output the responses to Splunk
  • Multiple authentication mechanisms
  • Add custom HTTP(s) Header properties
  • Add custom URL arguments
  • HTTP(s) Streaming Requests
  • HTTP(s) Proxy support
  • Response regex patterns to filter out responses
  • Configurable polling interval
  • Configurable timeouts
  • Configurable indexing of error codes

Authentication

The following authentication mechanisms are supported:

  • None
  • HTTP Basic
  • HTTP Digest
  • OAuth1
  • OAuth2 (with auto refresh of the access token)
  • Custom

Custom Authentication Handlers

You can provide your own custom Authentication Handler. This is a Python class that you should add to the
rest_ta/bin/authhandlers.py module.

You can then declare this class name and any parameters in the REST Input setup page.

Custom Response Handlers

You can provide your own custom Response Handler. This is a Python class that you should add to the
rest_ta/bin/responsehandlers.py module.

You can then declare this class name and any parameters in the REST Input setup page.

Command Modular Input

$
0
0

Simplifying the status quo

I’m often thinking about potential sources of data for Splunk and how to facilitate getting this data into Splunk in the simplest manner possible.

And what better source of data than existing programs on your operating system that already do the heavy lifting for you.

Now this is nothing new to Splunk , we’ve always been able to wrap up a program in a scripted input, execute it, transform the output and pipe it into Splunk.

But rather than going and creating many of these specific program wrappers for Splunk each time you need to capture a program’s output , why not create 1 single Modular Input that can be used as a generic wrapper for whatever program output you want to capture ?

Well , that’s just what I have done.The Command Modular Input is quite simply just a wrapper around whatever system programs that you want to periodically execute and capture the output from ie: (top, ps , iostat, sar ,vmstat, netstat , tcpdump, tshark etc…). It will work on all supported Splunk platforms.

Download and Install

Head on over to Splunkbase and download the Command Modular Input.

Untar to SPLUNK_HOME/etc/apps and restart Splunk

Setup

Login to Splunk and browse to Manager->Data Inputs

Setup a new command input

List command inputs you have setup

Search your command output

Custom Output Handlers

You may want to transform and process the raw command output before sending it to Splunk.So to facilitate this you can provide your own custom output handler.

This is a Python class that you should add to the command_ta/bin/outputhandlers.py module.

You can then declare this class name and any parameters in the Command setup page.

Streaming vs Non Streaming Command Output

Some commands will keep STD OUT open and stream results. An example of such a command might be tcpdump.

For these scenarios ensure you check the “streaming output” option on the setup page.

Making SNMP Simpler

$
0
0

Overview

From Wikipedia :

Simple Network Management Protocol (SNMP) is an “Internet-standard protocol for managing devices on IP networks”. Devices that typically support SNMP include routers, switches, servers, workstations, printers, modem racks, and more.

SNMP exposes management data in the form of variables on the managed systems.

The variables accessible via SNMP are organized in hierarchies. These hierarchies, and other metadata (such as type and description of the variable), are described by Management Information Bases (MIBs).

MIBs describe the structure of the management data of a device subsystem; they use a hierarchical namespace containing object identifiers (OID). Each OID identifies a variable that can be read or set via SNMP. MIBs use the notation defined by ASN.1.

SNMP agents can also send notifications , called Traps , to an SNMP trap listening daemon.

Splunking SNMP Data

SNMP represents an incredibly rich source of data that you can get into Splunk for visibility across a very diverse IT landscape.

For as long as I have been involved with Splunk , one of the most recurring requests on Splunkbase answers and in conversations has been ” how do I get my SNMP data into Splunk ? “.

And whilst there has always been a way , it has involved cobbling together a few different steps.

For polling SNMP variables this has typically involved writing a custom scripted input utilizing an existing program or library under the hood , such as snmpget or pysnmp.

And for capturing SNMP traps the approach has been to run a trap daemon such as snmptrapd on your Splunk server to capture the trap, dump it to a file and have Splunk monitor the file.

I think there is a much simpler way , a way that is more natively integrated into Splunk by implementing SNMP data collection in a Splunk Modular Input.

So myself and my colleague Scott Spencer set about doing just that.

SNMP Modular Input

The SNMP Modular Input allows you to configure your connections to your SNMP devices , poll attribute values and capture traps. It has no external dependencies , all of the functionality is built into the Modular Input and it will run on all supported Splunk platforms.

Features overview

  • Simple UI based configuration via Splunk Manager
  • Capture SNMP traps (Splunk becomes a SNMP trap daemon in its own right)
  • Poll SNMP object attributes
  • Declare objects to poll in textual or numeric format
  • Ships with a wide selection of standard industry MIBs
  • Add in your own Custom MIBs
  • Walk object trees using GET BULK
  • Optionally index bulk results as individual events in Splunk
  • Monitor 1 or more Objects per stanza
  • Create as many SNMP input stanzas as you require
  • IPv4 and IPv6 support
  • Indexes SNMP events in key=value semantic format
  • Ships with some additional custom field extractions

SNMP version support

SNMP V1 & V2c support are currently implemented. SNMP V3 is in the pipeline. So you don’t need to email me requesting this :)

Implementation

The Modular Input is implemented in Python and under the hood  pysnmp is used as the library upon which the Modular Input is written.

Getting started

Browse to Splunkbase and download the SNMP Modular Input

To install , you simply just untar it to SPLUNK_HOME/etc/apps and restart Splunk.

Configuration

Login to SplunkWeb and browse to Manager->Data Inputs->SNMP->New and setup your input stanza

View the SNMP inputs you have setup

Searching

You can then search over the SNMP data that gets indexed. In the example below, in addition to the SNMPv2-MIB,  I have also loaded in the Interface MIB (IF-MIB) to resolve the IF-MID OID names and values to their textual representation.

A note about MIBs

Many industry standard MIBs ship with the Modular Input.
You can see which MIBs are available by looking in SPLUNK_HOME/etc/apps/snmp_ta/bin/mibs/pysnmp_mibs-0.1.4-py2.7.egg

Any additional custom MIBs need to be converted into Python Modules.

You can simply do this by using the build-pysnmp-mib tool that is part of the pysnmp installation

build-pysnmp-mib -o SOME-CUSTOM-MIB.py SOME-CUSTOM-MIB.mib

Then “egg” up your python MIB modules and place them in SPLUNK_HOME/etc/apps/snmp_ta/bin/mibs

In the configuration screen for the SNMP input in Splunk Manager , there is a field called “MIB Names” (see above).
Here you can specify the MIB names you want applied to the SNMP input definition ie: IF-MIB,DNS-SERVER-MIB,BRIDGE-MIB
The MIB Name is the same as the name of the MIB python module in your egg package.

What’s next

Now it’s your turn…go and download the Modular Input, plug it in and Splunk some SNMP data . I’d love to hear your feedback about any way to make it better and even simpler.And as mentioned , SNMP Version 3 support is coming.

Developing Modular Inputs in C#: Part 2

$
0
0

I’m annoyed at our engineering team, but I’ll get over it. You see, just hours after I posted my first blog post on writing modular inputs in C#, the team up in Seattle released the latest edition of the C# SDK. Within that SDK is a bunch of class libraries that do a much better job than my work on the scaffolding needed to produce a modular input. I highly recommend you go over to their site and dig in to this. Within this blog post, I’m going to adjust my code to use the new scaffolding and take a look at actually running the modular input. Let’s start with the framework. Here is a starting recipe for a modular input:

using System;
using Splunk.ModularInputs;
using System.Collections.Generic;

namespace Splunk.Twitter
{
    internal class Twitter : Script
    {
        public override Scheme Scheme
        {
            get {
                throw new NotImplementedException();
            }
        }

        public static int Main(string[] args)
        {
            return Run(args);
        }

        public override void StreamEvents(InputDefinition inputDefinition)
        {
            throw new NotImplementedException();
        }
    }
}

As you can see, there isn’t much to it – we have a property that returns our Scheme. This is basically the same Scheme class that we used in part 1, but we implement it as a property now. We also need to implement a StreamEvents() method. This is the new method that is called to actually gather events. Let’s take a look at our new Scheme implementation:

        public override Scheme Scheme
        {
            get {
                return new Scheme
                {
                    Title = "Twitter",
                    Description = "Get data from twitter",
                    StreamingMode = StreamingMode.Simple,
                    Endpoint =
                    {
                        Arguments = new List {
                            new Argument {
                                Name = "username",
                                Title = "Twitter ID/Handle",
                                Description = "Your Twitter ID"
                            },
                            new Argument {
                                Name = "password",
                                Title = "Twitter Password",
                                Description = "Your Twitter Password"
                            },
                        }
                    }
                };
            }
        }

Notice that it’s pretty much the same as before – just formatted differently. I like this one better – I don’t have to parse command line arguments, serialize the XML data or understand that the Scheme is returned from a –scheme command. It just happens for me. Now, on to the meat of todays post – actually dealing with the data. I’m not going to tell you how to connect to Twitter and pull data – there are better blog posts than mine on this subject. However, let’s explore what happens when Splunk starts a modular input to receive data. Splunkd runs the modular input with no arguments, and feeds the modular input an XML document via stdin. This is captured by the Splunk C# framework, which turns it into an InputDefinition object and then calls StreamEvents(). Your StreamEvents() method should never end (unlike mine) and can access the parameters that the modular input was configured with. You will need a sample XML document to fully test this. Here is an example:

<?xml version="1.0" encoding="utf-8" ?>
<input>
  <server_host>DEN-IDX1</server_host>
  <server_uri>https://127.0.0.1:8089</server_uri>
  <session_key>123102983109283019283</session_key>
  <checkpoint_dir>C:\Program Files\SplunkUniversalForwarder\var\lib\splunk\modinputs\twitter</checkpoint_dir>
  <configuration>
    <stanza name="twitter://aaa">
      <param name="username">ahall</param>
      <param name="password">mypwd</param>
      <param name="disabled">0</param>
      <param name="index">default</param>
    </stanza>
  </configuration>
</input>

This is actually generated from the information you enter into the inputs.conf file or through the Manager. However, we need to hand-craft this when we are testing. My StreamEvents() method looks like this:

        public override void StreamEvents(InputDefinition inputDefinition)
        {
            Console.Out.WriteLine("# stanzas = " + inputDefinition.Stanzas.Count.ToString());
            foreach (string st in inputDefinition.Stanzas.Keys) {
                Console.Out.WriteLine(st + ":");
                Console.Out.WriteLine("\tUsername = " + inputDefinition.Stanzas[st].Parameters["username"]);
                Console.Out.WriteLine("\tPassword = " + inputDefinition.Stanzas[st].Parameters["password"]);
            }
            throw new NotImplementedException();
        }

I’m still throwing the NotImplementedException(), but first I’m printing some of the data we got from the input definition. Now you can use this to configure your modular input and start gathering data. From PowerShell, I can run this with the following command:

Get-Content MyXMLFile.xml | .\Twitter.exe

There are some great examples of modular inputs out there, including modular inputs for PowerShell execution and SNMP. Modular Inputs are a powerful method of gathering hard-to-get data, and I encourage you to explore your systems like they’ve never been explored before.


The Splunk SDK for Python gets modular input support

$
0
0

Support for modular inputs in Splunk 5.0 and later enables you to add new types of inputs to Splunk that are treated as native Splunk inputs.

Last week Jon announced updates to the Splunk SDKs for Java, Python, and JavaScript, now we’ll take a deep dive into modular input support for the Splunk SDK for Python.

The latest release of the Splunk SDK for Python brings modular input support. The Splunk SDKs for C# (see Developing Modular Inputs in C#) and Java also have this functionality as of version 1.0.0.0 and 1.2, respectively. The Splunk SDK for Python enables you to use Python to create new modular inputs for Splunk.

Getting started

The Splunk SDK for Python comes with two example modular input apps: random numbers and Github forks. You can get the Splunk SDK for Python on dev.splunk.com. Once you have the Splunk SDK for Python, you can build the .spl files for these examples and install them via the app manager in Splunkweb. Do this by running the following commands: python setup.py build
python setup.py dist
in the root level of the SDK, the .spl files will be in the build directory.

Now I’ll walk you through the random numbers example.

Random numbers example

The random numbers example app will generate Splunk events containing a random number between the two specified values. Let’s get into the steps for creating this modular input.

Inherit from the Script class

As with all modular inputs, we should inherit from the abstract base class Script from splunklib.modularinput.script from the Splunk SDK for Python. They must override the get_scheme and stream_events functions, and, if the scheme returned by get_scheme has Scheme.use_external_validation set to True, the validate_input function.

Below, I’ve created a MyScript class in a new file called random_numbers.py which inherits from Script, and added the imports that will be used by the functions we will override.

import random, sys
from splunklib.modularinput import *
try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET

class MyScript(Script):
    # TODO: fill in this class
Override get_scheme

Now that we have a class set up, let’s override the get_scheme function from the Script class. We need to create a Scheme object, add some arguments, and return the Scheme object.

    def get_scheme(self):
        scheme = Scheme("Random Numbers")

        scheme.description = "Streams events containing a random number."
        # If you set external validation to True, without overriding
        # validate_input, the script will accept anything as valid.
        # Generally you only need external validation if there are
        # relationships you must maintain among the parameters,
        # such as requiring min to be less than max in this
        # example, or you need to check that some resource is
        # reachable or valid. Otherwise, Splunk lets you
        # specify a validation string for each argument
        # and will run validation internally using that string.
        scheme.use_external_validation = True
        scheme.use_single_instance = True

        min_argument = Argument("min")
        min_argument.data_type = Argument.data_type_number
        min_argument.description = "Minimum random number to be produced by this input."
        min_argument.required_on_create = True
        # If you are not using external validation, add something like:
        #
        # setValidation("min > 0")
        scheme.add_argument(min_argument)

        max_argument = Argument("max")
        max_argument.data_type = Argument.data_type_number
        max_argument.description = "Maximum random number to be produced by this input."
        max_argument.required_on_create = True
        scheme.add_argument(max_argument)

        return scheme
Optional: Override validate_input

Since we set scheme.use_external_validation to True in our get_scheme function, we need to specify some validation for our modular input in the validate_input function.

This is one of the great features of modular inputs, you’re able to validate data before it gets into Splunk.

In this example, we are using external validation to verify that min is less than max. If validate_input does not raise an exception, the input is assumed to be valid. Otherwise it prints the exception as an error message when telling splunkd that the configuration is invalid.

   def validate_input(self, validation_definition):
        # Get the parameters from the ValidationDefinition object,
        # then typecast the values as floats
        minimum = float(validation_definition.parameters["min"])
        maximum = float(validation_definition.parameters["max"])

        if minimum >= maximum:
            raise ValueError("min must be less than max; found min=%f, max=%f" % minimum, maximum)
Override stream_events

The stream_events function handles all the action: Splunk calls this modular input without arguments, streams XML describing the inputs to stdin, and waits for XML on stdout describing events.

    def stream_events(self, inputs, ew):
        # Go through each input for this modular input
        for input_name, input_item in inputs.inputs.iteritems():
            # Get the values, cast them as floats
            minimum = float(input_item["min"])
            maximum = float(input_item["max"])

            # Create an Event object, and set its data fields
            event = Event()
            event.stanza = input_name
            event.data = "number=\"%s\"" % str(random.uniform(minimum, maximum))

            # Tell the EventWriter to write this event
            ew.write_event(event)
Bringing it all together

Let’s bring all the functions together for our complete MyScript class. In addition, we need to add these 2 lines at the end of random_numbers.py to actually run the modular input script:

if __name__ == "__main__":
    sys.exit(MyScript().run(sys.argv))

Here is the complete random_numbers.py:

import random, sys

from splunklib.modularinput import *

try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET

class MyScript(Script):
    def get_scheme(self):
        scheme = Scheme("Random Numbers")

        scheme.description = "Streams events containing a random number."
        # If you set external validation to True, without overriding
        # validate_input, the script will accept anything as valid.
        # Generally you only need external validation if there are
        # relationships you must maintain among the parameters,
        # such as requiring min to be less than max in this
        # example, or you need to check that some resource is
        # reachable or valid. Otherwise, Splunk lets you
        # specify a validation string for each argument
        # and will run validation internally using that string.
        scheme.use_external_validation = True
        scheme.use_single_instance = True

        min_argument = Argument("min")
        min_argument.data_type = Argument.data_type_number
        min_argument.description = "Minimum random number to be produced by this input."
        min_argument.required_on_create = True
        # If you are not using external validation, add something like:
        #
        # setValidation("min > 0")
        scheme.add_argument(min_argument)

        max_argument = Argument("max")
        max_argument.data_type = Argument.data_type_number
        max_argument.description = "Maximum random number to be produced by this input."
        max_argument.required_on_create = True
        scheme.add_argument(max_argument)

        return scheme

    def validate_input(self, validation_definition):
        # Get the parameters from the ValidationDefinition object,
        # then typecast the values as floats
        minimum = float(validation_definition.parameters["min"])
        maximum = float(validation_definition.parameters["max"])

        if minimum >= maximum:
            raise ValueError("min must be less than max; found min=%f, max=%f" % minimum, maximum)

    def stream_events(self, inputs, ew):
        # Go through each input for this modular input
        for input_name, input_item in inputs.inputs.iteritems():
            # Get the values, cast them as floats
            minimum = float(input_item["min"])
            maximum = float(input_item["max"])

            # Create an Event object, and set its data fields
            event = Event()
            event.stanza = input_name
            event.data = "number=\"%s\"" % str(random.uniform(minimum, maximum))

            # Tell the EventWriter to write this event
            ew.write_event(event)

if __name__ == "__main__":
    sys.exit(MyScript().run(sys.argv))
Optional: set up logging

It’s best practice for your modular input script to log diagnostic data to splunkd.log. Use an EventWriter‘s log method to write log messages, which include both a standard splunkd.log level (such as DEBUG or ERROR) and a descriptive message.

Add the modular input to Splunk

We’ve got our script ready, now let’s prepare to add this modular input to Splunk.

Package the script and the SDK library

To add a modular input that you’ve created in Python to Splunk, you’ll need to first add the script as a Splunk app.

  1. Create a directory that corresponds to the name of your modular input script—for instance, random_numbers—in a location such as your Documents directory. (You’ll copy the directory over to your Splunk directory at the end of this process.)
  2. In the directory you just created, create the following three empty directories:
    • bin
    • default
    • README
  3. From the root level of the Splunk SDK for Python, copy the splunklib directory into the bin directory you just created.
  4. Copy the modular input Python script (for instance, random_numbers.py) into the bin directory. Your app directory structure should now look like the following:
.../
  bin/
    app_name.py
    splunklib/
      __init__.py
      ...
  default/
  README/
Create an app.conf file

Within the default directory, create a file called app.conf. This file is used to maintain the state of an app or customize certain aspects of it in Splunk. The contents of the app.conf file can be very simple:

[install]
is_configured = 0

[ui]
is_visible = 1
label = My App

[launcher]
author = Splunk Inc
description = My app is awesome.
version = 1.0

For more examples of what to put in the app.conf file, see the corresponding files in the modular inputs examples.

Create an inputs.conf.spec file

You need to define the configuration for your modular input by creating an inputs.conf.spec file manually. See Create a modular input spec file in the main Splunk documentation for instructions, or take a look at the SDK samples’ inputs.conf.spec file, which is in the application’s README directory. For instance, the following is the contents of the random numbers example’s inputs.conf.spec file:

[random_numbers://<name>]
*Generates events containing a random floating point number.

min = <value>
max = <value>
Move the modular input script into your Splunk install

Your directory structure should look something like this:

.../
  bin/
    app_name.py
    splunklib/
      __init__.py
      ...
  default/
    app.conf
  README/
    inputs.conf.spec

The final step to install the modular input is to copy the app directory to the following path: $SPLUNK_HOME$/etc/apps/

Restart Splunk, and on the App menu, click Manage apps. If you wrote your modular input script correctly, the name of the modular input—for instance, Random Numbers—will appear here. If not, go back and double-check your script. You can do this by running python random_numbers.py --scheme and python random_numbers.py --validate-arguments from the bin directory of your modular input. These commands will verify that your scheme and arguments are configured correctly, these commands will also catch any indenting issues which could cause errors.

If your modular input appears in the list of apps, in Splunk Manager (or, in Splunk 6.0 or later, the Settings menu), under Data, click Data inputs. Your modular input will also be listed here. Click Add new, fill in any settings your modular input requires, and click Save.

Congratulations, you’ve now configured an instance of your modular input as a Splunk input!

Splunking the World Cup 2014: Real Time Match Analysis

$
0
0

splunk-blog-world-cup-stadium-chart

As an Englishman I’ve been waiting months – with very high expectations – for the World Cup to come around. Reading fellow Splunker, Matt Davies’ blog post titled, “Splunking World Cup 2014. The winner will be…“, only heightened my excitement.

The tournament is now going into the second week and I’ve been starting to look at the teams, players, and tournament more closely. Which stadium holds the most people? Who’s the top scorer? Which referee hands out the most cards?

With these questions fresh in my mind I opened up Splunk and began to have a look at the huge amounts of information being streamed from the tournament. For this post I’m going to explore real-time match updates; including teams, scores, and match locations.

Prerequisites

Step 1: Choose the Data Sources You Want to Splunk

World Cup Match JSON Feed

There are lots of potential sources to grab World Cup data – from match reports to fan Twitter feeds. Software For Good have created a bunch of endpoints offering both match and team information.

For this project we’ll use their live match endpoint.

Step 2: Install the REST API Modular Input in Splunk

splunk-blog-dallimore-rest-ta

To get this feed into Splunk we’ll use Damien Dallimore’s REST API Modular input for Splunk. You can download the app here with full instructions on how to install it.

Step 3: Configure your RESTful Input

splunk-blog-rest-input-config

In Splunk navigate to “Settings > Data Inputs > REST”, and select “Add new”.

Configuration options:

REST API Input Name: WorldCupMatchData (optional)
Endpoint URL: http://worldcup.sfg.io/matches
Response Type: JSON
Set Sourcetype: Manual
Sourcetype: _json
Host: SFG (optional)

You will see we set the “Response Type” to JSON as the feed being returned is in JSON format. It is also important to explicitly set the “Sourcetype” to “_json” too. This ensures Splunk parses the JSON events correctly at search. If your search returns grouped events, you’ve probably forgot to set this.

Note, I have only included the fields that are essential to configure (unless stated). Everything else can be left blank or as default (unless you need to enter in a proxy to get out to the internet, etc).

Step 4: Lets Play

splunk-blog-search-world-cup-data

Note, this data source also contains future match data. If you’re not interested in this information just specify “NOT status=”future” in your search string.

Where have most matches been played so far? (Maracanã – Estádio Jornalista Mário Filho – 7 / Estadio Nacional – 7)

host="SFGFeed" NOT status="future" | top location

How many goals have been scored? (49)

host="SFGFeed" NOT status="future" | stats sum(away_team.goals) AS TotalAwayGoals sum(home_team.goals) AS TotalHomeGoals | eval TotalGoals = TotalAwayGoals + TotalHomeGoals | fields TotalGoals

Average goals per game? (~3)

host="SFGFeed" NOT status="future" | stats sum(away_team.goals) AS TotalAwayGoals sum(home_team.goals) AS TotalHomeGoals dc(match_number) AS TotalMatches | eval TotalGoals = TotalAwayGoals + TotalHomeGoals | eval GoalsPerGame = TotalGoals / TotalMatches

What stadium were the most goals been scored in during the first matches? (Arena Fonte Nova)

host="SFGFeed" NOT status="future" match_number>=1 match_number<=16 | stats sum(away_team.goals) AS TotalAwayGoals sum(home_team.goals) AS TotalHomeGoals dc(match_number) AS TotalMatches by location | eval TotalGoals = TotalAwayGoals + TotalHomeGoals | sort - TotalGoals

Which teams won their opening games? (USA, Switzerland, Netherlands, Mexico,, Ivory Coast, Italy, Germany, France, Costa Rica, Colombia)

host="SFGFeed" NOT status="future" | where winner!="Draw" | top winner | fields - percent

(Note, the numbers will be out of date by the time you read this! Maybe England have won!)

Step 5: Extra Time

splunk-blog-world-cup-winners

I’ve only started to scratch the surface here. Remember this data source is streaming information in real-time into Splunk as matches are being played. Why not get Splunk up on a second big screen whilst your watching the game to analyse the stats (too much)?

Correlating the data from the Software for Good endpoints with other sources may also prove interesting. Does the number of goals scored during the game have any correlation to the heat? Or distance travelled by teams before the match; how does this impact the final score?

Now I do believe there’s a soccer football match on…

Splunking Social Media: Tracking Tweets

$
0
0

splunk-blog-twitter-dashboard

So you use Twitter and have heard Splunk can do “Big Data”. By tapping into Twitter’s API you can use Splunk to investigate the stream of tweets being generated across the globe.

The great thing about using Splunk to do this is that you have complete control of the data meaning it’s incredibly flexible as to what you can build. A few basic ideas I’ve had include tracking hashtags, following specific influencers, or tracking tweets by location in real-time.

What’s more, it takes a matter of minutes before you can start analysing the wealth of data being generated. This post will show you how.

Prerequisites

Step 1: Create a Twitter App

splunk-blog-twitter-create-app

Go to: “dev.twitter.com” > “Sign in / up” > select “Create App”.

It doesn’t really matter what name you enter when creating the app (especially if it’s not going to be public) although I’d recommend using something you can remember. Same goes for description and website.

The callback field can be left blank. I won’t go into why or when this should be used in this post.

Step 2: Generate API Keys

splunk-blog-twitter-create-api-keys

Once your app has been created click the “API Keys” tab. You should see “Your access token” with a button “Create my access token”. Press this button.

You should now see your API keys. Don’t worry about noting them down, we can come back to this page at anytime. You will want to keep them secret though (the app above will be deleted by the time you read this!).

Step 3: Install the REST API Modular Input in Splunk

splunk-blog-dallimore-rest-ta

To get this feed into Splunk we’ll use Damien Dallimore’s REST API Modular input for Splunk. You can download the app here with full instructions on how to install it.

Step 4: Configure Your RESTful Twitter Input

splunk-blog-twitter-rest-input

We’re on the home straight now! Now we just need to give Splunk the credentials to tap into the Twitter API.

In Splunk navigate to “Settings > Data Inputs > REST”, and select “Add new”.

4.1 OAuth settings:

REST API Input Name = TwitterFeed (optional)
Endpoint URL = https://stream.twitter.com/1.1/statuses/filter.json
HTTP Method: GET
Authentication Type = oauth1
OAUTH1 Client (Consumer) Key = <YOUR_CLIENT_KEY>
OAUTH1 Client (Consumer) Secret = <YOUR_CLIENT_SECRET>
OAUTH1 Access Token = <YOUR_ACCESS_KEY>
OAUTH1 Access Token Secret = <YOUR_ACCESS_SECRET>

If you need to retrieve your OAUTH keys created in Step 2 go to: “dev.twitter.com” > “My apps” > “[Your App]” > “API Keys” > “Test OAuth”.

Note, we will be using version 1.1 of Twitter’s API which imposes rate limitations on its endpoints. If you’re only collecting a small number of tweets every 15 minutes this shouldn’t be a problem. If you’re planning on polling thousands you should probably read this first.

4.2 Argument settings:

At this point you should read the Twitter API docs if you are unfamiliar with the arguments that can be passed.

Example 1

URL Arguments: track=#worldcup^stall_warnings=true

Here I am using the ‘track’ streaming API parameter. In this case, I am polling tweets that contain the hashtag #worldcup. Note, that if you want to track multiple keywords, these are separated by a comma. However, the REST API configuration screen expects a comma delimeter between key=value pairs. Notice that I have used a “^” delimiter instead, as I need to use commas for my track values.

Example 2

URL Arguments: follow=21756213^stall_warnings=true

Now I am collecting Tweets using the “follow” streaming API parameter for the account @himynamesdave (that’s me). Note, that when using the follow parameter you must use the users ID, not username. If you’re unsure how to find a user ID, this site will help you.

4.3 Response settings:

Response Type = json
Streaming Request = True
Request Timeout = 86400 (optional)
Delimeter: ^ (or whatever delimiter you used in the URL arguments field)
Set Sourcetype: Manual
Sourcetype: tweets (optional)

Note, for steps 4.1 – 4.3, I have only included the fields that are essential to configure (unless stated). Everything else can be left blank or as default (unless you need to enter in a proxy to get out to the internet, etc).

4.4 inputs.conf:

For reference, your new REST input configuration can also be found in: “<SPLUNK_HOME/etc/apps/launcher/local/inputs.conf”.

Step 5: Check Your Input is Working

splunk-blog-twitter-search

Using a Splunk search will allow you to check your data is being received and indexed:

sourcetype="tweets"

Note, you will only start to see Tweets after your Input polls a new Twitter event (we will not be able to pull Tweets historically).

See the latest tweet:

sourcetype="tweets" | fields text | head 1

Look at Tweet volume over time:

sourcetype="tweets" | timechart count(_raw)

Or count the number of retweets:

sourcetype="tweets" | stats count(retweet_count)

… you get the idea.

Start polling other accounts or searches to build up a bigger picture of what’s happening by repeating the steps above.

Step 6: Enrich Your Tweets

splunk-blog-twitter-sentiment

Why not start by analysing the sentiment of your Tweets? Splunker, David Carasso, has built a Sentiment App for Splunk that will help you to do this.

Alternatively use the REST API Modular Input to bring other social media sources into Splunk. Foursquare, Facebook and LinkedIn are just a few others that spring to mind.

Let me know what mashups you dream up (and build!).

Splunking web-pages

$
0
0

Have you ever had a situation where you found information on a webpage that you wanted to get into Splunk? I recently did and I wrote a free Splunk app called Website Input that makes it easy for everyone to extract information from web-pages and get it into a Splunk instance.

The Problem

There are many cases where web-pages include data that would be useful in Splunk but there is no API to get it. In my case, I needed to diagnose some networking problems that I suspected was related to my DSL connection. My modem has lots of details about the state of the connection but only within the web interface. It supports a syslog feed but it doesn’t include most of these syslog messages. Thus, to get this information, I need to get it directly from the web interface.

Some other use cases might be:

  • Integrity analysis of a website (so that you could alert if something goes wrong or if the site is defaced)
  • Identify errors on pages (like PHP warnings)
  • Retrieve contextual information that would help you understand the relevance of events in Splunk (like correlating failures with weather conditions)

The Solution

I wrote an app that includes a modular input for getting data from web-pages. Basically, you tell the app what page you want to monitor and what data to get out of the page. It will retrieve the requested data so that it can be searched and reported in Splunk. You identify the data you want to obtain using a CSS selector. The app will then get all of the text from under the nodes matching the selector.

Getting the Data into Splunk

Getting the web-page data into Splunk is fairly easy once you know the URL and the CSS selector that you want to use. You can get the data into Splunk in four steps.

Step 1: identify the URL

You’ll need to identify the URL of the page containing the data. In my case, I wanted to get data from my DSL modem and the URL containing the data was at http://192.168.1.1/statsadsl.html:

adsl_details

Step 2: identify the data

After identifying the URL, you’ll next need to make a selector that matches the data you want to obtain. If you don’t know how to use CSS selectors, Google “jQuery selector” or “CSS selector”. Here are a couple of good places to start:

The selector indicates what parts of the page the app should import into Splunk. For each element the selector matches, the app will get the text from the matching node and the child-nodes. Consider the following example. Assume we are attempting to get information from a page containing the following HTML table:

<table>
	<tr>
		<td></td>
		<td>Downstream</td>
		<td>Upstream</td>
	</tr>
	<tr>
		<td>Rate:</td>
		<td>3008</td>
		<td>512</td>
	</tr>
	<tr>
		<td>Attainable Rate:</td>
		<td>5600</td>
		<td>1224</td>
	</tr>
</table>

The table would look something like this:

Downstream Upstream
Rate: 3008 512
Attainable Rate: 5600 1224

If I enter a selector of “table”, then the app will match once on the entire table and produce a single value for the match field like this:

1 Downstream Upstream Rate: 3008 512 Attainable Rate: 5600 1224

This could easily by parsed in Splunk but it would be easier to parse if the results were broken up a bit more. You can do this by changing the selector to make multiple matches. If I use a selector of “td”, then I will get one value per td node (per each cell):

1 Downstream
2 Upstream
3 Rate:
4 3008
5 512
6 Attainable Rate:
7 5600
8 1224

Note that the app will make a single field (called “match”) with values for each match. Empty strings will be ignored.

Matching “td” works ok, but I think I would like the field values near the description. Thus, I would prefer to use a “tr” selector which will make a value for each row. That would yield:

1 Downstream Upstream
2 Rate: 3008 512
3 Attainable Rate: 5600 1224

This will be very easy to parse in Splunk. Once you get the selector and URL, you will be ready to make the input.

Step 3: make the input

Make sure you have the Website Input app installed. Once you do, you can make a new input by going in the Splunk manager page for Data Inputs and selecting “Web-pages”:

inputs

Click “Add new” to make a new instance:

new_input

The configuration is straightforward once you know what page you are looking and what selector you want to use. In my case, I needed to authenticate to my DSL modem so I needed to provide credentials as well. Also, you will likely want to set the sourcetype manually, especially if you want to apply props and transforms to the data. Otherwise, the data will default to the sourcetype “web_input”. Below is my completed input which grabs the data every minute and assigns it the sourcetype of adsl_modem:

completed_input

Once the input is made, you should see the data in Splunk by running a search. In my case, I searched for “sourcetype=adsl_modem”:

data

The data is present in Splunk and is searchable, but it isn’t parsed. That leads to the last step.

Step 4: parsing

Finally, you will likely want to create props and transforms to extract the relevant data into fields that you could include on dashboards. I want to get the value for “Super frame errors” since I have determined it indicates when my DSL connection is having problems.

I can use rex in a search to parse out the information. The following extracts the fields “super_frame_errors_downstream” and “super_frame_errors_upstream”:

sourcetype=adsl_modem | head 5| rex field=_raw "Super Frame Errors: (?<super_frame_errors_downstream>\d*) (?<super_frame_errors_upstream>\d*)"

This gets me the information that I wanted in the appropriate fields:

results_rex_parsed

You may want to have the extractions done in props/transforms so that you don’t have to add rex to every search that needs the data parsed. In my case, I did this by adding the following to props.conf:

[adsl_modem]
EXTRACT-super-frame-errors = "Super Frame Errors: (?<super_frame_errors_downstream>\d*) (?<super_frame_errors_upstream>\d*)"

With the data extracted, I could make a chart to illustrate the errors over time:

chart

Getting the app

If you want to use the app, go the apps.Splunk.com and download it (its free). If you need help, ask a questions on Answers.splunk.com.

Limitations

The app currently only supports HTTP authentication which means you cannot use it to capture data from web-pages that require you to authenticate via a web-form (might be supported in a later version). Also, you need to be careful pulling data from others’ websites without approval. Some websites have terms of use that disallow web-scraping.

New support for authoring modular inputs in Node.js

$
0
0

Modular inputs allow you to teach Splunk Enterprise new ways to pull in events from internal systems, third party APIs or even devices. Modular Inputs extend Splunk Enterprise and are deployed on the Splunk Enterprise instance or on a forwarder.  In version 1.4.0 of the Splunk SDK for JavaScript we added support for creating modular inputs in Node.js!

In this post, I’ll show you how to create a modular input with Node.js that pulls commit data from GitHub into Splunk.

Why Node.js

Node.js is designed for I/O intensive workloads. It offers great support for streaming data into and out of a Node application in an asynchronous manner. It also has great support for JSON out of the box. Finally, Node.js has a huge ecosystem of packages available via npm that are at your disposal. An input pulls data from a source and then streams those results directly into a Splunk instance. This makes modular inputs a great fit for Node.js.

Getting started

You can get the Splunk SDK for JavaScript from npm (npm install splunk-sdk), the Splunk Developer Portal or by grabbing the source from our GitHub repo. You can find out more about the SDK here. The SDK includes two sample modular inputs, random numbers, and GitHub commits. For the remainder of this post we’ll look at the GitHub example.

This input indexes all commits on the master branch of a GitHub repository using GitHub’s API. This example illustrates how to pull in data from an external source, as well as showing how to create checkpoints when you are periodically polling in order to prevent duplicate events from getting created.

Prerequisites

Installing the example

  1. Set the $SPLUNK_HOME environment variable to the root directory of your Splunk Enterprise instance.
  2. Copy the GitHub example from
    /splunk-sdk-javascript/examples/modularinputs/github_commits

    to

    $SPLUNK_HOME/etc/apps
  3. Open a command prompt or terminal window and go to the following directory:
    $SPLUNK_HOME/etc/apps/github_commits/bin/app
  4. Then type npm install, this will install the Node modules which are required, which includes the splunk-sdk itself and the github module.
  5. Restart Splunk Enterprise by typing the following into the command line:
    $SPLUNK_HOME/bin/splunk restart

Configuring the GitHub commits modular input example

Modular Inputs integrate with Splunk Enterprise, allowing Splunk Administrators to create new instances and provide necessary configuration right in the UI similar to other inputs in Splunk. To see this in action, follow these steps:

  1. From Splunk Home, click the Settings menu. Under Data, click Data inputs, and find “GitHub commits”, the input you just added. Click Add new on that row. splunk inputs
  2. Click Add new and fill in:
    • name (whatever name you want to give this input)
    • owner (the owner of the GitHub repository, this is a GitHub username or org name)
    • repository (the name of the GitHub repository)
    • (optional) token if using a private repository and/or to avoid GitHub’s API limits

    splunk add inputTo get a GitHub API token visit the GitHub settings page and make sure the repo and public_repo scopes are selected. github token

  3. Save your input, and navigate back to Splunk Home.
  4. Do a search for sourcetype=github_commits and you should see some events indexed; if your repository has a large number of commits indexing them may take a few moments.splunk search

Analyzing GitHub commit data

Now that your GitHub repository’s commit data has been indexed by Splunk Enterprise, you can leverage the power of Splunk’s Search Processing Language to do interesting things with your data. Below are some example searches you can run:

  • Want to know who the top contributors are for this repository? Run this search:
    sourcetype="github_commits" source="github_commits://[your input name]" | stats count by author | sort count DESC
    

    JS-SDK-contributer-table

  • Want to see a graph of the repository’s commits over time? Run this search:
    sourcetype="github_commits" source="github_commits://[your input name]" | timechart count(sha) as "Number of commits"

    Then click the Vizualization tab, and select line from the drop down for visualization types (pie may be already selected).splunk viz 1splunk viz 2Splunk viz 3

Write your own modular input with the Splunk SDK for JavaScript

Adding a modular input to Splunk Enterprise is a two-step process: First, write a modular input script, and then package the script with several accompanying files and install it as a Splunk app.

Writing a modular input

A modular input will:

  1. Return an introspection scheme. The introspection scheme defines the behavior and endpoints of the script.  When Splunk Enterprise starts, it runs the input to determine the modular input’s behavior and configuration.
  2. Validate the script’s configuration (optional). Whenever a user creates or edits an input, Splunk Enterprise can call the input to validate the configuration.
  3. Stream events into Splunk. The input streams event data that can be indexed by Splunk Enterprise. Splunk Enterprise invokes the input and waits for it to stream events.

To create a modular input in Node.js, first require the splunk-sdk Node module. In our examples, we’ve also assigned the classes we’ll be using to variables, for convenience. At the very least, we recommend defining a ModularInputs variable as shown here:

var splunkjs        = require("splunk-sdk");
var ModularInputs   = splunkjs.ModularInputs;

The preceding three steps are accomplished as follows using the Splunk SDK for JavaScript:

  1. Return the introspection scheme: Define the getScheme method on the exports object.
  2. Validate the script’s configuration (optional): Define the validateInput method on the exports object. This is required if you set the scheme returned by getScheme to use external validation (that is, set Scheme.useExternalValidation to true).
  3. Stream events into Splunk: Define the streamEvents method on the exports object.

In addition, you must run the script by calling the ModularInputs.execute method, passing in the exports object you just configured along with the module object which contains the state of this script:

ModularInputs.execute(exports, module);

To see the full GitHub commits input source code, see here.

Woah. Let’s take a deeper dive into the code so we can understand what’s really going on.

The getScheme method

When Splunk Enterprise starts, it looks for all the modular inputs defined by its configuration, and tries to run them with the argument –scheme. The scheme allows your input to tell Splunk arguments that need to be provided for the input, these arguments are then used for populating the UI when a user creates an instance of an input. Splunk expects each modular input to print a description of itself in XML to stdout. The SDK’s modular input framework takes care of all the details of formatting the XML and printing it. You only need to implement a getScheme method to return a new Scheme object, this makes your job much easier!

As mentioned earlier, we will be adding all methods to the exports object.

Let’s begin by defining getScheme, creating a new Scheme object, and setting its description:

exports.getScheme = function() {
        var scheme = new Scheme("GitHub Commits"); 
        scheme.description = "Streams events of commits in the specified GitHub repository (must be public, unless setting a token).";

For this scheme, the modular input will show up as “GitHub Commits” in Splunk.

Next, specify whether you want to use external validation or not by setting the useExternalValidation property (the default is true). If you set external validation to true without implementing the validateInput method on the exports object, the script will accept anything as valid. We want to make sure the GitHub repository exists, so we’ll define validateInput once we finish with getScheme.

       scheme.useExternalValidation = true;

If you set useSingleInstance to true (the default is false), Splunk will launch a single process executing the script which will handle all instances of the modular input. You are then responsible for implementing the proper handling for all instances within the script. Setting useSingleInstance to false will allow us to set an optional interval parameter in seconds or as a cron schedule(available under more settings when creating an input).

      scheme.useSingleInstance = false;

The GitHub commits example has 3 required arguments (name, owner, repository), and one optional argument (token). Let’s recap what these are for:

  • name: The name of this modular input definition (ex: Splunk SDK for JavaScript)
  • owner: The GitHub organization or user that owns the repository (ex: splunk)
  • repository: The GitHub repository (ex: splunk-sdk-javascript), don’t forget to set the token argument if the repository is private
  • token: A GitHub access token with at least the repo and public_repo scopes enabled. To get an access token, see the steps outlined earlier in this post.

Now let’s see how these arguments are defined within the Scheme. We need to set the args property of the Scheme object we just created to an array of Argument objects:

      scheme.args = [
            new Argument({
                name: "owner",
                dataType: Argument.dataTypeString,
                description: "GitHub user or organization that created the repository.",
                requiredOnCreate: true,
                requiredOnEdit: false
            }),
            new Argument({
                name: "repository",
                dataType: Argument.dataTypeString,
                description: "Name of a public GitHub repository, owned by the specified owner.",
                requiredOnCreate: true,
                requiredOnEdit: false
            }),
            new Argument({
                name: "token",
                dataType: Argument.dataTypeString,
                description: "(Optional) A GitHub API access token. Required for private repositories (the token must have the 'repo' and 'public_repo' scopes enabled). Recommended to avoid GitHub's API limit, especially if setting an interval.",
                requiredOnCreate: false,
                requiredOnEdit: false
            })
        ];

Each Argument constructor, takes a parameter of a JavaScript object with the required property name and the optional properties:

  • dataType: What kind of data is this argument? (Argument.dataTypeBooleanArgument.dataTypeNumber, or Argument.dataTypeString)
  • description: A description for the user entering this argument (string)
  • requiredOnCreate: Is this a required argument? (boolean)
  • requiredOnEdit: Does a new value need to be specified when editing this input? (boolean)

After adding arguments to the scheme, return the scheme and we close the function:

        return scheme;
    };

The validateInput method

The validateInput method is where the configuration of an input is validated, and is only needed if you’ve set your modular input to use external validation. If validateInput does not call the done callback with an error argument, the input is assumed to be valid. Otherwise it throws an error when it tells Splunk that the configuration is not valid.

When you use external validation, after splunkd calls the modular input with the –scheme argument to get the scheme, it calls it again with the –validate-arguments argument for each instance of the modular inputs in its configuration files, feeding XML on stdin to the modular input to validate all enabled inputs. Splunk calls the modular input the same way again whenever the modular input’s configuration is changed.

In our GitHub Commits example, we’re using external validation since we want to make sure the repository is valid. Our validateInput method contains logic used the GitHub API to check that there is at least one commit on the master branch of the specified repository:

    exports.validateInput = function(definition, done) { 
        var owner = definition.parameters.owner;
        var repository = definition.parameters.repository;
        var token = definition.parameters.token;

        var GitHub = new GitHubAPI({version: "3.0.0"});

        try {
            if (token && token.length > 0) {
                GitHub.authenticate({
                    type: "oauth",
                    token: token
                });
            }

            GitHub.repos.getCommits({
                headers: {"User-Agent": SDK_UA_STRING},
                user: owner,
                repo: repository,
                per_page: 1,
                page: 1
            }, function (err, res) {
                if (err) {
                    done(err);
                }
                else {
                    if (res.message) {
                        done(new Error(res.message));
                    }
                    else if (res.length === 1 && res[0].hasOwnProperty("sha")) {
                        done();
                    }
                    else {
                        done(new Error("Expected only the latest commit, instead found " + res.length + " commits."));
                    }
                }
            });
        }
        catch (e) {
            done(e);
        }
    };

The streamEvents method

Here’s the best and most important part, streaming events!

The streamEvents method is where the event streaming happens. Events are streamed into stdout using an InputDefinition object as input that determines what events are streamed. In the case of the GitHub commits example, for each input, the arguments are retrieved before connecting to the GitHub API. Then, we go through each commit in the repository on the master branch.

Creating Events and Checkpointing

For each commit, we’ll check to see if we’ve already indexed it by looking in a checkpoint file. This is a file that Splunk allows us to create in order to track which data has been already processed so that we can prevent duplicates. If we have indexed the commit, we simply move on – we don’t want to have duplicate commit data in Splunk. If we haven’t indexed the commit we’ll create an Event object, set its properties, write the event using the EventWriter, then append the unique SHA for the commit to the checkpoint file. We will create a new checkpoint file for each input (in this case, each repository).

The getDisplayDate function, is used to transform the date we get back from the GitHub API into something more readable format.

exports.streamEvents = function(name, singleInput, eventWriter, done) {
        // Get the checkpoint directory out of the modular input's metadata.
        var checkpointDir = this._inputDefinition.metadata["checkpoint_dir"];

        var owner = singleInput.owner;
        var repository = singleInput.repository;
        var token      = singleInput.token;

        var alreadyIndexed = 0;

        var GitHub = new GitHubAPI({version: "3.0.0"});

        if (token && token.length > 0) {
            GitHub.authenticate({
                type: "oauth",
                token: token
            });
        }

        var page = 1;
        var working = true;

        Async.whilst(
            function() {
                return working;
            },
            function(callback) {
                try {
                    GitHub.repos.getCommits({
                        headers: {"User-Agent": SDK_UA_STRING},
                        user: owner,
                        repo: repository,
                        per_page: 100,
                        page: page
                    }, function (err, res) {
                        if (err) {
                            callback(err);
                            return;
                        }

                        if (res.meta.link.indexOf("rel=\"next\"") < 0) {
                            working = false;
                        }
                        
                        var checkpointFilePath  = path.join(checkpointDir, owner + " " + repository + ".txt");
                        var checkpointFileNewContents = "";
                        var errorFound = false;

                        var checkpointFileContents = "";
                        try {
                            checkpointFileContents = utils.readFile("", checkpointFilePath);
                        }
                        catch (e) {
                            fs.appendFileSync(checkpointFilePath, "");
                        }

                        for (var i = 0; i < res.length && !errorFound; i++) {
                            var json = {
                                sha: res[i].sha,
                                api_url: res[i].url,
                                url: "https://github.com/" + owner + "/" + repository + "/commit/" + res[i].sha
                            };

                            if (checkpointFileContents.indexOf(res[i].sha + "\n") < 0) {
                                var commit = res[i].commit;

                                json.message = commit.message.replace(/(\n|\r)+/g, " ");
                                json.author = commit.author.name;
                                json.rawdate = commit.author.date;
                                json.displaydate = getDisplayDate(commit.author.date.replace("T|Z", " ").trim());

                                try {
                                    var event = new Event({
                                        stanza: repository,
                                        sourcetype: "github_commits",
                                        data: JSON.stringify(json),
                                        time: Date.parse(json.rawdate)
                                    });
                                    eventWriter.writeEvent(event);

                                    checkpointFileNewContents += res[i].sha + "\n";
                                    Logger.info(name, "Indexed a GitHub commit with sha: " + res[i].sha);
                                }
                                catch (e) {
                                    errorFound = true;
                                    working = false;
                                    Logger.error(name, e.message, eventWriter._err);
                                    fs.appendFileSync(checkpointFilePath, checkpointFileNewContents);

                                    done(e);
                                    return;
                                }
                            }
                            else {
                                alreadyIndexed++;
                            }
                        }

                        fs.appendFileSync(checkpointFilePath, checkpointFileNewContents);

                        if (alreadyIndexed > 0) {
                            Logger.info(name, "Skipped " + alreadyIndexed.toString() + " already indexed GitHub commits from " + owner + "/" + repository);
                        }

                        page++;
                        alreadyIndexed = 0;
                        callback();
                    });
                }
                catch (e) {
                    callback(e);
                }
            },
            function(err) {
                done(err);
            }
        );
    };

Logging (optional)

Logging is an optional feature we’ve included with modular inputs the Splunk SDK for JavaScript.

It’s best practice for your modular input script to log diagnostic data to splunkd.log ($SPLUNK_HOME/var/log/splunk/splunkd.log). Use a Logger method to write log messages, which include a standard splunkd.log severity level (such as “DEBUG”, “WARN”, “ERROR” and so on) and a descriptive message. For instance, the following code is from the GitHub Commits streamEvents example, and logs a message if any GitHub commits have already been indexed:

if (alreadyIndexed > 0) {
    Logger.info(name, "Skipped " + alreadyIndexed.toString() + " already indexed GitHub commits from " + owner + "/" + repository);
}

Here we call the Logger.info method to log a message with the info severity, we’re also passing in the name argument, which the user set when creating the input.

That’s all the code you have to write to get started with modular inputs using the Splunk SDK for JavaScript!

Add the modular input to Splunk Enterprise

With your modular input completed, you’re ready to integrate it into Splunk Enterprise. First, package the input, and then install the modular input as a Splunk app.

Package the input

Files

Create the following files with the content indicated. Wherever you see modinput_name — whether in the file name or its contents — replace it with the name of your modular input JavaScript file. For example, if your script’s file name is github_commits.js, give the file indicated as modinput_name.cmd the name github_commits.cmd.

If you haven’t already, now is a good time to set your $SPLUNK_HOME environment variable.

We need to make sure all the names match up here, or Splunk will have problems recognizing your modular input.

modinput_name.cmd

@"%SPLUNK_HOME%"\bin\splunk cmd node "%~dp0\app\modinput_name.js" %*

modinput_name.sh

#!/bin/bash

current_dir=$(dirname "$0")
"$SPLUNK_HOME/bin/splunk" cmd node "$current_dir/app/modinput_name.js" $@

package.json

When creating this file, replace the values given with the corresponding values for your modular input. All values (except the splunk-sdk dependency, which should stay at “>=1.4.0″) can be changed.

{
    "name": "modinput_name",
    "version": "0.0.1",
    "description": "My great modular input",
    "main": "modinput_name.js",
    "dependencies": {
        "splunk-sdk": ">=1.4.0"
    },
    "author": "Me"
}

app.conf

When creating this file, replace the values given with the corresponding values for your modular input:

  • The is_configured value determines whether the modular input is preconfigured on install, or whether the user should configure it.
  • The is_visible value determines whether the modular input is visible to the user in Splunk Web.

inputs.conf.spec

[install]
is_configured = 0

[ui]
is_visible = 0
label = My modular input

[launcher]
author=Me
description=My great modular input
version = 1.0

When creating this file, in addition to replacing modinput_name with the name of your modular input’s JavaScript file, do the following:

  • After the asterisk (*), type a description for your modular input.
  • Add any arguments to your modular input as shown. You must list every argument that you define in the getScheme method of your script.

The file should look something like this:

[github_commits://<name>]
*Generates events of GitHub commits from a specified repository.

owner = <value>
repository = <value>
token = <value>
File structure

Next, create a directory that corresponds to the name of your modular input script—for instance, “modinput_name” — in a location such as your Documents directory. (It can be anywhere; you’ll copy the directory over to your Splunk Enterprise directory at the end of this process.)

  1. Within this directory, create the following directory structure:
    modinput_name/
        bin/
            app/
        default/
        README/
  2. Copy your modular input script (modinput_name.js) and the files you created in the previous section so that your directory structure looks like this:
    modinput_name/
        bin/
            modinput_name.cmd
            modinput_name.sh
            app/
                package.json
                modinput_name.js
        default/
            app.conf
        README/
            inputs.conf.spec
Install the modular input

Before using your modular input as a data input for your Splunk Enterprise instance, you must first install it.

  1. Set the SPLUNK_HOME environment variable to the root directory of your Splunk Enterprise instance.
  2. Copy the directory you created in Package the script to the following directory:
    $SPLUNK_HOME/etc/apps/
  3. Open a command prompt or terminal window and go to the following directory, where modinput_name is the name of your modular input script:
    $SPLUNK_HOME/etc/apps/modinput_name/bin/app
  4. Type the following, and then press Enter or Return: npm install
  5. Restart Splunk Enterprise: From Splunk Home, click the Settings menu. Under System, click Server Controls. Click Restart Splunk; alternatively you can just run
    $SPLUNK_HOME/bin/splunk restart

    from command prompt or terminal.

Your modular input should now appear long the native Splunk input by going to Splunk Home, click the Settings menu. Under Data, click Data inputs, and find the names of the modular inputs you just created.

In Summary

In this post you’ve seen how to create a modular input using the Splunk SDK for JavaScript.

Now you can use your Node.js skills to extend Splunk and pull data from any source, even Github!

Custom Message Handling and HEC Timestamps with the Kafka Modular Input

$
0
0

Custom Message Handling

If you are a follower of any of my Modular Inputs on Splunkbase , you may see that I employ a similar design pattern across all of my offerings. That being the ability to declaratively plug in your own parameterizable custom message handler to act upon the raw received data in some manner before it gets output to Splunk for indexing. This affords many benefits :

  • Many of my Modular Inputs are very cross cutting in terms of the numerous potential types and formats of data they will encounter once they are let loose in the wild. I can’t think of every data scenario. An extensibility design allows the user and community to be able to customize the data handling as they may require it by creating their own custom handlers , alleviating  me from having to hard code logic for every data scenario into the Modular Input .
  • Custom message handlers allow you to pre-process data , perhaps filtering out data you don’t require  or performing some data pre computations.
  • Custom formatting of the event that gets sent to Splunk for indexing such as transforming some gastly XML into a simpler JSON event.
  • Handle non standard data types ie: binary data that you might receive over a messaging channel such as compressed or encrypted data or some proprietary binary protocol or charset encoding that Splunk can’t parse like  (EBCDIC)

A couple of simple examples with the Kafka Modular Input

By default the Kafka Modular Input will use it own internal DefaultMessageHandler. This will just wrap Kafka messages in a KV string along with some other event meta-data.

I ship the Modular Input with 2 other message handlers that you can declaratively plug in to your config stanza (screenshots shown below) which are more oriented to JSON payloads being received from Kafka , a pretty common scenario in the Kafka world.

BodyOnlyMessageHandler

This will simply index the original raw event received from Kafka , such as a JSON string , with no additionally added meta fields.You can use this handler with STDOUT or HEC output channels.

Source code

Screen Shot 2015-10-22 at 12.11.43 PM

JSONBodyWithTimeExtraction

This handler is designed to be applied when you are using the HEC output option and the received data from Kafka is a JSON string. The HEC payload format allows you to specify a “time” field that will get applied to your indexed event as the index time. So this handler allows you to declare which field in the JSON received from Kafka contains the time data , and this will be extracted and added into the HEC payload’s “time” field sent to Splunk for indexing.

Source code

Screen Shot 2015-10-22 at 12.11.28 PM

 

Summary

So I hope these simple tips will come in handy for you and get you thinking about the augmentable capabilities of my Modular Inputs. If you need to create a custom message handler , start by reading the docs for the respective Modular Input and looking at source code examples on Github. And as always , reach out to me at anytime !

How’s my driving?

$
0
0

It was the summer of 2014. I was well into my big data addiction thanks to Splunk. I was looking for a fix anywhere: Splunk my home? Splunk my computer usage? Splunk my health? There were so many data points out there for me to Splunk but none of them would payoff like Splunking my driving…

Rocky Road

At the time, my commute was rough. Roads with drastically changing speeds, backups at hills and merges, and ultimately way more stop and go than I could stomach. But how bad was my commute? Was I having as bad an impact on the environment as I feared? Was my fuel efficiency much worse than my quiet cruise-controlled trips between New York and Boston? With my 2007, I really had no way to know…that is, until I learned about Automatic.

Automatic is a dongle that goes into your On Board Diagnostic (OBD) port. This port is hiding in plain sight – typically right under your steering wheel – but only your mechanic knows to look for it. The OBD port is what the mechanics use to talk to the car during service. It turns out there’s a ton of information available through that USB-like port and there’s a slew of new devices and dongles (for which Automatic is one of) that expose that information to you. Combine that car info from what your phone is capturing of the world around you and you got yourself a delicious data stew!

The Dataman Cometh

Out of the box, Automatic provides really cool details about every trip it records: fuel efficiency, route path, drastic accelerations, sudden brakes, periods of speeding, and so on. This is all accessible for review on your smart phone. A little bit hidden is that Automatic provides a dashboard where all of your trips can be seen in aggregate (http://dashboard.automatic.com). This dashboard allows you to see some basic aggregate statistics (sums and averages) for a selected time period of trips. I’m sure you see where I’m going with this…

I wanted more from this data. I had already been spoiled by the power of Splunk and I knew that if I could Splunk this data, I could do so much more. That’s when I noticed in the bottom right corner of the dashboard an export option. EUREKA! I immediately downloaded the resulting CSV of my trip data and got to work adding it to Splunk.

Getting Dizzy with Vizzies

I created each and every search that immediately came to mind: total miles driven, total time behind the wheel, total number of trips. Then I got into the visualizations: number of trips over time, fuel efficiency trends relative to instances of poor driving (sudden brakes, fast accelerations, or speeding), fuel prices over time, and location of the cheapest gas observed.

Basic aggregate analytics of Automatic data within Splunk.

As I started seeing my data represented visually, I was reminded of something from Dr. Tom LaGatta’s .conf2014 talk “Splunk for Data Science” (http://conf.splunk.com/speakers/2014.html#search=lagatta). He spoke about how the human brain processes data more effectively in a visual manner. I was now seeing exactly what he meant! My data was no longer a table, or a graphic for a single trip. Instead I was able to use my entire trip history to create visualizations that demonstrated my driving trends and the impact of my behaviors on fuel efficiency – things I would never have captured by looking at individual events, nor by reading a spreadsheet of numbers. With this new enthusiasm, I took Dr. LaGatta’s premise to the max and used a D3 Chord Diagram to represent my travel frequency from zip code to zip code.

Density of zip code transits represented with a Chord Diagram

Dynamic Data

After posting this collection of insights as an app on SplunkBase, I met a wise Splunk Ninja named Todd Gow. Todd showed me the ways of the force by using Damien Dallimore’s REST API Module Input to pull my Automatic data perpetually, rather than solely through manual CSV exports from dashboard.automatic.com. Thanks to Todd and Damien’s coaching, as well as some support from the developers at Automatic, the app was now able to pull user’s driving behavior automatically.

This was a game changer. Not only did it ease the import of data, but it meant that users of Automatic and Splunk could now create Splunk alerts based on driving performance! Alerts that highlight everything from erratic driving, to poor fuel efficiency, to notifications that the vehicle has gone outside its typical geographic area (stolen?). The possible insights were growing now that new data could be analyzed against historical performance.

“Where we’re going, we don’t need roads”

Since Back to the Future II’s prediction for 2015 was off, let’s conclude by covering some final insights achieved thanks to Splunk + Automatic:

Besides the metrics mentioned above, I was able to calculate some interesting details about my fuel efficiency. Automatic provides audio tones to alert the driver to speeding, drastic acceleration, and sudden braking. This produced a Pavlovian response for me such that, over time, I could feel myself adjusting my driving behavior in response. So this led me to wonder: Has Automatic saved me money on fuel by adjusting my driving behavior? Thanks to Splunk, I was able to calculate such an answer!

First I calculated my average fuel efficiency from when I started with Automatic till now. Since each trip with Automatic includes fuel consumption and fuel prices (as calculated by prices local to my trip location), I was able to compare the fuel efficiency increase to the average fuel prices and provide an estimate of money saved. Considering the cos of Automatic’s dongle, I concluded that I MADE money by spending less on fuel thanks to my improved driving behavior!

Fuel efficiency data from Automatic represented in Splunk

To learn about the search techniques to draw these, and other, conclusions, download the app from SplunkBase and check out the dashboards for yourself.

“Get Outta My Dreams, Get Into My Car”

With my Automatic driving data in Splunk, my mind won’t stop “racing” with new insights to implement. In addition to the alerts proposed in this post, I plan to provide insights into vehicle usage for wear and tear, route path details, stronger fuel economy calculations, and webhook incorporations.

Before any of that, I have to finish what has become a complete app re-write. I’ve been rewriting the data ingestion modular input to both simplify app configuration as well as take advantage of the new Automatic API and its new field names.

Down the “road”, I hope to collaborate with the Common Information Model team here at Splunk to define a vehicle-based data model. That way, users of any vehicle-related data capture (such as Automatic) could take advantage of the Automatic app on SplunkBase and get the same insights, regardless of the data’s differences in field names and formats.

Of course, if you’ve found cool insights into your driving data, post it below and share your discoveries! Let us know how you used Splunk to make your “machine data accessible, usable and valuable to everyone.” Thanks for reading, drive safe, and happy Splunking!


Viewing all 15 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>