Computer networks are becoming increasingly complex with more and more devices connected each day. Gaining network visibility is absolutely crucial to ensure your traffic flows smoothly and transit costs are kept low. In this mini-series I will show you how to set-up sFlow sampling on Linux, aggregate the data and finally present it in flashy graphs.

Part 1: Sampling sFlow

In this episode we will talk about what sFlow is, why it is useful and how to sample it on Linux based systems.

What is sFlow?

To quote Wikipedia:

sFlow, short for “sampled flow”, is an industry standard for packet export at Layer 2 of the OSI model. It provides a means for exporting truncated packets, together with interface counters for the purpose of network monitoring. Maintenance of the protocol is performed by the sFlow.org consortium, the authoritative source of the sFlow protocol specifications. The current version of sFlow is v5.

sFlow agents (devices, which can sample and generate sFlow data) collect data by sampling packets at a specified rate.

For example, you might have a router which will take every 1000th packet that passes through it. It will collect all available metadata about the packet and send that to a specified collector (a device, which receives sFlow data) over the network. In this scenario, the sampling rate would be 1000.

Why is sFlow sampling useful?

Any time you want to collect information about the data flowing through your network, you need to do some sort of packet sampling. There are two options:

  • Flow Analysis
  • Packet Analysis

Packet Analysis

Packet Analysis works by capturing everything flowing through the network interface. Every packet (possibly matching a specific capture filter) including data within is captured. You might know utilities like tcpdump or Wireshark which work on this basis.

The benefit of doing packet analysis is that you get all information contained in packets, which can help in troubleshooting misconfigurations, network attacks and similar.

This all comes with a cost though. Since the amount of data is massive, the compute power required to collect, process and analyze the traffic is huge. Not to mention the storage requirements for storing packet captures for later use.

Flow Analysis

Flow analysis provides a way to give insight into data flowing through the network while maintaining low overhead and minimal performance hit.

Unlike with packet based analysis, the network device captures only a summary of data (metadata) flowing through it. To put that in technical terms, a flow agent will only care about packet headers. To lower the overhead even more, not every packet is sampled. In a gigabit network, sampling rates of 400 to 1000 are common, which means that the device only processes every 400th to 1000th packet.

Even though it may seem insufficient, you can still utilize flow analysis to learn a lot about which data flows through your network, how much and where to. The amount of data to be stored is small, which makes long term storage (for later analysis and auditing purposes) very cost effective.

Sampling sFlow on Linux

Host sFlow, otherwise known as hsflowd is an sFlow agent for Linux, Windows, FreeBSD and other operating systems.

While you can configure it to sample data using few different methods, today we will focus on making it work using NFLOG.

Here’s an example of how a configuration file can look like:

# cat /etc/hsflowd.conf
sflow {
 # Do not use DNS based configuration
  DNSSD = off
 # How often should data be sent to collectors (in seconds)
  polling = 20

 # Sampling rate settings
  sampling = 400
  sampling.100M = 400
  sampling.1G = 400
  sampling.10G = 400
  sampling.40G = 400

 # Where should we send the data (to which collectors)
  collector { ip=127.0.0.1 udpport=6343 }
  collector { ip=192.168.1.37 udpport=6343 }

 # How should we gather the data
  nflog { group = 5  probability = 0.0025 }
}

In this configuration, hsflowd will look for NFLOG data with group 5 and expect probability to be 0.0025 (1/sampling_rate = 1/400).

Now we need to forward the traffic to NFLOG. This can be done using IPTables:

iptables -A FORWARD -o br0 -m statistic --mode random --probability 0.00249999994 -j NFLOG --nflog-prefix  SFLOW --nflog-group 5
iptables -A FORWARD -i br0 -m statistic --mode random --probability 0.00249999994 -j NFLOG --nflog-prefix  SFLOW --nflog-group 5

We’re telling IPTables to randomly (with a probability equal to 1/sampling_rate = 1/400) select packets being forwarded through an interface br0 both ways and forward them to NFLOG group 5.

Host sFlow should now be generating sFlow and sending it to specified collectors, which can be verified using tcpdump for example.