wiki:Traces Collection Framework

Traces Collection Framework


Please notice, that the current version of the document is not final. Several required corrections and modifications will be made in the near future.


Introduction

As a traffic flow between testbeds’ elements might not directly lie in the jurisdiction of the researcher, a problem of distant traffic traces collection and analysis arises. This framework aims at providing a convenient toolkit for exposing inter-components communication for initial setup and “debugging” purposes at an early stage of an experiment. Exploiting the functionality of the framework can also facilitate the execution of the experiments, which aim at studying functionality and compatibility of various components within a testbed. Finally, a problem of storing and sharing results of an experiment after its completion can also be tackled with the proposed solution. The developed framework provides means for a real-time interaction observation, including the integration with a widely used network protocol analyzer - Wireshark, which is a de facto standard for traffic flows analysis on the local machine, but yet does not have a native support for displaying network activity from the remote nodes’ interfaces.

Framework structure

Figure 1: Framework workflow

As demonstrated in the Figure XX., the framework has a distributed structure with two main components: a Traces Collection Client and a Traces Collection Server. Traces Collection Client (henceforth abbreviated as TCC) is a software component, running on the testbed nodes, from which traffic traces have to be collected. It is responsible for packets interception and their transmission to the Server in an appropriate format. Traces Collection Server (TCS) is a central location where the trace files are collected, stored, managed and from which traffic captures can be exposed to a user either concurrently with the capturing process, or afterwards in “offline” mode. In the scenario, illustrated in the Figure XX, Traces Collection Clients are deployed on resources A and B. It allows a user to observe packets that are sent between the components A, B and C. Because there were no TCCs deployed on the resources C and D, a user has no platform-enabled means to inspect what is happening on the interfaces between them. In this case a user has to either establish his own tracing infrastructure of rely on logging mechanisms of the testbed resources.

Traces Collection Clients consist of the capturing tool (tshark) and a TracesTransmitter? – a program, relaying traffic captures to a central server. Capturing process is performed using the tshark probes, which extract traffic in the most general traces format – pcap. They support the libpcap capture filters as well as the Wireshark display filters. Operations, specified on the T1 interface are used for the deployment and configuration of TCCs through the PTM. Installation of more than one Traces Collection Client is allowed in order to enable fine-granular filters setup for facilitating the capturing process as well as decreasing the amount of irrelevant data being collected.

Features

A set of parameters can be configured for every TCC individually.

Developed solution supports the functionality of real-time observations. For the simplification of routines, required for obtaining a desired capture data and further directing it to the Wireshark, users are provided with the execution scripts, wrapping several important preparation routines. Two solutions have been developed: one for the POSIX-compliant operating systems, the other for Windows. The scripts can be downloaded from a web interface. By a minimal browser configuration, they might be immediately executed, allowing starting a remote observation by just one click.

Execution of the shell script will in essence cause three actions:

  1. Creation of a named pipe with the unique name
  2. Star of Wireshark, configured to wait for the streaming input from the mentioned pipe
  3. HTTP GET request to the server for retrieving traces for selected resource and forwarding this date into the previously created named pipe

Linux: script_template

#!/bin/bash          
echo "Checking for pipe collisions...."
count=0
name=/tmp/wireshark$count
echo "Start name: $name"

while [ -e $name ]
	do
		count=$(( $count + 1 ))
		name=/tmp/wireshark$count
		echo "Name: $name"
	done

echo "Creating pipe $name"
mkfifo $name
echo Opening wireshark...
wireshark -k -S -i $name &
echo "Starting download process...." 
wget -qO - %%STREAMER_ADDRESS%%?trace=%%TRACE_ID%% > $name

echo "Removing pipe..."
rm  $name
echo "Removed."
echo "Done"

This script essentially consists of tree main subroutines. The first one (lines X-XX) is responsible for creating a named pipe with an auto generated name while controlling that the name of the new pipe does not coincide with the names of the pipes already in use. This feature is important, because it is very likely, that a user would want to simultaneously observe network activity from several nodes or even from the same node with non-overlapping filter rules, thus opening several Wireshark windows simultaneousely. The second subroutine (lines X-XX) opens a protocol analyzer (Wireshark) with a set of necessary flags and the name of the pipe, from which it has to expect the incoming data in the pcap format. The last part (lines X-XX) makes use of the wget utility, which is by default included in the most of Linux OS distributions. This subroutine creates an HTTP GET request with the wget for obtaining a(the) trace file and directs its output into the pipe, created by the first subroutine. It is very important to mention, that wget will not interrupt the execution of the download process, when the Server will finish transmission of all currently available data, but only when the underlying connection will be closed from the server side. It allows using presented script for asynchronously receiving traces updates and thus enabling live mode observation. As soon as new packets will be captured on a remote node, they will be delivered to the TCS and immediately sent to the user. As this data is forwarded into the Wireshark using a pipe, it automatically recognizes the changes and refreshes the GUI, presenting new data. In the listing, presented above, %%STREAMER_ADDRESS%% and %%TRACE_ID%% are the placeholders, that will be substituted by a concrete IP address of a TCS and with an id of a specific trace file respectively during the script generation phase.

Developing solution for Windows is more challenging due to the absence of several useful tools used in the preceding case. First of all, wget is not included into Windows distributions by default. Secondly, named pipes are not supported on the shell script level, but can only be used through a native API. At the same time putting a burden on a user in a form of additional software installation was undesirable. Therefore, a decision was made to develop a Python program, that takes care of managing the named pipes and obtaining the traces, namely routines that cannot be executed purely from the Windows batch scripts. The content of this program is presented in the following listing:

Pipe.py:

import win32pipe, win32file
import os
import time

print os.getcwd()

assigned = False
count = 0
startName = r'\\.\pipe\wireshark'

while (not assigned):
    pipeName = startName+str(count)
    print pipeName
    try:
        p = win32pipe.CreateNamedPipe(
            pipeName,
            win32pipe.PIPE_ACCESS_OUTBOUND,
            win32pipe.PIPE_TYPE_MESSAGE | win32pipe.PIPE_WAIT,
            1, 65536, 65536,
            300,
            None)
        assigned = True
        break
    except :
        count = count + 1;
        print "This pipe is busy, try another one"

print pipeName
win32pipe.ConnectNamedPipe(p, None)

delay = 0

while 1 :
    data = cf.read(10)
    if data=="":
        print delay
        delay +=1
        time.sleep(1)
        if delay>20:
            print 'stop...'
            break
    else:
        delay = 0
        print data
        win32file.WriteFile(p, dzata)

Analogous to the script from the Listing X, presented program also consists of three parts. Line X to X manages creation of the uniquely named pipes, by utilizing win32pipe (LINK) Python library. On line X Wireshark is opened and connected to the created named pipe. The third part (line X - XX) initiates HTTP GET request to the TracesStreamer? on TCS and writes obtained data into the pipe. For allowing the execution of presented program on machines without Python execution environment it has been converted into the binary Windows executable, provided along with all required libraries in an archive, called Traces4Win. This allowed minimizing the required setup procedures to the following two simple steps:

  • Downloading and unpacking Traces4Win
  • Setting two environment variables. The first, called traces has to point to a folder, where Traces4win has been unpacked. The second, wireshark, pointing to the directory, where the Wireshark is installed.

Scripts, that are downloaded and executed by the user for every observation process basically consist of one execution command, where %traces% and %wireshark% are automatically substituted by the respective environment variables. The template for these scripts is presented in the Listing X:

Windows: win_script.template

echo Please, check the correctness of the following paths: 
echo Traces4win: %traces%
echo Wireshark: %wireshark%
"%traces%\trace.exe" "%wireshark%" %%STREAMER_ADDRESS%%?trace=%%RESOURCE_ID%% 


%%STREAMER_ADDRESS%% and %%RESOURCE_ID%% are replaced on the server side in exactly the same way as described previously for the Linux scripts. Thereby, a developed solution allows automating and facilitating the procedure of opening traces in a network protocol analyzer, relieving the user from the necessity to carry out tedious preparation routines.

In the case, when dissection of packets and deep analysis of their content is less important, than the possibility to register the fact of some specific type of interactions (SIP INVITE, DNS request etc.) between specific peers, TCCs can be configured to collect decoded data from the output of the tshark. This allows applying display filters already on the testbed resource side, which in turn decreases the amount of transmitted information. This also allows demonstrating information about the distant nodes’ network activities in any program that is capable of displaying textual data. An AJAX-based solution for displaying this kind of real-time captures in a browser has been developed (see Figure XX).

Figure 2: Browser integration

Results of an experiment, (generally, raw packet traces) can be stored in a centralized location - Traffic Collection Server. This resource can remain persistent for the extended duration of time in comparison to the rest of the booked testbed elements. Therefore all other resources might be released, without the necessity to load a considerable amount of data onto a hard drive of a user. Results can be thus stored in a “cloud” for further analysis and processing. A reasonable estimation of the storage space provided for saving the traces has to be additionally performed. This will either limit the traces collection to pure setup and debug scope (mainly real time observations) or provide the possibility to store a complete experiment’s flow.


Figure 3: Functional Decomposition

Attachments