CINXE.COM

Experiences of Internet Traffic Monitoring with Tstat - CORE Reader

<!DOCTYPE html><html><head><meta name="viewport" content="width=device-width"/><meta charSet="utf-8"/><title>Experiences of Internet Traffic Monitoring with Tstat - CORE Reader</title><link rel="preload" href="https://core.ac.uk/download/pdf/11424131.pdf" as="fetch" type="application/pdf" crossorigin="anonymous"/><link rel="icon" type="image/png" sizes="64x64" href="https://core.ac.uk/favicon/favicon-64px.png"/><link rel="icon" type="image/png" sizes="128x128" href="https://core.ac.uk/favicon/favicon-128px.png"/><link rel="icon" type="image/png" sizes="256x256" href="https://core.ac.uk/favicon/favicon-256px.png"/><link rel="icon" type="image/png" sizes="512x512" href="https://core.ac.uk/favicon/favicon-512px.png"/><link rel="icon" sizes="any" type="image/svg+xml" href="https://core.ac.uk/favicon/favicon.svg"/><meta name="referrer" content="origin"/><meta name="DC.format" content="https://core.ac.uk/download/pdf/11424131.pdf"/><meta name="citation_pdf_url" content="https://core.ac.uk/download/pdf/11424131.pdf"/><meta name="DC.title" content="Experiences of Internet Traffic Monitoring with Tstat"/><meta name="citation_title" content="Experiences of Internet Traffic Monitoring with Tstat"/><meta name="DC.creator" content="[object Object]"/><meta name="DC.identifier" content="oai:porto.polito.it:2486379"/><meta name="DC.subject" content="info:eu-repo/semantics/article"/><script type="application/ld+json">{"@context":"http://schema.org","@graph":[{"@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"https://core.ac.uk","name":"CORE"}},{"@type":"ListItem","position":2,"item":{"@id":"https://core.ac.uk/search?q=repositories.id:(351)","name":"PORTO Publications Open Repository TOrino"}},{"@type":"ListItem","position":3,"item":{"@id":"https://core.ac.uk/reader/11424131","name":"Experiences of Internet Traffic Monitoring with Tstat","image":"https://core.ac.uk/image/11424131/large"}}]},{"@type":"ScholarlyArticle","@id":"https://core.ac.uk/reader/11424131","headline":"Experiences of Internet Traffic Monitoring with Tstat","name":"Experiences of Internet Traffic Monitoring with Tstat","author":[{"@type":"Person","name":{"name":"Finamore, Alessandro"}},{"@type":"Person","name":{"name":"Mellia, Marco"}},{"@type":"Person","name":{"name":"Meo, Michela"}},{"@type":"Person","name":{"name":"Munafo, M."}},{"@type":"Person","name":{"name":"Rossi, Dario Giacomo"}}],"datePublished":"","isAccessibleForFree":true,"provider":{"@type":"Organization","name":"PORTO Publications Open Repository TOrino"},"image":"https://core.ac.uk/image/11424131/large","publisher":{"@type":"Organization","name":"IEEE"}}]}</script><meta name="next-head-count" content="18"/><link rel="preload" href="/reader/_next/static/css/a0bec27e4dde4234f033.css" as="style"/><link rel="stylesheet" href="/reader/_next/static/css/a0bec27e4dde4234f033.css" data-n-g=""/><noscript data-n-css=""></noscript><link rel="preload" href="/reader/_next/static/chunks/main-d72a0256b9a8bfaec3f0.js" as="script"/><link rel="preload" href="/reader/_next/static/chunks/webpack-bdedcfd1ed701b02120b.js" as="script"/><link rel="preload" href="/reader/_next/static/chunks/framework.bff3b006583f00846d57.js" as="script"/><link rel="preload" href="/reader/_next/static/chunks/pages/_app-9a52499a2051e69b3e10.js" as="script"/><link rel="preload" href="/reader/_next/static/chunks/pages/%5BpdfId%5D-2f928bd9a0683fd71079.js" as="script"/></head><body><div id="__next"></div><script id="__NEXT_DATA__" type="application/json">{"props":{"pageProps":{"acceptedDate":"2011-05-25T00:00:00+01:00","authors":[{"name":"Finamore, Alessandro"},{"name":"Mellia, Marco"},{"name":"Meo, Michela"},{"name":"Munafo, M."},{"name":"Rossi, Dario Giacomo"}],"contributors":[],"createdDate":"2013-07-10T14:52:25+01:00","dataProvider":{"id":351,"name":"PORTO Publications Open Repository TOrino","url":"https://api.core.ac.uk/v3/data-providers/351","logo":"https://api.core.ac.uk/data-providers/351/logo"},"depositedDate":"2011-05-01T00:00:00+01:00","documentType":"research","doi":"10.1109/MNET.2011.5772055","downloadUrl":"https://core.ac.uk/download/pdf/11424131.pdf","fullText":"110-year Experience of Internet Traffic Monitoring\nwith Tstat\nA. Finamore M. Mellia M. Meo M. M. Munafo` D. Rossi\n1Politecnico di Torino 2TELECOM ParisTech\nemail: {lastname@tlc.polito.it} email: dario.rossi@enst.fr\nAbstract—Network monitoring has always played a key role in\nunderstanding telecommunication networks since the pioneering\ntime of the Internet. Today, monitoring traffic has become a\nkey element to characterize network usage and users’ activi-\nties, to understand how complex applications work, to identify\nanomalous or malicious behaviors. In this paper, we present our\nexperience in engineering and deploying Tstat, a free open source\npassive monitoring tool that has been developed in the past ten\nyears. Started as a scalable tool to continuously monitor packets\nthat flow on a link, Tstat has evolved into a complex application\nthat gives to network researchers and operators the possibility to\nderive extended and complex measurements via advanced traffic\nclassifiers. After discussing Tstat capabilities and internal design,\nwe present some examples of measurements collected deploying\nTstat at the edge of several ISP networks in the past years.\nWe then discuss the scalability issues that software based tools\nhave to cope with when deployed in real networks, showing the\nimportance of properly identifying bottlenecks.\nI. INTRODUCTION\nSince the Internet childhood, network monitoring has played\na vital role in network management, performance analysis and\ndiagnosis. Nowadays, with the increased complexity of the\nInternet infrastructure, the applications and services, this role\nhas become more crucial than ever. Over the years, a number\nof methodologies and tools have been engineered to assist\nthe daily routines of traffic monitoring and diagnosis and to\nunderstand the network performance and users’ behavior [1].\nTo analyze a system, researchers can follow experimental\nscience principles and devise controlled experiments to induce\nand measure cause-effect relationships, or, observational sci-\nence principles and, avoiding artificial interference, study the\nunperturbed system. In the specific field of network traffic\nmeasurement, the above two disciplines are referred to as\nactive and passive measurements, respectively. The active\napproach aims at interfering with the network to induce a\nmeasurable effect, which is the goal of the measurement\nitself. Active approaches generate traffic, e.g, by injecting\nspecifically crafted probe packets or alter the network state,\ne.g., by enforcing artificial packet loss. A number of Internet\nmonitoring tools are based on active probing, ranging from\nsimple operation management or network tomography via\n“ping” or “traceroute”, to more complex delay and capacity\nestimation via “capprobe” or “pathchar”. Finally, large and\ncontrolled testbeds can be easily setup using tools like “netem”\nThis work has been supported by the European Commission through NAPA-\nWINE Project (Network-Aware P2P-TV Application over Wise Network), ICT\nCall 1 FP7-ICT-2007-1.\n(a)\nTCP/UDP\nBehavioural\nFSM DPI\nPure DPI\nL7\nL4\nIPL3\n(b)\nFig. 1. Tstat monitoring probe setup (a) and analysis workflow (b).\nor “dummynet”. For the passive approach pure observations\nare performed by means of dedicated tools, named “sniffers”\nby the Internet metrology community, that simply observe and\nanalyze the traffic that flows on links. Several passive measure-\nment tools are available. Some tools, such as “tcpdump” or\n“Wireshark”, are designed to let researchers interactively ana-\nlyze the captured packets. Other tools are instead automated,\nso that the human interaction is minimized; examples are\nthe flow-level monitoring tool “NetFlow”, intrusion detection\nsystem like “Snort” or “Bro”, and the traffic classification tool\n“CoralReef”. A comprehensive list of both active and passive\ntools can be found in [1].\nTstat is an example of automated tool for passive mon-\nitoring. It has been developed by the networking research\ngroup at Politecnico di Torino since 2000 [2]. Tstat offers\nlive and scalable traffic monitoring up to Gb/s using off-the-\nshelf hardware. It implements traffic classification capabilities,\nincluding advanced behavioral classifiers [3], while offering\nat the same time performance characterization capabilities of\nboth network usage and users’ activities [4]. After more than\nten years of development, Tstat has become a versatile and\nscalable application, used by several researchers and network\noperators worldwide. In this paper, we report our experience\nwith Tstat development and use. We illustrate as a case study\nthe traffic evolution as observed during the last year at different\nvantage points in Europe, and discuss some issues about the\nfeasibility of Internet traffic monitoring with common PCs that\ncan help researchers to avoid common pitfalls that we have\nfaced in the past.\nII. TSTAT OVERVIEW\nTstat started as evolution of tcptrace[5], which was de-\nveloped to track and analyze individual TCP flows, offering\n2detailed statistics about each flow. Tstat initial design objective\nwas to automate the collection of TCP statistics of traffic\naggregate, adding real time traffic monitoring features. Over\nthe years, Tstat evolved into a more complex tool offering rich\nstatistics and functions. Developed in ANSI C for efficiency\npurposes, Tstat is today an Open Source tool that allows\nsophisticated multi-Gigabit per second traffic analysis to be run\nlive using common hardware. Tstat design is highly flexible,\nwith several plugin modules offering different capabilities that\nare briefly described in the following. In addition, plugins can\nbe activated and deactivated on the fly, without interrupting the\nmonitoring. Being a passive tool, live monitoring of Internet\nlinks, in which all flowing packets are observed, is the typical\nusage scenario. Fig. 1(a) sketches the common setup for a\nprobe running Tstat: on the left there is the network to monitor,\ne.g., a campus network, that is connected to the Internet\nthrough an access link that carries all packets originated\nfrom and destined to terminals in the monitored network.\nThe Tstat probe observes the packets and extracts the desired\ninformation. Note that this scenario is common to a wide set of\npassive monitoring tools. Therefore the problems faced when\ndesigning Tstat are common to other tools as well.\nA. Monitored objects\nThe basic objects that passive monitoring tools considers\nare the IP packets that are transmitted on the monitored\nlink. Flows are then typically defined by grouping,\naccording to some rules, all packets identified by the\nsame flowID and that have been observed in a given\ntime interval. A common choice is to consider flowID =\n(ipProtoType, ipSrcAddr, srcPort, ipDstAddr, dstPort),\nso that TCP and UDP flows are considered. For example, in\ncase of TCP, a new flow starts is commonly identified when\nthe TCP three-way handshake is observed; similarly, its end\nis triggered when either the proper TCP connection tear-down\nis seen, or no packets have been observed for some time.\nSimilarly, in case of UDP, a new flow is identified when the\nfirst packet is observed, and it is ended after an idle time.\nAs Internet conversations are generally bidirectional, the\ntwo opposite unidirectional flows (i.e., having symmetric\nsource and destination addresses and ports) are then typically\ngrouped and tracked as connections. This allows to gather\nseparate statistics for client-to-server and server-to-client flow,\ne.g., the size of HTTP client requests and server replies.\nFurthermore, the origin of information can be distinguished,\nso that it is possible to separate local hosts from remote hosts\nin the big Internet. As depicted in Fig. 1(a), traffic is then\norganized in four classes:\n• incoming traffic: the source is remote and the destination\nis local;\n• outgoing traffic: the source is local and the destination is\nremote;\n• local traffic: both source and destination are local;\n• external traffic: both source and destination are remote.\nThis classification allows to separately collect statistics about\nincoming and outgoing traffic; for example, one could be\ninterested in knowing how much incoming traffic is due to\nYouTube, and how many users access Facebook from the\nmonitored network. The local and external cases should not\nbe considered but in some scenarios they can be present.\nAt packet, flow and application layers, a large set of statis-\ntics can be defined and possibly customized at the user’s will.\nIn case of Tstat, several statistics are already available, and\nthey can be easily customized and improved being Tstat Open\nSource. A detailed description of all available measurement\nindexes can be found in [2].\nB. Workflow analysis\nAs far as the analysis process is concerned, each observed\npacket is handed over to the analyzer plugins that are activated,\nas illustrated in Fig. 1(b). Following the Internet naming\nstandard and going up in the protocol stack, layer-2 (L2) frame\nde-encapsulation is first done. Then, the network-layer (L3)\nheader is processed. Given the datagram service offered by\nIP networks, at L3 only per-packet statistics, such as bitrate,\npacket length, are possible.\nGoing up to the transport-layer (L4) analysis, a set of com-\nmon statistics for both TCP and UDP flows are maintained,\ne.g., packet and byte counters, round trip time (RTT) and\nthroughput of the data download.\nAt the application-layer (L7), the main goal of a monitoring\ntool is to perform traffic classification task, that is to identify\nthe application that generated the traffic. As traffic classifi-\ncation is known to be prone to fallacies, several approaches\nhave been studied in the literature [6]. Each tools has then\nits peculiarities. In the case of Tstat, three different engines\nare available, each relying on different technologies. They are\ndesigned to work even when the complete packet payload is\nnot available, that is a common situation in live networks\nmonitoring, since, usually, only a limited portion of each\npacket is exposed to the sniffer due to privacy reasons.\nThe simplest engine is Pure Deep Packet Inspection (PDPI).\nIt uniquely identifies applications by matching a signature\nin the application payload. All the application signatures are\ncollected in a dictionary, defining a set of classification rules,\nand are then checked against the current packet payload until\neither a match is found, or all the signatures have been\ntested. In the first case, the packet/flow is associated to the\nmatching application, while in the second case it is labeled\nas “unknown”. Signatures cover a large set of applications,\nranging from standard email protocols to Peer-to-Peer appli-\ncations, like Bittorrent, eMule, Gnutella, PPLive, and Sopcast.\nExtending and updating the signatures is a key issue with\nPDPI, as we will discuss later.\nThe second engine, named Finite State Machine Deep\nPacket Inspection (FSMDPI), inspects more than one packet\nof a flow. Finite State Machines (FSM) are used to verify\nthat message exchanges are conform to the protocol standard;\nto have a positive match, a specific sequence of matching\nrules have to be triggered. For example, if the first packet\ncontains GET http:// and the response carries HTTP/1.0\nOK, the flow can be considered as HTTP. Using this approach,\nmore complex signatures can be defined, allowing to identify\nmore web based applications like YouTube, Vimeo, Facebook,\n3Flickr, or chat services like MSN, XMPP/Jabber, Yahoo.\nFinally Voice over IP phone (VoIP) calls based on RTP/RTCP\ncannot be easily detected using PDPI and then FSMDPI\nclassification is required.\nTo cope with applications that leverage on encryption\nmechanisms which make any DPI classifier useless, Tstat\nimplements a Behavioral classifier (BC) engine that exploits\nstatistical properties of traffic to distinguish among applica-\ntions. For example, packet size or inter arrival time in flows\ncarry information about the application generating the content,\nso that VoIP flows have very different characteristics with\nrespect to data download flows. Using this approach, Tstat\nidentifies encrypted traffic like the one generated by Skype\nand Obfuscated P2P-file-sharing of BitTorrent and eMule [3].\nIn Section III we present some results that exploit traffic\nclassification capabilities of Tstat. While the performance and\naccuracy of the classifier are out of scope of this paper, overall,\nthey have been found to “outperform [other] signature based\ntools used in the literature” when compared by independent\nresearchers [7].\nC. Input data\nSoftware based monitoring tools like Tstat are designed\nto work in real-time when installed in operational networks.\nThe software tool runs on a “probe”, i.e., a dedicated PC\nthat “sniffs” traffic flowing on an operative link, as shown\nin Fig. 1(a). The libpcap library is the de-facto standard\nApplication Programming Interface (API) to capture packets\nfrom standard Ethernet linecards under several Operating\nSystems. Dedicated hi-end capture devices such as Endace\nDAG or AITIA S1GED cards are also available on the\nmarket1. They offer hardware packet monitoring solutions that\noffload the main CPU while guaranteeing higher performance\nthan software based solutions. Tstat supports both standard\nsniffing based on libpcap, and hardware solutions as the ones\nmentioned earlier.\nFurthermore, Tstat can be also compiled as a “library” to\nallow an easy integration with already existing tools such\nas those typically deployed by an ISP which already has a\nmonitoring solution. In the latter case, the ISP is free to\ncustomize Tstat and decide what packets should be further\nprocessed, so as to tune the amount of payload, filter packets,\nanonymized addresses for privacy purposes. In our experience,\nthis approach has been very successful to facilitate the integra-\ntion of Tstat with the monitoring tools of several ISPs around\nEurope and with other traffic analysis tools developed by the\nresearch community.\nBesides live traffic analysis, monitoring tools are also com-\nmonly adopted to offline process packet level traces that have\nbeen previously collected. In this case, the tool can be used to\ninspect specific traffic for post-mortem analysis, or to develop\nmore complex statistical analysis for advanced performance\nevaluation, or to double check the accuracy of any new index\nthat is being developed. Since several trace file formats are\navailable on the market, a variety of dump file formats should\nbe supported, such as pcap, erf, etherpeek, snoop to name a\n1http://www.endace.com, http://www.aitia.ai\nfew. Besides providing already a large set of trace file format\ninput plugins, Tstat allows to easy integrate new formats\nthanks to its open and flexible design.\nD. Output statistics\nFinally, each monitoring tool offers a set of output statistics\nthat are strictly bound to the goal of the tool itself. For exam-\nple, intrusion detection systems like Snort or Bro output the list\nof triggered alarms and violations, while traffic classification\ntools like Tie or Coralreef report statistics about application\ntraffic shares. Considering Tstat, statistics are available with\ndifferent granularities: per-packet, per-flow, and aggregated. At\nthe finest level of granularity, Packet traces can be dumped into\ntrace files for further offline processing. This output format\nis extremely valuable when coupled with Tstat classification\ncapabilities: indeed, packets can be dumped per-application\nin different files. For example, it is possible to instruct Tstat\nto only dump packets generated by Skype and BitTorrent\napplications, while discarding all other packets.\nAt an intermediate level of granularity, Flow-level logs are\ntext files providing detailed information for each monitored\nflow. A log file is arranged as a simple table where each\ncolumn is associated to a specific information and each line\nreports the two unidirectional flows of a connection. Several\nflow-level logs are available, e.g., the log of all UDP flows, or\nthe log of all VoIP calls. The log information is a summary of\nthe connection properties. For example, the starting time of the\nVoIP call, its duration, the number of suffered packet losses,\nthe jitter are all valuable metrics that allow to monitor the\nVoIP quality of service. Flow-level logs use much less space\nthan the original packet level traces, and can be collected for\nmuch longer periods of time.\nAt an even higher level of granularity, Tstat gathers statistics\nabout flows aggregates. Two formats are available in this case.\nHistograms are empirical frequency distributions of collected\nstatistics over a set of flows. For example, the distribution of\nthe VoIP call duration is automatically computed by consid-\nering all VoIP flows that were observed during each 5 minute\ntime interval. To overcome the problem of storage space explo-\nsion of packet-traces, flow-level logs and histograms over time,\nthe second available format is represented by Round Robin\nDatabase (RRD) [8], that allows to build a database that spans\nover several years by keeping the amount of space limited.\nRRD handles historical data with different granularities: newer\nsamples are stored with higher frequencies, while older data\nare averaged in coarser time scales. This dramatically reduces\nthe requirements in terms of disk space (a priori configurable)\nand, thanks to the tools provided by the RRD technology, it is\npossible to visually inspect the results. For example, RRD data\ncollected by a Tstat probe can be queried in real time using a\nsimple web interface [2], and plots of historical measurements\nover multiple sites can be shown. Results presented in this\npaper are obtained from the corresponding RRD data.\nIII. TRAFFIC TRENDS FROM DIFFERENT VANTAGE POINTS\nAfter having presented the main Tstat features and char-\nacteristics, we now show Tstat capabilities through a few\n4TABLE I\nPROBES CHARACTERISTICS\nLocation Users Technology Type\nPolish ISP 10k ADSL Home\nHungarian ISP 4k ADSL Home\nItalian ISP 5k ADSL Home\nItalian ISP 15k FTTH Home\nItalian Campus 10k LAN and WLAN Campus\nresults and discuss some conclusions we drawn from our long\nexperience in using it.\nWe have been collecting measurement data since 2005 in\ncollaboration with several ISPs. A Linux-based Tstat probe\nhas been installed and properly configured in different Points-\nof-Presence (PoPs).\nA. Probe description\nThe main characteristics of the 5 probes are summarized in\nTab. I, which reports the PoP location, the approximate number\nof aggregated users, the access technology and the type of\ncustomers distinguishing between Home or Campus users. As\nit can be observed, the set of probes is very heterogeneous: it\nincludes Home users in three different countries, with ADSL\nor LAN and WLAN access technologies. Depending on the\ntype of contract with the ISP and on the quality of the physical\nmedium, ADSL technology offers the users different bitrates,\nranging from 2 to 20 Mb/s downstream and up to 1024 kb/s\nupstream. Fiber to the Home (FTTH) customers are offered\n10Mb/s full duplex Ethernet connectivity, while Campus users\nare connected to a 10Gb/s based Campus network using either\n100 Mb/s Ethernet, or IEEE 802.11a/b/g WiFi access points.\nThe Campus network is connected to the Internet via a single\n1 Gb/s link and a firewall is present to enforce strict policies,\nto block P2P traffic (unless obfuscated), and to grant access\nto only official servers inside the campus.\nProbes were upgraded several times to update the Tstat\nversion and to include advanced features, so as to enhance\ntraffic classification accuracy and augment the number of\nprotocol signatures. All probes are configured to continuously\ncollect RRD information.\nB. Traffic share and trends\nWe first present results covering the May 1st, 2009 to Oct\n31th, 2010 period. Figure 2 shows the traffic breakdown for\nincoming traffic, i.e., traffic received by customers. The appli-\ncations generating the largest amount of traffic are highlighted\nusing different colors. Over time, we enhanced the classifica-\ntion portfolio of Tstat by adding both PDPI/FSMDPI rules and\nstatistical signatures. For example, since June 2009 we have\nbeen collecting statistics about both Streaming Applications,\nsuch as YouTube, Vimeo, Google video and other flash-based\nstreaming services, and File Hosting Web based services like\nRapidShare or MegaUpload that allow users to share large\nfiles. Light and dark pink colors highlight them in the plots.\nDeveloped and double checked in the Campus network first,\nwe then deployed these capabilities into other probes. Simi-\nlarly, since December 2009 the BitTorrent obfuscated traffic\n(plotted in light green) is correctly identified by Tstat, and\nthe more recent BitTorrent UDP based data transport protocol\nnamed uTP [9] is correctly classified since July 2010 (dark\nred). This latter classifier was developed while investigating\nthe cause of the sudden increase of UDP traffic share that is\nclearly visible in the Hungarian vantage points during February\n2010. This is an example of the usage of Tstat to effectively\nsupport traffic monitoring.\nSeveral considerations can be derived from the presented\nresults.\n• Before the BitTorrent adoption of uTP protocol, the\nvolume of UDP traffic was marginal in all vantage points\nbut in the Italian ISP. This is due to this ISP offering\nVideo on Demand (VoD) services over UDP that makes the\nvolume of VoD UDP traffic in this network about 10% of\nthe total. Customers of the same operator are offered native\nVoIP service using standard RTP/RTCP protocols over UDP.\nStill, the volume of traffic due to VoIP is almost negligible,\naccounting for less than 2% of total traffic (in purple color\nin the figure). Nowadays, UDP traffic can top 20% of total\nvolume, depending on the popularity of BitTorrent-uTP or\nVoD applications. Therefore, the widely popular statement that\nUDP traffic is negligible does not hold anymore.\n• Applications usage is very different at different places.\nFor example, in Poland the fraction of HTTP traffic is predom-\ninant, with more than 60% of traffic due to several applications\nadopting HTTP protocol. In both the Italian ISP PoPs, instead,\nPeer-to-Peer (P2P) applications amount to more than 50% of\ntraffic, with eMule clearly being preferred over BitTorrent. In\nHungary, on the contrary, BitTorrent is more popular (with a\ntraffic share above 20%), while an almost negligible amount\nof traffic that is due to eMule. Finally, note that in the Italian\nCampus network the fraction of P2P traffic is marginal being\nthe firewall very effective in blocking such traffic.\n• Some slow long-term trends are clearly visible. For\nexample, P2P traffic share is generally decreasing, while\nstreaming applications are becoming more and more popular\nwith a share that has reached more than 20% in Poland,\nand is above 15% in other PoPs. Interestingly, there is a\ncorresponding positive trend for File Hosting applications,\nwhich are eroding important percentage of traffic to P2P file\nsharing applications. Indeed, the same content can be retrieved\nby users through P2P or File Hosting technologies. The latter\nis nowadays becoming more and more popular among users\nsince it offers much better performance.\n• While the above mentioned changes in traffic shares are\ntypically slow, sudden changes are possible due to changes\nin the application. For example, as already mentioned, the\npopular µtorrent application was updated during February\n2010 to use by default the uTP transport protocol instead\nof TCP. Correspondingly, there is an increase of UDP traffic\nclearly visible in some probes. Similarly, RapidShare changed\nthe application protocol during September 2010, and this\nchange fooled the PDPI classifier. An artificial drop in File\nHosting traffic is then observed in those vantage points in\nwhich RapidShare is popular, e.g., in Poland.\n• Traffic shares are very stable and little variations are\nvisible among different days. Only in the Campus network,\n5Tr\naf\nfic\n sh\nar\ne \n[%\n]\nPolish ISP\n 0\n 10\n 20\n 30\n 40\n 50\n 60\n 70\n 80\n 90\n 100\nMay09 Jul09 Sep09 Nov09 Jan10 Mar10 May10 Jul10 Sep10\nTr\naf\nfic\n sh\nar\ne \n[%\n]\nHungarian ISP\n 0\n 10\n 20\n 30\n 40\n 50\n 60\n 70\n 80\n 90\n 100\nMay09 Jul09 Sep09 Nov09 Jan10 Mar10 May10 Jul10 Sep10\nTr\naf\nfic\n sh\nar\ne \n[%\n]\nItalian ISP - ADSL\n 0\n 10\n 20\n 30\n 40\n 50\n 60\n 70\n 80\n 90\n 100\nMay09 Jul09 Sep09 Nov09 Jan10 Mar10 May10 Jul10 Sep10\nTr\naf\nfic\n sh\nar\ne \n[%\n]\nItalian ISP - FTTH\n 0\n 10\n 20\n 30\n 40\n 50\n 60\n 70\n 80\n 90\n 100\nMay09 Jul09 Sep09 Nov09 Jan10 Mar10 May10 Jul10 Sep10\nTr\naf\nfic\n sh\nar\ne \n[%\n]\nItalian Campus\n 0\n 10\n 20\n 30\n 40\n 50\n 60\n 70\n 80\n 90\n 100\nMay09 Jul09 Sep09 Nov09 Jan10 Mar10 May10 Jul10 Sep10\nTCP\nEmule\nEmule Obfuscated\nBitTorrent\nBitTorrent Obfuscated\nP2P\nStreaming (YouTube, Vimeo, ...)\nFileHosting (RapidShare, ...)\nOthers HTTP (SSL/TLS, Chat, ...)\nHTTP\nFTP, email, Unknown\nOther\nUDP\nBitTorrent\nBitTorrent uTP\nP2P\nDNS, VoD, Unknown\nRTP/RTCP\nOther\nFig. 2. Comparison of traffic as observed on 5 different traffic probes.\nthe variability is very high and the weekly pattern is clearly\nvisible (see also the figure and related comments in the next\nsection). Indeed, during the weekend, few users are present in\nthe campus and little traffic flows on the link. This causes the\napplication usage pattern to be different.\nWhile it is out of scope of this paper to provide a detailed\nanalysis of Internet traffic trends and user habits, the presented\nresults highlight the importance of constantly monitoring the\nnetwork with a flexible tool that has to be constantly upgraded\nand enhanced to follow its changes.\nIV. SCALABILITY ISSUE OF SOFTWARE BASED\nMONITORING TOOLS\nWhen implementing a live monitoring tool, the knowledge\nof the maximum sustainable load that the probe can handle\nis one the most critical issues that must be faced. Indeed, as\nseen in the previous section, Internet traffic widely changes\nover both time and space. In a finer timescale, traffic is known\nto exhibit even larger variability considering both the packet\nand flow levels. For example, packet level burstiness can stress\nthe sniffing hardware so that packet bursts can arrive at very\nhigh speed. Packet capturing, filtering and timestamping are\nthen critical, especially if implemented in software. Similarly,\nbursts of new flows can stress the per-flow operations, so that\nmemory management becomes typically a bottleneck.\nWhile Tstat is as an example of advanced traffic monitoring\ntool, most of the operations it handles are common to any\nflow level sniffer and monitoring tool. Indeed, similar data\nstructures must be used to store basic per-flow information\nsuch as flow identifier, packets and bytes counters, timestamp\nand the classification status. Notice that flow structures must\nbe accessed and updated for each packet: hence, efficient\ndata structures like hash-tables must be considered, where\n6Tr\naf\nfic\n lo\nad\nCP\nU\n lo\nad\n [%\n]\nMax CPU load Tot. bitrate [Mb/s] Flows tracked [kflows/s]\n 0\n 50\n 100\n 150\n 200\n 250\n 300\n 350\n 400\n 450\n 500\nMon Tue Wed Thu Fri Sat Sun Mon\n 0\n 20\n 40\n 60\n 80\n 100\n(a) Italian ISP FTTH probe.\nTr\naf\nfic\n lo\nad\nCP\nU\n lo\nad\n [%\n]\nMax CPU load Tot. bitrate [Mb/s] Flows tracked [kflows/s]\n 0\n 50\n 100\n 150\n 200\n 250\n 300\n 350\n 400\n 450\nMon Tue Wed Thu Fri Sat Sun Mon\n 0\n 20\n 40\n 60\n 80\n 100\n(b) Italian Campus probe.\nFig. 3. Total link bitrate, number of flows and maximum CPU utilization\nduring a typical week.\ncollisions are minimized and eventually handled using chain-\ning. Further optimizations of memory management are also\nneeded; freed structures should be manually handled as reuse\nlists by a garbage collector, so as to avoid generic and\nexpensive garbage collection routines to kick-in and slow\ndown the main analysis operations.\nIn [10] we extensively analyzed the computational com-\nplexity of the Tstat analysis workflow, showing that even with\noff-the-shelf hardware it is possible to run advanced analysis\ntechniques on several Gb/s worth of traffic in real-time.\nTo provide some examples of the typical workload that\nTstat has to support, and to highlight some critical points\nin the design of a flow-sniffer, Fig.3 shows the evolution a\none week long period of time of the total link bitrate (gray\nline), number of tracked flows (black line) and maximum CPU\nutilization (dotted line), i.e., the total time spent by the CPU\nin running Tstat, including both kernel and user space CPU\ntime. Measurements refer to a time window of 5 minutes. The\nresults for the Italian ISP FTTH and Italian Campus probes\nare reported on the top and bottom plots, respectively; results\nfrom other probes are not reported for the sake of brevity.\nConsidering the total link bitrate, the two probes handle\napproximately the same amount of traffic, which tops to nearly\n500Mb/s at the peaks. Notice that the peak-hour occurs at\ndifferent times, reflecting the different user habits of Home\nand Campus users. The number of active flows is also very\ndifferent, with the Campus probe having to handle a per-flow\nload which is about two times higher. This is due to the\ndifferent traffic mix generated by Campus users, as previously\nshown in Fig. 2. Therefore, hash table sizes must be correctly\ntuned to support the various values of the load.\nIn the CPU load curves, we see a very different behavior:\nthe Italian ISP probe exhibits a very low maximum CPU\nutilization, which is not correlated with either the packet or\nflow level patterns. On the contrary, the Campus maximum\nCPU utilization is always above 30%, and it tops 100% during\nsustained traffic load. Investigating further, we pinpointed this\nto be due to the packet capturing input module, which is\nbased on a common Gigabit Ethernet linecard in the Campus\nprobe, while the Italian ISP probe relies on dedicated Endace\nlinecard. Based on our experience indeed, the major bottleneck\nis due to the linecard-to-memory communications, which can\noverload CPU by generating a large number of Interrupt\nRequests (IRQ) per second, i.e., one for each received packet.\nDedicated traffic capturing devices solve this problem by im-\nplementing timestamping functionalities and Direct Memory\nAccess (DMA) based transfers of packet batches. The CPU\nutilization figures of the other probes, not shown in the paper\ndue to lack of space, confirm this. All ISP probes are indeed\nequipped by dedicated hardware capturing linecards so that\nthe maximum CPU utilization remains very limited even if\nthey have to handle a large volume of traffic, topping to about\n1.5Gb/s.\nIn summary, with common hardware it is possible to mon-\nitor several Gb/s volumes of traffic in real time, provided\nthe packet capturing is performed with efficient hardware that\noffload the CPU from the per-packet memory copy and times-\ntamping operations. Similarly, efficient memory management\nalgorithms must be adopted to perform per flow operations,\nwhich optimize both the flow lookup performed for every\npacket, and garbage collection mechanisms required to avoid\nmemory starvation.\nV. CONCLUSIONS\nIn this paper, we described our experience in using Tstat,\na software based Internet traffic monitoring tool that we\nhave being developing for the past 10 years. Presenting\nmeasurements collected from several ISP networks, we have\nshown that Internet traffic widely changes over both time and\nspace: application shares are different at different networks\neven if common trends are visible due to slow changes in\napplications popularity; however, sudden changes are observed\nafter the deployment of disruptive technologies made by appli-\ncations themselves. We then discussed the implication of using\nsoftware based solutions for traffic monitoring showing that\nmoderate volumes of traffic can be monitored with common\nhardware, provided that efficient packet capturing devices are\nused, and proper memory management is implemented.\nREFERENCES\n[1] Les Cottrell, “Network Monitoring Tools Collection,” http://www.slac.\nstanford.edu/xorg/nmtf/nmtf-tools.html.\n[2] Tstat Homepage, http://tstat.tlc.polito.it\n[3] A.Finamore, M.Mellia, M.Meo, D.Rossi, “KISS: Stochastic Packet\nInspection Classifier for UDP Traffic” IEEE/ACM Transactions on\nNetworking, , Vol.18, No.5, pp.1505-1515, Oct. 2010.\n[4] M.Mellia, R. Lo Cigno, F. Neri, “Measuring IP and TCP behavior on\nedge nodes with Tstat”, Computer Networks, Vol.47, No.1, pp.1-21, Jan.\n2005.\n7[5] TCPTrace Homepage, http://www.tcptrace.org\n[6] T.Nguyen, G.Armitage, “A survey of techniques for internet traffic\nclassification using machine learning,” Communications Surveys \u0026 Tu-\ntorials, IEEE, vol.10, no.4, pp.56-76, 2008.\n[7] M. Pietrzyk, J. Costeux, G. Urvoy-Keller, and T. En-Najjary, “Challeng-\ning Statistical Classification for Operational Usage : the ADSL Case”,\nACM Internet Measurement Conference, Chicago, IL, Nov. 2009.\n[8] RRDtool Homepage http://oss.oetiker.ch/rrdtool/\n[9] S.Shalunov, G.Hazel, J. Iyengar, “Low extra delay background transport\n(LEDBAT)”, IETF Draft, October 2010.\n[10] D. Rossi and M. Mellia, “Real-Time TCP/IP Analysis with Common\nHardware,” IEEE International Conference of Communication (ICC’06),\nIstanbul, Turkey, June 2006.\n","id":11424131,"identifiers":{"doi":"10.1109/MNET.2011.5772055","oai":"oai:porto.polito.it:2486379"},"title":"Experiences of Internet Traffic Monitoring with Tstat","language":{"code":"en","name":"English"},"publishedDate":"","publisher":"IEEE","references":[{"id":6440696,"title":"A survey of techniques for internet traffic classification using machine learning,”","authors":[],"date":"2008","doi":"10.1109/surv.2008.080406","raw":"T.Nguyen, G.Armitage, “A survey of techniques for internet traffic classification using machine learning,” Communications Surveys \u0026 Tutorials, IEEE, vol.10, no.4, pp.56-76, 2008.","cites":null},{"id":6440698,"title":"Challenging Statistical Classification for Operational Usage : the ADSL Case”,","authors":[],"date":"2009","doi":"10.1145/1644893.1644908","raw":"M. Pietrzyk, J. Costeux, G. Urvoy-Keller, and T. En-Najjary, “Challenging Statistical Classification for Operational Usage : the ADSL Case”, ACM Internet Measurement Conference, Chicago, IL, Nov. 2009.","cites":null},{"id":6440700,"title":"Low extra delay background transport (LEDBAT)”,","authors":[],"date":"2010","doi":null,"raw":"S.Shalunov, G.Hazel, J. Iyengar, “Low extra delay background transport (LEDBAT)”, IETF Draft, October 2010.","cites":null},{"id":6440692,"title":"M.Meo, D.Rossi, “KISS: Stochastic Packet Inspection Classifier for UDP Traffic”","authors":[],"date":"2010","doi":"10.1109/tnet.2010.2044046","raw":"A.Finamore, M.Mellia, M.Meo, D.Rossi, “KISS: Stochastic Packet Inspection Classifier for UDP Traffic” IEEE/ACM Transactions on Networking, , Vol.18, No.5, pp.1505-1515, Oct. 2010.","cites":null},{"id":6440694,"title":"Measuring IP and TCP behavior on edge nodes with Tstat”,","authors":[],"date":"2005","doi":"10.1016/j.comnet.2004.06.026","raw":"M.Mellia, R. Lo Cigno, F. Neri, “Measuring IP and TCP behavior on edge nodes with Tstat”, Computer Networks, Vol.47, No.1, pp.1-21, Jan. 2005.7","cites":null},{"id":6440690,"title":"Network Monitoring Tools Collection,”","authors":[],"date":null,"doi":null,"raw":"Les Cottrell, “Network Monitoring Tools Collection,” http://www.slac. stanford.edu/xorg/nmtf/nmtf-tools.html.","cites":null},{"id":6440702,"title":"Real-Time TCP/IP Analysis with Common Hardware,”","authors":[],"date":"2006","doi":"10.1109/icc.2006.254794","raw":"D. Rossi and M. Mellia, “Real-Time TCP/IP Analysis with Common Hardware,” IEEE International Conference of Communication (ICC’06), Istanbul, Turkey, June 2006.","cites":null}],"sourceFulltextUrls":[],"updatedDate":"","yearPublished":"2011","links":[{"type":"download","url":"https://core.ac.uk/download/pdf/11424131.pdf"},{"type":"reader","url":"https://core.ac.uk/reader/11424131"},{"type":"thumbnail_m","url":"https://core.ac.uk/image/11424131/medium"},{"type":"thumbnail_l","url":"https://core.ac.uk/image/11424131/large"},{"type":"display","url":"https://core.ac.uk/outputs/11424131"}],"abstract":"","tags":["info:eu-repo/semantics/article"],"fulltextStatus":"disabled","subjects":["info:eu-repo/semantics/article"],"oai":"oai:porto.polito.it:2486379","deleted":"ALLOWED","disabled":false,"journals":null,"repositories":{"id":"351","openDoarId":0,"name":"PORTO Publications Open Repository TOrino","urlHomepage":null,"uriJournals":null,"physicalName":"noname","roarId":0,"pdfStatus":null,"nrUpdates":0,"lastUpdateTime":null},"repositoryDocument":{"id":11424131,"depositedDate":"2011-05-01T00:00:00+01:00","publishedDate":null,"updatedDate":"2018-02-08T15:13:47+00:00","acceptedDate":"2011-05-25T00:00:00+01:00","createdDate":"2013-07-10T14:52:25+01:00"},"urls":["http://porto.polito.it/2486379/1/tstat_IEEENET.pdf","http://porto.polito.it/2486379/"],"lastUpdate":"2018-02-08T15:13:47+00:00","setSpecs":[],"sdg":[],"structuredData":"{\"@context\":\"http://schema.org\",\"@graph\":[{\"@type\":\"BreadcrumbList\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"item\":{\"@id\":\"https://core.ac.uk\",\"name\":\"CORE\"}},{\"@type\":\"ListItem\",\"position\":2,\"item\":{\"@id\":\"https://core.ac.uk/search?q=repositories.id:(351)\",\"name\":\"PORTO Publications Open Repository TOrino\"}},{\"@type\":\"ListItem\",\"position\":3,\"item\":{\"@id\":\"https://core.ac.uk/reader/11424131\",\"name\":\"Experiences of Internet Traffic Monitoring with Tstat\",\"image\":\"https://core.ac.uk/image/11424131/large\"}}]},{\"@type\":\"ScholarlyArticle\",\"@id\":\"https://core.ac.uk/reader/11424131\",\"headline\":\"Experiences of Internet Traffic Monitoring with Tstat\",\"name\":\"Experiences of Internet Traffic Monitoring with Tstat\",\"author\":[{\"@type\":\"Person\",\"name\":{\"name\":\"Finamore, Alessandro\"}},{\"@type\":\"Person\",\"name\":{\"name\":\"Mellia, Marco\"}},{\"@type\":\"Person\",\"name\":{\"name\":\"Meo, Michela\"}},{\"@type\":\"Person\",\"name\":{\"name\":\"Munafo, M.\"}},{\"@type\":\"Person\",\"name\":{\"name\":\"Rossi, Dario Giacomo\"}}],\"datePublished\":\"\",\"isAccessibleForFree\":true,\"provider\":{\"@type\":\"Organization\",\"name\":\"PORTO Publications Open Repository TOrino\"},\"image\":\"https://core.ac.uk/image/11424131/large\",\"publisher\":{\"@type\":\"Organization\",\"name\":\"IEEE\"}}]}","statusCode":200},"__N_SSP":true},"page":"/[pdfId]","query":{"pdfId":"11424131"},"buildId":"U3yrPyMBTipvhsz3tu9xL","assetPrefix":"/reader","isFallback":false,"gssp":true}</script><script nomodule="" src="/reader/_next/static/chunks/polyfills-6d6c377365630078ee4c.js"></script><script src="/reader/_next/static/chunks/main-d72a0256b9a8bfaec3f0.js" async=""></script><script src="/reader/_next/static/chunks/webpack-bdedcfd1ed701b02120b.js" async=""></script><script src="/reader/_next/static/chunks/framework.bff3b006583f00846d57.js" async=""></script><script src="/reader/_next/static/chunks/pages/_app-9a52499a2051e69b3e10.js" async=""></script><script src="/reader/_next/static/chunks/pages/%5BpdfId%5D-2f928bd9a0683fd71079.js" async=""></script><script src="/reader/_next/static/U3yrPyMBTipvhsz3tu9xL/_buildManifest.js" async=""></script><script src="/reader/_next/static/U3yrPyMBTipvhsz3tu9xL/_ssgManifest.js" async=""></script></body></html>