How to manage network performance data

Monitoring the right things in your network and making them visible in dashboards can make the difference between proactive and reactive responses to network issues. I recently wrote about using log data in network management, and now it’s time to talk about managing network performance data.

What to monitor

Network monitoring systems must collect the data needed to detect problems and diagnose failures. Pay attention to the following data:

Interface state changes (up/down),
Errors
Dropped Packages
Capacity utilization
Device memory
CPU usage

In addition to the above, it collects data about routing and switching protocols and neighbors. This is beneficial for diagnosing packet forwarding problems. It is best to collect this data within a minute or five minutes. This helps catch problems before they become widespread. Less frequent collection may hide significant events. As I wrote last year, “We often have difficulty conceptualizing how problems increase as our computer systems grow. Events that are supposed to be extremely rare occur more frequently than our minds would indicate from a other way”.

Automated test results from a digital experience monitoring platform are another valuable source of data. Correlation and reporting of this data drives proactive detection of network problems.

Data collection

Recently, there has been a push to move from the “pull” model of SNMP data collection to the “push” model of telemetry. But they are not the only sources of data. There’s also the command-line interface, the Windows Management Interface, and APIs, to name a few. The reality is that the method of data collection does not make a significant difference. The goal is to get the data and store it in a format that makes it useful for analysis.

Data storage

Network monitoring relies on several types of data. Some data is highly relational and is best stored in a relational database. Examples of this relational data include: device type, operating system version, hardware inventory, location, and technical contact. However, performance data is tied to the time it was collected and is best stored in a time series database.

Matching the data type to the corresponding database type provides the best performing network monitoring system (NMS) platform. Systems that attempt to store performance data in a relational database typically require a large platform architecture with multiple collectors, and these systems have a correspondingly high implementation and management cost. One of the reasons relational databases don’t work well for performance data is that the database is read-optimized by maintaining indexes. Updating table indexes when writing performance data to the relational database makes it much slower than writing the same data to a time series database.

Performance dashboards

The network management system must convert the collected data into useful information. This is where many systems fall short. There may be eye-catching graphics with pie charts and bar graphs showing the performance of parts of the network. But what is needed are views of network elements that need attention, such as interfaces with high errors/drops, high utilization, or firewall tables with almost full status.

How do you identify the most problematic statistics? One of the most useful methods is through a “Top-10” report that shows the top ten items, sorted by percentage or count. Note that packet loss percentage thresholds should be on the order of 0.0001% of total packets to identify interfaces that affect TCP performance. Utilization figures should be based on the 95th percentile calculation as outlined in the article on the dangers of network average statistics.

Displaying all these statistics in a concise dashboard is quite a challenge and this is where most network management platforms fail. Some screens need both the current value and a historical chart so you can determine the trend. Here is a concise mockup of a dashboard full of information. Take advantage of Top-5 lists and show that they separate network and security events into different categories. A more comprehensive dashboard would include additional statistics.

Additional network management data sources, such as digital experience monitoring alerts, can be incorporated using APIs to collect the data.

Problems of scale

Networks with more than a few thousand interfaces will start to run into scaling problems. The main factor is the number of interfaces to be monitored. It is typical for most medium-scale network equipment to have an average of about fifty network interfaces. Networks that use a significant number of virtual interfaces may have a higher average. Multiply the average interface figure by the number of devices to arrive at an approximate number of managed items. This is the number most vendors will need to size a network management system (and its price).

Data collection, storage, analysis and display of results have an impact on the size of the management system. Many network management systems address scale by accumulating data periodically. The typical process averages the raw daily performance data into hourly data each day. The resulting loss of high-fidelity data makes detailed historical analysis impossible.

The user interface presents another scaling problem. How does the system display data collected from hundreds or thousands of devices and interfaces? That’s where the user interface needs to provide easy-to-use display filtering and concise reports that highlight useful information.

Summary

Network monitoring is a typical big data business application. Large volumes of data contain clues to network performance issues. Applying the right techniques can help you identify problems to improve network performance.

Leave a Comment Cancel Reply