Cisco Malware Detection: What Communications Folks Need to KnowCisco Malware Detection: What Communications Folks Need to Know
Encrypted Traffic Analytics, a new method for detecting malware in encrypted data traffic, may have applicability for encrypted SIP flows.
August 31, 2017
One of the most intriguing capabilities Cisco announced at its June Cisco Live conference is Encrypted Traffic Analytics (ETA), a solution that has the ability to examine encrypted data traffic and identify threats, like viruses and malware. The company claims ETA is 99+% accurate in detecting these menaces without decryption.
This post describes how ETA works, why it requires a new generation of switches, and why we in the communications industry should care.
Encrypted Traffic Is Growing
According to an April 2017 study sponsored by Thales e-Security, enterprise use of encrypted data flows, already on the rise, is expected to increase rapidly as companies roll out Internet of Things (IoT) programs and make encryption devices and policies the norm. Encrypting data brings a far greater sense of content security than allowing open data flows, but it turns out that hackers and malware makers are also incorporating the use of encrypted flows to make their threats much more difficult to detect.
Many organizations have responded to this increase in encrypted traffic by putting some sort of trusted "device in the middle" that decrypts traffic, does a deep packet inspection looking for threats, and then re-encrypts the data. While this method works, it isn't scalable in terms of investment and required compute power.
Cisco's ETA Approach for Malware Detection
With ETA, Cisco takes an alternative approach for examining encrypted traffic by examining patterns in malware-infected, but still encrypted, data flows. Many malware schemes create unique fingerprints or identifiable patterns while they are setting up the flows and as the flows progress. By training a machine learning algorithm using known patterns of infected encrypted data, ETA can detect malware even while the data flow is encrypted.
Two key elements establish a malware fingerprint in encrypted data: the initial data packet, and the sequence of packet lengths and times during a flow.
Many encrypted data flows use transport layer security (TLS) as the cryptographic protocol for providing security between two applications communicating over a network. The majority of TLS handshake messages are unencrypted, and Cisco switches in the flow path can gather this TLS handshaking information and use it as meta data. The initial packet offered by the device initiating the flow is very important because it provides a gold mine of TLS information while remaining unencrypted.
TLS handshaking for establishing a secure connection involves the following steps:
Agree on the version of the TLS protocol to use
Agree on the cryptographic algorithms to use
Exchange and validate digital certificates
Generate a shared secret key
Cisco also collects the sequence of packet lengths and times because they can serve as indicators of what's happening in an encrypted flow.
The figure above shows packet length (vertical lines) and arrival times (horizontal lines) for two different TLS sessions. On the left is a pattern for a typical Google search, while the image at the right is a session for the BestaFera trojan hackers used to collect a user's online banking data and send information to a control server. The red lines at the start represent unencrypted TLS packets while the gray lines are encrypted data flows.
The Google search at the left proceeds as expected. The user begins typing, and the browser sends an outbound packet to Google. Google immediately responds with a lot of packets containing possible auto-complete results based on its predictive algorithms using the typed letters or words. The small gray packets on top represent the user still typing as he/she completes entering the search terms. Google then sends updated results.
In the malware image on the right, the TLS handshake occurs, but the BestaFera server sends back a self-signed certificate (note it is still in red, unencrypted, so ETA can detect it). The virus then commands the user's device to begin sending a lot of data (Data Exfiltration), as shown in the upper gray lines. Finally, the virus server sends a command and control message (the C2 Message).
The point is that mapping arrival times and packet sizes along with TLS handshake information provides a pattern for detecting both good and bad data in encrypted traffic flows.
Tuning the Machine Learning Algorithm
In the example above, Cisco used the free scikit-learn software machine learning library. Written in the Python programming language, scikit-learn has a number of sponsors including INRIA, a French technology institute; New York University; Paris-Saclay Center for Data Science; Columbia University; and Google.
In simple terms, engineers can use the scikit-learn machine learning program to classify data or information. They can also use it to estimate values (regression) or to identify clusters. Cisco is using it to classify data flows as either malicious or benign.
Without going into too much detail, readers should understand that engineers can tweak and tune machine learning models, and that it takes judgment and skill to determine which tuning parameters will give the best results. The data below shows the results of Cisco's training of the scikit-learn program.
The data shown above illustrates the accuracy of the model given different data combinations. Legacy, on the left, means typical NetFlow information, such as the duration of the flow and the number of packets and bytes exchanged by each side. Legacy/SPL adds the sequence of packet lengths while TLS adds data for the TLS handshaking.
The most important data to examine are the two bottom rows, as all available data is used to train the model. The tradeoff between correctly detecting malware and predicting false positives is clear. For example, at the 0.5 value for the tuning parameter, the model correctly detects malware 99.35% of the time and benign flows 98.38% of the time. This 98.38% figure means that in 162 flows out of 10,000 (10,000 - 9,838), the model will incorrectly predict that a benign flow has malware (a false positive). When the tuning parameter is set to 0.99, the model gets the benign packets right 100% of the time, but is only 68.83% accurate in detecting malware packets. The point is that these machine learning models are rarely 100% accurate, which is the case when detecting both benign and malware-laden packets. Thus, human judgment and understanding is still required, even when artificial intelligence and machine learning are in use.
When ETA predicts malware, it does not automatically quarantine a machine from the network. Rather it raises an alarm that manual, human intervention is required to place devices under quarantine.
Continue to next page to read about the ETA product ecosystem and why the communications industry should care
Continued from Page 1
ETA Is Really an Ecosystem of Products
An organization can set up ETA to examine some or all flows. For example, an organization that needs to approve content or application use for some individuals may require some data flows to be decrypted. For those flows, the organization would use the device-in-the-middle approach over the ETA method. But, for many other flows, ETA will provide a scalable, cost-effective way to examine encrypted data flows for malware.
ETA is actually an ecosystem play, consisting of Cisco switches, NetFlow collectors, and cloud computing.
To enable the significant processing power required to collect/compute the meta data ETA uses, Cisco has developed a new ASIC for its Catalyst switches (9300 series) to give them the CPU cycles required for generating these new data elements. Some Cisco ISR devices with a lower port count do not require the new ASIC as the existing processing power is sufficient.
Once the switch computes the ETA parameters, it places them into a standard NetFlow stream for forwarding to Cisco Stealthwatch servers, which, among other things, collect and analyze the ETA data to detect anomalies. Cisco has coupled Stealthwatch with its cloud-based Cognitive Threat Analytics engine for correlating traffic with global threat behaviors to identify infected hosts, breaches, and suspicious traffic.
Why Should We in the Communications Space Care about ETA?
Although research has been ongoing for a number of years with respect to identifying content in encrypted data flows, Cisco's ETA represents a leap forward in commercializing this technology and putting it to a constructive use. Perhaps in the future we'll see development of an ETA-like mechanism to do malicious threat detection in encrypted SIP flows, which is critical to our industry.
This could possibly be done in the session border controller so that SIP traffic no longer needs to be decoded/re-encoded to traverse the network boundary. Almost all IP-based voice and video calls, as well as IM/presence flows, are encrypted, yet they continuously go through this decrypt/encrypt cycle. While I'm not aware of significant malware hiding in encrypted SIP flows at this point, given the prolific nature of voice, video, and chat, it is likely lurking out there in the wild and may one day spring upon us. It will be useful to have techniques like ETA and others available to safeguard these flows.
In summary, ETA is being put to use for a good purpose: detecting malicious data in encrypted flows. However, similar encrypted data analytics techniques can be a bit nefarious. For example, it has been reported that third parties can identify which Netflix videos you may be watching even though the video stream is encrypted. This can give tons of personal preference information to intermediary network providers or anyone running such algorithms on switches/routers through which Netflix data traverses. You can easily imagine that encrypted YouTube and other videos available on cloud-based servers may also be fingerprinted.
I think we should all be aware that this technology exists, and understand that like most technologies, it can be used for really great purposes, like ETA, and it may also be exploited for less honorable reasons.