My Story…

Throughout my life I have been pushing the boundaries of technology and a little on the artistic/music side of things.

Let the Story Begin…

I began my career in 1988 as a analytical/environmental chemist looking to apply computers, data collection, and analytics to chemistry. Data Science and analysis have always been apart of each stage of my career. My first stage as a developer started in 1991 working on embedded systems and lead architect for a message-based multi-processing system for automated mail sorting. In my next stage I moved into more distributed architectures and agent technology with Fidelity and then Sonalysts. During this stage I created a basement lab to begin to explore network security. I moved more into network security in 2004 by starting my own group within Sonalysts called Guardian Services. I created a new technology focused on aggregate behavior analysis in 2006 and won a contract with the Dept. of Homeland security. I then wrote and received my first patent in 2013 based on the work I did for them. I moved into cloud technologies, private first, then public cloud later, in 2018 for Secureworks, Inc. For one of my projects, I created and then implemented a annotation capability used to label threat security events.

Leveraging Open Source Technology to Startup and Establish a Disruptive Cyber Defense Capability

In 2006, with an initial $1M seed capital (tons of sweat equity), and an initial part-time team of 2, eventually expanding to 15 diverse individuals, we were able to produce a tool, that existing MSSPs, and cyber defense tooling couldn’t replicate, Occulex.
This team was comprised of multiple public and private institutions, that we shared data and ideas with on: the need to establish a threat-focused Ontology (Today we have Mitre Att&ck), near-realtime risk, trust models (NATO paper), establishing moving target defense strategies based on device trust. Towards then end we started to conceptualize fusing cyber and physical domains, 2010, creating a journal article and presenting. Today that is known as IT/OT fusion.
This is an overview of that journey, and the lessons learned can be applied to a diverse set of industry sectors: LLM, Cyber Security, Healthcare, BioMedical, Genetics, etc… . .

Things started in a Small Lab in My Basement in the 90s

I began experimenting with open source in the mid to late 90s (e.g. Slackware, OpenBSD, PF, openpce, Snort, etc.) when I created a small network security lab in my basement. As time progressed, I installed honeypots, network flow analysis tools (US CERT SIlk), pen-testing tools (metasploit, nmap), network taps, NIDS (Snort, later Suricata and Bro), firewalls with OpenBSD then pfsense. I really got to understand the data types, current threat TTPs, and sensor bias and fidelity.

That knowledge allowed me cultivate the idea of how to fuse cyber telemetry, in a completely different fashion.

Evolved the Lab to Focus on the Development of Aggregate Behavior Analysis

By the 2006, that little lab evolved into (created by 2 people, a colleague and myself) an ingestion capability (driven by an Endace DAG), event streaming pipeline, transformation software (250k lines of C++ code, with OpenPCE), and HPC support to foster the development of ML techniques on the fused data sets. R was used for initial investigations, before we attempted migrating algorithms to the OpenMPI cluster. Nowadays, this rack of servers will/could be in the cloud, (if in AWS then, RDS, S3, MSK, EKS, Sagemaker, Jupyter, Spark will be used).

Created a 3D Visualization Tool to Analyze Aggregate Data and Drill into Raw Data

In 2010, the data produced from the pipeline was so different and complex, that we created a immersive 3D visualization tool in Java and OpenGL (~50k worth of Java UI code Model View Controller style architecture). We integrated MaxMind, for IP geolocation and could aggregate up to Autonomous Systems (AS). This tool allowed analysts to interact, zoom in, and highlight device behaviors, and to drill into the raw flow data. Someone called it a Cyber MRI, and later we called the platform Occulex. Lately, I have researched the use of Unreal Engine v5, to create immersive collaborative analytic environments.

The team was able to pick out emergent behaviors, qualify normal network behaviors, and share actionable intel with local and federal agencies. The approach logically compressed network flow data 1000x times, based on aggregation time window selection.

Our work gave me opportunities to speak at: NATO’s CCDCEO, to talk about the technique, lectures at MIT, an invite to Obama’s National Cyber Leap Year where I was part of the nature inspired defense Team, presentations in Predictive Analytics Symposiums, talks in DC to Dept. of Homeland Security and other agencies, invites to SRI to build out startup knowledge on value propositions, and elevator pitches.

The overall model, and architecture was established in 2006, yet by layering the processing of data, feeding to ML models, it offers a deep level of eXplainable AI (XAI) perspectives, when analysts are trying make sense of higher level outcomes. Note, to be clear, much of the initial work was focused on the establishment of primitives in Layer 3.

That said, I dove into modeling the data, published my findings (Deriving Behavior Primitives fromAggregate Network Features using Support Vector Machines), and presenting them to NATO’s CCDCOE in Tallinn, Estonia. I was able derive behavior primitives using support vector machines, SVM, using subsets of the derived and rich feature space.


One common theme on this specific ABA approach, is that when you combine high and low fidelity sensor telemetry (see my blog on sensor bias here), e.g. combine network flow/process behaviors, with IDS events, you can “burn the haystack” to get the needle, depending on your approach in data transformations, aggregations, and fusion.

Establishing a Data Science Development Lifecycle Supporting ABA

Starting in 2006, my team and I needed to establish a data science lifecycle facilitating the development of a new type of cyber defense technology, aggregate behavior analysis, ABA. This lifecycle hinged on the unique transformation that was done on network flow, and later process data. This lifecycle included the establishment of ground-truth data sets, data aggregation/transformation step, data cleansing/normalization, feature selection, using machine learning classification techniques e.g. SVM, to identify abstract primitives, and then model evaluation.

Summary of How We Evolved Aggregate Behavioral Analysis (ABA)

So it was in the summer of 2006, while on vacation with my family, I received a phone call from a very good friend at work, Jane Goldsmith, that I was awarded a SBIR grant through DHS. I was stunned, amazed, and my mind locked in on the challenge. The thesis for this idea was how do we gather behaviors from not only rule-driven data, e.g. firewalls, IPS/IDS, but also, form the underlying foundations of network communications, Network Flow data e.g. Netflow v9 (at the time). So, we can uncover “what we don’t know, we don’t know”. Also, use the derived data, fuse it with rule-based data, to “burn the haystack, to get closer to the needle.”.

During the proposal writing, I assembled a team that year based on public and private partnerships comprised of Universities (University of Connecticut, Dalhousie University), and risk-based think-tank (Delta-Risk), and a consultant from RedJack. Early on we assembled workshops to focus the concept on finding botnets. After we gathered data, performed some data science based investigation, we established a proof of concept. I put together a value proposition, later a business plan. I spent a summer writing a patent, got it approved and Sonalysts, Inc has established a multi-million dollar business based on it, evolving and alive today.

Also, before the onrush of cyber security, I delved into opensource network security systems in the late 1990s, establishing a small focused group of researchers at Sonalysts. After Sonalysts, in the fall of 2014, (I broke my ankle, and had some downtime before starting my next journey), I started to put ideas for a book focused on the concept of behavior attribution analytics, I have a way to go.

Cloud Development and the use of Ontology

I am now working with big fast data systems at Secreworks, developing in the cloud. As an architect I created a platform that first integration of Mitre Attack data being used to annotate security events to make them better understood by our clients, a kind of Ontology.

Embedded Systems Message-based Systems

In the early 1990s, my first software-related job experience, after being an environmental chemist, and studying software engineering, was with a startup working on a next generation intelligent mail sorting platform. I was their lead architect. This was an incredible experience, and an awesome team.

I started creating real-time cyber physical systems (CPS) back in the early 1990s, with mail sorting equipment. A cyberphysical system (CPS) is a computer system in which a mechanism is controlled or monitored by computer-based algorithms, and then in automating manufacturing with clothes cutting systems at Gerber Garment Technology. I started with distributed web-based applications rolling out Fidelity’s first online web-based trading systems in that time. Also, before the onrush of cyber security,

Opensource and a Simple Streaming Framework

I delved into opensource network security systems in the late 1990s, establishing a small focused group of researchers at Sonalysts. After Sonalysts, in the fall of 2014, (I broke my ankle, and had some downtime before starting my next journey), I started to put ideas for a book focused on the concept of behavior attribution analytics, I have a way to go.

In the mid-90s I started working on a simple event stream processing framework called the open pervasive computing environment (opce). It was written in C++. The goals was to create streaming capability that would support transformation and custom statistical computations. I open-sourced it in the early 2000’s. (https://github.com/mccuskero/openpce), (https://sourceforge.net/projects/open-pce/). The framework allowed for the creation of stream processors that fed off of sockets. Keep in mind, this type of processing is now done by frameworks like Kafka, and and Flink. The goal of the framework was to allow, from a consistent framework, to computationally acquire statistics and behaviors over not only server-based applications, but, also embedded systems. The capability would be leveraged later on as I pursued more advanced and complex solutions working for the Department of Homeland Security (DHS). Stream processing capabilities are a cornerstone enabling technology in developing semantic applications driven by unique data transformations.

Summary and Future Work

Knowledge-driven event streaming, platforms leverage multiple data sources, technology stacks, development strategies and capabilities. Anyway you look the problem, data is the driven force, data drives the solution.

I have been developing such systems since the early 1990’s and have seen the industries, technologies and markets slowly evolve to balance the needs with the capabilities. I became a thought leader and tech evangelist in the creation of an ontology-driven behavior analysis capabilities and their applications in cyber security.

Reach out to me sometime and I will tell you how I developed these techniques based on some core concepts using concepts evolved from omnipotent-based optical character recognition used in the early 90s.

Data Science is an evolving capability that escapes any single complete definition. Over the years to now, data science has grown from gathering and cleansing data sets to encompass data analysis, predictive analytics, data mining, machine learning and business intelligence.

Story Timeline

12/01/2023-current

Crowdstrike: As a Data Engineer I am working with very large data sets

11/01/2014-11/01/2023

Secureworks: Initially started working on the CTP platform programming in Java, ActiveMQ, processing billions of events a day owning our on servers. Migrated to a private cloud, VMware, to automated deployment with CD/CI, Docker/K8s and Jenkins. The last migration was to a public cloud, AWS, and Golang, with Kafka and started leveraging terraform and gitlab-ci.

01/01/2020

Mitre Att&ck Integration: Architected, designed and implemented a methodology that ETLs Mitre Att&ck tactics/techniques labeling alerts, enabling customers to better understand events in terms of a threats kill chain

12/01/2022

File Analysis Platform: Part of a team that implemented a file analysis platform that acquired suspicious files detected on hosts, extracted meta data on the files, and reported back to the user. My focused on was on file parsing and extraction of various file types using Golang, receiving AWS SNS topic events, and reading files from and S3 bucket.

11/1/2022

POC’d the Creation of an Extended Reality (XR) Platform facilitating Data Analytics: This POC used Unreal Engine 5 (UE5) creating an immersive environment that analysts can analyze high dimensional data sets. Simulating and visualizing aggregate network flow data as UE Actors, allowing characters to walk through the data sets. The main goal was to establish the capacity of then engine, in terms of data volume and its effects on Usability. Also, establishing architectural patterns that facilitate data transformations supporting analytics in XR based cyber capabilities.

11/1/2019

POC’d Normal Model Development and Visualizations using D3.js in Identifying Abnormal System Behaviors: The goal of this project was to gather and transform system level signals e.g. memory usage, network usage, process usage, and model what was normal for process archetypes, and their rate of transition to abnormal behaviors.

11/1/2016

POC’d Fusing Flow and Other Security Data Feeding a Graph-based Visualization: This POC was focused on using Gephi, to push on how to interpret the relationships found in host and network heterogeneous data sets. Though the images were static, lots of complex details were uncovered in the relationships.

Aug 2013

Lectured in Tallinn, Estonia at NATOs CCD-COE: One of two U.S Citizens asked to lecture, here, General Keith Alexander, Director of NSA (lecturing on the threat landscape), and myself, talking about aggregate behavior analysis. In 2014 General Kieth Alexander left the NSA to found Ironnet, ironnet.com, a distributed defense capability focused on the use of behavior analytics.

Jan 2010-Jan 2013

Created and Awarded a Patent on Behavioral Aggregation supporting a new type of Analytics: Personally, wrote a patent based on our work creating DMnet/Occulex, reviewed by patent attorney’s, and filed the patent with the Patent Office. “A method of determining, within a deployed environment over a data communication network, network threats and their associated behaviors. The method includes the steps of acquiring sensor data that identifies a specific contact, normalizing the acquired sensor data to generate transformed sensor data, deriving, for the specific contact from the transformed sensor data, … “

Sep 2009

Introduced Behavior Analytics and DMnet to the Network Security Community at CATCH : This work started out in 2006, with a SBIR contact with DHS.
This was my entree into advocating, becoming a thought leader, in the use of behavior analytics to network security to anticipate cyber threats. To flip the current approach, focusing on assets, the threat itself, and to disrupt our current thoughts on network defense, creating data lakes rich for modeling

August 2009

Member of the Nature Inspired Cyber Defense Team at Obama’s NCLP: At the National Cyber Leap year, I was part of a working group researching nature inspired solutions for network security. Worked with Dr. Dipankar Dasgupta , and Dr. Polly Matzinger (NIH).

Jan 2001-2003

Financial Sector Tech Lead for the Largest National Cyber Exercise to Date : At Sonalysts, working for the DOJ, I was the tech lead for the largest national cyber exercise focused on protecting supply chains.

Jan 1999 – Jan 2004

Senior Software Engineer developing Weather Information/Fusion System: Created data storage capabilities in C/C++ using a hybrid database system, PostgreSQL, integrating postGIS capabilities. On the side, started a security service for the company called Guardian Services.

Nov 2007

2007 RSA conference: Met up with Dmitri Alperovitch at Secure Computing, and talked about aggregate behaviors and analytics. He later co-founded Crowdstrike in 2011, which uses behavior analytics to identify cyber threats.

Jan 2006

Created a Ontology Driven, Distributed Cyber Fusion POC: DMnet: Starting in 2006, a small team at Sonalysts, Inc., started to create a distributed defense solution using aggregate behavior analysis, supporting behavioral analytics. This later became Occulex

Jan 2004

Established Guardian Services Network Security Group: Created a small team of cyber professionals initially focused on Wifi security, but slowly creating new technologies and services.

Jan 1998

Smartweb: startup web analytics: Served as one of three partners in a startup company that created a web-monitoring tool for gathering site statistics on a web farm. The company was rated at $2.5M by a marketing firm in CT. Worked with partners performing product realization, technical market research to target our initial release of the product. Focused on creating a SAX-like HTML parser (before a SAX parser was available) for parsing and modifying HTML code on the fly.

Dec 1997

Researched, POC’d, and gave a talk on Distributed Hybrid Agent Systems: My focus pursing a M.S. at RPI was on agent technology, and distributed knowledge gathering and sharing. This prototype used a web crawler to gather data, and then use gather meta-data, to associate similarities between a users needs and the data gathered. I used JESS, KQML, to enable this work. I was awarded 1st runner up in my presentation at RPI. My POC, written in Java 1. x, was essentially a web spider, that collected meta on sites. “hybrid agents and hybrid agent
systems offer a flexible alternative in developing agent
frameworks compared to other agent types. In a hybrid
framework different agent components can be assembled
at run-time based on the task. In a dynamic environment,
hybrid agents can offer better adaptability for the
varying tasks needed by users”

Dec 1996

Dictaphone: Asynchronous Digital Recording: One of the highlights of this work was focused on using and expanding on the RIFF standard for securely storing audio. I had the chance to work with a few teams here, but, focused more on the use of CORBA to provide backend storage for a Communication Recording Solution. I also created some libraries for interfacing to Oracle DB.

Jan 1995

Network Security Basement Lab: Collected PC carcasses and started working with Slackware 3.0, later on installed OpenBSD as a firewall. This soon became pfsense, based on FreeBSD, and on other boxes installed honeypots e.g. honeyd. Snort was installed in 1999, and Suricata a decade later. I worked with my son scanning our smart TVs with Metasploit and found some interesting open ports over the years. I also started collecting network flow data using CERTS security tools. https://tools.netsa.cert.org/index.html

Oct 1995

Part of a Team that Created the 1st Online Trading Application at Fidelity: I worked closely with Paul Kraus, (later Director at Semantic, Corp, Solera Computing, Network forensics). Here we developed Fidelity’s first re-usable secure tunnel based on Secure Socket Layer (SSL). Developed socket-based thread classes using Winsock 2.0. Worked on System Requirements for NT-based SSL system. Designed network monitoring system program for monitoring the quality of remote connections over phone lines. This system effectively caught bugs in CISCO routers as the overall system was being deployed. Implemented a secure socket framework based on SSL Ref 2.0. Designed and implemented secure proxies under Windows NT 3.51. Interfaced an SSL server proxy to a Java prototype applet to check on Java security. Designed and implemented HTTP client test programs.

Nov 1993

Embedded Clothes Cutting at Gerber Garment Technology: I love developing things that you can see move, Cyber Physical Systems (CPS). In this case machine automation. This project focused on the creation of a near-real-time control system driving clothes cutting and manufacturing. The ask? It must use Windows NT, the newest multi-threaded COTS-based OS on the market, at the time…

Oct 1991

(Startup) Intelligent Automated Mail Sorting I joined a startup company to develop an intelligent, multiprocessing-based, mail sorting capability to handle the most difficult addresses, yup, using COTS components, less a proprietary shared memory capability. I was the lead architect. We used the 1st omnifont-based OCR capability, which is key to my future machine learning approaches. This is at is core a cyber physical system (CPS).

Jan 1988

Environmental Chemist and LIMS Manager: I worked as a analytical chemist creating metal methods for fly ash, and managed our LIMS system while studying computer science at UCONN (humble beginnings of analytics, and data science working with lab results)