Auburn Alabama 24 May 2024
This is a guest contribution by Dr Robert A. Norton, Ph.D. Professor, Veterinary Infectious Diseases, Biosecurity and Public Health, Office of the Senior Vice President for Research, and LNO, National Security and Defense Projects
Introduction
When the IC conducted “Insight contests” that pitted analysts using traditional methods and legacy systems against Open Source Intelligence (OSINT), there was always a lot of tension in the room. After all, OSINT, by definition, didn’t need exquisite TOP SECRET collection platforms that cost billions of dollars. Materially, all it needed was a laptop with access to the internet and a steady supply of electricity. In one particularly boisterous contest, a high value technical document was the target, the acquisition of which had preoccupied a participating three letter agency for several months. Various means were considered to gain access to the document, including the very high risk of using a HUMINT asset.
The OSINT team found it in under 15 minutes at a total cost of $4.00 (not including pro-rated electricity and ISP charges that ran in the order of scores of cents). How did they accomplish this vital national security mission? The 300 page document was identified in the holdings of a foreign university library and purchased with a credit card online. America was saved from a terrible fate.
Like an early adopter waiting outside an apple store for the next big thing to drop, having validated this incredible new high-utility - low-cost solution to a difficult problem, the IC eagerly embraced the power of OSINT as vital new tool for the mission.
Or at least thats what common sense suggests should have happened. Instead, things got a little awkward.
For decades, the US intelligence community has enjoyed annual budgets in the vicinity of ~$85bn that it spends on a vast, complex, collection apparatus comprised of platforms from the seabed to the stars. When an essentially free technique delivered important results, instead of seeing it as an opportunity, it was viewed as a challenge. The IC became victim to the sunk cost fallacy, whereby ‘a person or organization is reluctant to abandon a strategy or course of action because they have invested heavily in it’. This was combined with the powerful incentives of the military industrial complex to support legacy solutions. As Upton Sinclair put it, “It is difficult to get a man to understand something, when his salary depends on his not understanding it.”
2024 OSINT Strategy for the IC
The sunk costs fallacy explains why it has taken the IC until 2024 to fully embrace OSINT with the release of the first official OSINT Strategy: The INT of First Resort: Unlocking the Value of OSINT.
Why the change? Follow the data. There has been an explosion of data. Consider this point: If one byte of data equals a grain of rice, a zetabyte would fill the Pacific Ocean. In 2023, the world created around 120 zettabytes (ZB) of data and the numbers only continue to rise. Fully 90% of the world’s data has been produced in the past 2 years. Only an infinitesimal fraction of that is classified data. Artificial Intelligence (AI) is the only answer to data at scale. It is a natural fit with OSINT because the data is in the open. As MIL has argued elsewhere, “The biggest secret in the IC is that, by volume, secrets rarely matter to the IC”.
What is OSINT?
OSINT extracts information from “Publicly Available Information” (PAI). The old definition for OSINT is that it is comprised of data sources that are not classified, including information derived from the press, public records, scientific journals, or industry and trade publications (“Gray Literature”), social media, videos, websites and even the “Dark Web”. The new definition of OSINT is more open, far more encompassing, nuanced and continually evolving. Given that almost all information is publicly available, some practitioners, like myself, define OSINT as, “everything”.
OSINT can be broken down into two large categories, passive and active OSINT. These refer to whether or not there is direct interaction between the collector and/or analyst with the target. For example, active OSINT is the near totality of the work of “Competitive Intelligence (CI)” or “Business Intelligence” practitioners. CI is used by companies and corporations in order to gain business advantage.
In the last decade, OSINT has become more technical, and has changed dramatically for the better in its speed and capabilities. For example, intelligence targets, be they people, places, things, or ideas, must be triangulated in order to verified. Technological advancements, link analysis, multi-database interfaces, and machine learning have enabled far more rapid insight and empowered OSINT in particular to deliver ground truth.
Validating or “cleaning” the data, means removing error. The practice is essential to every intelligence discipline (“INT”), but especially so for OSINT. Data validating will always be one of OSINT’s most challenging problems for both practitioners and AI engineers alike. Without it properly applied, like with all things involving computers, “garbage in, garbage out”. Garbage filled AI is certainly useless and potentially dangerous.
The Merging of the INTS
This raises another major trend - the merging of the INTs. What was once the preserve of highly classified agencies - 1m satellite image resolution, real time communications and financial transfers, GPS navigation to 1m accuracy , propaganda creation and distribution, encrypted chat, photo reconnaissance, live voice recording (in person or cellphone); all of these and a lot more beside, are in the palm of your hand right now. Your cellphone is the most advanced intelligence device ever made and has more computing power than the Apollo 11 spacecraft. Smartphones are a leading cause in the explosion of data. OSINT practitioners swim like a fish in the river of data - a target rich environment even in the most remote regions of the world. Maasai tribesmen angle for the best spot prices for their cattle on the Nairobi stock exchange. There are no landlines in the rift valley. There are however cell towers and Starlink dishes.
Smartphones have been a godsend to intelligence collection and analysis. There are a lot of dead Russians who refused to turn off their cell phones in combat zones. Smartphone “digital dust”, meaning the data traces left behind by online activities, is critically important in developing pattern or linkage analysis. Compromised credentials, including passwords, email addresses, etc. are also now available in commercial databases. These are usually used for the purposes of cyber security but can also be used for linkage analysis. Previously, many of the capabilities derived from those types of analyses would have solely been preserve of Signals Intelligence (SIGINT). In other words, the work of three letter agencies.
As the OSINT signal grows stronger, high value insight opportunities follow. But so too does noise. Data noise is a raging threat, which left unmanaged, will overwhelm the attainment of insight and complicate analysis. The signal to noise ratio causes not just technological challenges, but also staffing and perhaps most vexing authorities and permissions issues.
Commercial firms are now producing very high resolution imagery that even a few years ago would have been reserved for the IC and military, but is now available for purchase. Imagery Intelligence (IMINT) and Geospatial Intelligence (GEOINT) data sources are therefore becoming readily available to OSINT practitioners inside and outside of government and the military, some at daily refresh rates and sub-meter or better resolution than provided by the legacy national assets. The U.S. government contracts many of these high end services.
Measures and Signatures Intelligence (MASINT), which utilizes hyperspectral sensors are also no longer solely the reserve of National Intelligence Assets. Like commercially available imagery, “MASINT-like”, Commercially Owned Technologies (COT), rather than just Government Owned Technologies (GOT) are constantly being improved by businesses, and increasingly used for commercial purposes.
Of all the new commercially available technological advancements, MASINT is one of the most promising for the further development of OSINT but also one of the most challenging in terms of maintaining the security of personally identifiable information (PII). A hyperspectral signature of an individual is equivalent to biometric and facial recognition data rolled up together. Once the hyperspectral signature of an individual is gathered, the collector has it forever, enabling that individual to be tracked wherever they go. China already uses these technologies to track individuals of interest to them.1
As OSINT becomes more sophisticated, the distinction between OSINT and “All Source” intelligence fades. OSINT practitioners will need to be trained in multiple “INTs”, thereby made more able to fuse and layer disparate lines of data. Done right, these practitioners will enable faster and more accurate insight. The power of OSINT is its agility, but also its ability to drive requirements for the traditional “INTs” resident within the three letter agencies and Military Intelligence (MI).
Finding patterns in the Layers
Data layering is the analytical process of integrating disparate data sources to form a coherent picture or narrative related to a target. Consider the following simple analogy: Mixing paint colors for home improvement projects. In this analogy there is a particular color (i.e., “insight”) that is desired. Yellow and blue make green, but there are hundreds of shades of green available. Only one is right for your living room. Mixing colors in different ratios and concentrations enables the ability to produce a full spectrum of precision colors (e.g., the precise shade of green needed). Mixing different data sources using a prioritization of significance process and varying levels of fidelity enables rapidly produced and validated priority insight. For any given insight there may be different “mixtures of data” necessary to create the most precise insight. That is the job of data scientists, working with subject matter experts, to determine how to produce the most rapidly available validated insight.
Data layering enables ‘pattern of life’ analysis which is a surveillance method that is used to model behaviors, whether by individuals, groups, or communities and then look for anomalous behaviors. I am personally involved in one ongoing Biosurveillance OSINT effort designed to detect disease before it happens via a combination of hyperspectral sensors and pattern of life analysis. Healthy animals, plants and environment have discernible features in both their hyperspectral signatures and in their pattern of life. Biosurveillance is designed to detect the changes from norm, meaning the emergence of conditions conducive to the development of disease or the presence of actual disease.
In this effort, the baseline for pattern of life analysis will eventually be 600 different data sources. With time we expect the data sources to double, triple or multiply perhaps by magnitudes of layered data. No human can develop or even predict how a particular layer of data might interact with other layers of data. Remember that each layer of data is reflective of something in the real world (e.g., temperature, atmospheric pressure, soil type, season, animal species, bacterial species, animal-human interface…on and on…), hence the critical need for rapidly trainable and surety validated AI that can assists analysts in detecting subtle, but perhaps impactful change(s).
The future of OSINT
AI has become a forcing function for the IC to finally embrace OSINT. Secrets can and will continue to be discovered by OSINT. Mysteries will also continue to be solved. Insight will become faster, more reliable, and economical as the tradecraft further matures. OSINT will certainly never totally replace any of the traditional “INTs” but will continue to be able to equal them in value and perhaps occasionally best them on developing given insights. AI is driving OSINT as the critical node in any intelligence activity and product. It will become the integrator.
How is OSINT going to evolve in the future? The potential will be limited only by the willingness of decision makers, to rely more on it as a valued and trusted producer of insight. Good OSINT, effective OSINT, will require increased funding, whether by government or business. Acceptance, trust and funding will make the magic happen, nurturing both the necessary people and the technology
Dr. Robert A. Norton is a Professor of Veterinary Infectious Diseases, Biosecurity and Public Health, serving as a Special Coordinator of National Security and Defense Projects in the Office of the Senior Vice President for Research and Economic Development at Auburn University. Before entering academia, Dr. Norton served in the U.S. Army at Ft. Detrick Maryland. His current efforts include Open Source Intelligence (OSINT) collection and analysis, food, and agriculture biosecurity, CBRNE related technical Intelligence integration and Critical Infrastructure protection. He has served for many years as a consultant for the U.S. Government and business. He is also a regular contributor to the industry press, having written extensively on the developing threats to food and agriculture and cyber security.
Future U.S. OSINT practitioners that use this combination of sensor technologies will again have to assiduously adhere to the associated ethical norms and legal requirements.