So the other day I somewhat audaciously tweeted that I disagreed with @marcusjcarey and @itza11hYp3 regarding the statement that OSINT (open source intelligence) was basically a new name for something intelligence analysts have always done. I say audacious because I am not, in fact, an intel analyst by trade, whereas they have a stint at NSA and long experience in the federal sphere, respectively. Now, I know a lot of people have theorized that I work for the NSA – and I am quite proud of my hat, which came straight from the (classified!) gift shop at Ft. Meade – but I don’t. My skills are focused on obtaining information rather than analyzing it.

Still, I think there are a few reasons why modern OSINT is different from traditional methods. Whether that qualifies as something really new or not is semantic. To clarify, OSINT refers to gathering intelligence from “open sources” – newspapers, court records, web sites, etc. Certainly, analysts have always done this, and obviously it has gotten easier since PCs and the Internet became more widely used. Not only can you search a lot of archives from a distance, and quickly, but also people and organizations are just publishing more information about themselves.

The first reason why I think it’s substantially different is the collection method. It’s not just a matter of reading online versus reading paper. Part of it is being able to Google well (and use Lexis / Nexis, etc.) but part is being able to find relevant information in a very large haystack without wasting time on false positives. The only way to do this in a timely manner is algorithmically. The only way to implement such complex algorithms is in software. Example: say you have a suspected domestic terror group whose known members have purchased warehouse space in 3 cities. It might be useful to search all the real estate purchases in the U.S. for other purchases matching similar criteria to ferret out other cells. Given the amount of real estate transaction data available online, this should be doable, but the analyst is going to have to be able to carefully delineate the parameters for such a search – not just what to look for, but what to exclude. Or, perhaps, more likely, they’ll explain the general idea to an IT person who will codify it.

Reason 2: Correlation. A big benefit of having so much digital data available is being able to correlate it. Forget those scenes in cop shows where they have all their clues written on pieces of paper and tacked to a cork board. I’d be spidering craigslist, twitter, etc., and running the data through something like Splunk to look for unforeseen correlations. Merely by doing math, software can point out possible correlations that might never occur to a human. Given the number of job postings at NSA for people with experience in data mining over the past decade, you can be sure this is happening. (Hey, I just deduced what the NSA is doing from their public job postings. OSINT!)

Reason 3: Meta-data. Analysts are no longer limited to the “published” data in a record. We’ve all heard the stories of info being leaked in the meta-data of a Word doc, EXIF data in an image, geo data in a tweet, ID3 tags in MP3s, whatever. Being able to access, search and understand this information is critical and, again, requires either a technically minded analyst or a resident techie to write the tools.

Reason 4: The so-called “deep web”. Not every bit of information on the Internet is indexed by Google; in fact, a great deal of it doesn’t even use HTTP. Even “innocuous” protocols like ICMP and DNS can transport extra data around. If nmap can figure out an operating system by looking at administrative data, what can we figure out at the semantic level. E.g., if you’re monitoring a server that you suspect of storing malware being used by a certain group, and every time one of the group members walk in to a cafe with a laptop, the ping time to that server increases, it stands to reason that they’re doing something that’s increasing the load on the server.

Fifth and final reason: There are more different types of data now: pictures, videos, audio, etc., due to the advent of cheap digital cameras and other devices. I know for a fact that certain (probably all) intelligence agencies examine the pictures that suspected terrorists post on Facebook, Flickr, etc., to try to identify their location, associates, etc. In the past, pictures and video were generally only available if a HUMINT operator was surveilling the suspect already. Of course, all of the aforementioned reasons also apply to multimedia, too.

I’m not writing this to defend my position or anything, just to point out things I find interesting about the process. Certainly, as someone who doesn’t want “Johnny Cocaine” to be associated with his real identity, I consider these avenues of attack and attempt to mitigate them, as I’ve discussed previously. Not that I need to keep it secret from nation-state level adversaries – although I feel confident that I could do so – but it’s just fun. Anyway, hats off to @marcusjcarey and @itza11hYp3 for making me think about this and replying to me. I’ll buy you guys a drink at the next spook convention.


One Response to “OSINT”

  1. itza11hyp3 Says:

    Thinking about what you have written; and some musings I’ve had over the past couple weeks, I have to lean towards your opinion on this. There is far more in-depth information out there; and several new ways (Maltego and some different note-taking applications came to mind) to associate the information, and it’s not as simple as reviewing news sources or several of us (and I’m not necessarily saying the national intelligence community) would not be gainfully employed 🙂 I go to Shmoocon and plan on BH/Defcon next year instead of just goofing off in LV for a week 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: