Attacks on (my) anonymity

In case you haven’t realized, Johnny Cocaine is not my real name.  If I were using my father’s actual surname, it should be Johnny Drunk-Ass-Moron.  But I digress.  I do keep my real life identity a secret on the Interwebs, mostly because my day job could suffer otherwise.  Also, not everyone I know IRL would appreciate my attitude and crude vocabulary.  Not everyone online appreciates it either, but they can always choose to stop reading / watching / following.

Anyway, I don’t go to great lengths to disguise what I do in public (Twitter, this blog, YouTube, Myspace, etc.)  I don’t use military / intelligence grade tools or tactics.  But it got me wondering, if I weren’t me, and I wanted to find out who I really was, how would I go about it?

Well, there’s pics and videos of me, right?  So one way would me to wander around the U.S. and other countries looking for me.  It could work.  It occasionally works for those kids on the milk cartons, right?

OK, not likely.  Well, there are different possible classes of people who would look for me right?  Let’s start with the most impotent – another individual who uses the same sites as me.  Maybe I offended someone with my arrogant Twitter posts.  Now, there is no way that I can figure out to find out which IP a web site user was at retroactively, for another user.  Even if there was, I proxy my traffic through an unsuspecting domain name, as mentioned previously.  So let’s ignore that and look at the semantic layer.

Well, what info does the attacker have about me?  They have my blog posts, my tweets, everything on my myspace page, my videos, any comments that Google can come up with, maybe even captures of my IRC conversations.  Possibly, at some point, code snippets.

Maybe the first thing I’d do is write a tool to organize all of the data on a timeline.  Then I could see, for example, all of Johnny Cocaine’s activity for a given day or week.  This would immediately point out something: most of my activity is during the day in the Western Hemisphere. Perhaps even a certain time zone.  That either implies that A) I work a swing shift, B) am unemployed or C) have a job where I can do some things online.  Of course, blog posts, videos, etc., could have been prepared any time; only things like tweets and forum posts are likely to be done in real time.  (This blog post is being written late at night, but I probably won’t post it until tomorrow.)  I just said I have a job, so we can rule out B).  I called it a day job, but that’s a figure of speech.  I’d rule out college student because the posts are consistent, and rare on weekends; students’ schedules tend to be more sporadic.

One can assume from the technical content of my posts that I’m a fairly competent programmer or sys admin or computer engineer or architect of some kind.  This would give me access to a computer while I’m at work, and senior technical people rarely work swing shifts.  So we lean toward an actual 9-to-5 job.

Next thing I’d look for is location – references to places, events, sports teams, weather, etc.  Anyone reading my posts might conclude that I travel a fair amount, as I’ve mentioned being in several cities on both coasts of the U.S., as well as Mexico.  That could be true, or it could be bullshit that I put out there so this kind of analysis is harder.

I could look at any pics or videos I’ve posted.  Backgrounds might give some information.  (E.g., in one video we hear a “meow” and I refer to “my cat”.)  Meta data might also contain information, e.g., the EXIF data in my photo(s).  (A fake threat to bomb a school was “thwarted” in ’07 when members of 4chan found the perp’s father’s name in EXIF info – http://en.wikipedia.org/wiki/4chan.)

All of this needs shoved in to a database, tagged, dated, etc., so you can do queries and run possible scenarios.  But, unless I, as johnny, am very stupid, you probably won’t get anything too specific.  The next thing to do – and you’re not going to like this – is to start looking at who I’m friends with, who I follow on Twitter and who follows me, whose comments I respond to, etc.  If you’re lucky, you’ll find a conversation that sounds like I really know the person in question; now you can stat attacking their identity as well.  If it’s not obvious, you can start correlating a circle of people; if I’m friends with the same 4 or 5 people on several networks, it’s likely that the know me fairly well, maybe even IRL.

The ultimate tool would be to trick me in to visiting a web site that has dangerous javascript in it.  If you’re good, you can load it in to my browser and start watching everything I do.  I’ll probably slip up at some point and pass some real personal information.

Finally, you could start doing semantic analysis on what I say.  For example, references to certain bits of pop culture might reveal my age, specific technologies could narrow down my niche in the work world, certain phrases could be slang from different areas, etc.  You could take certain phrases, links I post, etc., and google them to find out if someone else posted them – especially around the same time.  I might have posted a link to twitter and then you find the same link posted around the same time on another web site, undera different name.  Once is probably coincidence, but a few times is, as they say, enemy action.

Frankly, on the technical level, that’s about as far as you can go.  (Anyone have any more ideas?)

A web site I frequent, especially a big one like Myspace, would have better information; in addition to all of that, they could look at my IP, to try to defeat my obfuscation.  They could look at how often I log in, stay on, etc., and log every transaction I make.  Maybe the tone of one private message indicates I know that person IRL, or we share an in joke about something?

Then of course, there are ISPs.  My last hop provider can’t do much; my traffic is encrypted to a proxy and I transfered the key out-of-band to defeat the attack I talked about before.  I also connect from a lot of different places.  But given that most Internet traffic flows over one of only a few networks, at least in the U.S., we  can assume that any one of them can apture a fair amount of my traffic and correlate it across their network.  E.g., my local connection, my proxy server, and myspace’s data center may all get internet service from AT&T.

Finally, of course, there’s the FBI, NSA, CIA and anyone who has moles in those organizations.  One can assume they are building data warehouses full of every damn bit that flows across the Internet.  I’m not intending to defeat that kind of surveillance.  Oh, I could.  Maybe I do, for some activities.  But I don’t think they care that I make smart ass comments and talk about computer security issues.  They don’t have time to arrest every arrogant geek!  Plus, they’re run by people who still think that a Muslim in Iraq with a grenade is a bigger threat than a hacker with a budget.  Sigh.

OK, I have shown you the way.  First person to uncover my identity will suffer the death of a thousand (cyber) cuts.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: