Operator Speaking by Zachary Constantine
 

Posts Tagged ‘consumer data mining’

Welcome to my tar pit…

Thursday, November 5th, 2009

Actually, if you are reading this, you are not presently a resident fossil in the brand-new Operator Speaking Tar Pit.

… but watch yourself – you never know …


Tar Pit Operation

  1. Record incoming user agent, IP address
  2. Compare user agent, IP address to Black List
  3. On match, apply ban level (deny, poison) protocol

Tar Pit Administration

Review MySQL database for suspicious / undesirable hits and insert bans accordingly.


What this means for those I have banned: this site simply will not work the way you’d expect. Strange things will happen. Binary will be served (very slowly) instead of the text that you were anticipating. Your feed monitoring software will let you down.

I was plotting to release a WordPress plugin when I realized that the tar pit script is simple enough to work with just about any PHP-driven application… so I’ll have some source code to post as soon as I come up with a working administrative interface (though if you really want to play around with the code and you’re comfortable using SQL, e-mail me and I’ll send something your way).

For now, the MySQL schema:

SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";

CREATE TABLE IF NOT EXISTS `ban_agents` (
  `ban_agent_id` int(11) NOT NULL AUTO_INCREMENT,
  `ban_agent_name` varchar(150) COLLATE utf8_bin NOT NULL,
  `ban_level` enum('deny','poison') COLLATE utf8_bin NOT NULL DEFAULT 'poison',
  `ban_agent_reason` varchar(150) COLLATE utf8_bin NOT NULL,
  `ban_agent_timestamp` datetime NOT NULL,
  PRIMARY KEY (`ban_agent_id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=1 ;

CREATE TABLE IF NOT EXISTS `ban_ips` (
  `ban_ip_id` int(11) NOT NULL AUTO_INCREMENT,
  `ban_ip_name` int(11) NOT NULL,
  `ban_level` enum('deny','poison') COLLATE utf8_bin NOT NULL DEFAULT 'poison',
  `ban_ip_reason` text COLLATE utf8_bin NOT NULL,
  `ban_ip_timestamp` datetime NOT NULL,
  PRIMARY KEY (`ban_ip_id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=1 ;

CREATE TABLE IF NOT EXISTS `ban_ranges` (
  `ban_range_id` int(11) NOT NULL AUTO_INCREMENT,
  `ban_range_start` int(11) NOT NULL,
  `ban_range_end` int(11) NOT NULL,
  `ban_level` enum('deny','poison') COLLATE utf8_bin NOT NULL DEFAULT 'poison',
  `ban_range_reason` varchar(150) COLLATE utf8_bin NOT NULL,
  `ban_range_timestamp` datetime NOT NULL,
  PRIMARY KEY (`ban_range_id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=1 ;

CREATE TABLE IF NOT EXISTS `hits` (
  `hit_id` int(11) NOT NULL AUTO_INCREMENT,
  `hit_context` enum('page','feed','feed-atom','feed-rss','feed-rss2','sitemap') COLLATE utf8_bin NOT NULL DEFAULT 'page',
  `hit_ip_address` int(11) NOT NULL,
  `hit_user_agent` varchar(150) COLLATE utf8_bin NOT NULL,
  `hit_timestamp` datetime NOT NULL,
  `hit_disposition` enum('pass','tested','deny','poison') COLLATE utf8_bin NOT NULL DEFAULT 'pass',
  `ban_reason_table` enum('none','ban_agents','ban_ips','ban_networks','ban_ranges') COLLATE utf8_bin NOT NULL DEFAULT 'none',
  `ban_reason_id` int(11) NOT NULL,
  PRIMARY KEY (`hit_id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=1 ;

Project Gaydar: Data-Mining Social Networks

Monday, September 21st, 2009

Using data from the social network Facebook, they made a striking discovery: just by looking at a person’s online friends, they could predict whether the person was gay. They did this with a software program that looked at the gender and sexuality of a person’s friends and, using statistical analysis, made a prediction. The two students had no way of checking all of their predictions, but based on their own knowledge outside the Facebook world, their computer program appeared quite accurate for men, they said. People may be effectively “outing” themselves just by the virtual company they keep.

“When they first did it, it was absolutely striking – we said, ‘Oh my God – you can actually put some computation behind that,’ ” said Hal Abelson, a computer science professor at MIT who co-taught the course. “That pulls the rug out from a whole policy and technology perspective that the point is to give you control over your information – because you don’t have control over your information.”

. . .

Facebook spokesman Simon Axten could not respond to Jernigan and Mistree’s analysis, since it is not public, but pointed out that it is something that happens every day.

- Project ‘Gaydar’ by Carolyn Y. Johnson
Boston Globe
2009-09-20

Oops! Your sexual preference is showing.

Keep in mind that the research performed by these students is far from “high-tech” – their research isn’t published but it is safe to assume that they put together something possibly as simple as counting each individual’s connections and, where an individual with an unknown sexual preference was connected to another individual with a known sexual preference, added to that individual’s homosexuality indicator.

This would be a highly iterative process, however, knowing the actual sexual preference of only a small percentage of individuals and then extrapolating upon connections from unknown to unknown based upon what is known would allow the data mining program to indicate with better-than-random probability the sexual preference of everyone with a connection.

Take it a few steps further and start analyzing the other information provided – favorite books, movies, musical acts, their addresses, the content posted by users on eachothers’ profile pages, content posted in online journals, the content of sites linked from each user’s profile, even their names… you can build a statistically-probable representation of an individual down to his or her ideology.

So, who wants to be first up against the wall?

Like your identity? Hold on to your card.

Thursday, September 17th, 2009

While I understand that you need to check identification, I do not consent to having my personal information stored in your computer systems for any length of time. I value my privacy and I do not want my location or purchasing habits tracked. Moreover, my personal information is valuable and you have provided me with no assurances regarding the security of your systems.

- Swiping Your Identity [ID recorded at liquor store checkout]
by Sherri Davidoff of Philosecurity
2009-09-12

Problem: There are legitimate reasons a commercial entity might ask for your identification card, however, tracking your name, address, and license number is not one of them.

This seems like a novel practice – one that’s sure to catch on elsewhere – for getting your details for data mining’s sake if you should decide to pay cash (the store owner can already do as much data mining as desired if you’re paying with plastic).

Solution: Ask how your identification will be used before handing it over and, if you don’t approve of what is about to occur, shop elsewhere.

Yes, it is a hassle and some people will look at you as though you’re mad… some may even complain about the hold-up in line.

Perhaps they’ll bleat as loudly in the line to the slaughterhouse?