Google scientist's book raises real, fictional privacy concerns

October 24, 2011 |  6:00 am

In the data laboratories of the Internet's biggest companies -- the Googles, Facebooks and Yahoos of the world -- statisticians practice the mysterious art of spinning vast troves of personal consumer data into marketing gold.

BalujaAs with the famously confidential Coca-Cola recipe, the companies do not share their secret alchemical strategies for taking data gleaned from consumers' phones, tablets and PCs and using it to build intricate behavioral profiles of users, the better to sell them products they are most likely to buy.

As we wrote in our recent story on digital privacy, the lack of transparency about these practices has led to widespread concern about how Internet companies are using the information they collect -- including how long they keep the data, with whom they share it, and the types of conclusions they can make about individual behavior.

But Shumeet Baluja, a data scientist at Google, has made an end run around the wall of secrecy that protects corporate data practices. Instead of sharing Google's approach to data security and privacy, he's written a fictional account of a Google-like company -- and what happens when the wrong people get access to its huge storehouse of private information.

The book, "The Silicon Jungle," came out in the spring. Its dustjacket describes it as raising "serious ethical questions about today's technological innovations and how our most confidential activities...can be routinely pieced together into rich profiles that reveal our habits, goals and secret desires."

We asked Baluja about the story behind his book, and his own thinking on the state of consumer privacy and data collection.

How would you rate the level of awareness of the general public about what’s done with their data?

It's pretty poor.  People in general know they should be concerned about privacy, but I think very few people understand what it means to have privacy.  They certainly don't understand what data mining is, or what the capabilities are of the companies out there that are looking at their data.  This is a big impetus in writing this book: I wanted to show from a personal level to a national level -- from every level -- the ramifications of giving these little bits of data away.

Is there any way for people to figure out the kinds of data that are being collected about them?

That's absolutely key to having trust in any company at this point. How much will they actually tell you about what they’re doing with your data?  If you look at the big companies out there, the major players, we're in contact with them every day as users -- whenever we post updates or do searches, we reveal a lot about ourselves.  

A few [companies] have started allowing you to say "I would like you to get rid of some of this information you have on me.” And that’s extremely important.  Without that, I would suggest users be very careful.

But are the leading companies offering much of a window into what they're collecting?

Even when users can see the individual pieces of data they’ve given, what's harder to figure out is what inferences can be drawn from that.  And that’s what data mining is about -- drawing inferences from the small pieces of data you have.

[Those inferences] get harder to reveal because they're obviously very proprietary. So it’s a little bit of a tricky game as far as that’s concerned. 

The fact that I bought some golf shoes, or took a vacation to Hawaii, or drove home on the 405 Freeway don’t seem too interesting by themselves, but what about when you put all of the pieces together?

Well, let’s go through your example: Besides the fact of drawing inferences about where you live and your vacation habits, perhaps you could tell what your demographic status is, your income level for example.  We could look at the type of hobbies we’d expect you to have, the types of products you’d be interested in buying.  And by further looking at what you search for, what products you buy, and where you travel -- we can then revise our hypotheses.

So it’s kind of like a living profile?

Absolutely. When someone considers a profile -- it's not the case that it's created once and then forgotten about. Every interaction you have then goes back to feeding that profile and either enforcing our conclusions or making us come up with new ones.

What does Google think about the book?

As you can imagine, I’ve been very careful not to talk about my company in this interview. But Google has been extremely supportive, which has been awesome of them. That being said, talking about the book in general is fine but I’d shy away from talking about any policies they’d have.

You, someone who has been intimately involved in data mining for major companies, have written a nightmare scenario type of book about what could go wrong if all these data were leaked. Should people take your book as a warning sign, even if it is fictional?

As a scientist who's worked in this field for 15 years, I think that besides talking about the great things that have come from it, it's also important to talk about the things that could go wrong -- it’s not so much to scare, but to clearly inform people that there are consequences to sharing so much personal information.  


-- David Sarno

