Apple Touts 'Differential Privacy' Data Gathering Technique in iOS 10
Jun 14, 2016 4:02 am PDT by Tim Hardwick
With the announcement of iOS 10 at WWDC on Monday, Apple mentioned its adoption of "Differential Privacy" – a mathematical technique that allows the company to collect user information that helps it enhance its apps and services while keeping the data of individual users private.


During the company's keynote address, Senior VP of software engineering Craig Federighi – a vocal advocate of personal privacy – summarized the concept in the following way:
We believe you should have great features and great privacy. Differential privacy is a research topic in the areas of statistics and data analytics that uses hashing, subsampling and noise injection to enable…crowdsourced learning while keeping the data of individual users completely private. Apple has been doing some super-important work in this area to enable differential privacy to be deployed at scale.
Wired has now published an article on the subject that lays out in clearer detail some of the practical implications and potential pitfalls of Apple's latest statistical data gathering technique.
Differential privacy, translated from Apple-speak, is the statistical science of trying to learn as much as possible about a group while learning as little as possible about any individual in it. With differential privacy, Apple can collect and store its users' data in a format that lets it glean useful notions about what people do, say, like and want. But it can't extract anything about a single, specific one of those people that might represent a privacy violation. And neither, in theory, could hackers or intelligence agencies.
Wired notes that the technique claims to have a mathematically "provable guarantee" that its generated data sets are impervious to outside attempts to de-anonymize the information. It does however caution that such complicated techniques rely on the rigor of their implementation to retain any guarantee of privacy during transmission.

You can read the full article on the subject of differential privacy here.

Note: Due to the political nature of the discussion regarding this topic, the discussion thread is located in our Politics, Religion, Social Issues forum. All forum members and site visitors are welcome to read and follow the thread, but posting is limited to forum members with at least 100 posts.

Tag: privacy

Top Rated Comments

(View all)

26 months ago
Good that they are taking privacy seriously . Looking forward to some experts review of this approach
Rating: 15 Votes
26 months ago

While I don't have the exact technique they are using, it is common to use a "double blind" addressing technique keep anonymity making it impossible to trace back to ID someone. There are descriptions of this technique a search away.


Background: my PhD advisor is a main contributor to the differential privacy literature, and my department overall has a few professors working on differential privacy. Although my own research doesn't deal with differential privacy, some of my past work has been in statistical privacy.

Response to quoted text: while Apple is, without a doubt, anonymizing all identifiers in the data (i.e. your name, address, and other contact info is 100% certain to have been stripped), this does not describe what differential privacy does (rather, anonymizing data is a prerequisite for all practical data privacy methodology). Differential privacy provides a probabilistic guarantee on the data-masking algorithm that, in layman's terms, if you have two datasets that differ only for one user, the output of the algorithm on both datasets are indistinguishable in some precise sense. There are various ways to construct this algorithm so that is differentially private.

The take-away is (and I'm addressing the other commenter): no, even if you are absolutely unique in the dataset, differential privacy guarantees you will be entirely indistinguishable. In their words, it is a guarantee that any attacker will never be able to verify or determine the true value for any entry in the protected data (e.g. the value of any variable for any particular individual).

Many argue that this concept, although it is an interesting mathematical tool, is too strong for use in practice, in that it cannot be practically implemented in any real-world scenario without removing all useful signal in the data. I can't name any companies or even government agencies that have any claims that their data are algorithmically protected with differentially private guarantees. What Apple has done here is truly revolutionary and I sincerely doubt any of its competitors are close to being able to do what they're doing today. Maybe in a decade or two?
[doublepost=1465909213][/doublepost]

Never thought I'd say this, but they've finally made all my years of learning stats for my Econ degree sound interesting!

Quite intrigued to see how this actually works out. My guess is that that they take this individual level data but perhaps apply it on a macro scale? But I can't see it being completely unbreakbale.


See my other reply for a more detailed response. In particular, differential privacy is a guarantee that no matter how any attacker aggregates the data, there is no way to pick out individual values for any of the variables collected, for any user.
Rating: 14 Votes
26 months ago
Apple once again doing the right thing
Rating: 7 Votes
26 months ago

That's very good and all, but this is MacRumors (macRumors ;-)), I'm sure we can find a negative way to spin this.


Nothing ever progress if all you have are positive comments. You ever heard the expression, "Tell me what I need to hear, not what I want to hear"? The question is, do the negative comments have merit, if so, and most do, then someone at Apple should be listening. We cannot count on the Media, because they need access to Apple, to say what everyone is thinking.
Rating: 5 Votes
26 months ago
That's very good and all, but this is MacRumors (macRumors ;-)), I'm sure we can find a negative way to spin this.
Rating: 4 Votes
26 months ago

That's very good and all, but this is MacRumors (macRumors ;-)), I'm sure we can find a negative way to spin this.


Are you fishing for these comments ?? ;)

Welcome to the Internet . I'm yet to find a forum where it's just positive news....
Rating: 3 Votes
26 months ago
So basically Apple is selling people's personal info.
Rating: 2 Votes
26 months ago

Me too, but what I have heard is that as long as you are doing the same as everyone else your privacy is protected, but if you stand out in anyway then you can be identified.

While I don't have the exact technique they are using, it is common to use a "double blind" addressing technique keep anonymity making it impossible to trace back to ID someone. There are descriptions of this technique a search away.
Rating: 2 Votes
26 months ago
I think this is a very positive thing. I think it's the very antithesis of the likes of Facebook and Google's approach to using user data (and increasingly Microsoft's too). So frankly, even if it doesn't work at all, it's better than the alternatives - at least they are trying!

I think even the most privacy-conscious users will concede there are useful types of data that can improve services and software when the developers can access such data. The compromise between that data and the users' privacy is unfortunately sometimes the 'collateral damage' in the process, so it's great if Apple are finding ways to have the best of both worlds. I would guess it's only really possible if your target is improving the product for the user, rather than identifying the user in order to sell ads to them, specifically, so it could become a real unique selling point for paid software development on iOS.

Still, they need to be careful and very sure that it works. There have been lots of instances in the past where claims of 'anonomysied data' have been proven to be trivially easy to de-anonomyse.
Rating: 2 Votes
26 months ago

Background: my PhD advisor is a main contributor to the differential privacy literature, and my department overall has a few professors working on differential privacy. Although my own research doesn't deal with differential privacy, some of my past work has been in statistical privacy.

Response to quoted text: while Apple is, without a doubt, anonymizing all identifiers in the data (i.e. your name, address, and other contact info is 100% certain to have been stripped), this does not describe what differential privacy does (rather, anonymizing data is a prerequisite for all practical data privacy methodology). Differential privacy provides a probabilistic guarantee on the data-masking algorithm that, in layman's terms, if you have two datasets that differ only for one user, the output of the algorithm on both datasets are indistinguishable in some precise sense. There are various ways to construct this algorithm so that is differentially private.

The take-away is (and I'm addressing the other commenter): no, even if you are absolutely unique in the dataset, differential privacy guarantees you will be entirely indistinguishable. In their words, it is a guarantee that any attacker will never be able to verify or determine the true value for any entry in the protected data (e.g. the value of any variable for any particular individual).

Many argue that this concept, although it is an interesting mathematical tool, is too strong for use in practice, in that it cannot be practically implemented in any real-world scenario without removing all useful signal in the data. I can't name any companies or even government agencies that have any claims that their data are algorithmically protected with differentially private guarantees. What Apple has done here is truly revolutionary and I sincerely doubt any of its competitors are close to being able to do what they're doing today. Maybe in a decade or two?
[doublepost=1465909213][/doublepost]

See my other reply for a more detailed response. In particular, differential privacy is a guarantee that no matter how any attacker aggregates the data, there is no way to pick out individual values for any of the variables collected, for any user.


So this is what Apple hard at work creating. I'm impressed.
Rating: 1 Votes

[ Read All Comments ]
Newer Article Older Article