Apple has added a new post to its Machine Learning Journal that explains how it’s using differential privacy to protect users, even when collecting very sensitive data such as keystrokes and the sites users visit.
This type of data collection occurs when users opt in to share usage analytics from macOS or iOS, allowing Apple to collect “privatized records”.
Apple introduced differential privacy in iOS 10 in support of new data collection aimed at improving QuickType, emoji suggestions, Spotlight suggestions, and media playback features in Safari.
The system works on the basis that statistical noise can be added to data on the device before it’s shared with Apple.
The post, Learning with Privacy at Scale, is Apple’s seventh issue in its first volume on the site that goes into detail about its machine-learning projects and how they impact its products. This one offers a deeper dive into its differential privacy framework and serves to reassure users that it’s not slurping up extremely private information.
It says its approach to differential privacy on the device allows data to be “randomized before being sent from the device, so the server never sees or receives raw data”.
The records arrive at a restricted access server where IP addresses are dropped. Apple says at that point it can’t tell if an emoji record and a Safari web domain record come from the same users. Apple then converts the records into aggregate compute statistics that are shared with relevant teams at Apple.
When users opt in to share device analytics, Apple defines a “per-event privacy parameter” and limits the number of records that are transmitted by each user per day.
Users can see the reports in iOS by going to Settings > Privacy > Analytics > Analytics Data in entries that begin with ‘DifferentialPrivacy’. Mac users can see them in the Console in System Reports. Apple also offers sample images to show users how the reports can be identified.
Apple has what it calls an ‘injestor’ where metadata such as timestamps of records is removed and the records are grouped by use case. The records are then passed to an ‘aggregator’ for statistical analysis.
The end result of all this processing is that Apple can now, for example, tell which are the most popular emojis, and in different languages, which in turn helps it improve predictive emoji on the iOS keyboard.
Apple can also identify websites that are energy and memory hogs in Safari on iOS and macOS. Apple’s browser can detect these domains and report them to Apple using its differential privacy framework.
It also helps identify the websites that users want Auto-play enabled, which Safari began automatically blocking with macOS High Sierra.
The third benefit to Apple is that can discover new words, which help it improve its on-device lexicons and autocorrect.
Previous and related coverage
Apple reported a spike in secret national security orders this year
Device and requests went down, but secret and classified orders spiked by more than three-fold.
In defending China demands, Apple loses privacy high ground
Deep dive analysis: Apple says it will ‘follow the law’ wherever it does business. But questions remain over what happens — and how the company will react — when the laws fall foul of the company’s privacy promises.