The promise of synthetic data

In an algorithm-driven world where data is king, one mis-step can lead to a royal mess. Netflix discovered this in 2009 when it released anonymised movie reviews penned by subscribers. By crossmatching those snippets with reviews on another website, data sleuths revealed they could identify individual subscribers and what they had been watching. A gay customer sued for breach of privacy; Netflix settled.

That episode is still cited today by academics seeking ways of sifting useful information from data without outing the individuals who provide it. Where anonymisation failed, synthetic data might yet succeed.

It is, as its name suggests, artificially generated. It is most often created by funnelling real-world data through a noise-adding algorithm to construct a new data set. The resulting data set captures the statistical features of the original information without being a giveaway replica. Its usefulness hinges on a principle known as differential privacy: that anybody mining synthetic data could make the same statistical inferences as they would from the true data — without being able to identify individual contributions.

您已閱讀29%（1141字），剩餘71%（2763字）包含更多重要資訊，訂閱以繼續探索完整內容，並享受更多專屬服務。

The promise of synthetic data

相關文章

相關話題

中東期待沙烏地阿拉伯制衡川普

投資者押注防務支出增加，Palantir成爲「川普交易」贏家之一

Lex專欄：成長來之不易，雀巢前景平淡

Lex專欄：便宜商品是沃爾瑪股價上漲的基礎

Lex專欄：奢侈品品牌寄希望於自己的美國夢

諾和諾德準備下一代減肥藥的試驗結果