Of tech giants, Microsoft and arguably Apple are the only two to have so far kept their noses clean when it comes to personal data exploitation scandals. Microsoft may now have lost its squeaky clean record after the Windows and Microsoft Office software giant was reported to have ‘quietly’ deleted what has been described as the world’s largest online database of personal images.
The database, which Microsoft has stated was intended for academic purposes only, is said to have contained over 10 million images of over 100,000 individuals. The images, scraped from the web, were theoretically of ‘celebrities’ whose public personas, based on the Creative Commons license, made them fair game if their publically available pictures were used for non-commercial research purposes.
However, the Financial Times, which has conducted an investigation into several such databases, found that many of the individuals whose images were included are not what could justifiably be labelled public figures or celebrities. Rather, they are private individuals with ‘normal’ jobs that happen to have digital identity because they involve work published online, such as journalists. These arguable ‘private’ individuals were not asked for their consent to be included in the ‘Celebs’ database.
The Microsoft database was used to train facial recognition software based on machine learning algorithms.
Microsoft itself is said to have used the images to train its own facial recognition software. However, the FT reports that citations in AI papers indicate the dataset has also been used by IBM, Panasonic, Nvidia and Hitachi as well as Chinese companies Sensetime and Megvii. All are unarguably commercial, rather than academic organisations regardless of how they might frame certain elements of the R&D work they conduct.
The two Chinese companies, Sensetime and Megvii, both sell equipment and software to the Chinese authorities. One customer of both that will cause most outrage is the local government of the province of Xinjiang, which is home to Uighurs and several other Muslim minorities. Chinese officials have been accused of using the latest technology in the world of facial recognition in their alleged policy of systematic persecution of these minorities.
Adam Greenfield, a technology writer and urbanist, images of whom were part of the Microsoft data set, responded when contacted by the Financial Times investigation:
“I am in no sense a public person, there is no way in which I’ve ceded my right to privacy. It’s indicative of Microsoft’s inability to hold their own researchers to integrity and probity that this was not torpedoed before it left the building”.
“To me, it is indicative of a profound misunderstanding of what privacy is.”
Two other online datasets identified by the investigation, Duke MTMC surveillance data set created by researchers at Duke University and a data set called Brainwash created by Stanford University researchers from live streaming footage taken in a café by the same name have also been taken offline.
Michael Veale, a technology policy researcher at the Alan Turing Institute, commented:
“They are likely to have taken it down because their lawyers expressed concern that they do not have a basis to process special category data such as faces under Article 9 of GDPR. They may not have a get-out clause for processing biometric data for the purposes of ‘uniquely identifying a natural person’.
“Particularly as the use of the data set has moved from a purely research use to something that products are being built with. There is reason to believe that the people in data set cannot be considered to expressly and clearly have made their faces public.”