A new tool, Data Provenance Explorer, lets users pick through the questionable provenance of many large data sets used for AI training. A new online tool allows users to identify, track and learn ...
The FDAP stack brings enhanced data processing capabilities to large volumes of data. Apache Arrow acts as a cross-language development platform for in-memory data, facilitating efficient data ...
Personally identifiable information has been found in DataComp CommonPool, one of the largest open-source data sets used to train image generation models. Millions of images of passports, credit cards ...
Katharine Kemp is a Member of the Expert Panel of the Consumer Policy Research Centre, and the Australian Privacy Foundation. Photos of Australian children have been ...