A newborn baby dressed in white, fast asleep in his crib. Two curly haired toddlers helping their mother prepare dinner. A girl snuggled in her father’s lap, clutching a rainbow flag at a local Pride Parade. Beaming teenagers at their high school graduation.
In the absence of strong child data privacy laws, the personal photos of these and other children are being used to build powerful artificial intelligence (AI) tools, without their knowledge or consent. In turn, these tools are being used to create malicious deep fakes that put even more children at risk of exploitation and harm.
Our research found that LAION-5B, a large, open data set used to train popular AI tools that was built by scraping most of the internet, contains links to identifiable photos of children, including those described above. Some children’s names are listed in the accompanying caption or the URL where the image is stored. In many cases, their identities are easily traceable, including information on when and where the child was at the time their photo was taken.
One such photo, for example, features a baby smiling at the camera, dressed in a cheerful onesie embroidered with a cartoon bee. Her name is listed in the image’s URL. Information embedded in the photo reveals the precise location of her grandparents’ house in a rural town in Tennessee, where the photo was taken six years ago on a Monday afternoon.
The photos we found in the LAION-5B data set were posted by children, their parents, or their grandparents on personal blogs and photo- and video-sharing sites. Many were uploaded years or even a decade before LAION-5B was created, seen by very few people, and appear to have had a measure of privacy previously. For example, we could not find the photo of the smiling baby through an online search.
In response to our research, LAION, the non-profit that manages LAION-5B, pledged to remove the children’s data that we found. The group disputed that AI models trained on LAION-5B could reproduce personal data.
However, the children’s data that we found is likely to be the tip of the iceberg. Our search of LAION-5B was far from extensive – we found 41 children’s personal photos after reviewing 600 images in a dataset that contains 5.85 billion images. There are likely many more children whose privacy remains at risk.
All Children’s Privacy Under Threat
Once their data is swept up and fed into AI systems, children face further threats to their privacy due to flaws in the technology. AI models, including those trained on LAION-5B, are notorious for leaking private information. The models can reproduce identical copies of the material they were trained on, including medical records and photos of real people. Some companies have set guard rails to prevent the leakage of sensitive data, but these rules have been repeatedly broken by users and researchers.
These privacy risks pave the way for further harm. Training on photos of real children has enabled AI models to create convincing clones of any child, based on a handful of photos, or even a single image. Now, all children whose photos or videos are posted online are at risk. Someone could steal their likeness and use AI to make it appear as if they are saying or doing things that they never said or did.
Scammers have demanded ransom from parents using AI-generated clones of their children’s voices begging for help. Others have retraumatized families by creating videos of their missing or murdered children recounting their own deaths. Malicious actors have also used LAION-trained AI tools to generate explicit imagery of children using innocuous photos taken from their social media accounts, as well as graphic scenes of child survivors whose images of sexual abuse were scraped into LAION-5B.
One 14-year-old reported that a stranger sent fake explicit images and threatened to share them with the child’s friends unless the child paid ransom. “The images look SCARY real and there’s even a video of me doing disgusting things that also look SCARY real,” the child told the National Center for Missing and Exploited Children. The teen sent their debit card information to the predator.
Fabricated media have always existed. But in the past, such images required time, resources, and specialized expertise to create. The result was often not very realistic. Today’s AI tools create lifelike outputs in seconds, are often free, and are so easy to use that children have used them to harass classmates.
Congressional Action Needed
Fifty-four attorneys general and lawmakers in two dozen U.S. states have proposed banning AI-generated sexually explicit images of children. These efforts are urgent and important. But they only tackle one symptom of the deeper problem: that children’s personal data remain largely unprotected from any misuse or exploitation.
Children deserve privacy. They deserve to safely learn, grow, and play online, without fear that their identities might be stolen and weaponized against them.
The U.S. Congress should pass a child data privacy law that comprehensively protects children’s data from being collected or used in ways that can harm them, regardless of the technology used to do so. Such a law should prohibit scraping children’s personal data into AI systems, given the privacy risks involved and the potential for new forms of misuse as the technology evolves.
The law should also prohibit the nonconsensual digital replication or manipulation of children’s likenesses. Finally, it should provide children who experience harm with mechanisms to seek meaningful justice and remedy. As Congress considers dozens of pieces of legislation related to AI, it should also ensure that proposed regulations incorporate data privacy protections for everyone, and especially for children.
The harm that children are experiencing is not an unfortunate necessity for technological progress. Generative AI is still a very young technology, and its consequences on society are not inevitable. It has been only two years since the world was introduced to the notion of creating images simply by describing what one wants to see. Society can describe, and should insist, on an alternative vision for AI: that it be developed and used to build a just and joyful world for children.