404 Media published a concerning report that they have obtained internal documents from Automattic that they are preparing to sell user data to Midjourney and OpenAI. Automattic is the parent company of WordPress and Tumblr.
This blog is published using WordPress.com for hosting. I'm going to have to see if there is an opt-out option and to read the terms and conditions attached to that option. If that option is available, I would hope opt-out would be the default option. People should be able to opt-in if they want to. Subterfuge shouldn't need to be used.
A concern raised in the report is that when compiling a data dump from Tumblr for Midjourney/OpenAI, Cyle Gage (a product manager at Tumblr) stated that some data was included that shouldn't have been such as:
- private posts on public blogs
- posts on deleted or suspended blogs
- unanswered asks (normally these are not public until they’re answered)
- private answers (these only show up to the receiver and are not public)
- posts that are marked ‘explicit’ / NSFW / ‘mature’ by our more modern standards (this may not be a big deal, I don’t know)
- content from premium partner blogs (special brand blogs like Apple’s former music blog, for example, who spent money with us on an ad campaign) that may have creative that doesn’t belong to us, and we don’t have the rights to share with this-parties; this one is kinda unknown to me, what deals are in place historically and what they should prevent us from doing.
Tumblr and Wordpress to Sell Users’ Data to Train AI Tools (Sam Cole/404 Media)
The benefit of having my own site is that I can move if I feel like I need to. I'll have to consider other options whether it's moving to a new platform like Ghost or by finding another hosting service.
It is disappointing to see Automattic moving in this direction. They have described themselves as the guardians of the open web but this decision will have people considering whether to remove their Tumblrs or blogs to avoid it being included in a training set for a large language model.
The promise of the open web was that it allowed people to connect with each other in a new way. As Gita Jackson wrote:
The internet has been broken in a fundamental way. It is no longer a repository of people communicating with people; increasingly, it is just a series of machines communicating with machines.
The Internet Is Full of AI Dogshit (Gita Jackson/Aftermath)
This decision by Automattic, if it is true, will make this problem worse in the short term. There's no guarantee that it will improve in the medium to long term either. Companies like OpenAI have made great promises of progress in the past only to renege on them when it suited. Unfortunately, I have little faith that this will be any different.
I could be wrong. I hope that I am.