Fiosracht - OpenAI

Automattic May Start Selling Users' Data to Train AI Tools
404 Media published a concerning report that they have obtained internal documents from Automattic that they are preparing to sell user data to Midjourney and OpenAI. Automattic is the parent company of WordPress and Tumblr.

This blog is published using WordPress.com for hosting. I'm going to have to see if there is an opt-out option and to read the terms and conditions attached to that option. If that option is available, I would hope opt-out would be the default option. People should be able to opt-in if they want to. Subterfuge shouldn't need to be used.

A concern raised in the report is that when compiling a data dump from Tumblr for Midjourney/OpenAI, Cyle Gage (a product manager at Tumblr) stated that some data was included that shouldn't have been such as:
private posts on public blogs

posts on deleted or suspended blogs

unanswered asks (normally these are not public until they’re answered)

private answers (these only show up to the receiver and are not public)

posts that are marked ‘explicit’ / NSFW / ‘mature’ by our more modern standards (this may not be a big deal, I don’t know)

content from premium partner blogs (special brand blogs like Apple’s former music blog, for example, who spent money with us on an ad campaign) that may have creative that doesn’t belong to us, and we don’t have the rights to share with this-parties; this one is kinda unknown to me, what deals are in place historically and what they should prevent us from doing.

Tumblr and Wordpress to Sell Users’ Data to Train AI Tools (Sam Cole/404 Media)
The benefit of having my own site is that I can move if I feel like I need to. I'll have to consider other options whether it's moving to a new platform like Ghost or by finding another hosting service.

It is disappointing to see Automattic moving in this direction. They have described themselves as the guardians of the open web but this decision will have people considering whether to remove their Tumblrs or blogs to avoid it being included in a training set for a large language model.

The promise of the open web was that it allowed people to connect with each other in a new way. As Gita Jackson wrote:

The internet has been broken in a fundamental way. It is no longer a repository of people communicating with people; increasingly, it is just a series of machines communicating with machines.
The Internet Is Full of AI Dogshit (Gita Jackson/Aftermath)

This decision by Automattic, if it is true, will make this problem worse in the short term. There's no guarantee that it will improve in the medium to long term either. Companies like OpenAI have made great promises of progress in the past only to renege on them when it suited. Unfortunately, I have little faith that this will be any different.

I could be wrong. I hope that I am.
→ 11:03 PM, Feb 27

Meredith Whitaker on AI Hype

Meredith Whitaker, the President of Signal and chief advisor to the AI Now Institute, appeared on the Big Technology Podcast and she had some interesting things to say about OpenAI, Microsoft and the hype that has built around AI since the release of ChatGPT.

ChatGPT itself is not an innovation. It's an advertisement that was very, very expensive that was placed by Microsoft to advertise the capacities of generative AI and to advertise their Azure GPT APIs that they were selling after effectively absorbing OpenAI as a Microsoft subsidiary. But the technology or frameworks on which ChatGPT are based are dated from 2017.

So, Microsoft puts up this ad, everyone gets a little experience of communicating with something that seems strikingly like a sentient interlocutor. You have a supercharged chat bot that everyone can experience and have a kind of story about. It's a bit like those viral "upload your face and we'll tell you what kind of person you are" data collection schemes that we saw across Facebook in the 2010s and then an entire narrative of innovation or a narrative of scientific progress gets built around this sort of ChatGPT moment.

Suddenly generative AI is the new kind of AI. Suddenly claims about sentience and about the superintelligence and AI being on the cusp of breaking into full consciousness and perhaps, endangering human life. All of this almost like religious rhetoric builds up in response to ChatGPT.

I'm not a champion of Google but I think we need to be very careful about how are we defining innovation and how are we defining progress in AI because what I'm seeing is a reflexive narrative building around what is a very impressive ad for a large, generative language model but not anything we should understand as constitutionally innovative.
Meredith Whitaker on ChatGPT

She also talks about the dangers of trusting the models to return factual information.

I didn't say useless. I said not that useful in most serious contexts or that's what I think. If it's a low stakes lit review, a scan of these docs could point you in the right direction. It also might not. It also might miss certain things because you're looking for certain terms but actually, there's an entire field of the literature that uses different terms and actually if you want to research this and understand it, you should do the reading.

Not maybe trust a proxy that is only as good as the data it's trained on and the data it's trained on is the internet plus whatever fine-tuning data you're using.

I'm not saying it's useless, I'm saying it is vastly over-hyped and the claims that are being made around it are I think leading to a regulatory environment that is a bit disconnected from reality and to a popular understanding of these technologies that are far over-credulous about the capabilities.

Any serious context where factuality matters is not somewhere where you can trust one of these systems.
Meredith Whitaker on AI Hype and Doing the Reading

I remember Ezra Klein talking about the importance of doing the reading and the connections that can be formed in your mind as the material becomes more familiar to you. That depth of knowledge can provoke insights to create something new or to improve an existing service. Loading all your books into an expert system does not help this type of thinking if you never read them yourself.

Productivity in knowledge work is still incentivized to produce more volume rather than more quality. There's great story about Bill Atkinson when Apple decided to track the productivity by the number of lines of code that they wrote in a week. According to Folklore.org:

Bill Atkinson, the author of Quickdraw and the main user interface designer, who was by far the most important Lisa implementer, thought that lines of code was a silly measure of software productivity. He thought his goal was to write as small and fast a program as possible, and that the lines of code metric only encouraged writing sloppy, bloated, broken code.

He recently was working on optimizing Quickdraw's region calculation machinery, and had completely rewritten the region engine using a simpler, more general algorithm which, after some tweaking, made region operations almost six times faster. As a by-product, the rewrite also saved around 2,000 lines of code.
-2000 Lines Of Code (Andy Hertzfeld/Folklore.org)

I'm afraid that the diligence and craft displayed by Bill Atkinson would not be rewarded today when developers are encouraged to crank out as much code as possible using GitHub Copilot or some other AI assistant.

→ 11:24 PM, Jan 17

Do Users Write More Insecure Code with AI Assistants?

According to this study from Stanford University, the answer is yes. From the conclusion:

We conducted the first user study examining how people interact with an AI code assistant (built with OpenAI’s Codex) to solve a variety of security related tasks across different programming languages. We observed that participants who had access to the AI assistant were more likely to introduce security vulnerabilities for the majority of programming tasks, yet were also more likely to rate their insecure answers as secure compared to those in our control group.
Do Users Write More Insecure Code with AI Assistants?

→ 8:59 PM, Jan 17

Yes, Google Results Have Gotten Worse

404 Media reported on a study published by German researchers from Leipzig University, Bauhaus-University Weimar, and the Center for Scalable Data Analytics and Artificial Intelligence titled "Is Google Getting Worse? A Longitudinal Investigation of SEO Spam in Search Engines".

Google isn't the only search engine dealing with this issue. Jason Keobler writes:

Notably, Google, Bing, and DuckDuckGo all have the same problems, and in many cases, Google performed better than Bing and DuckDuckGo by the researchers' measures.
Google Search Really Has Gotten Worse, Researchers Find (Jason Koebler/404 Media)

The research does highlight how much damage search engine optimization (SEO) has done to the ecosystem of the internet. The release of generative AI is only going to make the problem worse. Amazon is dealing with product titles and reviews being generated using ChatGPT.

David Roth had a good piece on Defector about the promises made by the developers and boosters of AI and its actual use in the present day.

One reason it is not very interesting is that everything they have touted as the future of some essential human thing or other—the future of art, or money—has mostly crashed out in ways that left behind very little useful residue. Another is that the ways in which AI is used in the present, by your lower-effort plagiarists and scammers, are so manifestly not the future of anything that works, but rather both the present and the future of shitting-up web search results, which is roughly analogous to saying that robocalls about homeowners insurance are the future of human communication.
The Future Of E-Commerce Is A Product Whose Name Is A Boilerplate AI-Generated Apology (David Roth/Defector)

→ 11:09 PM, Jan 16

Restoring the Tech Worker's Dream

I love this video of Cory Doctorow explaining how the dreams of tech workers have changed over the past 15 years.

https://youtu.be/XwvqecNDHF0

This topic also appeared in his speech that he gave to Defcon earlier this year.

Remember when tech workers dreamed of working for a big company for a few years, before striking out on their own to start their own company that would knock that tech giant over?

Then that dream shrank to: work for a giant for a few years, quit, do a fake startup, get acqui-hired by your old employer, as a complicated way of getting a bonus and a promotion.

Then the dream shrank further: work for a tech giant for your whole life, get free kombucha and massages on Wednesdays.

And now, the dream is over. All that’s left is: work for a tech giant until they fire your ass, like those 12,000 Googlers who got fired six months after a stock buyback that would have paid their salaries for the next 27 years.

We deserve better than this. We can get it.
An Audacious Plan to Halt the Internet’s Enshittification and Throw It Into Reverse

If tech workers needed an example of the power they possess at this point in time, they need look no further than what happened at OpenAI when Sam Altman was fired by the board. He would not be the CEO today if the workers had not threatened to leave.

It's a small example and it will be interesting to see how Altman and OpenAI will react to try and break that solidarity in the future.

→ 12:13 AM, Dec 5

Brian Stelter on Breaking News

I was scrolling through my RSS feeds on Friday when I came across the news of Sam Altman being fired as the CEO of OpenAI. After the success of OpenAi's Dev Day the previous week, I was surprised by this. After reading the blog post announcing the decision I still didn't understand why exactly he had been fired.

According to the OpenAI board

Mr. Altman’s departure follows a deliberative review process by the board, which concluded that he was not consistently candid in his communications with the board, hindering its ability to exercise its responsibilities. The board no longer has confidence in his ability to continue leading OpenAI.
OpenAI announces leadership transition

This raised way more questions than it answered. I spent over an hour trying to find out what was going on but then I remembered a quote from Brian Stelter gave in an interview about breaking news.

We oftentimes have the most interest in a news story when there's the least amount of information. You know, something's breaking news and we really know absolutely nothing about it but that's when everybody wants to know everything and by the time we know all the facts, everybody's moved on.
Brian Stelter on the Offline podcast

It's important to for me to remember that I don't need to keep up with events like this. It will work itself out eventually and I can deal with it then. There's no point responding to speculation about things that haven't happened yet. It's usually wasted energy.

→ 6:08 PM, Nov 22