Hello, dear readers!
If you’re just joining my newsletter, welcome! I typically offer a deep-dive into one random topic a month. This month’s topic is a rundown of common apps that creators use and what their policies are with respect to training language models (LLMs and SLMs) that are necessary for generative AIs to function.
As I pointed out earlier, when I mentioned that conversations involving AI are complicated AF, generative AI is one type of artificial intelligence that has already supplanted human workers in a variety of fields. Writers and artists are just two types of creators that are already at risk. And yet, in order for generative AI to continue to be effective, LLMs require additional data to remain effective. Still confused? Here’s a primer on Large Language Models, AI copyright, and how reliable and accurate AI is. If you’re doing your own research, please be mindful that it’s challenging to sort “shiny” articles promoting generative AI from the confusing reality. There’s a lot of investments driving the adoption of AI, and there’s quite a bit that still needs to be figured out for us plebes.
In the past few months, I’ve noticed a significant uptick in marketing AI to be more palatable and the rapid rollout of new AI-driven tools on programs that I typically use. Even operating systems like Windows 11 and Apple are integrating AI as part of their basic functionality. Even search engines (Google, edge) either display AI results by default or integrate them into their functionality with no options to turn off AI. If you want search results that don’t include AI, you’d have to use browser extensions or use StartPage.
In the interest of helping you make decisions and be aware, here are some common apps writers use and what their policies are.
Google Docs: According to Google, what you input into Google docs is not used to train AI, but your data is used in other ways. Since this is an online service, however, there is some speculation that your data could be scraped for training if it’s publicly available. There appears to be a distinction between customer data and publicly available data, and I’m not sure if that’s Google-specific terminology or industry-wide.
LibreOffice: LibreOffice is OpenOffice’s successor. I was unable to find any information about their policies for AI, but I did find some discussion about integrating AI using their search function. At present, I have to assume that LibreOffice is not collating customer data to train AI.
MS Word: The link will take you to a post claiming MS Word is automatically using your data for AI training, but will also walk you through how to turn off training. Microsoft has later refuted this claim, and their stated AI policies are here. When this article dropped, I did follow the steps myself regardless. YMMV.
Scrivener: Currently, Scrivener does not have plans to incorporate AI with the program itself. Their statement can be found here.
According to my cursory research, most social media platforms are more liberal with respect to data collation and training.
Bluesky: The platform recently issued a statement that they have no plans to scrape content.
Meta: Covers Facebook, Threads, and Instagram. Meta’s AI policy does collate most of your content for AI training. You can only opt out if you live in Europe or the UK.
Twitter/X: As of November 15th, X does use your content by default, and there’s no way to opt out.
Here are a couple of newsletter services and their policies.
Buttondown: This is a paid newsletter service that does not use your data to train AI.
Mailchimp: The newsletter service was purchased by Inuit, and is “all in” on AI. I was unable to find the exact language regarding the AI training. Here is an announcement and Mailchimp’s standard TOS.
Substack: Substack hasn’t issued a statement, but the platform does use content for training by default. To opt out, go to your Publisher Dashboard. Then, scroll on the left hand menu to the Privacy tab. Then, click the toggl for the “Block AI Training.”
Lastly, here’s a few other tools writers commonly use and what their policies are.
Evernote: The language in their Supplement Terms is a bit convoluted, but there is a key passage I want to point out: “We do not use Input or Output to train our artificial intelligence models or tools unless you direct us to do so.” This Reddit thread clarifies this a bit further.
LinkedIn: Every time there’s a new, shiny whistle LinkedIn adopts it and you have to opt out. So, yes LinkedIn is utilizing your data. Here’s how to opt out.
Notion: Notion does use data you input and shows you how. Here’s their security and privacy policy.
Zoho: Zoho offers a suite of project management apps, like Notebook. The service has integrated with Open AI and Chat GPT. Their Data Privacy and Security spells this out.
Want even more ways to opt out? As it turns out, several generative AI programs do allow you to opt out of your data usage. It just might be a little tricky to find the specific link, toggl, or place to request, but they do exist.
Like this post? You can buy me a coffee, subscribe, or share. I appreciate your support and hope this information helps empower you. I do plan on covering image-related apps as a follow-up, too.
Imagine that company thinking it's OK to train its AI with your imagination