The fiction analytics site Prosecraft, known for its data-driven analysis of word usage and writing style markers, recently faced a substantial shutdown due to vehement objections from authors. The central issue revolved around the site’s unapproved use of copyrighted works within its program.
Prosecraft.io, an innovative platform, utilized the text of more than 25,000 copyrighted books to create an extensive data library. This library allowed for the display of various writing-style markers, including word count, passive voice frequency, and vividness. Authors, upon discovering their works being employed without consent, expressed immediate disdain, sparking a significant backlash against the project.
How DARE you, @benji_smith
I demand you take my book off your site immediately. I do not consent to this, and never did. And I know my publisher never would pic.twitter.com/QvPkRme5pr
— Zach Rosenberg’s Debut Is Out Now! (@ZachRoseWriter) August 7, 2023
The incident initially gained traction when author Zach Rosenberg highlighted Prosecraft’s operations to fellow writers on X, the platform formerly known as Twitter. Subsequently, numerous authors, including high-profile figures like Jeff VanderMeer, Indra Das, and Gretchen Felker-Martin, joined the protest against the infringement of their creative rights.

How did Prosecraft operate?
A crucial aspect of the controversy stemmed from Prosecraft’s admission of utilizing “AI algorithms.” In an October 5, 2018 blog post, Benji Smith, the brains behind both Prosecraft and the writing program Shaxpir, revealed the integration of machine learning algorithms to recognize word usage patterns. These algorithms were trained on a vast collection of fiction text data, encompassing over 560 million words from more than 5,800 books authored by more than 3,300 well-known writers. Smith, however, did not disclose the sources of the literary works or whether proper permissions were obtained.
Remove all books and analysis for Jeff VanderMeer. You absolutely do not need a title by title run down. Just run a search on your own damn site.
— Jeff VanderMeer (@jeffvandermeer) August 7, 2023
Although Prosecraft’s technology didn’t directly resemble large language generative models like ChatGPT, concerns about its potential evolution toward such capabilities were valid due to its extensive library of books. In response to mounting opposition, Smith voluntarily decided to take down Prosecraft, providing further explanation through a comprehensive Medium blog post.
It’s worth noting that while Prosecraft incorporated excerpts from copyrighted texts and presented summary statistics, it neglected to secure permission from authors or publishers to construct a database based on entire works. Smith’s reliance on the concept of Fair Use, as mentioned in his blog post, was met with criticism as authors argued that using complete copyrighted works in a data training program crossed ethical boundaries.

The incident sheds light on a broader trend of authors’ escalating unease regarding the integration of AI in their field. Recent incidents, including unauthorized scans of books into expansive datasets, have ignited debates about intellectual property rights and the potential for AI-generated content to erode creators’ roles in their own artistic process.
The proliferation of AI tools, coupled with the surge in self-publishing, has fueled concerns about the prevalence of unethical practices. Instances of AI-generated travel guides inundating platforms like Amazon, and even the emergence of AI-authored children’s books, have underscored the need for more stringent ethical guidelines.

As AI technology advances, its integration into the creative sphere remains a contentious subject. While concerns persist about AI’s ability to replicate the artistry and individuality of human authors, there’s also growing apprehension that AI’s power might reshape traditional marketing and promotional strategies.
In essence, the Prosecraft incident serves as a poignant reminder that the creative world is at a crossroads. It’s grappling with the challenge of harnessing AI’s potential while upholding ethical standards and safeguarding the unique contributions of human authors. This incident prompts a broader conversation about how AI and creative expression can coexist harmoniously, with respect for the integrity and rights of creators.
Featured image credit: Prosecraft