After selling Spectrum Labs, entrepreneur Justin Davis is back with another AI startup dubbed Nurdle. The company is coming out of stealth today with the same backers and a focus on transforming the landscape of AI deployment for enterprises.
Nurdle has funding from notable investors like Greycroft, Intel Capital, and Twilio Ventures. The company has stepped out of the shadows after the acquisition of Spectrum Labs by ActiveFence. Interestingly, Nurdle was incubated inside Spectrum Labs.
Leveraging groundbreaking technology, the company seeks to transform the landscape of AI deployment for enterprises, offering faster, cheaper, and more accurate custom language models.
Nurdle is an evolution of the promising project initiated within Spectrum Labs, designed to streamline and refine custom private Large Language Models (LLMs) through a proprietary “lookalike” data technology.
GamesBeat Next On-Demand 2023
Did you miss out on GamesBeat Next? Head to our on-demand library to hear from the brightest minds within the gaming industry on latest developments and their take on the future of gaming.
This approach fuses the precision of human-generated and labeled data with the agility, volume, and cost-effectiveness of synthetic data, rendering custom AI applications more accessible for various businesses.
Davis, CEO of Nurdle, said, “Spectrum Labs was the testing ground for Nurdle’s groundbreaking AI solution. Our technology initially focused on understanding human behavior for content moderation, which was one of the most intricate problems in Natural Language Understanding. With Nurdle, we extend these innovations to bolster AI teams, driving high performance across multiple communication-focused applications.”
The company’s methodology for content creation, data labeling, and quality assurance has demonstrated remarkable results, exhibiting a five times to 10 times reduction in data scientist hours while achieving 92% accuracy for human-generated and human-labeled data, and all at a mere 5% of the usual cost. This process has significantly accelerated the time-to-production by several months.
Prior to its emergence, Spectrum Labs, which did text-based content moderating using AI-driven natural language understanding (NLU), was valued at $146 million per PitchBook data in its last funding round, with clients like Riot Games, Grindr, and The Meet Group. The recent acquisition of Spectrum Labs by ActiveFence marked an all-cash deal, cementing its position as a major move in the AI landscape.
“We were a part of Spectrum Labs for the last year, and we were basically trying to find ways to create better datasets for Spectrum so that we could ultimately build better performing models for detecting hate speech across languages or cheating or spam or whatever the thing was for across our wide variety of clients. And what we stumbled upon was basically building a second LLM,” Davis said in an interview with VentureBeat.
It took a lot of data for the purpose of simulating human content on the internet, such as in-game chat, and then using that to simulate and create more datasets, he said.
“And that proved to be really effective. Once we figured out that Nurdle could create models that were 92% as accurate as human-labeled models for classification, we realized that was where we needed to go. We need to go take this technology to basically every company that’s like Spectrum, of our size or bigger, that is building a chatbot or a classifier or an LLM.”
Josh Newman, CTO of Nurdle, said in a statement, “Existing LLM applications such as ChatGPT present limitations for specific businesses and products due to their generalized nature, leading to inaccuracies that impair trust. This is the fundamental challenge Nurdle tackles by crafting tailored, high-quality synthetic datasets.”
NurdleGPT, a second-generation synthetic data generator, facilitates the creation of domain-specific unstructured text data, particularly for training language models. Unlike previous synthetic data generators geared toward structured, regulated industries, NurdleGPT focuses on unstructured text data from diverse domains, such as chat logs, call transcripts, emails, and articles.
The company is extending its insights and solutions through a series of workshops catering to product managers, data scientists, and AI/ML engineers, aiming to democratize AI applications. Interested parties can access further information about these workshops via Nurdle’s platform.
Nurdle empowers product teams by providing cutting-edge AI solutions, making AI deployment faster, more cost-effective, and simpler. The NurdleGPT data technology allows for the creation of highly precise, specialized “lookalike” datasets that redefine the accessibility and practicality of AI applications across various industries. Interested parties can explore and test their data for free at Nurdle.ai.
How this came to pass
AI products need to be privacy-safe and safe for kids.
“We felt like we had pretty good IP to do it. So once that became apparent, it became really clear we needed to find a good home,” Davis said.
The interesting and rare part was that Davis was able to proceed with a second startup while still working at Spectrum Labs. Slack managed to do that as it pivoted from games to enterprise communications. Others have done that too.
“We really cared about fighting spam and hate speech. But we didn’t really know the right technical solution that would emerge from the soup six years later. And it turns out that contextual AI was the right way to solve that problem. And we were right about that during our journey,” Davis said. “We did experience specific key problems in developing some of the most advanced AI that you can conceive, which is understanding human behavior on the internet for automated content moderation.”
Then came the rise of OpenAI and ChatGPT, and it introduced a new problem that wasn’t about content moderation per se.
“It’s about anyone trying to build an AI company period, and how expensive and how timely and how hard it is to get your hands on the right data, especially with all the privacy stuff that exists for good reasons,” Davis said. “And so we need ways to accelerate and get around that. Ultimately we did. So I think that’s normal, right? You’re an entrepreneur, you solve a problem that you feel and then ideally, experience another set of problems and that journey and then the cycle continues. That’s actually what we did here.”
ActiveFence disclosed that it bought Spectrum Labs in September. Spectrum and its existing investors funded Nurdle.
“At this point, we’ve got a few years of runway, we’ve got plenty of capital in the bank, and a team that’s done this before,” Davis said.
While at Spectrum, Davis learned that a lot of game companies were trying to build their own classifiers that Spectrum had to compete against and they were experimenting with generative AI to to intellectual property creation.
“So we were getting asked more and more if we could help build these classifiers, or build other kinds of AI. So we had a choice of doing this for the gaming market as a custom dev shop or going upstream from that and help a company build its own AI on top of the tools it has,” he said. “It became really clear it was it was a data strategy. We can help gaming companies simulate, create and label more in-game chat or chatter than what they actually have. And we can organize any of that content or game content in a much more effective way so that when they’re going out trying to build new NPCs, or new chatbots for whatever mechanism, or new gen AI tools for their communities, that they have better datasets to do that with.”
He added, ” So we don’t want to be on the hook building AI. We’re on the hook for helping them with synthetic data to do a better job of building AI for themselves. And that seems like a better way for us to cater to a wider variety of use cases at gaming companies and also other types of verticals that we never got to work with before like healthcare, financial services and martech companies.”
How it works
Many companies have datasets. Game companies might have data for a non-player character. They could have that NPC talk to players in a game, but they first need to tune that data so that an 18-year-old NPC doesn’t sound like a 50-year-old NPC. The data gets tuned based on how people talk and they Nurdle creates synthetic data from that which is privacy compliant. Then they use that to train the NPC to speak in the right way and understand when someone is talking in the jargon of the game.
“That’s fine-tuning,” he said. “But you want it to be accurate too and deal with the problem of hallucinations that people talk about. That’s solved through another data problem, which is a process called retrieval augmented generation, where you’re basically telling it that before it answers a question, go refer to a certain set of data. So this could be like in the case of like an NPC, like the rules of the game, gaming manuals, catalogs or whatever is available. So then it doesn’t answer and make up something that doesn’t exist in the game, right, which it is prone to do.
That process of making sure that that system works right is called RAG, or retrieval augmented generation. To do that properly, you have to have datasets to be able to test it against. So a lot of people can build this stuff with their own data. And then they have no way to benchmark it or test it to see if it’s actually accurate, Davis said.
“Our datasets created custom for this help you that,” he said. “Everybody gets super excited about ChatGPT. But a lot of businesses realized that they can’t trust it or it doesn’t have the domain knowledge that you need for your purposes.”
Once you have this capability, you can properly assess if you can run with a cheaper version of the technology without sacrificing accuracy. The company has 18 people.
ActiveFence was involved in the process of vetting the tech for Nurdle as they needed new datasets to make the classifiers from Spectrum Labs even better.
“We continue to work on trust and safety,” Davis said. “These game companies are all going to need AI and the datasets for in-game chat features. If you’ve never had in-game chat, you need simulated data and content to better test these systems and figure out what you know. There is a lot of utility here and we are just starting to scratch the surface on it.”
The company ran pilot testing while under Spectrum Labs and so that’s why it’s ready to roll out already as a separate entity, Davis said.
“We’re in private betas with some early partners and customers, and we are keen to onboard a few more before we make this broadly available,” he said. “We have quite a few brewing with gaming customers and a variety of different verticals. It’s primetime. We’re ready to rock.”
GamesBeat’s creed when covering the game industry is “where passion meets business.” What does this mean? We want to tell you how the news matters to you — not just as a decision-maker at a game studio, but also as a fan of games. Whether you read our articles, listen to our podcasts, or watch our videos, GamesBeat will help you learn about the industry and enjoy engaging with it. Discover our Briefings.