MOSTLY AI CEO Tobias Hann

MOSTLY AI, CEO Tobias Hann

(US and Canada) MOSTLY AI, which pioneered the creation of AI-generated synthetic data, announced today that it has raised a US$25 million Series B round of funding led by Molten Ventures with participation from existing investors Earlybird and 42CAP, and new investor Citi Ventures.

The New-York-based company will use the funds to execute on its vision to build a smarter and fairer future grounded in responsible AI. Building on its established leadership in banking and insurance, MOSTLY AI plans to further accelerate its growth in Europe, aggressively capture more of the U.S. market, and ramp up hiring worldwide. MOSTLY AI is already working with multiple Fortune 100 banks and insurers in North America and Europe.

“MOSTLY AI is leading this emerging and rapidly-growing space in terms of both customer deployments and expertise,” said Christoph Hornung, Investment Director at Molten Ventures. “It is the top platform for structured synthetic data worldwide, and we are excited that together we can strengthen that position and accelerate MOSTLY AI’s hyper growth in the banking and insurance space.”

MOSTLY AI CEO Tobias Hann said that a drive towards responsible AI is helping fuel interest in the company’s synthetic data solutions. “2022 will be the year of synthetic data,” said Hann. “Synthetic data helps solve some of the industry’s most vexing issues when it comes to AI. It eliminates concerns about data privacy, it can be freely shaped and formed in order to accelerate AI initiatives, and it enables enterprises to augment and de-bias their data sets. We’re extremely excited about the future of synthetic data, and to partner with Molten Ventures, which shares our vision for fundamentally changing how companies work with data.”

According to Gartner, by 2024, 60% of the data used for the development of AI and analytics projects will be synthetically generated.

Synthetic data sets look just as real as a company’s original customer data reflecting behaviors and patterns with up to 99% accuracy, but without the original personal data points – helping companies comply with privacy protection regulations such as GDPR, while at the same time uncovering insights from the data. Unlike original data, synthetic data can be generated quickly in abundance, and is proven to drastically improve machine learning model performance. As a result, it is often used for advanced analytics and AI training, such as predictive algorithms, fraud detection and pricing models, as well as software testing. It also allows enterprises to use sensitive data in cloud environments.

Recent MOSTLY AI projects include:

  • Creating synthetic data sets for an insurer for retraining algorithms whose performance had degraded, and were exhibiting bias
  • Synthesizing 15,000 home addresses and linking the synthetic geodata to weather patterns for better insurance risk prediction
  • Narrowing the gap between high-earning men and women from 20%  to 2% in a U.S. census dataset
  • Evaluating a crime/fraud prediction dataset and then creating synthetic data that corrected a skew towards racial bias from 24% to just 1%

MOSTLY AI’s synthetic data technology has been proven to reduce time-to-data by 90%, save larger companies more than US$10 million annually on data provisioning and internal overhead, and boost available data by 85% for test data generation through data synthesis. MOSTLY AI has developed deep expertise in helping companies achieve business value from synthetic data – a goal that many enterprises have found elusive.

In addition to the funding, other recent MOSTLY AI milestones include:

  • Launching MOSTLY AI 2.0, the first synthetic data platform that can automatically synthesize complex data structures, making it ideal also for software testing.
  • Becoming the first synthetic data provider to achieve ISO 27001 certification.
  • Creating a new training program to help train the next generation of synthetic data superusers within enterprises. Several clients have leveraged this first-of-its-kind program to kickstart their synthetic data journeys, with very positive results.