By Gary Drenik, Forbes Contributor covering AI, analytics, and innovation

Artificial Intelligence (AI) stands at a pivotal juncture, facing a critical decision: continue down the path of large-scale data accumulation with its inherent challenges, or embrace more focused, high-quality data sources to achieve meaningful outcomes. This decision is underscored by the high failure rates of AI projects, with some estimates indicating that over 80% of AI initiatives do not succeed.

Challenges in AI Data Quality and Large Language Models (LLMs)

AI systems, particularly those utilizing Large Language Models (LLMs), encounter significant obstacles related to data quality and reasoning capabilities:

  1. Data Quality Issues: High-quality data is crucial for AI models to deliver accurate and reliable outcomes. Poor data quality can lead to incorrect predictions and flawed insights. According to Gartner, organizations measure data quality based on dimensions such as accuracy, completeness, reliability, relevance, and timeliness.
  2. Data Disappearance Concerns: The phenomenon of “data disappearing” refers to the loss or unavailability of data over time, which can hinder AI model training and performance. A study by the Data Provenance Initiative, led by MIT researchers, found that many web sources used for training AI models have restricted the use of their data, leading to a rapid decline in accessible information. Ensuring consistent data availability is essential for maintaining AI system effectiveness.
  3. LLM Reasoning Limitations: Recent studies, including one by Apple’s AI research team, have uncovered significant weaknesses in the reasoning abilities of LLMs. These models often struggle with mathematical reasoning and exhibit performance declines when faced with variations in problem statements.

The Road Less Traveled: Embracing High-Quality, Privacy-Compliant Data

In light of these challenges, a growing number of organizations are exploring alternative approaches that prioritize data quality over quantity. One such approach involves leveraging high-quality, privacy-compliant data sources like zero-party consumer surveys.

Morgan Slade, CEO of Exponential Technology, a global macro and institutional equity investment forecasting firm, emphasizes the importance of depth and quality data in AI initiatives: “The success of AI projects hinges not on the volume of data but on its relevance and accuracy. Leveraging high-quality, consented data sources like Prosper allows us to develop prediction models that are both effective and compliant.”

Key Outcomes:

  • Sales Forecasting and Predictive Analytics Prosper’s dataset enables highly accurate predictive analytics for retail sales and individual company revenues. With a historical foundation and continual updates, the data captures consumer trends, and changes in spending intentions, allowing for revenue forecasts for public retail companies up to two to three quarters in advance with over 90% accuracy. As discussed here on September 18, 2024, this provides a powerful tool for stock analysts and retail companies to anticipate sales cycles and seasonal fluctuations.
  • Microeconomic Signal Forecasting Beyond retail, consumer survey data provides signals related to broader economic indicators, such as the Consumer Price Index (CPI) and unemployment trends. By analyzing consumer spending intentions and economic attitudes, Prosper’s data can predict CPI movements approximately four weeks before official government releases, maintaining a directional accuracy rate exceeding 90%. This advantage allows users to make timely, data-driven decisions based on consumer trends, ahead of mainstream releases, which are critical for understanding market shifts and inflationary pressures.
  • Stock Portfolio Management Investment firms have leveraged survey data to craft more responsive and agile stock portfolios. Understanding consumer spending trends allows portfolio managers to make informed decisions regarding investments in retail and consumer sectors, aligning portfolios with anticipated market movements. For instance, as first published in this column in 2022, Morgan Slade’s data science team developed a stock portfolio management solution utilizing Prosper’s data, achieving an annualized return of 30.62% and a Sharpe Ratio of 3.1, indicating exceptional risk-adjusted returns. Notably, over 60% of the portfolio’s performance is attributed to unique alpha, suggesting that the returns are not replicable by other strategies and are largely independent of traditional market movements.

Tim Geannopulos, Partner at Broadhaven Capital and former CEO and Chairman of Trading Technologies, highlights the competitive advantage of using survey data: “The ability to predict both the quarterly revenues of hundreds of public companies and also some of the governmental economic indicators (i.e. CPI) with over 95% accuracy (and to do it a month or two before the rest of the industry) speaks for itself. This is changing the way that the trading industry views AI and predictive analytics in general; especially when you consider that the traders using such data outperform the S&P by more than 30%. Thus, it shouldn’t be a surprise that many of the trading industry’s tier one buy side companies have been long time customers of Prosper. The competitive edge of using survey data not only reduces the risk of adverse market reactions but it also enables investors to capitalize on trends before they are widely recognized.”

A Sustainable and Successful Path Forward

By focusing on high-quality, privacy-compliant data sources, organizations can reduce the need for extensive server infrastructure, leading to more sustainable and cost-effective AI initiatives. This approach not only addresses data quality challenges but also aligns with growing concerns about the environmental impact of large-scale data storage and processing.

Dr. Demirhan Yenigun, Chief Strategy Officer at Ereteam, a leading data analytics services company, notes: “Incorporating high-quality, privacy-compliant data into AI projects not only enhances model accuracy but also promotes sustainability by minimizing the computational resources required.”

As AI continues to evolve, the choice between pursuing large-scale data accumulation and embracing focused, high-quality data sources will significantly influence the success and sustainability of AI projects. Organizations that opt for the latter path may find themselves better equipped to navigate the complexities of AI implementation and achieve meaningful, lasting outcomes.

Share.
Exit mobile version