Search This Blog

Powered by Blogger.

Blog Archive

Labels

Footer About

Footer About

Labels

Showing posts with label customized test benchmarks. Show all posts

Salesforce Launches AI Research Initiatives with CRMArena-Pro to Address Enterprise AI Failures

 

Salesforce is doubling down on artificial intelligence research to address one of the toughest challenges for enterprises: AI agents that perform well in demonstrations but falter in complex business environments. The company announced three new initiatives this week, including CRMArena-Pro, a simulation platform described as a “digital twin” of business operations. The goal is to test AI agents under realistic conditions before deployment, helping enterprises avoid costly failures.  

Silvio Savarese, Salesforce’s chief scientist, likened the approach to flight simulators that prepare pilots for difficult situations before real flights. By simulating challenges such as customer escalations, sales forecasting issues, and supply chain disruptions, CRMArena-Pro aims to prepare agents for unpredictable scenarios. The effort comes as enterprises face widespread frustration with AI. A report from MIT found that 95% of generative AI pilots does not reach production, while Salesforce’s research indicates that large language models succeed only about a third of the time in handling complex cases.  

CRMArena-Pro differs from traditional benchmarks by focusing on enterprise-specific tasks with synthetic but realistic data validated by business experts. Salesforce has also been testing the system internally before making it available to clients. Alongside this, the company introduced the Agentic Benchmark for CRM, a framework for evaluating AI agents across five metrics: accuracy, cost, speed, trust and safety, and sustainability. The sustainability measure stands out by helping companies match model size to task complexity, balancing performance with reduced environmental impact. 

A third initiative highlights the importance of clean data for AI success. Salesforce’s new Account Matching feature uses fine-tuned language models to identify and merge duplicate records across systems. This improves data accuracy and saves time by reducing the need for manual cross-checking. One major customer achieved a 95% match rate, significantly improving efficiency. 

The announcements come during a period of heightened security concerns. Earlier this month, more than 700 Salesforce customer instances were affected in a campaign that exploited OAuth tokens from a third-party chat integration. Attackers were able to steal credentials for platforms like AWS and Snowflake, underscoring the risks tied to external tools. Salesforce has since removed the compromised integration from its marketplace. 

By focusing on simulation, benchmarking, and data quality, Salesforce hopes to close the gap between AI’s promise and its real-world performance. The company is positioning its approach as “Enterprise General Intelligence,” emphasizing the need for consistency across diverse business scenarios. These initiatives will be showcased at Salesforce’s Dreamforce conference in October, where more AI developments are expected.

Customized AI Models and Benchmarks: A Path to Ethical Deployment

 

As artificial intelligence (AI) models continue to advance, the need for industry collaboration and tailored testing benchmarks becomes increasingly crucial for organizations in their quest to find the right fit for their specific needs.

Ong Chen Hui, the assistant chief executive of the business and technology group at Infocomm Media Development Authority (IMDA), emphasized the importance of such efforts. As enterprises seek out large language models (LLMs) customized for their verticals and countries aim to align AI models with their unique values, collaboration and benchmarking play key roles.

Ong raised the question of whether relying solely on one large foundation model is the optimal path forward, or if there is a need for more specialized models. She pointed to Bloomberg's initiative to develop BloombergGPT, a generative AI model specifically trained on financial data. Ong stressed that as long as expertise, data, and computing resources remain accessible, the industry can continue to propel developments forward.

Red Hat, a software vendor and a member of Singapore's AI Verify Foundation, is committed to fostering responsible and ethical AI usage. The foundation aims to leverage the open-source community to create test toolkits that guide the ethical deployment of AI. Singapore boasts the highest adoption of open-source technologies in the Asia-Pacific region, with numerous organizations, including port operator PSA Singapore and UOB bank, using Red Hat's solutions to enhance their operations and cloud development.

Transparency is a fundamental aspect of AI ethics, according to Ong. She emphasized the importance of open collaboration in developing test toolkits, citing cybersecurity as a model where open-source development has thrived. Ong highlighted the need for continuous testing and refinement of generative AI models to ensure they align with an organization's ethical guidelines.

However, some concerns have arisen regarding major players like OpenAI withholding technical details about their LLMs. A group of academics from the University of Oxford highlighted issues related to accessibility, replicability, reliability, and trustworthiness (AART) stemming from the lack of information about these models.

Ong suggested that organizations adopting generative AI will fall into two camps: those opting for proprietary large language AI models and those choosing open-source alternatives. She emphasized that businesses focused on transparency can select open-source options.

As generative AI applications become more specialized, customized test benchmarks will become essential. Ong stressed that these benchmarks will be crucial for testing AI applications against an organization's or country's AI principles, ensuring responsible and ethical deployment.

In conclusion, the collaboration, transparency, and benchmarking efforts in the AI industry are essential to cater to specific needs and align AI models with ethical and responsible usage. The development of specialized generative AI models and comprehensive testing benchmarks will be pivotal in achieving these objectives.