No Jitter Midroll: AI Thrives on Good Data – Getting It Is the Hard PartNo Jitter Midroll: AI Thrives on Good Data – Getting It Is the Hard Part
Two surveys highlight the need for consistent, high-quality data in building reliable AI systems.
October 16, 2024
‘Garbage in, garbage out,’ is the oft-cited refrain with respect to data as the foundation for AI systems, generative and predictive alike. According to Hansa Iyengar, Senior Principal Analyst Enterprise IT with Omdia, “AI thrives on quality data, and ensuring that data is clean, accessible, and ready to power intelligent systems is key to unlocking its full potential.”
A pair of reports released on Tuesday, October 15, 2024, highlighted some of the challenges associated with providing AI with good data. For example, Salesforce’s CIO survey found that security/privacy threats and the lack of trusted data were chief among CIO concerns. Theta Lake’s survey of financial services firms’ use of unified communications and collaboration (UCC) solutions found that siloed data sources is a key issue – 75% of its respondents reported challenges in identifying, locating or retrieving information for reviews/audits.
“Organizations often have data scattered across multiple systems,” Iyengar said. “To prepare for AI, they must consolidate their data into a unified platform—whether it's a data lake, warehouse, or cloud solution—ensuring easy access for AI models.”
Salesforce: CIO Survey
On Tuesday, October 15, Salesforce released the results of its survey of 150 verified CIOs of companies with 1,000 or more employees. The study found, among other things, that IT is focusing on data initiatives before embarking on AI projects – CIOs report spending a median of 20% of their budgets on data infrastructure and management, versus 5% on AI.
The topic of ‘data’ more generally is apparently top-of-mind for CIOs. In an email, Salesforce said that “when we asked the open-ended question 'What is your biggest barrier to implementing AI?', 'data' was mentioned in 1/3 of the responses.”
The following chart illustrates the biggest fears CIO have regarding AI: security or privacy threats and a lack of trusted data rank as CIOs’ biggest AI fears. Salesforce said it defined trustworthy data as comprising its recency, accuracy, and level of integration.
“I think most CIOs today are now recognizing that we need to treat AI like we've treated other technologies, and make sure that we have the appropriate investments in data, the appropriate investments in infrastructure and security, and that we manage AI with responsibility,” said Juan Perez, CIO, Salesforce in a prebriefing. “Some CIOs are also being careful that they don't see the proliferation of what I call ‘shadow AI’ to the point that this [type of] AI will be unmanageable in the enterprise and can ultimately harm the business.”
Theta Lake: Compliance & Security Report 2024/25
Theta Lake’s report provides insights into how unified communication and collaboration tools are used by financial services firms, as well as into some of the challenges those types of companies encounter in a UCC-driven work environment. The survey was conducted by an independent third party in Q3 2024 of 500 total respondents (350 in the US and 150 in the UK).
The survey found that 35% of firms are using 7 or more UCC tools and 58% reported challenges with their current email archiving and voice recording solutions. The main concerns here relate to recordkeeping, reconciliation and reporting challenges which, at least in part, result from having to integrate audio, text and visual communications – which is what UCC excels at.
Seventy-five percent of respondents reported challenges in identifying, locating or retrieving information for investigations, regulatory examinations, FOIA or data subject access requests; 27% said they require significant manual resources to search multiple systems and modes of communications. This points to the siloing of records in email archives, voice systems, enterprise social media platforms and office productivity tools.
When asked how effective using artificial intelligence has been for supervision (reviewing calls for compliance, etc.), 34% of respondents said that they were working to improve the quality of the underlying data while 28% said that using AI for supervision has been resource-intensive to implement and/or update. Theta Lake says that this highlights the complexities and time-demands associated with fragmented data sources and incomplete records.
Dealing with Data
During the prebriefing, Salesforce’s Andy White, SVP, Business Technology, provided some advice on how CIOs could set their AI deployments up for success. First, create guidelines to ensure safe, trustworthy AI practices. Second, get data ‘in order’ and update security and purchasing guidelines to keep your data safe. Third, create clear metrics for success. “Many of our customers look at things like hours saved, deals closed, customer service, feedback and other data points to measure the success of the AI implementation,” White said.
Theta Lake, focused as it is around compliance and security for UCC, advocates for digital communications governance and archiving (DCGA) solutions -- a term which refers to the frameworks, policies, and technologies organizations use to manage, monitor, and secure digital communications. This approach is particularly important in regulated industries like financial services and health care, but also applies to following regulations such as GDPR and CCPA to protect customer data and ensure ethical and responsible data collection practices and AI usage.
According to Rohit Jain, Theta Lake's Distinguished Machine Learning Engineer, Theta Lake works with its customers to clean up data, anonymize it and then use parts of that data to train its algorithms. “When clients work with Theta Lake we don’t destroy the original data, but instead create a new cleaned up set of data to work with, without risk of losing the original.”
Keeping that original data – phone call recordings, transcripts, chat/email messages – serves as a source of ‘truth’ that can be referred to in the event of a dispute…or when the AI gets something wrong.
Omdia’s Iyengar said that ‘imbalanced data,’ such as a customer feedback dataset with far more positive than negative reviews, can skew AI predictions (not to mention human analysis). “Techniques like oversampling the minority class or using synthetic data can help maintain balance,” she said.
Ultimately, AI models are only as good as the data they learn from. “This means cleaning up errors, handling missing data, and standardizing formats like dates and currency,” Iyengar said. “Consistent, high-quality data is vital for building reliable AI systems.”