Data drives smart decision-making in modern industries, but the old saying still holds true: “Garbage in, garbage out.” The quality and completeness of the data pulled for analysis play a huge role in the accuracy and effectiveness of the results.
Whether you’re accessing data from internal sources such as sensors, customer transactions or marketing campaigns or purchasing data from one or more vendors, due diligence is essential to ensure your findings aren’t skewed or incomplete. Below, 20 members of Forbes Technology Council share important steps to take when analyzing and selecting data sources. Follow their tips to empower analysis that’s enlightening, not misleading.
1. Seek Out Comprehensive Data
First, selecting relevant and critical data sources is important; if you include only one geographical area for global customers, the results will be biased. Second, ensure the quality of the data: Are there any duplicates, is the data standardized or are there missing attributes (and so on)? Third, enrich your data with third-party data (such as point of interest data and demographics) for better insights. – Tendu Yogurtcu, Precisely
2. Measure Against A Known Good Dataset
Starting with quality data sources may seem obvious, but the challenge is identifying the highest-quality solution. A company often assumes that its current data source is the standard to measure against, but it can be troublesome if you are not 100% sure of its quality. Start with a truth set of known good data and measure against that first. – David Finkelstein, BDEX
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?
3. Ensure The Dataset Is Fully Representative
When selecting data for analysis, it’s important that the dataset be fully representative of the system being measured and evaluated. The dataset should not be skewed by the manner in which it is collected—for example, we can’t track device failures only for those devices that log a problem. Otherwise, the analysis may not provide an accurate picture of how the system is behaving. – William Bain, ScaleOut Software, Inc.
4. Take A ‘Decision Back’ Approach
Focusing on data and analytics with a value-first drive is critical. To do this, a company must start with its business problem(s), not the data, and take a “decision back” approach to achieve a greater impact on the business. Value comes first, and data comes second. – Deepak Jose, Mars
5. Augment Primary Data As Needed
In a manufacturing environment, big data is only of value if one can extract insights that guide decision-making. To do this, aspects including data quality, accessibility, potential bias and coverage of the behavior or operational states of a plant or asset should be considered. Augmenting sensor data with other sources, such as fault reports or physics simulations, can strengthen and balance observations for model-building and improve insights. – Heiko Claussen, Aspen Technology, Inc.
6. Request Data Tests From Multiple Vendors
Being the leader of a big data company, I know the uncertainties of purchasing data—you never truly know what you’re getting. The most effective method I have found to validate data is to request a specific, quick-turnaround data test from three or four vendors. That’s the best trick in the book to prevent data hacks or manipulation and accurately assess the true quality of the data. – Ariel Katz, H1
7. Ensure The Data Relates To The Problem Or Question You’re Addressing
One key consideration when selecting data sources for analysis is the relevance and reliability of the data. It’s essential to ensure that the data being utilized is directly related to the problem or question being addressed and that it’s accurate and trustworthy. The choice of data source can impact the quality of insights derived in several ways. – Tina Chakrabarty, Sanofi Pharmaceutical
8. Trace The Data’s Origins
If your source is traceable, it typically means you can get reliable information on where it comes from, whether it respects relevant intellectual property and privacy considerations, what quality assessments have been performed on it, and whether it is suitably representative of the population to which your use case applies. – Shameek Kundu, TruEra
9. Balance Data Criticality With Ease Of Integration
The truth is in the data, and if the data is not trustworthy, the truth derived from it is impaired. The key consideration is balancing the criticality of the data source with the ease of integration with standard, out-of-the-box adapters. This balance can identify the low-hanging fruit that can enable you to prioritize and make fast progress. – Manoj Gujarathi, Dematic
10. Give AI Enough Data To Find Meaningful Patterns
It’s important to make sure that your data is wide-ranging enough for artificial intelligence to find meaningful patterns in it. If you limit yourself to a narrow slice of data that fits in with your preconceived hypothesis of what is happening, you run the risk of missing out on key insights that you have not considered—insights that AI can find if you consider enough data. – Michael Amori, Virtualitics
11. Include Sources That Rank Highly In Terms Of Suitability, Quality And Compliance
In data analysis, the output is only as good as the input. To get meaningful and actionable insights, include sources that rank highly for suitability, quality and compliance. Ensure sources fit the intended purpose of analysis, and qualify them based on their accuracy, completeness and reliability. Top it off with a compliance lens to safeguard the confidentiality of data protected by privacy laws. – Anupriya Ramraj, PriceWaterhouse Coopers
12. Make Sure The Data Is Truly ‘Available’
I’ve participated in countless projects where we planned to extract value by combining data from different sources, but after looking deeper, it turned out the system vendor that held the data did not provide an API or exports to make the data easily accessible. Pro tip: Check true availability early. It will save you a lot of time. – Erik Aasberg, eSmart Systems
13. Keep Information Content, Accuracy And Timeliness In Mind
The most important thing about data source selection is to think about the total information content of your dataset, as well as the accuracy and timeliness of the data source. You will get inaccurate insights if your AI system does not correct for variations automatically or if you don’t do some data preparation outside the system. – Gaurav Banga, Balbix
14. Avoid Overcollecting Or Hoarding Data
A common misconception about data analysis is that the more data you have access to, the better your analysis. It is quite the opposite. You need to be specific in terms of what you are looking to accomplish and review. This is why data-cleansing initiatives are often needed today—too much data is being collected. – Anna Frazzetto, Airswift
15. Seek To Fully Hear Your Customers
It is imperative that organizations understand the voice of the customer. Companies must get real-time insights from user feedback, from reviews to requests for tech support. Data sources should include app reviews, social media, support tickets, surveys and more. The best sources are where your users are active. This will provide a roadmap to improvements in product quality. – Christian Wiklund, unitQ
16. Comply With Ethical Standards And Regulations
Ethical considerations regarding privacy, anonymization and consent must be considered when selecting data sources. Comply with ethical standards and regulations, such as the GDPR, the CCPA, HIPAA and similar regulations. The quality of the insights derived will depend on several other factors as well, but adhering to the law will ensure you do not encounter trouble later on. – Nitesh Sinha, Sacumen
17. Clean And Normalize The Data
Data analysis is most effective when the data is clean and normalized to a standard taxonomy. In addition, you should consider the outcomes you are trying to produce or the questions you are trying to answer and input the data sources necessary to achieve the results you’re looking for. If the quality of the data is low, the quality of the results will be low (garbage in, garbage out). – James Carder, Eptura
18. Pay Attention To The Data’s Life Cycle
Within most organizations, data has a life cycle, and its importance and relevance to an analysis process is typically determined by that life cycle. Also, most organizations have multiple copies of their data and must use data intelligence tools to ensure they are leveraging the proper version of their data in the analysis process. – Russ Kennedy, Nasuni
19. Answer These Three Questions
The key consideration when selecting data sources is to find a fit between the desired business outcome and the data source so that it generates ROI. This can be accomplished by answering three questions. 1. What is the expectation and use case on the business side? 2. How consistent and predictable is the structure and frequency of the data? 3. How much human context needs to be added to the raw data? – Akash Mukherjee, Chartmetric
20. Hunt For The ‘Unsexy’ Data That Drives Real Value
Very often, the most impactful data isn’t flashy or readily available. It might be buried within your systems, require cleaning and transformation, or come from sources your competitors are ignoring. Remember, the quality of the data is directly connected to the value of the insights you can generate. – Adrian Dunkley, StarApple AI