In the world of tech and innovation, there’s a perception that new and disruptive technologies originate solely from a select group of tech elites in Silicon Valley. While this is an obvious exaggeration, it contains a nugget of truth. Often, by the time a technology reaches the masses, we find ourselves simply as consumers, offering up our data without a thorough understanding of how it’s being used to deliver a service.
If you’re in a role where you influence how technology impacts communities, this article is for you. As a policymaker, understanding the fundamentals of database design is crucial for making informed decisions about the future of technology.
What Is a Database? What is Database Design?
Almost every company operates with a database. Essentially, a database is a repository to store data; it provides a structured framework to sort and categorize information. By using a database, management and analysis becomes a whole lot easier. Database design is the process of organizing data within the framework of a database. So simply put, a database is a structured system to efficiently organize data, while database design is the method of organizing data within these systems.
The term “database design” can includes aspects like data collecting, cleaning, transformation, and modeling, among other things. While each component is important, I want to focus my attention primarily on the roles of data collection and cleaning, and their potential implications when overlooked by policymakers.
To illustrate this, let’s consider the scenario in which due diligence must be conducted on a hypothetical startup called Help123. Help123 is a telemedicine company aimed at bridging the gap between mental health and psychiatric treatment. The policymaker must evaluate the potential large-scale societal impact of this startup in the field of mental healthcare.
How Policymakers Influence Societal Outcomes
The Help123 app personalizes mental health care pathways by matching users with appropriate medications for depression based on survey information regarding their demographic background and mental health history.
While this solution sounds promising in addressing the global mental health crisis, a lack of understanding about the backend database design of Help123 can result in some serious consequences.
Help123 uses a database management tool to gather and store data, including users’ medical and mental health histories. Without a basic comprehension of how this data is collected, the potential for introduced biases is completely overlooked. For example, data collection bias can happen when certain demographic groups are disproportionately represented in the data collection process, potentially skewing treatment outcomes.
Additionally, there is a risk of data integration bias when improperly merging different data sources. For example, pulling data on medication side effects from one source and information on side effects of another medication from a separate source might cause incomplete data on the comprehensive spectrum of side effects for various medications. And as a result, this discrepancy in information can introduce biases in the final dataset, which can affect the accuracy of treatment recommendations.
Even after proper data collection, the processing, cleaning, and transformation methods that follows can significantly skew outcomes. Labeling bias, for example, might happen when data is inaccurately labeled, causing a “medication recommender model” to fail. Feature selection bias is another concern that can affect treatment outcome. Feature selection is a process of deciding which data features are considered important enough to be included in a model. Biases can arise if relevant features are omitted or irrelevant ones are included, further complicating the outcome, and making the final recommendation unreliable.
Final Thoughts
Having a solid grasp of database design basics for policymakers goes beyond just technical proficiency; recognizing this reality allows them to play a key role in advancing technology, while also ensuring data safety. To avoid the necessity for future congressional hearings such as those endured by Mark Zuckerberg of Facebook and Shou Chew of TikTok, policymakers should continue to understand the essentials of database design.