Dr. George Ng, Co-Founder and CTO of GGWP.
Since 2018, privacy laws such as the GDPR have reshaped the digital landscape, mandating transparent data usage and minimal storage. These laws, while crucial, often lack explicit guidance on data types for analysis, application limitations or specific data retention periods.
Navigating these regulations, many platforms have adopted a cautious approach, setting fixed retention periods based on broad data categories. This conservatism stems partly from the operational challenges in tagging diverse data classes and the complexity of nuanced legal interpretations. Most critically, it requires extensive coordination between business and product owners (who understand the data’s intended purposes) and legal and data management teams.
The impending enforcement of safety regulations like the Digital Services Act (DSA) places greater responsibility on companies for on-platform content. For example, a standard 30-day data retention policy, common in AAA gaming for chat data, may fall short in addressing issues like child grooming.
This article examines various innovative technical approaches and their limitations that can better harmonize user privacy with effective content moderation.
• Variable data retention. Hierarchical risk-based tiered retention, categorizing data by risk, complies with privacy laws while addressing safety concerns. For instance, extending data retention for users exhibiting prior risky behavior enables more thorough investigations, aligning with privacy regulations and providing essential context for moderation decisions.
• Precision in data access. Lexical scoping limits data access to moderation necessities, representing a commitment to data minimization. This approach might involve restricting access to non-incident data or not storing data for users with positive behavior histories, ensuring moderation remains effective without compromising privacy.
• Anonymizing flags. Hashing, or converting data into untraceable hashes, facilitates moderation while maintaining anonymity. Effective for identifiable content, it becomes less viable for voice and audio, where context is paramount. Addressing ambiguities and content beyond known databases presents an ongoing challenge.
• In-memory processing. Implementing simpler, more explainable triaging rules often leads to suboptimal, one-size-fits-all risk assessments. In-memory processing, which does not involve long-term storage, can generate additional indicators for deeper analysis. For instance, detecting aggressive tone in voice data can justify reprocessing and potentially storing relevant conversations for further investigation. Examples like this help preserve privacy but may also introduce additional model bias.
• Local processing. On-device processing keeps user data private, as the platform never stores it. If needed only for training, federated learning can adjust models without extensive data transfer. For large platforms, client-based sampling can limit data transfer, and data augmentation can enhance privacy before central processing. Each step incurs additional development costs and can introduce some noise into model tuning, though this is less significant for large samples.
Organizational Execution
1. Assess the feasibility of implementing AI-driven approaches, including vendor collaborations, and recognize the limitations of smart tagging.
2. Evaluate the benefits of improved or new detections against the costs of implementation, maintenance and additional data storage.
3. Capitalize on existing data tagging infrastructure, such as GDPR-compliant data deletion systems, to streamline content moderation enhancements.
Adapting to evolving safety regulations and risk landscapes also necessitates ongoing updates to risk assessment and data access protocols. Flexibility and regulatory awareness are essential for effective content management strategies.
Conclusion
For tech companies, striking the right balance between user safety and privacy is not merely a regulatory requirement but a key aspect of building digital trust. By adopting methods like tiered retention, lexical scoping and hashing, companies can responsibly moderate content while respecting user privacy. In the dynamic digital era, continuous innovation and adherence to legal standards are fundamental in cultivating a secure and private online environment.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?