Hugging Face, a leading platform for open-source machine learning projects, has made a strategic acquisition of XetHub, a Seattle-based startup specializing in file management for artificial intelligence projects. This move aims to significantly enhance Hugging Face’s AI storage capabilities, enabling developers to work with larger models and datasets more efficiently.
XetHub was founded by Yucheng Low, Ajit Banerjee and Rajat Arya, who previously worked at Apple, where they built and scaled Apple’s internal ML infrastructure. The founders have a strong background in machine learning and data management, with Yucheng Low having co-founded Turi, a transformative ML/AI company acquired by Apple in 2016.
The startup has successfully raised $7.5 million in seed financing led by Seattle-based venture capital firm Madrona Ventures.
To appreciate the impact of this acquisition, it’s crucial to understand Git Large File Storage (LFS). Git LFS is an open-source extension that allows version control systems to handle large files more effectively. Hugging Face currently uses Git LFS as its storage backend, but this system has limitations. For instance, when developers update an AI model or dataset on Hugging Face’s platform, they must re-upload the entire file, which can be time-consuming for large files containing gigabytes of data.
XetHub’s platform introduces a game-changing solution by fragmenting AI models and datasets into smaller, manageable pieces. This approach allows developers to update only the specific segments they’ve modified, rather than re-uploading entire files. The result is a dramatic reduction in upload times, which is crucial for maintaining agility in AI development workflows.
Furthermore, XetHub’s platform provides additional features to streamline the AI development process, including:
- Advanced Version Control: Enabling precise tracking of changes across iterations of AI models and datasets.
- Collaborative Tools: Facilitating seamless teamwork on complex AI projects.
- Neural Network Visualization: Providing intuitive representations of AI model architectures for easier analysis and optimization.
By integrating XetHub’s technology, Hugging Face is poised to overcome its current storage limitations. This upgrade will allow the platform to host substantially larger models and datasets, with support for individual files exceeding 1 TB and total repository sizes surpassing 100TB. This capability is vital for Hugging Face’s ambition to maintain the most comprehensive collection of foundation models and dataset resources globally.
The acquisition of XetHub by Hugging Face promises a range of significant benefits for users of the platform. Developers can expect enhanced productivity through dramatically reduced upload times for large AI models and datasets, enabling faster iteration and deployment cycles. Collaboration among distributed AI development teams will become more efficient, fostering better teamwork and knowledge sharing. The integration also brings robust version control capabilities, allowing for improved tracking and reproducibility of machine learning workflows, which is crucial for maintaining quality and consistency in AI projects. Perhaps most importantly, the acquisition enables greater scalability, providing support for larger and more complex AI projects that push the boundaries of current technologies, thus opening new possibilities for innovation and advancement in the field of artificial intelligence.
The ability to efficiently handle larger models and datasets is particularly crucial as AI continues to evolve. Recent advancements in areas such as large language models (e.g., GPT-3, BERT) and computer vision have highlighted the importance of working with massive datasets and increasingly complex model architectures. Hugging Face’s enhanced infrastructure will enable developers to keep pace with these rapid advancements, potentially catalyzing new breakthroughs in AI research and applications.
With XetHub integration, the workflow for using Hugging Face models and datasets will be similar to Docker’s, which uses a layered file system instead of uploading and downloading the entire container image. Developers can pull or push only a fraction of the file that has been modified.
This strategic acquisition by Hugging Face is set to accelerate the democratization of AI technologies. By removing the technical barriers associated with managing large-scale AI projects, Hugging Face is making advanced AI development more accessible to a global community of researchers, developers and businesses.
Hugging Face’s acquisition of XetHub is an important step toward accelerating the adoption of open-weight models. By addressing critical limitations in data storage and management, this move solidifies Hugging Face’s leadership position within the AI development ecosystem.