Protege

Data Infrastructure and Analytics

New York City, New York 4,829 followers

The AI training data platform. Connecting data holders with vetted data users.

See jobs Follow

Discover all 76 employees

About us

The biggest unmet need in AI today is getting access to the right training data. Data holders often don’t know where to start and are rightly concerned about governance, intellectual property, and security implications. AI companies can spend years finding and negotiating access to the data they need. Protege is solving these problems by providing an easy-to-use platform to connect data holders with vetted data users.

Website: http://www.withprotege.ai
External link for Protege
Industry: Data Infrastructure and Analytics
Company size: 11-50 employees
Headquarters: New York City, New York
Type: Privately Held
Founded: 2024
Specialties: data, AI, and training data

Locations

Primary

New York City, New York, US

Get directions

Employees at Protege

See all employees

Updates

Protege

4,829 followers
1d
Report this post
Announcing Clothing Data X Motion Capture for AI development 👕🏃♂️ Protege and Render Ready have partnered to deliver what may be our coolest data yet: Multimodal clothing datasets that pair high-fidelity garment captures with real human motion! The GIF does a great job showing how this works. What’s included: 🩳 High-fidelity garment data (photos, 3D scans, measurements, materials) 🏃➡️ Motion capture sequences aligned to those garments ☑️ Production-ready deliverables (textured 3D assets + structured metadata) This helps teams better model how clothes actually move, fold, wrinkle, and interact on bodies in motion. We all wear clothes, and rendering that correctly is especially pivotal for image and video generation. We see this being used in many areas! 🎮️ Gaming + simulation teams 🔬 ML + research teams 🤖 Robotics + digital twin builders 👚 Apparel + manufacturing innovators If your team needs production-ready 3D assets and metadata aligned to real motion, we’d love to talk. 👇️ 🤸♂️ More cool examples in our case study - see the comments!
7 Comments

Like Comment Share
Protege

4,829 followers
2d
Report this post
We’ve seen firsthand how healthcare teams building AI products and models can struggle. But the issue usually isn’t lack of internal expertise or clarity of vision. Again and again, we’re seeing that the struggle is getting cohort data fast enough for live development cycles. Especially when the data needs to be connected to existing datasets and/or involve multiple modalities. In our newest Case Study (link in comments!), we show how Protege, Loopback Analytics, and other healthcare data providers unlocked connections between patient cohorts in record time. In this case, the partnership helped a healthcare AI company connect patient-level data with new EHR data… all with: ⏱️ 30-day turnaround from contract signature to cohort delivery 🧩 Single data provider network: one access point to multiple EHR partners, unified into de-identified datasets 📈 Scalable cohort expansion strategy: ongoing refreshes and new partner data scoped to expand the matched cohort over time, all using existing pipelines 💰 New revenue for providers: healthcare data partners earn revenue with new AI training data deals using assets previously used for other Protege opportunities If you’re building healthcare AI — or sitting on underutilized healthcare data assets — this short case study is for you. 🔗 Full case study in the comments!
2 Comments

Like Comment Share
Protege

4,829 followers
6d
Report this post
What does it take to unlock AI’s true value? 🤔 In a recent interview with Authority Magazine Protege CEO Bobby Samuels explains why AI’s potential starts with access to high-quality training and evaluation data. Even the most advanced algorithms are only as effective as the data behind them, and today, much of the world’s most valuable data remains siloed, sensitive, and inaccessible. 💡 Full interview linked in comments!
3 Comments

Like Comment Share
Protege

4,829 followers
1w
Report this post
We’re excited to partner with PiZetta Media to bring emotion-rich, real-world audiovisual storytelling into responsible AI development. PiZetta’s de-identified video interviews and mini-documentaries capture authentic, lived experiences communicated by real people in relayed in real time—content that’s hard to recreate synthetically, but crucial for building more human-centered AI. They’re also de-identified for privacy protection and ready for helping AI systems reflect the full range of human emotion and experience. Why this matters: 🎙️ Real dialogue about lived experiences that show emotional depth 😶🌫️ Genuine emotion—facial expressions, tone shifts, conversational depth “As AI becomes something people turn to for support and connection, I want to ensure these systems can respond to real human emotion authentically,” said Michael VanZetta, co-founder and CEO of PiZetta Media. “Partnering with Protege allows us to contribute to AI development in a way that prioritizes emotional storytelling and human-centered content over polished, sanitized narratives.” “Emotional authenticity is one of AI’s big challenges right now,” said Ashley Snow, Protege’s Audio Partnerships Lead. “PiZetta Media's content provides genuine, emotional data that AI needs to learn from if we want these systems to interact with humans more empathetically and authentically.” 🔗 Full announcement in the comments! #AI #ResponsibleAI #TrainingData #MultimodalAI #MentalHealth #EmotionalIntelligence

7 Comments

Like Comment Share
Protege reposted this
Healthcare AI Guy

8,115 followers
2w Edited
Report this post
NEW: Protege, an AI data platform for licensed real-world datasets, raised a $30M Series A led by Andreessen Horowitz ($65M total raised). Protege helps AI companies (incl. many of the Mag 7) access and curate trusted data across healthcare, media, & audio, at scale. Congrats to the team & investors! cc: a16z Bio + Health, Bobby Samuels, Richard Ho, Engy Ziedan, Daisy Wolf, Travis May, Eva Steinman
2 Comments

Like Comment Share
Protege reposted this
Don Addington
2w
Report this post
A good friend once said: Life moves pretty fast. If you don't stop and look around once in a while, you could miss it. Ok, so that was Ferris Bueller. But he was right. Life is, indeed, moving quite fast, especially here Protege. And we'll move even faster with this exiting investment from our friends at Andreessen Horowitz... In addition, between Nov (when I joined Bobby Samuels, Engy Ziedan, and team at Protege) we've continued to cement our position as THE premiere partner to researchers building the most important, exciting, and impactful models ever - if you can think of a company doing important work building models, we're probably partnered with them, or fielding hard questions from them daily. As the infrastructure for real-world data in AI-development, we're providing our customers with the data they need to provide the most authentic representation of how people and systems actually behave. In order for AI to achieve the greatest heights, it must be grounded in truth. Real world truth. And Protege is hard at work sourcing, curating, advising on, and delivering the highest quality data sets to help drive some of the biggest opportunities of our time. Thank you to our Data Partners, our incredible (and intensely collaborative) customers, team members, and, of course investors. The journey has just begun and I'm incredibly honored to be part of this team (which is also growing rapidly ;)....
Bobby Samuels

CEO at Protege | Unlocking Data Access for AI Development
2w Edited

Today, I’m thrilled to announce that Protege has raised a new $30M funding round led by Andreessen Horowitz. When Travis May, Richard Ho, Engy Ziedan, and I first started Protege in 2024, we saw three bottlenecks to AI’s progress — compute, models, and data. Compute has scaled dramatically, models continue to improve, but access to the right data remains the hardest part. That’s why Protege exists: to be the infrastructure for real-world data in AI development. In the few months since our last round of funding, we’ve felt the urgent demand pull across industries, domains, and modalities. We are rapidly expanding to meet those needs. When AI builders come to Protege, they’re looking for real-world data: the most authentic signal of how people and systems actually behave. This is not synthetic data created by AI nor manufactured data created to simulate human behavior. Across every stage of the AI development lifecycle — from pre-training to post-training to fine-tuning to evaluation — AI builders need this data. They’re looking across modalities and industries: healthcare, video, audio, motion capture, gaming, manufacturing, life sciences, real estate, finance, education, and many more. Foundational, multi-modal model-builders (including the majority of the Magnificent Seven) now work with us across multiple domains along with dozens of other model builders. Thanks to Daisy Wolf, Eva Steinman, and the rest of the a16z team for joining us on this journey as well as our existing investors also participating in this round — Nikhil Basu Trivedi at Footwork, Saar Gur at CRV, Bloomberg Beta, Flex Capital, Shaper Capital, and others. Lastly, a huge thank you to the Protege team for making this all possible. Details linked in comments. Emily Lindemer, PhD Don Addington Rebecca Houser Hillary Acer Dave Davis 🔜 CES Caleb Lee #AIDevelopment #EnterpriseAI #DataInfrastructure #a16z #StartupFunding #AI #Protege
14 Comments

Like Comment Share
Protege reposted this
Dave Davis
1w Edited
Report this post
Thank you to Jiangsu Broadcasting for a great interview at Asia TV Forum & Market (ATF) in December. True story, ATF was so busy that I had forgotten about this interview until my colleague Weiguang Shi serendipitously saw it on WeChat! Protege June Yeoh Xenia Shevnina

4 Comments

Like Comment Share
Protege reposted this
Shayla W.
1w Edited
Report this post
🚀 We’re hiring Software & Data Engineers Protege is building the AI training data platform that gives AI teams access to real-world data that isn’t publicly available. We recently closed a major funding round (details in comments) and we’re growing our core engineering team under Richard Ho's (Co-Founder, CTO) leadership! We’re looking for strong backend/infra/data eng who want to: -Build foundational systems from the ground up -Work on real data problems with real impact on AI models -Join early and help shape architecture, tooling, and technical direction -have high ownership, move fast, and build things that have an impact -work with a team that works hard and also has fun 🕺 ⭐ We're hiring across levels, prioritizing mid-staff If this sounds like a fit for you or someone in your network, please apply below Any referrals are extremely appreciated! Link to apply: https://lnkd.in/gtXXsvts cc: Richard Ho **us-based
11 Comments

Like Comment Share
Protege reposted this
Carlos Sanchez
1w
Report this post
I’ll be at Content Americas in Miami to participate on the panel: "𝗛𝗼𝘄 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲 𝗔𝗜 𝗶𝘀 𝗯𝗲𝗶𝗻𝗴 𝘂𝘀𝗲𝗱 𝘁𝗼 𝘀𝘂𝗽𝗲𝗿𝗰𝗵𝗮𝗿𝗴𝗲 𝘁𝗵𝗲 𝗡𝗲𝘄 𝗖𝗼𝗻𝘁𝗲𝗻𝘁 𝗘𝗰𝗼𝗻𝗼𝗺𝘆." I’m honored to explore this topic alongside a powerhouse lineup: Erik Barmack (Wild Sheep Content) John Attard (Showdog Studio) Ana Maria Bond (AlgoVerde.ai) We’ll be covering the full spectrum of the AI revolution—from innovations in production workflows to new monetization opportunities. 👇 Join the session: 🗓️ Thursday, January 22nd at 3:00 PM (EST 📍 Hilton Miami Downtown ----------------------------------------------------------------------------- 👋 Let's catch up: I will be at the conference from Tuesday through Thursday. If you are a rights holder managing a library and want to discuss what value looks like in the current AI landscape, send me a DM. I'd be happy to find a time to meet. Protege #ContentAmericas
8 Comments

Like Comment Share
Protege reposted this
Dave Davis
1w Edited
Report this post
Thanks Peter Csathy and Victor Harwood on a great Digital Hollywood panel at CES - good enough in fact for Rob Salkowitz at Forbes to do an article about it! (Link to article in comments.) Here's my favorite part: "Dave Davis, chief content officer for Protege, a new company set up to license content to AI companies under equitable financial terms for creators, says the company saw revenue grow from zero to “eight figures” in 2025 as tech companies lined up to secure access to rich, well annotated multi-modal data (video, audio, text) through legitimate channels. That suggests that legal challenges may have altered the smash-and-grab mindset that characterized the first phase of the AI data gold rush, opening the door for better deals for creators willing to play ball." Rebecca Grossman-Cohen, Emi Wayner, Jason Henderson, Esq. Protege
7 Comments

Like Comment Share

Browse jobs

Funding

Protege 2 total rounds

Last Round

Series A Sep 13, 2025

US$ 25.0M

Investors

Footwork + 4 Other investors

See more info on crunchbase

Protege

Data Infrastructure and Analytics

New York City, New York 4,829 followers

The AI training data platform. Connecting data holders with vetted data users.

About us

Locations

Employees at Protege

Mike Smith

James Golden

Xenia Shevnina

Dave Davis

Updates

Join now to see what you are missing

Similar pages

Aurasell AI

Arintra

August Health

Squint

Tavily

Lorikeet

Bluefish

Casap

InstaLILY AI

Isaac Health

Browse jobs

Engineer jobs

Senior Software Engineer jobs

Analyst jobs

Technical Lead jobs

Technical Director jobs

Civil jobs

Hardware Engineering Intern jobs

Developer jobs

Operations Engineer jobs

Associate jobs

Intelligence Specialist jobs

Business Development Associate jobs

Project Manager jobs

System Engineer jobs

Junior Software Engineer jobs

Graduate Engineer jobs

Data Engineer jobs

Virtual Assistant jobs

Graduate jobs

Customer Service Representative jobs

Funding