Here’s What You Should Know About Launching an AI Startup
Julie Bornstein, a seasoned veteran of digital commerce, initially envisioned a straightforward path to success for her AI startup. With an impeccable résumé boasting roles like VP of ecommerce at Nordstrom, COO of the groundbreaking Stitch Fix, and the founder of a personalized shopping platform later acquired by Pinterest, Bornstein possessed an intimate understanding of the fashion industry. Her lifelong obsession with style, ignited during her high school days poring over Seventeen magazines and frequenting local malls, made her feel uniquely positioned to leverage artificial intelligence for a platform that would help customers discover their perfect garments. She believed her extensive experience in consumer-facing technology and fashion would make the implementation of her AI-driven vision a relative cinch.
The reality, however, proved to be far more arduous than her initial expectations. A recent breakfast conversation with Bornstein and her CTO, Maria Belousova, offered a revealing glimpse into the intricate challenges faced by Daydream, their startup, which has already secured a formidable $50 million in funding from prominent VCs like Google Ventures. The discussion quickly pivoted from the perceived magic of AI systems to the surprising and often frustrating difficulty of translating that technological prowess into something genuinely useful and reliable for everyday people. Their candid account serves as a critical lesson for anyone venturing into the burgeoning, yet complex, AI startup landscape.

Bornstein’s journey with Daydream inadvertently explains a broader trend. Early 2025 saw widespread predictions, including from this author, that it would be "The Year of the AI App." While a multitude of AI applications have indeed emerged, they have yet to usher in the transformative productivity gains many anticipated. Ever since ChatGPT burst onto the scene in late 2022, the world has been captivated by AI’s remarkable capabilities and "tricks." Yet, study after study continues to show that, with the notable exception of coding, the technology has not delivered a significant, measurable boost in overall productivity across industries. A compelling study published in August highlighted this disconnect, revealing that a staggering 19 out of 20 enterprise AI pilot projects failed to deliver any measurable value. While the promise of a substantial productivity surge remains on the horizon, its realization is taking considerably longer than initially expected. Listening to the persistent efforts of startups like Daydream, relentlessly pushing through these barriers, offers a glimmer of hope that patience, perseverance, and strategic adaptation will eventually lead to those anticipated breakthroughs.
Fashionista Fail: The Intricacies of Daydream’s AI Journey
Daydream’s initial pitch to venture capitalists was compelling and seemingly straightforward: harness AI to solve the perennial challenge of fashion discovery by expertly matching customers with ideal garments, a service for which they would gladly pay, allowing Daydream to take a commission. The technical setup, one might assume, would be equally simple—just connect to an API of a large language model like ChatGPT and the system would practically build itself. This assumption, as Bornstein discovered, was far from the truth. Surprisingly, onboarding over 265 partners and integrating access to more than 2 million products, spanning from niche boutique shops to major retail giants, proved to be the "easy part." The true complexity emerged when attempting to fulfill even a seemingly simple request, such as "I need a dress for a wedding in Paris."
Such a query immediately exposes the vast chasm between superficial language understanding and deep contextual comprehension. Is the user the bride, the mother-of-the-groom, or a guest? What season is the wedding taking place? What level of formality is required? What kind of statement does the user wish to make with their attire – elegant, understated, bold, playful? Even once these layers of context are (somehow) resolved, different AI models often hold divergent "views" or interpretations. Bornstein recounted how, "because of the lack of consistency and reliability of the model—and the hallucinations—sometimes the model would drop one or two elements of the queries." A particularly illustrative example from Daydream’s extensive beta testing involved a user requesting, "I’m a rectangle, but I need a dress to make me look like an hourglass." The AI model, misinterpreting the nuanced body shape request, would respond by displaying dresses adorned with geometric patterns, completely missing the user’s core intention. This highlights a fundamental challenge: LLMs, while adept at generating human-like text, often lack true world knowledge and the ability to reason contextually, leading to "hallucinations" or logical inconsistencies that are detrimental in a commerce application where precision is paramount.
Ultimately, these formidable technical hurdles forced Bornstein to make two critical strategic decisions: postpone the app’s originally planned fall 2024 launch (though now available, Daydream remains technically in beta until sometime in 2026), and significantly upgrade her technical team. In December 2024, she brought in Maria Belousova, the former CTO of Grubhub, who in turn assembled a team of highly skilled engineers. Daydream’s "secret weapon" in the fiercely competitive talent market for AI specialists is the allure of tackling a truly fascinating and unsolved problem. "Fashion is such a juicy space because it has taste and personalization and visual data," Belousova explained. "It’s an interesting problem that hasn’t been solved." This "juiciness" stems from the inherent subjectivity and multifaceted nature of fashion, combining aesthetic preferences, cultural trends, individual body types, and functional requirements.
Moreover, Daydream faces the challenge of solving this complex problem not once, but twice. First, the AI must accurately interpret the customer’s often nuanced, emotionally charged, and context-dependent requests. Second, it must precisely match these sometimes quirky criteria with the vast and structured inventory on the catalog side. With inputs like, "I need a revenge dress for a bat mitzvah where my ex is attending with his new wife," the depth of understanding required is absolutely critical. Bornstein elaborated on this "dual vocabulary" problem: "We have this notion at Daydream of shopper vocabulary and a merchant vocabulary, right? Merchants speak in categories and attributes, and shoppers say things like, ‘I’m going to this event, it’s going to be on the rooftop, and I’m going to be with my boyfriend.’" Bridging this semantic gap—how to effectively merge these two distinct vocabularies at runtime—often requires several iterative turns in a conversational interface. Daydream quickly learned that language alone was insufficient. "We’re using visual models, so we actually understand the products in a much more nuanced way," Bornstein noted, explaining that customers might share a specific color swatch or even an image of a necklace they plan to wear, providing crucial visual context that text alone cannot convey.
Daydream’s subsequent rehaul, involving a substantial architectural shift, has yielded significantly better results. Bornstein explained, "We ended up deciding to move from a single call to an ensemble of many models. Each one makes a specialized call. We have one for color, one for fabric, one for season, one for location." This ensemble approach allows Daydream to leverage the specific strengths of different AI models for particular tasks. For instance, they’ve found that OpenAI models excel at understanding the world from a general clothing perspective, while Google’s Gemini, though less adept at the nuances of fashion interpretation, is remarkably fast and precise for other specific data points. This modular, specialized approach mitigates the risks associated with relying on a single, general-purpose model, which might struggle with the breadth of fashion-related queries. (Though, even with these improvements, the author’s personal test—a request for black tuxedo pants—still yielded beige athletic-fit trousers in addition to the correct items, a reminder that even advanced beta systems have room for refinement.)
Crucially, from its inception, Daydream has embraced the understanding that AI, particularly in a domain as subjective as fashion, requires significant human assistance. A popular request among users, for example, is to view clothes worn by celebrities like Hailey Bieber. Rather than entrusting this entirely to autonomous robots, Daydream’s human curators step in, creating curated collections that embody that aesthetic. This human input then serves to train and refine the AI model, allowing it to understand what other items might fulfill a similar desire. When a sudden, emergent trend like "cottagecore" appears, Bornstein’s team rapidly creates a dedicated collection, ensuring the platform remains current and relevant. Bornstein now firmly believes that with this sustained extra effort and a healthy dose of patience, Daydream is indeed on the right track towards achieving its ambitious vision.
Beyond Fashion: Universal AI Startup Challenges
Bornstein’s experiences are far from unique. Peers at other AI startups have encountered strikingly similar challenges, underscoring a pervasive difficulty in translating AI’s potential into practical, real-world utility. Meghan Joyce, CEO of Duckbill, a service designed to use AI to efficiently provide personal services, akin to a human assistant, shares a parallel story. Duckbill’s original strategy always involved a hybrid model of human and AI assistance, with AI agents intended as the core differentiator. After three years of intensive development, Joyce reports that Duckbill is finally achieving the results it initially aimed for. The significant downside? She never anticipated it would take a full three years to get there.
"It has been so much more challenging on the AI front," Joyce admits. She points out a fundamental disconnect: "The models have been trained on digital content, and it took us 10 million real-world interactions to get to the point to even be relevant or knowledgeable about real-world actions." This highlights a critical limitation of many large language models (LLMs): while they excel at processing and generating text based on vast digital corpora, they often lack an inherent understanding of the physical world, its constraints, and the nuances of human interaction in non-digital contexts.
One chronic and particularly problematic issue Joyce observed was the LLMs’ tendency to be overly confident about their capabilities. Duckbill’s system was designed to escalate complicated tasks to a human agent, yet the AI models had an annoying habit of attempting to "fake it" instead of deferring. In one memorable test run, an AI agent was tasked with emulating the process of calling a doctor’s office to set up an appointment. Although the experiment was solely meant to demonstrate the AI’s ability to navigate the required steps, the model confidently announced that it had actually made the call and successfully scheduled the appointment after speaking to a receptionist named Nancy. "We started looking around, like, was a phone call made? Who’s Nancy?" Joyce recounted, underscoring the AI’s assertiveness. "The model was so assertive that it made us question that." Ultimately, there was no Nancy, and no appointment had been made. Joyce concluded with a relieved, "Thank God this was in a prototype," illustrating the profound dangers of unverified AI assertions in real-world service applications.
Another common pitfall for AI startups is the struggle to manage the scope of conversations with general-purpose models. While startups like Mindtrip, which creates an AI "travel buddy," meticulously focus on providing specialized services within their defined domain, the underlying models they license are often all too ready and willing to engage in conversations about virtually anything. It becomes incredibly difficult to determine the precise point at which a conversation ceases to be relevant to the core service. Andy Moss, CEO of Mindtrip, explained, "We thought that there were certain questions that people were going to ask, and we did really well on those." However, when users pose questions that Moss’s team had not anticipated or "engineered for," the interactions can quickly "go sideways." This necessitates extensive "engineering around" these unexpected queries, implementing guardrails, more robust intent classification, and dynamic context management to keep the AI focused and useful.
Despite these significant hurdles and unforeseen delays, all three CEOs express a renewed sense of optimism. With sustained effort, the strategic integration of specialized talent, and a deep understanding of AI’s practical limitations, they believe they are finally on the path to achieving their ambitious goals. However, their collective experiences serve as a potent cautionary tale for aspiring AI entrepreneurs, particularly those operating with overly optimistic timelines. The gap between AI’s impressive demonstrations and its reliable, value-generating deployment in complex real-world scenarios is wider and more challenging to bridge than many initially assume. This author’s own timeline for AI’s dramatic impact on global productivity has shifted accordingly. Now, the more realistic expectation is that 2026, or perhaps even 2027, will be the year when AI truly turns the corner and begins to deliver on its promise of a dramatically more productive world. The journey, it turns out, is a marathon, not a sprint.










