Osmo digitizes smell and cuts AI costs by 200x with Meta Llama on AWS

Osmo CEO Alex Wiltschko

As
part of its journey towards digitizing scent, Osmo
builds AI-powered tools that
generate fragrance formulas from creative briefs. Model development presents a
uniquely complex challenge because results can only be evaluated by physically
producing the scent. The team initially relied on proprietary frontier model
APIs, but this became unsustainable due to limited customization, lack of model
ownership, and high costs. By moving to Meta Llama
on Amazon
Web Services (AWS), Osmo
gained full control of its models and infrastructure, cutting training costs by
20 times and inference costs by 200 times. In evaluations by trained perfumers,
its open-model system matched the performance of leading frontier models.

Computers have learned
to see. They’ve learned to hear. They can detect light, motion, and sound at
scales and speeds that far exceed human ability. But for most of computing
history, one sense has remained out of reach: smell. Osmo is on a mission to change
that. The company combines machine learning, perfumery, chemistry, and
neuroscience to build what it calls “Olfactory Intelligence,” giving AI
systems the ability to sense and interpret the chemical world.

Today, Osmo primarily
serves the fragrance industry, while also actively expanding how scent can be
used as a medium for brand storytelling. Through its Studio platform, users
start from a sentence, a mood board, or an image, then set guardrails such as notes
to avoid, budget, and clean standards. Studio translates those inputs into
manufacturing-ready formulas that are delivered to customers’ doorsteps as
physical samples. By lowering the barriers to scent creation, Osmo is
transforming fragrance from an exclusive luxury into an accessible, powerful
tool for modern brand expression.

Building this
technology is a uniquely difficult machine learning problem. Smell is
extraordinarily high-dimensional. The human nose has roughly 400 distinct
receptor types, compared to just three for vision, and odor perception depends
on complex molecular combinations. Human descriptions of scents are notoriously
inconsistent, making labeled data hard to collect. “Unlike most AI
systems, we can’t evaluate results instantly. We have to physically produce the
fragrance and have experts smell it,” says Richard Whitcomb, CTO of Osmo.
Each evaluation cycle takes approximately two weeks. Trained evaluators assess
samples for how well they match the input brief along with qualitative
attributes like balance, wearability, and overall quality as a fine fragrance.
This creates a tightly coupled feedback loop between model output and
real-world evaluation. Because each loop requires physical production,
iteration is slow and expensive, making the cost and flexibility of the
underlying AI infrastructure critically important.

Osmo initially
approached the problem using large frontier model APIs, but the constraints of
proprietary platforms became hard to ignore as its ambitions grew. Training
methods were limited to what the platform permitted, and in a domain this
specialized, the ability to deeply customize model behavior wasn’t optional.
Neither was ownership: in a field where fine-tuned weights and proprietary
training data are the competitive advantage, building on models Osmo didn’t
control was an increasingly uncomfortable place to be.

Meta Llama models on
AWS gave Osmo the control and flexibility it needed to build its system the
right way. “We think about fragrance creation like
writing—building formulas ingredient by ingredient, similar to how language
models generate text,” says Wesley Qian, VP of engineering and research at
Osmo. “You lay down the major components first, then fill in the gaps, and the model
learns those patterns across data.” With Llama’s open weights, Osmo was able to
better support that structured, sequential process by shaping model behavior,
customizing training approaches, and retaining control over its infrastructure
and models. That included implementing techniques such as Direct Preference
Optimization (DPO), building agentic loops for iterative fragrance design, and
exploring reinforcement learning approaches like Group Relative Policy
Optimization (GRPO).

To support this
approach at scale, Osmo built a flexible training and deployment pipeline on
AWS. Amazon
SageMaker powers
Osmo’s training and fine-tuning workflows on proprietary datasets, while model
weights are stored in Amazon Simple Storage Service
(Amazon S3) and imported into Amazon Bedrock
for deployment. Amazon Bedrock provides auto-scaling
for inference endpoints, leading to cost-efficient serving across both
experimentation and production workloads. “Moving to Llama on AWS changed
how we think about building AI systems,” says Whitcomb. “Instead of
adapting our product to fit a model, we can now adapt the model to fit our
product.”

Throughout the process,
AWS teams worked closely with Osmo, meeting weekly to evaluate data preparation
strategies and model configurations. They provided reference implementations,
debugging support, and training and inference scripts so Osmo could move
quickly from experimentation to a stable production pipeline. After evaluating
Meta Llama variants, AWS and Osmo identified Meta Llama 3.1 8B and 3.3 70B
Instruct models as the strongest performer, with the 8B model also producing
viable outputs. The collaboration extended into how Osmo refines its models
over time: high-performing formulas are fed back into training through a
filtered augmentation loop, adding successful outputs back into the dataset so
that only validated results inform future model generations. “What AWS
enabled for us is a true end-to-end workflow from training to deployment that
we fully understand and control,” says Whitcomb.

Evaluations conducted
by trained evaluators proved the Meta Llama 3.3 70B model achieved performance
parity with Osmo’s production model built on large frontier APIs. There was no
statistically significant difference across core metrics, including fit to
descriptors and qualitative fragrance attributes. “We knew we were on the right
track when our sensory panels couldn’t distinguish between the Meta Llama 3.3
70B outputs and those from much larger frontier models,” says Sam Barnett,
senior machine learning engineer at Osmo. Most significantly, training costs
dropped by approximately 10–20 times, and inference costs fell by up to 200
times depending on model size and throughput. It was a fundamental shift in
what’s economically possible. As Whitcomb says, “We can experiment more,
iterate faster, and ultimately deliver better results to our customers, all
while maintaining ownership of our models and data.”

By eliminating
dependency on proprietary AI providers, Osmo also removed the risk of forced
deprecations or unexpected behavior changes. “This is really about moving
from renting intelligence to building it ourselves,” says Whitcomb,
“and creating a digital perfumer that we fully control and can
continuously improve.” With costs reduced and control fully in hand, Osmo
can iterate faster, generate more data, and refine its core models. “When you
control the model and infrastructure, you can explore ideas that just aren’t possible
in a more constrained system,” says Qian. This accelerates its broader mission
to make olfactory intelligence as accessible and powerful as the tools we
already have for sight and sound. “Once machines can interpret scent, you
unlock entirely new applications,” says Whitcomb, “from flagging spoiled
food or sensing mold in a building to detecting disease and identifying
environmental hazards.”

Learn how Amazon SageMaker
and Amazon
Bedrock open up new possibilities for
building, customizing, and deploying generative AI
models with full control.