Rec2Vec
Dense retrievers struggle to handle logical constraints such as negation, conjunction, and disjunction in product search, particularly when relevance depends on a small number of attribute-level differences. In Rec2Vec, we address this limitation by supervising dense embeddings with attribute-level edit distance, defined as the minimum number of feature changes required for a product to satisfy a Boolean query. We build a large-scale training dataset from the ESCI corpus by constructing contrastive triplets consisting of a positive product, a hard negative that violates specific attributes, and an easy negative sampled at random. To enable fine-grained supervision, we use large language models to extract synthetic product features and generate natural-language Boolean queries, allowing the embedding space to reflect logical structure rather than pure semantic similarity.
Contributors
- Matthew Toles ,
- Shachaf Rispler ,
- Eugene Wu