One of the most promising applications of Large Language Models (LLMs) is the creation of “Digital Twins”—AI agents designed to simulate the behavior, preferences, and decision-making processes of specific human individuals. This research initiative builds the foundations for silicon sampling, offering a scalable alternative to traditional human-subject research. We introduced Twin-2K-500, a massive benchmark dataset of over 2,000 digital twins based on real humans, and conducted a mega-study across 19 domains to evaluate their fidelity. Our findings reveal that while digital twins can capture relative heterogeneity, they struggle with precise individual prediction and exhibit a “blue-shift” bias—where richer persona descriptions paradoxically lead to more progressive, skewed simulation outcomes.