About the Role
Join remote teams building high-quality mathematical datasets and evaluations for AI systems. You will create rigorous problem sets (algebra, calculus, probability, statistics, discrete math), validate step-by-step solutions, assess chain-of-thought quality, and run prompt evaluation for reasoning tasks. Work spans RLHF preference labeling, rubric-driven QA evaluation, and large language model evaluation to ensure training data quality and model performance improvement across diverse domains.



