Function-based Code Annotator (Remote)

Lead the creation of high-quality, function-level algorithm problems—from statement design to robust testing and validation. This role blends competitive programming expertise, bilingual technical writing (Chinese and English), and rigorous quality assurance to build fair, scalable benchmarks for real-world and research use cases.

Job Image

About the Role

• Design function-level algorithm problems spanning foundational to advanced topics in algorithms and data structures. • Write precise, bilingual (Chinese and English) problem statements, with unambiguous input/output formats and constraints. • Build comprehensive test datasets, covering edge and extreme cases to ensure correctness and discourage cheating. • Provide reference solutions, implement automatic validators/checkers, and perform difficulty grading and quality assessment. • Participate in cross-reviews to maintain scientific validity, fairness, and innovation across the benchmark suite.

Key Responsibilities

• Propose and refine problem ideas; abstract vague concepts into clear algorithmic tasks at the function level. • Author bilingual problem descriptions with explicit formats, constraints, and examples. • Design robust test suites: small/large cases, adversarial inputs, randomized sets, and hack-style instances. • Implement reference solutions and auxiliary tooling (validators/checkers, generators) in Python or C/C++. • Calibrate difficulty and quality via internal rubrics; monitor performance across solution variants. • Collaborate in peer reviews to ensure correctness, fairness, and novelty; iterate based on feedback. • Document assumptions, edge conditions, time/space complexity expectations, and intended solution strategies.

Required Skills

• Proven experience creating or reviewing problems for programming contests (e.g., ACM/ICPC, Codeforces). • Strong command of algorithms and data structures; specialization in at least one area (e.g., graph theory, dynamic programming, string algorithms). • Proficiency in Python or C/C++ for efficient algorithm implementation and writing validators/checkers. • Excellent problem abstraction skills—capable of converting high-level ideas into precise, testable specifications. • Ability to write clear technical content in both Chinese and English.

Professional Background

• Background in Computer Science, Software Engineering, Mathematics, or Artificial Intelligence; or experience as an Algorithm Engineer or Algorithm-focused Software R&D Engineer. • Experience evaluating code generation models is a plus.

Bonus Points

• Understanding of capability boundaries and common failure modes of major Code LLMs. • Experience writing algorithm blogs or teaching algorithm-related courses. • Prior work in benchmark design or contributions to open-source Online Judge (OJ) systems. • Demonstrated skill in crafting complex problems that assess logical reasoning in AI models.

Compensation and Work Setup

• Compensation: USD $120–150/day, based on skills and experience. • Work Model: Remote. • Collaboration: As part of a distributed team with periodic reviews and feedback cycles.

Frequently Asked Questions

  • Q: Is this role fully remote?

    Yes. The position is fully remote, with collaboration handled via online tools and scheduled cross-reviews.

  • Q: Is bilingual proficiency required?

    Yes. You will write problem statements and specifications in both Chinese and English, so clear technical writing in both languages is essential.

  • Q: What programming languages are preferred?

    Python or C/C++ are required for reference solutions and validators/checkers. Familiarity with scripting for test generation is helpful.

  • Q: What does a complete problem deliverable include?

    A bilingual statement (ZH/EN), explicit I/O formats and constraints, sample cases, reference solution(s), validators/checkers, data generators, comprehensive test sets (including edge and adversarial cases), difficulty tagging, and brief editorial notes outlining intended approaches.

  • Q: How is difficulty determined?

    Difficulty is proposed by the author and calibrated through internal rubrics and cross-reviews, considering algorithmic complexity, implementation traps, and solver performance.

  • Q: Do I need prior experience evaluating code generation models?

    It is not required but considered a plus, especially for designing tests that surface typical LLM failure modes.

  • Q: Which algorithm domains are most relevant?

    Breadth is valuable, but specialization in at least one core area—such as graph theory, dynamic programming, or string algorithms—is expected.

  • Q: How are anti-cheating and robustness addressed?

    Through carefully designed constraints, randomized and adversarial datasets, strict validators/checkers, and coverage of edge/extreme cases to discourage hard-coded or heuristic shortcuts.

  • Q: What is the compensation structure?

    The role offers USD $120–150 per day, commensurate with skills and experience.

230+Domains Covered
120K+PhD, Specialist, Experts Onboarded
50+Countries Represented

Industry-Leading Compensation

We believe exceptional intelligence deserves exceptional pay. Our platform consistently offers rates above the industry average, rewarding experts for their true value and real impact on frontier AI. Here, your expertise isn’t just appreciated—it's properly compensated.

Work Remotely, Work Freely

No office. No commute. No constraints. Our fully remote workflow gives experts complete flexibility to work at their own pace, from any country, any time zone. You focus on meaningful tasks—we handle the rest.

Respect at the Core of Everything

AI trainers are the heart of our company. We treat every expert with trust, humanity, and genuine appreciation. From personalized support to transparent communication, we build long-term relationships rooted in respect and care.

Ready to shape the future of code annotation?

Apple below.

I'M INTERESTED