Repo-based Code Annotator

Remote opportunity for experienced engineers to build reproducible Docker-based test environments, strengthen unit test coverage, validate SWE-Bench/Terminal-Bench workflows, and write clear, standardized task documentation. Compensation: USD $80–$120 per day, based on skills and experience.

Job Image

About the Role

As a Repo-based Code Annotator, you will design and maintain reproducible, standardized test environments to replicate known issues or produce expected outputs according to defined procedures. You will review and improve unit test coverage, validate the completeness and rationality of test sets, and ensure that workflows for tasks related to SWE-Bench and Terminal-Bench are precisely aligned. Documentation quality and reproducibility are central to the role.

Key Responsibilities

• Build deterministic Docker images and environments to reproduce issues and generate expected outputs. • Review, extend, and refactor unit tests to evaluate correctness, stability, and coverage of target code. • Validate test set completeness and rationality; align task workflows with SWE-Bench and Terminal-Bench requirements. • Write clear, standardized documentation (e.g., task.yaml and README) to ensure consistency and reproducibility. • Develop Python-based task harnesses, automation scripts, and utilities supporting testing workflows. • Maintain clean Git/GitHub practices, producing high-quality, reproducible pull requests.

Required Skills

• Strong proficiency with Linux command line and Shell scripting; comfort with tools such as grep, sed, awk, curl, and jq. • Expert-level Python for building task harnesses, writing unit tests, and automating workflows. • Solid Docker experience, including authoring Dockerfiles and building reproducible environments. • Familiarity with pytest (or similar), with techniques for mocking data and controlling randomness. • Competence with Git/GitHub workflows for collaborative, reproducible development.

Professional Background

• Degree or equivalent experience in Computer Science, Software Engineering, Artificial Intelligence, or related fields. • Relevant experience in Software Development, Test Engineering, DevOps, or Data Engineering. • Preference for contributors to open-source projects, especially in automated testing, CI/CD, and containerization.

Bonus Points

• Proficiency in Go or Rust for performance-critical tooling. • Familiarity with Docker Compose or Podman and other sandbox technologies. • Ability to design datasets/tasks that mitigate task cheating. • Understanding of scientific benchmark design principles: fairness, repeatability, scalability. • Experience with automated testing systems or CI/CD, and a cross-disciplinary perspective.

Compensation

USD $80–$120 per day, dependent on actual skills and experience. The rate reflects the role's emphasis on reproducibility, robust testing, precise workflow alignment, and comprehensive documentation.

Work Setup & Collaboration

• Fully remote, suitable for distributed teams and asynchronous collaboration. • Collaboration through GitHub issues, pull requests, and code reviews. • Documentation-first approach with standardized procedures and reproducible outcomes. • Emphasis on deterministic builds, clear test artifacts, and traceable changes.

Tools & Technologies

• Linux, Shell scripting (grep, sed, awk, curl, jq). • Python, pytest, and testing utilities for mocking and randomness control. • Docker/Dockerfiles; familiarity with Docker Compose/Podman is a plus. • Git/GitHub, CI/CD systems. • SWE-Bench and Terminal-Bench task workflows and validation.

Success Indicators

• Deterministic, reproducible builds and environments. • Measurable improvements in unit test coverage and stability. • Clear, actionable task.yaml/README documentation that enables consistent execution. • Validated test sets and workflows aligned with SWE-Bench/Terminal-Bench guidelines.

Location & Schedule

• Remote role with flexibility across time zones. • Output-focused collaboration; occasional overlap for reviews or syncs may be requested.

Frequently Asked Questions

  • Q: What does a Repo-based Code Annotator do day-to-day?

    You will build reproducible Docker environments to replicate issues or expected outputs, create and refine unit tests, validate SWE-Bench/Terminal-Bench task workflows, write task.yaml/README documentation, and implement Python-based harnesses and automation with reproducible Git/GitHub practices.

  • Q: How strong should my Python and Docker skills be?

    You should be comfortable authoring Dockerfiles, building deterministic images, and writing Python test harnesses and automation. Familiarity with pytest, mocking, and controlling randomness is expected.

  • Q: Is this position fully remote?

    Yes. The role is fully remote and suited to asynchronous collaboration across different time zones.

  • Q: What is the compensation range?

    USD $80–$120 per day, based on demonstrated skills and relevant experience.

  • Q: Which tools are used most frequently?

    Linux CLI tools (grep, sed, awk, curl, jq), Python, pytest, Docker, Git/GitHub, and CI/CD systems. Familiarity with SWE-Bench and Terminal-Bench workflows is beneficial.

  • Q: Are open-source contributions required?

    They are not required but are preferred—especially contributions in automated testing, CI/CD, or containerization—as they demonstrate strong reproducibility and code quality practices.

  • Q: How is success measured in this role?

    Success includes deterministic builds, improved test coverage and stability, validated and rational test sets, benchmark-aligned workflows, and high-quality documentation that enables reproducible execution.

  • Q: Are Go or Rust needed for this role?

    They are not required, but proficiency in Go or Rust is a plus for performance-focused utilities.

  • Q: Will I design tasks or datasets?

    Yes, you may. Designs should consider preventing task cheating and follow benchmark principles such as fairness, repeatability, and scalability.

  • Q: What Git/GitHub workflow is expected?

    Use standard branching, clear commits, thorough tests, and documentation. Pull requests should be reproducible and easy to review.

230+Domains Covered
120K+PhD, Specialist, Experts Onboarded
50+Countries Represented

Industry-Leading Compensation

We believe exceptional intelligence deserves exceptional pay. Our platform consistently offers rates above the industry average, rewarding experts for their true value and real impact on frontier AI. Here, your expertise isn’t just appreciated—it's properly compensated.

Work Remotely, Work Freely

No office. No commute. No constraints. Our fully remote workflow gives experts complete flexibility to work at their own pace, from any country, any time zone. You focus on meaningful tasks—we handle the rest.

Respect at the Core of Everything

AI trainers are the heart of our company. We treat every expert with trust, humanity, and genuine appreciation. From personalized support to transparent communication, we build long-term relationships rooted in respect and care.

Ready to shape the future of code annotation?

Apple below.

I'M INTERESTED