AI Prompt Engineer Jobs in the United States

AI prompt engineers design, test, and optimize prompts and evaluation workflows that improve large language model behavior in real production settings. On Rex.zone, these remote, full-time roles support LLM training pipelines through prompt engineering, RLHF-style human feedback, prompt evaluation, QA evaluation, and content safety labeling across NLP, tool use, and multimodal use cases. You will translate product requirements into prompt strategies, build prompt test suites, measure model performance improvement, and iterate with engineers and researchers to reduce hallucinations and increase instruction-following. Explore Rex.zone to find United States prompt engineer jobs spanning AI labs, tech startups, BPOs, and annotation vendors, including contract, freelance, entry-level, and senior pathways.

Job Image

AI Prompt Engineer Jobs in the United States (Remote)

Title: AI Prompt Engineer Jobs in the United States Date: 25-02-2026 Company: Rexzone Country: US Remote Type: Remote Employment Type: FULL_TIME Experience Level: Mid-Senior Industry: Technology Job Function: Engineering Skills: Prompt Engineering, LLM Evaluation, RLHF, Prompt Evaluation, QA Evaluation, LLM Training Pipelines, NLP, Content Safety Labeling, Tool Use, RAG Salary Currency: USD Salary Min: 63360 Salary Max: 126720 Pay Period: YEAR

About the Role

You will own prompt engineering and prompt evaluation workflows for LLM-powered features, focusing on reliability, safety, and measurable model performance improvement. This includes writing and versioning system prompts, building prompt libraries, crafting adversarial and edge-case test prompts, and running A/B prompt experiments. You will partner with engineering to integrate prompts into applications (tool calling, function calling, RAG), and with data teams to define labeling taxonomies and QA evaluation rubrics aligned to annotation guidelines compliance. You will help operationalize human feedback loops similar to RLHF, including preference data collection, rubric-based scoring, and prompt regression testing to prevent quality drift.

Key Responsibilities

Design and iterate system prompts, developer prompts, and evaluation prompts for instruction-following, reasoning, and tool use. Create prompt test suites with coverage for safety, policy, and content quality, including red-teaming style adversarial prompts. Define evaluation criteria and scoring rubrics for prompt evaluation, QA evaluation, and human feedback collection. Analyze model outputs for failure modes (hallucinations, refusal errors, policy gaps) and propose mitigations. Collaborate on RLHF-adjacent workflows: preference ranking, rubric scoring, and dataset iteration with labeling teams. Work with engineers to integrate prompts into production workflows (RAG grounding, retrieval prompts, tool schemas, guardrails). Document prompt standards, prompt versioning practices, and prompt deployment checklists for repeatable operations. Contribute to content safety labeling guidance and escalation procedures for sensitive content.

Required Qualifications

Experience designing prompts for LLM applications and evaluating output quality with structured rubrics. Strong understanding of NLP concepts, LLM behavior, and common failure modes. Ability to build practical evaluation frameworks (gold sets, acceptance tests, regression suites) for prompt-driven systems. Comfort working with cross-functional partners across engineering, product, and AI/ML data operations. Clear writing skills for prompt guidelines, annotation guidelines compliance, and QA documentation.

Preferred Qualifications

Hands-on experience with RLHF concepts, preference data, and evaluation datasets. Experience with RAG, embeddings, retrieval quality, and grounding strategies. Familiarity with content safety labeling, policy taxonomies, and risk-based evaluation. Experience with multimodal prompting (text + image) or computer vision annotation workflows. Ability to use lightweight scripting or notebooks to analyze evaluation results and model outputs.

Workflows and Domains You May Support

NLP prompt engineering for summarization, extraction, classification, and dialogue. Named entity recognition-style extraction via structured prompting and schema validation. Prompt evaluation and QA evaluation for customer support, sales enablement, and internal productivity copilots. LLM training pipelines with human feedback loops, rubric scoring, and prompt regression testing. Content safety labeling and policy-aligned refusals, including sensitive and regulated topics. Multimodal prompting and CV-adjacent tasks such as image captioning evaluation or visual QA.

Employment Types and Common Modifiers

This page covers remote, full-time roles, and also reflects common market modifiers: contract, freelance, entry-level, and senior prompt engineer opportunities. Employers may include AI labs, tech startups, BPOs, and annotation vendors, depending on the project and the LLM deployment stage.

How to Apply on Rex.zone

Create or update your Rex.zone profile, highlight prompt engineering portfolios, and include examples of prompt test suites, evaluation rubrics, and measurable improvements. Search for United States roles filtered by remote type, employment type, domain (NLP, content safety, tool use, RAG), and experience level, then apply directly through the Rex.zone job flow.

Frequently Asked Questions

  • Q: What does an AI prompt engineer do in a real LLM workflow?

    They design prompts and guardrails, run prompt evaluation and QA evaluation, and iterate based on human feedback loops (often RLHF-adjacent) to improve model reliability, safety, and task success in production.

  • Q: Are these AI prompt engineer jobs in the United States remote?

    Yes. The roles on this page are marked Remote and located in the US, aligning with remote hiring and distributed LLM product teams.

  • Q: What skills should match the AI prompt engineer keyword intent?

    Prompt engineering, LLM evaluation, RLHF concepts, prompt evaluation, QA evaluation, NLP, content safety labeling, and LLM training pipelines are core, with tool use and RAG commonly required in production apps.

  • Q: How is prompt evaluation different from QA evaluation?

    Prompt evaluation focuses on how prompt changes affect model behavior and outcomes, while QA evaluation checks output quality against defined standards, rubrics, and acceptance criteria to ensure consistency and compliance.

  • Q: Do prompt engineers work with data labeling teams?

    Often yes. Prompt engineers may define rubrics, labeling taxonomies, and annotation guidelines compliance so that human feedback data is consistent and useful for improving model behavior.

  • Q: What domains commonly hire prompt engineers?

    NLP-heavy products, content safety programs, AI assistants with tool use, RAG-based knowledge systems, and multimodal applications that require careful evaluation and controlled output behavior.

  • Q: Can entry-level or contract candidates find opportunities too?

    Yes. While this page focuses on full-time mid-senior hiring intent, the market commonly includes contract, freelance, entry-level, and senior roles depending on project scope and evaluation maturity.

  • Q: How should candidates present a portfolio for prompt engineering?

    Include prompt libraries, before/after examples, prompt test suites, evaluation rubrics, regression testing results, and any measurable model performance improvement from prompt iterations.

230+Domains Covered
120K+PhD, Specialist, Experts Onboarded
50+Countries Represented

Industry-Leading Compensation

We believe exceptional intelligence deserves exceptional pay. Our platform consistently offers rates above the industry average, rewarding experts for their true value and real impact on frontier AI. Here, your expertise isn't just appreciated—it's properly compensated.

Work Remotely, Work Freely

No office. No commute. No constraints. Our fully remote workflow gives experts complete flexibility to work at their own pace, from any country, any time zone. You focus on meaningful tasks—we handle the rest.

Respect at the Core of Everything

AI trainers are the heart of our company. We treat every expert with trust, humanity, and genuine appreciation. From personalized support to transparent communication, we build long-term relationships rooted in respect and care.

Ready to Shape the Future of AI/ML Engineering?

Apply Now.