Skip to content

AI Gateway ADR 002: Exposing proxy endpoints to AI providers

Summary

AI Gateway exposes proxy endpoints to AI providers to let existing client libraries in GitLab-Rails access them. This is a drop-in replacement that should be used until stage groups move to a single purpose endpoint. We are veering from our ultimate desired architecture in order to bring these features to market for self-managed GitLab instances faster.

Context

The original iteration of the blueprint suggested to have a single purpose endopint for each AI-powered feature. There were multiple reasons for this:

  • Avoid hard-coding AI-related logic in the GitLab monolith codebase to minimize the time required for customers to adopt our latest features.
  • Retain the flexibility to make changes in our product without breaking support for a long-tail of older instances.

In issue 454543, we discussed various options to enable existing AI features in self-managed GitLab.

Decision

In the issue we decided to introduce proxy endpoints to AI providers so that our Ruby client libraries Anthropic::Client and VertexAi::Client work as-is. The reason is that:

  • It's challenging to re-write the existing business logic in Python AI Gateway:
    • Some of the business logic is using dependencies that are only available in GitLab-monolith (e.g. Feature Flag, Caching in Redis). This requires us to workaround these implementations, which is error prone.
    • Due to the intensive inheritance in Gitlab::LLm namespace, it's hard to extract the actual business logic that are taking an effect.
    • We lack a tool to evaluate whether the quality and functionality of the feature remain consistent before and after changes.
  • Duo Chat bacame GA regardless of the existing POST /v1/chat/agent endpoint which serves as a proxy endpoint. Technically, this is not a single purpose endpoint yet.

Technical details

Here is the overview of the request flow:

flowchart LR
    subgraph AIGateway
        Proxy["Proxy"]
    end

    subgraph Provider1["Anthropic"]
    direction LR
    Model1(["Claude 2.1"])
    end

    subgraph Provider2["VertexAI"]
    direction LR
    Model2(["text-bison"])
    end

  subgraph SM or SaaS GitLab
    DuoFeatureA["Duo feature A"]
    DuoFeatureB["Duo feature B"]
  end

  DuoFeatureA -- POST /v1/proxy/anthropic/v1/complete --- Proxy
  DuoFeatureB -- POST /v1/proxy/vertex-ai/v1/text-bison:predict --- Proxy
  Proxy -- POST /v1/complete --- Provider1
  Proxy -- POST /v1/text-bison:predict --- Provider2

Anthropic

Expose the following HTTP/1.1 endpoint in AI Gateway:

POST /v1/proxy/anthropic/(*path)

path can be forwarded to the folloinwg endpoints:

Vertex AI

Expose the following HTTP/1.1 endpoint in AI Gateway:

POST /v1/proxy/vertex-ai/(*path)

path can be forwarded to the following endpoints:

  • /v1/{endpoint}:predict
    • endpoint must be one of: chat-bison, code-bison, codechat-bison, text-bison, textembedding-gecko@003.

Common behavior

  • Request body is sent to AI providers as-is.
  • Request headers are filtered/replaced by AI Gateway accordingly e.g. Allow only accept, content-type, anthropic-version and filter out the rest. x-api-key is added.
  • Response body is returned to clients as-is.
  • Response headers are filtered/replaced by AI Gateway accordingly e.g. Allow only date, content-type, transfer-encoding and filter out the rest.
  • Response status is returned to clients as-is.
  • HTTP Streaming is supported.
  • if unsupported path is specified, AI Gateway responds with a 404 Not Found error.

Access control

  • Clients must send JWT issued by GitLab.com or Customer Dot.
    • This JWT contains scopes that indicates the permissions given to the GitLab-instance. This scopes will vary per Duo subscription tier.
    • To access these proxy endpoints, scopes must include one of: explain_vulnerability, resolve_vulnerability, generate_description, summarize_all_open_notes, summarize_submitted_review, generate_commit_message, summarize_review, fill_in_merge_request_template, analyze_ci_job_failure.
    • Requests that do not meet the specified criteria will result in a 401 Unauthorized Access error.
  • Clients must send X-Gitlab-Feature-Usage headers in HTTP requests.
    • This X-Gitlab-Feature-Usage header indicates the purpose of the API request.
    • To access these proxy endpoints, X-Gitlab-Feature-Usage must be one of: explain_vulnerability, resolve_vulnerability, generate_description, summarize_all_open_notes, summarize_submitted_review, generate_commit_message, summarize_review, fill_in_merge_request_template, analyze_ci_job_failure.
    • Requests that do not meet the specified criteria will result in a 401 Unauthorized Access error.
  • For logging, we add the value of X-Gitlab-Feature-Usage header in access logs in AI Gateway.
  • For metrics, we instrument the concurrent requests with ModelRequestInstrumentator and input/output tokens with TextGenModelInstrumentator in AI Gateway. It should be labled with X-Gitlab-Instance-Id, X-Gitlab-Global-User-Id and X-Gitlab-Feature-Usage.
  • For telemetry, we add Internal Event Tracking for each feature in GitLab-Rails. Alternatively, we could use the existing snowplow tracker in AI Gateway, which requires additional work for introducing an unified schema.

For futher access control improvement, see this issue.

Consequences

  • Experimental AI features are enabled on self-managed instances.
  • Stage groups can start working on improving the business logic of the feature. This proxy work can be worked in parallel.
  • Stage groups don't need to rush refactoring business logic in Python AI Gateway for GA release. They can take time post-GA.
  • We can detect abusers by checking X-Gitlab-Instance-Id, X-Gitlab-Global-User-Id and X-Gitlab-Feature-Usage in logs and metrics.
  • We can block abusers by gating the access at Cloud Connector LB (Cloud Flare) or AI Gateway middleware.