Bulk Export - Trajectory

Bulk export handles the full pipeline: discover conversations in your LangSmith project, export the complete run trees (including child LLM and tool runs), parse into Trajectories, and upload to the Trajectory platform.

Prerequisites

Before running a bulk export, you need three pieces of information from LangSmith:

API key — your LangSmith API key (starts with lsv2_sk_... or lsv2_pt_...)
Workspace ID — the tenant/workspace UUID
Destination ID — a pre-configured bulk export destination UUID

Finding your Workspace ID

Go to LangSmith and open Settings (gear icon)
Under Workspaces, select your workspace
The workspace ID is in the URL: https://smith.langchain.com/o/.../workspaces/{workspace_id}

Alternatively, you can find it via the API:

from langsmith import Client

client = Client(api_key="lsv2_sk_...")
import requests
resp = requests.get(
    "https://api.smith.langchain.com/workspaces",
    headers={"X-API-Key": "lsv2_sk_..."},
)
for ws in resp.json():
    print(f"{ws['display_name']}: {ws['id']}")

Finding your Destination ID

A bulk export destination tells LangSmith where to write the exported data (e.g. a GCS bucket). Destinations are configured in the LangSmith UI:

Go to Settings → Bulk Exports in LangSmith
Create or select an export destination (e.g. a GCS bucket)
Copy the destination ID

You can also list existing destinations via the API:

import requests

resp = requests.get(
    "https://api.smith.langchain.com/api/v1/bulk-export-destinations",
    headers={
        "X-API-Key": "lsv2_sk_...",
        "X-Tenant-Id": "your-workspace-id",
    },
)
for dest in resp.json():
    print(f"{dest['display_name']}: {dest['id']}")

Finding your Project ID

Go to your project in LangSmith
The project ID is in the URL: https://smith.langchain.com/o/.../projects/p/{project_id}

E2E Bulk Export

With all three IDs, the full pipeline is three lines:

import trajectory_sdk as tj

tj.init(
    provider="langsmith",
    api_key="lsv2_sk_...",                  # or set LANGSMITH_API_KEY env var
    project_id="your-project-id",
    workspace_id="your-workspace-id",
    destination_id="your-destination-id",
    trajectory_api_key="your-trajectory-api-key",  # or set TRAJECTORY_API_KEY env var
)

# Export everything, parse, and return Trajectories
trajectories = tj.import_conversations(bulk=True)

# Upload to the Trajectory platform
tj.upload(trajectories, dataset="my_dataset")

What happens under the hood

Discover trace IDs — lists all root runs in the project and collects their trace_id values. This is necessary because child runs (LLM calls, tool calls) don’t carry thread_id metadata, so filtering by thread ID alone would miss them.
Trigger bulk export — sends a POST to the LangSmith bulk exports API with an in(trace_id, [...]) filter, which captures the full run tree for each conversation (parents and all children).
Poll for completion — the export job runs asynchronously. The SDK polls every 5 seconds until it completes (typically 1-2 minutes).
Download parquet — fetches the exported parquet file from the configured GCS destination bucket.
Parse into Trajectories — groups runs by conversation_id (from metadata) or trace_id, builds run trees, extracts messages, and constructs Trajectory objects using multiprocessing for speed.

Time-Scoped Export

Export only recent conversations by passing the since parameter:

from datetime import timedelta

# Export conversations from the last hour
trajectories = tj.import_conversations(bulk=True, since=timedelta(hours=1))

You can also pass an absolute datetime:

from datetime import datetime

trajectories = tj.import_conversations(
    bulk=True,
    since=datetime(2025, 3, 1),
)

When since is omitted, the SDK exports all conversations in the project.

From a Local Parquet File

If you already have a parquet file (e.g. from a previous export or manual download), you can skip the export steps:

import trajectory_sdk as tj

tj.init(
    provider="langsmith",
    project_id="your-project-id",
)

trajectories = tj.import_conversations(
    bulk=True,
    source="./langsmith_export.parquet",
)

tj.save(trajectories, "./exports")

No workspace_id or destination_id is needed when providing a local file.

Upload

After importing, upload trajectories to the Trajectory platform:

tj.upload(trajectories, dataset="my_dataset")

trajectory_api_key must be set in tj.init() (or via the TRAJECTORY_API_KEY env var) for upload to work.

Limits and Caveats

The LangSmith bulk export API has a maximum of 100 runs per page when listing root runs. The SDK auto-paginates through all pages.
Export jobs are asynchronous and typically take 1-2 minutes to complete.
The destination_id must point to a GCS bucket that your service account can read from.
Empty conversations (root runs with no child LLM/tool runs) produce Trajectories with 0 steps. The upload step automatically skips these.

​Prerequisites

​Finding your Workspace ID

​Finding your Destination ID

​Finding your Project ID

​E2E Bulk Export

​What happens under the hood

​Time-Scoped Export

​From a Local Parquet File

​Upload

​Limits and Caveats

Prerequisites

Finding your Workspace ID

Finding your Destination ID

Finding your Project ID

E2E Bulk Export

What happens under the hood

Time-Scoped Export

From a Local Parquet File

Upload

Limits and Caveats