Skip to main content
Bulk export handles the full pipeline: discover conversations in your LangSmith project, export the complete run trees (including child LLM and tool runs), parse into Trajectories, and upload to the Trajectory platform.

Prerequisites

Before running a bulk export, you need three pieces of information from LangSmith:
  1. API key — your LangSmith API key (starts with lsv2_sk_... or lsv2_pt_...)
  2. Workspace ID — the tenant/workspace UUID
  3. Destination ID — a pre-configured bulk export destination UUID

Finding your Workspace ID

  1. Go to LangSmith and open Settings (gear icon)
  2. Under Workspaces, select your workspace
  3. The workspace ID is in the URL: https://smith.langchain.com/o/.../workspaces/{workspace_id}
Alternatively, you can find it via the API:
from langsmith import Client

client = Client(api_key="lsv2_sk_...")
import requests
resp = requests.get(
    "https://api.smith.langchain.com/workspaces",
    headers={"X-API-Key": "lsv2_sk_..."},
)
for ws in resp.json():
    print(f"{ws['display_name']}: {ws['id']}")

Finding your Destination ID

A bulk export destination tells LangSmith where to write the exported data (e.g. a GCS bucket). Destinations are configured in the LangSmith UI:
  1. Go to SettingsBulk Exports in LangSmith
  2. Create or select an export destination (e.g. a GCS bucket)
  3. Copy the destination ID
You can also list existing destinations via the API:
import requests

resp = requests.get(
    "https://api.smith.langchain.com/api/v1/bulk-export-destinations",
    headers={
        "X-API-Key": "lsv2_sk_...",
        "X-Tenant-Id": "your-workspace-id",
    },
)
for dest in resp.json():
    print(f"{dest['display_name']}: {dest['id']}")

Finding your Project ID

  1. Go to your project in LangSmith
  2. The project ID is in the URL: https://smith.langchain.com/o/.../projects/p/{project_id}

E2E Bulk Export

With all three IDs, the full pipeline is three lines:
import trajectory_sdk as tj

tj.init(
    provider="langsmith",
    api_key="lsv2_sk_...",                  # or set LANGSMITH_API_KEY env var
    project_id="your-project-id",
    workspace_id="your-workspace-id",
    destination_id="your-destination-id",
    trajectory_api_key="your-trajectory-api-key",  # or set TRAJECTORY_API_KEY env var
)

# Export everything, parse, and return Trajectories
trajectories = tj.import_conversations(bulk=True)

# Upload to the Trajectory platform
tj.upload(trajectories, dataset="my_dataset")

What happens under the hood

  1. Discover trace IDs — lists all root runs in the project and collects their trace_id values. This is necessary because child runs (LLM calls, tool calls) don’t carry thread_id metadata, so filtering by thread ID alone would miss them.
  2. Trigger bulk export — sends a POST to the LangSmith bulk exports API with an in(trace_id, [...]) filter, which captures the full run tree for each conversation (parents and all children).
  3. Poll for completion — the export job runs asynchronously. The SDK polls every 5 seconds until it completes (typically 1-2 minutes).
  4. Download parquet — fetches the exported parquet file from the configured GCS destination bucket.
  5. Parse into Trajectories — groups runs by conversation_id (from metadata) or trace_id, builds run trees, extracts messages, and constructs Trajectory objects using multiprocessing for speed.

Time-Scoped Export

Export only recent conversations by passing the since parameter:
from datetime import timedelta

# Export conversations from the last hour
trajectories = tj.import_conversations(bulk=True, since=timedelta(hours=1))
You can also pass an absolute datetime:
from datetime import datetime

trajectories = tj.import_conversations(
    bulk=True,
    since=datetime(2025, 3, 1),
)
When since is omitted, the SDK exports all conversations in the project.

From a Local Parquet File

If you already have a parquet file (e.g. from a previous export or manual download), you can skip the export steps:
import trajectory_sdk as tj

tj.init(
    provider="langsmith",
    project_id="your-project-id",
)

trajectories = tj.import_conversations(
    bulk=True,
    source="./langsmith_export.parquet",
)

tj.save(trajectories, "./exports")
No workspace_id or destination_id is needed when providing a local file.

Upload

After importing, upload trajectories to the Trajectory platform:
tj.upload(trajectories, dataset="my_dataset")
trajectory_api_key must be set in tj.init() (or via the TRAJECTORY_API_KEY env var) for upload to work.

Limits and Caveats

  • The LangSmith bulk export API has a maximum of 100 runs per page when listing root runs. The SDK auto-paginates through all pages.
  • Export jobs are asynchronous and typically take 1-2 minutes to complete.
  • The destination_id must point to a GCS bucket that your service account can read from.
  • Empty conversations (root runs with no child LLM/tool runs) produce Trajectories with 0 steps. The upload step automatically skips these.