Bulk export handles the full pipeline: discover conversations in your LangSmith project, export the complete run trees (including child LLM and tool runs), parse into Trajectories, and upload to the Trajectory platform.
Prerequisites
Before running a bulk export, you need three pieces of information from LangSmith:
- API key — your LangSmith API key (starts with
lsv2_sk_... or lsv2_pt_...)
- Workspace ID — the tenant/workspace UUID
- Destination ID — a pre-configured bulk export destination UUID
Finding your Workspace ID
- Go to LangSmith and open Settings (gear icon)
- Under Workspaces, select your workspace
- The workspace ID is in the URL:
https://smith.langchain.com/o/.../workspaces/{workspace_id}
Alternatively, you can find it via the API:
from langsmith import Client
client = Client(api_key="lsv2_sk_...")
import requests
resp = requests.get(
"https://api.smith.langchain.com/workspaces",
headers={"X-API-Key": "lsv2_sk_..."},
)
for ws in resp.json():
print(f"{ws['display_name']}: {ws['id']}")
Finding your Destination ID
A bulk export destination tells LangSmith where to write the exported data (e.g. a GCS bucket). Destinations are configured in the LangSmith UI:
- Go to Settings → Bulk Exports in LangSmith
- Create or select an export destination (e.g. a GCS bucket)
- Copy the destination ID
You can also list existing destinations via the API:
import requests
resp = requests.get(
"https://api.smith.langchain.com/api/v1/bulk-export-destinations",
headers={
"X-API-Key": "lsv2_sk_...",
"X-Tenant-Id": "your-workspace-id",
},
)
for dest in resp.json():
print(f"{dest['display_name']}: {dest['id']}")
Finding your Project ID
- Go to your project in LangSmith
- The project ID is in the URL:
https://smith.langchain.com/o/.../projects/p/{project_id}
E2E Bulk Export
With all three IDs, the full pipeline is three lines:
import trajectory_sdk as tj
tj.init(
provider="langsmith",
api_key="lsv2_sk_...", # or set LANGSMITH_API_KEY env var
project_id="your-project-id",
workspace_id="your-workspace-id",
destination_id="your-destination-id",
trajectory_api_key="your-trajectory-api-key", # or set TRAJECTORY_API_KEY env var
)
# Export everything, parse, and return Trajectories
trajectories = tj.import_conversations(bulk=True)
# Upload to the Trajectory platform
tj.upload(trajectories, dataset="my_dataset")
What happens under the hood
-
Discover trace IDs — lists all root runs in the project and collects their
trace_id values. This is necessary because child runs (LLM calls, tool calls) don’t carry thread_id metadata, so filtering by thread ID alone would miss them.
-
Trigger bulk export — sends a
POST to the LangSmith bulk exports API with an in(trace_id, [...]) filter, which captures the full run tree for each conversation (parents and all children).
-
Poll for completion — the export job runs asynchronously. The SDK polls every 5 seconds until it completes (typically 1-2 minutes).
-
Download parquet — fetches the exported parquet file from the configured GCS destination bucket.
-
Parse into Trajectories — groups runs by
conversation_id (from metadata) or trace_id, builds run trees, extracts messages, and constructs Trajectory objects using multiprocessing for speed.
Time-Scoped Export
Export only recent conversations by passing the since parameter:
from datetime import timedelta
# Export conversations from the last hour
trajectories = tj.import_conversations(bulk=True, since=timedelta(hours=1))
You can also pass an absolute datetime:
from datetime import datetime
trajectories = tj.import_conversations(
bulk=True,
since=datetime(2025, 3, 1),
)
When since is omitted, the SDK exports all conversations in the project.
From a Local Parquet File
If you already have a parquet file (e.g. from a previous export or manual download), you can skip the export steps:
import trajectory_sdk as tj
tj.init(
provider="langsmith",
project_id="your-project-id",
)
trajectories = tj.import_conversations(
bulk=True,
source="./langsmith_export.parquet",
)
tj.save(trajectories, "./exports")
No workspace_id or destination_id is needed when providing a local file.
Upload
After importing, upload trajectories to the Trajectory platform:
tj.upload(trajectories, dataset="my_dataset")
trajectory_api_key must be set in tj.init() (or via the TRAJECTORY_API_KEY env var) for upload to work.
Limits and Caveats
- The LangSmith bulk export API has a maximum of 100 runs per page when listing root runs. The SDK auto-paginates through all pages.
- Export jobs are asynchronous and typically take 1-2 minutes to complete.
- The
destination_id must point to a GCS bucket that your service account can read from.
- Empty conversations (root runs with no child LLM/tool runs) produce Trajectories with 0 steps. The upload step automatically skips these.