User GuideDataset Registration

Dataset Registration

The first step after uploading a file to S3 is to perform image registration in V7. See the vendor reference: Registering items from external storage.

Prepare in V7

  1. Log on to V7 → navigate to your workspace → DatasetCreate New Dataset. Use a single word, or multiple words joined with - or _ and no spaces.
  2. Log on to V7 → SettingsAPI KeysNew API Key.

Values you will need

team_slug    = "your-team-slug-here"        # V7 workspace name with - added to it
dataset_slug = "your-dataset-slug-here"     # Dataset name you created earlier on V7
storage_name = "your-storage-bucket-name"   # Your S3 prod bucket

Before running the multi-file scripts you must be connected to AWS via the AWS CLI. If you use TechOps ACE Infra, connect via AWS command line → AWS CLI Access → Steps 1–5.

Register a single file

import requests
 
api_key = "YOURAPIKEY"  # Generate this key on the V7 portal
 
team_slug = "gene-gred-ace-nlp"          # Workspace name with - added to it
dataset_slug = "data"                    # Any name you want to display on V7
storage_name = "v7-gene-gred-ace-nlp-prod"  # S3 bucket name
 
headers = {
    "Content-Type": "application/json",
    "Accept": "application/json",
    "Authorization": f"ApiKey {api_key}",
}
 
payload = {
    "items": [
        {
            "path": "/",
            "slots": [
                {
                    "as_frames": "false",
                    "slot_name": "1",
                    "storage_key": "data/000000000.png",  # s3 folder/filename
                    "file_name": "000000000.png",
                }
            ],
            "name": "000000000.png",
        }
    ],
    "dataset_slug": dataset_slug,
    "storage_slug": storage_name,
}
 
response = requests.post(
    f"https://darwin.v7labs.com/api/v2/teams/{team_slug}/items/register_existing",
    headers=headers,
    json=payload,
)
body = response.json()
 
if response.status_code != 200:
    print("request failed", response.text)
elif "blocked_items" in body and len(body["blocked_items"]) > 0:
    print("failed to register items:")
    for item in body["blocked_items"]:
        print("\t - ", item)
    if len(body["items"]) > 0:
        print("successfully registered items:")
    for item in body["items"]:
        print("\t - ", item)
else:
    print("success")

Register multiple files (from a bucket / subfolder)

import boto3
import requests
 
# Connect to the S3 bucket
s3 = boto3.client("s3")
 
# Your AWS bucket name
bucket_name = "v7-roche-pred-opm-prod"
 
# List objects within the bucket
objects = s3.list_objects_v2(Bucket=bucket_name)
 
# List objects within the bucket and subfolder if needed
# objects = s3.list_objects_v2(Bucket=bucket_name, Prefix="data/enface_mahnaz2/")
 
# V7 API setup
api_key = "YOURAPIKEY"
team_slug = "roche-pred-opm"        # CHANGE-IT
dataset_slug = "enface_mahnaz2"     # CHANGE-IT
storage_name = "v7-roche-pred-opm-prod"  # CHANGE-IT
 
headers = {
    "Content-Type": "application/json",
    "Accept": "application/json",
    "Authorization": f"ApiKey {api_key}",
}
 
# Initialize payload
payload = {
    "items": [],
    "dataset_slug": dataset_slug,
    "storage_slug": storage_name,
}
 
# Iterate over each object in the bucket
for obj in objects.get("Contents", []):
    file_name = obj["Key"]
    if file_name.endswith(".jpg"):  # CHANGE-IT depending on your file type
        payload["items"].append(
            {
                "path": "/",
                # The slots value changes by object type (.img, .avi, .pdf, etc.).
                # See https://docs.v7labs.com/docs/registering-items-from-external-storage#the-basics-1
                "slots": [
                    {
                        "as_frames": "false",
                        "slot_name": "1",
                        "storage_key": f"{file_name}",
                        "file_name": f"{file_name.split('/')[-1]}",
                    }
                ],
                "name": f"{file_name.split('/')[-1]}",
            }
        )
 
# To test before loading to V7, uncomment the print and comment everything below it.
# print(payload)
 
# Send request to V7
response = requests.post(
    f"https://darwin.v7labs.com/api/v2/teams/{team_slug}/items/register_existing",
    headers=headers,
    json=payload,
)
 
# Process response
body = response.json()
if response.status_code != 200:
    print("request failed", response.text)
elif "blocked_items" in body and len(body["blocked_items"]) > 0:
    print("failed to register items:")
    for item in body["blocked_items"]:
        print("\t - ", item)
    if len(body["items"]) > 0:
        print("successfully registered items:")
    for item in body["items"]:
        print("\t - ", item)
else:
    print("success")
⚠️

Hitting ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED]? See SSL Certificate Error.