Dataset Registration

After uploading files to S3, register items into V7 datasets. Vendor reference: Registering items from external storage.

Before you start

Complete these steps before running any registration script:

  1. Create a dataset. Log in to V7, navigate to your workspace, open Dataset, then click Create New Dataset. Use a single word or multiple words joined with - or _ (no spaces).

  2. Generate an API key. In V7, go to SettingsAPI KeysNew API Key.

  3. Connect to AWS via the AWS CLI. This is required so the script can read objects from your S3 prod bucket. If you use Techops ACE Infra, follow onboarding and the AWS CLI access steps (steps 1-5) in the Okta Single Sign-On Broker guide.

Prepare values

team_slug    = "your-team-slug"      # V7 workspace name with - added to it
dataset_slug = "your-dataset-slug"   # dataset name you created earlier in V7
storage_name = "your-storage-bucket" # your S3 prod bucket

Single-file registration example

import requests

api_key = "YOUR_API_KEY"
team_slug = "your-team-slug"
dataset_slug = "your-dataset"
storage_name = "your-storage-bucket"

headers = {
    "Content-Type": "application/json",
    "Accept": "application/json",
    "Authorization": f"ApiKey {api_key}",
}

payload = {
    "items": [
        {
            "path": "/",
            "slots": [
                {
                    "as_frames": "false",
                    "slot_name": "1",
                    "storage_key": "data/000000000.png",
                    "file_name": "000000000.png",
                }
            ],
            "name": "000000000.png",
        }
    ],
    "dataset_slug": dataset_slug,
    "storage_slug": storage_name,
}

requests.post(
    f"https://darwin.v7labs.com/api/v2/teams/{team_slug}/items/register_existing",
    headers=headers,
    json=payload,
)

Multi-file (batch) registration example

Use this to register many files from an S3 bucket at once. Make sure you have connected to AWS via the AWS CLI first (see Before you start).

import boto3
import requests

# Connect to the S3 bucket
s3 = boto3.client('s3')

# Your AWS bucket name
bucket_name = 'v7-roche-pred-opm-prod'

# List objects within the bucket
objects = s3.list_objects_v2(Bucket=bucket_name)

# List objects within the bucket and subfolder if needed
# objects = s3.list_objects_v2(Bucket=bucket_name, Prefix='data/enface_mahnaz2/')

# V7 API setup
api_key = "YOUR_API_KEY"
team_slug = "roche-pred-opm"            # CHANGE-IT
dataset_slug = "enface_mahnaz2"         # CHANGE-IT
storage_name = "v7-roche-pred-opm-prod" # CHANGE-IT

headers = {
    "Content-Type": "application/json",
    "Accept": "application/json",
    "Authorization": f"ApiKey {api_key}",
}

# Initialize payload
payload = {
    "items": [],
    "dataset_slug": dataset_slug,
    "storage_slug": storage_name,
}

# Iterate over each object in the bucket
for obj in objects.get('Contents', []):
    file_name = obj['Key']
    if file_name.endswith('.jpg'):  # CHANGE-IT depending on your file type
        payload['items'].append({
            "path": "/",
            # The slots value changes depending on the object type (.img, .avi, .pdf, etc.).
            # See: https://docs.v7labs.com/docs/registering-items-from-external-storage#the-basics-1
            "slots": [
                {
                    "as_frames": "false",
                    "slot_name": "1",
                    "storage_key": f"{file_name}",
                    "file_name": f"{file_name.split('/')[-1]}",
                }
            ],
            "name": f"{file_name.split('/')[-1]}",
        })

# To test before loading to V7, uncomment the print and comment everything below it
# print(payload)

# Send request to V7
response = requests.post(
    f"https://darwin.v7labs.com/api/v2/teams/{team_slug}/items/register_existing",
    headers=headers,
    json=payload,
)

# Process response
body = response.json()
if response.status_code != 200:
    print("request failed", response.text)
elif 'blocked_items' in body and len(body['blocked_items']) > 0:
    print("failed to register items:")
    for item in body['blocked_items']:
        print("\t - ", item)
    if len(body['items']) > 0:
        print("successfully registered items:")
    for item in body['items']:
        print("\t - ", item)
else:
    print("success")

If you hit certificate issues, see SSL Certificate Error.