Dataset Registration
After uploading files to S3, register items into V7 datasets. Vendor reference: Registering items from external storage.
Before you start
Complete these steps before running any registration script:
-
Create a dataset. Log in to V7, navigate to your workspace, open Dataset, then click Create New Dataset. Use a single word or multiple words joined with
-or_(no spaces). -
Generate an API key. In V7, go to Settings → API Keys → New API Key.
-
Connect to AWS via the AWS CLI. This is required so the script can read objects from your S3 prod bucket. If you use Techops ACE Infra, follow onboarding and the AWS CLI access steps (steps 1-5) in the Okta Single Sign-On Broker guide.
Prepare values
team_slug = "your-team-slug" # V7 workspace name with - added to it
dataset_slug = "your-dataset-slug" # dataset name you created earlier in V7
storage_name = "your-storage-bucket" # your S3 prod bucket
Single-file registration example
import requests
api_key = "YOUR_API_KEY"
team_slug = "your-team-slug"
dataset_slug = "your-dataset"
storage_name = "your-storage-bucket"
headers = {
"Content-Type": "application/json",
"Accept": "application/json",
"Authorization": f"ApiKey {api_key}",
}
payload = {
"items": [
{
"path": "/",
"slots": [
{
"as_frames": "false",
"slot_name": "1",
"storage_key": "data/000000000.png",
"file_name": "000000000.png",
}
],
"name": "000000000.png",
}
],
"dataset_slug": dataset_slug,
"storage_slug": storage_name,
}
requests.post(
f"https://darwin.v7labs.com/api/v2/teams/{team_slug}/items/register_existing",
headers=headers,
json=payload,
)
Multi-file (batch) registration example
Use this to register many files from an S3 bucket at once. Make sure you have connected to AWS via the AWS CLI first (see Before you start).
import boto3
import requests
# Connect to the S3 bucket
s3 = boto3.client('s3')
# Your AWS bucket name
bucket_name = 'v7-roche-pred-opm-prod'
# List objects within the bucket
objects = s3.list_objects_v2(Bucket=bucket_name)
# List objects within the bucket and subfolder if needed
# objects = s3.list_objects_v2(Bucket=bucket_name, Prefix='data/enface_mahnaz2/')
# V7 API setup
api_key = "YOUR_API_KEY"
team_slug = "roche-pred-opm" # CHANGE-IT
dataset_slug = "enface_mahnaz2" # CHANGE-IT
storage_name = "v7-roche-pred-opm-prod" # CHANGE-IT
headers = {
"Content-Type": "application/json",
"Accept": "application/json",
"Authorization": f"ApiKey {api_key}",
}
# Initialize payload
payload = {
"items": [],
"dataset_slug": dataset_slug,
"storage_slug": storage_name,
}
# Iterate over each object in the bucket
for obj in objects.get('Contents', []):
file_name = obj['Key']
if file_name.endswith('.jpg'): # CHANGE-IT depending on your file type
payload['items'].append({
"path": "/",
# The slots value changes depending on the object type (.img, .avi, .pdf, etc.).
# See: https://docs.v7labs.com/docs/registering-items-from-external-storage#the-basics-1
"slots": [
{
"as_frames": "false",
"slot_name": "1",
"storage_key": f"{file_name}",
"file_name": f"{file_name.split('/')[-1]}",
}
],
"name": f"{file_name.split('/')[-1]}",
})
# To test before loading to V7, uncomment the print and comment everything below it
# print(payload)
# Send request to V7
response = requests.post(
f"https://darwin.v7labs.com/api/v2/teams/{team_slug}/items/register_existing",
headers=headers,
json=payload,
)
# Process response
body = response.json()
if response.status_code != 200:
print("request failed", response.text)
elif 'blocked_items' in body and len(body['blocked_items']) > 0:
print("failed to register items:")
for item in body['blocked_items']:
print("\t - ", item)
if len(body['items']) > 0:
print("successfully registered items:")
for item in body['items']:
print("\t - ", item)
else:
print("success")
If you hit certificate issues, see SSL Certificate Error.