brainscore_core.supported_data_standards.brainio.packaging

PACKAGING MODULE - Upload assemblies and stimulus sets to S3

PURPOSE:

This module handles packaging and uploading Brain-Score data (assemblies and stimulus sets) to AWS S3 storage. It creates the standardized file formats and directory structures used by the Brain-Score ecosystem.

KEY FUNCTIONS:

  • package_data_assembly(): Upload neural/behavioral data to S3 as NetCDF

  • package_stimulus_set(): Upload stimulus collections to S3 as CSV+ZIP

  • write_netcdf(): Convert assemblies to NetCDF format with compression

  • upload_to_s3(): Handle S3 uploads with progress tracking and user tagging

CORE FUNCTIONALITY:

  • Creates NetCDF files from xarray DataArrays

  • Packages stimuli into ZIP archives with CSV metadata

  • Validates stimulus sets for consistency and naming conventions

  • Generates SHA1 hashes for data integrity verification

  • Handles S3 authentication and upload progress

CATALOG SYSTEM REMOVED:

Original BrainIO packaging functions required catalog_identifier parameters to register uploads in CSV lookup tables. These are now optional/removed since Brain-Score uses direct S3 references instead.

Functions

check_experiment_stimulus_set(stimulus_set)

Checks the stimulus set files are non-corrupt and named/numbered sequentially.

check_image_format(image, identifier)

check_naming_convention(name)

check_stimulus_naming_convention(name)

check_stimulus_numbers(stimulus_set)

create_stimulus_csv(proto_stimulus_set, ...)

create_stimulus_zip(proto_stimulus_set, ...)

Create zip file for stimuli in StimulusSet. Files in the zip will follow a flat directory structure with each row's filename equal to the stimulus_id by default, or stimulus_path_within_store if passed. :param proto_stimulus_set: a StimulusSet with a get_stimulus: stimulus_id -> local path method, a stimulus_id column, and optionally a stimulus_path_within_store column. :param target_zip_path: path to write the zip file to :return: SHA1 hash of the zip file.

extract_specific(proto_stimulus_set)

get_user_info(sts_client)

package_data_assembly(catalog_identifier, ...)

Package a set of data along with its metadata for the BrainIO system. :param catalog_identifier: The name of the lookup catalog to add the data assembly to. :param proto_data_assembly: An xarray DataArray containing experimental measurements and all related metadata. * The dimensions of the DataArray must be appropriate for the DataAssembly class: * NeuroidAssembly and its subclasses: "presentation", "neuroid"[, "time_bin"] * except for SpikeTimesAssembly: "event" * MetaDataAssembly: "event" * BehavioralAssembly: should have a "presentation" dimension, but can be flexible about its other dimensions. * A presentation dimension must have a stimulus_id coordinate and should have coordinates for presentation-level metadata such as repetition. The presentation dimension should not have coordinates for stimulus-specific metadata, these will be drawn from the StimulusSet based on stimulus_id. * The neuroid dimension must have a neuroid_id coordinate and should have coordinates for as much neural metadata as possible (e.g. region, subregion, animal, row in array, column in array, etc.) * The time_bin dimension should have coordinates time_bin_start and time_bin_end. :param assembly_identifier: A dot-separated string starting with a lab identifier. * For published: <lab identifier>.<first author e.g. 'Rajalingham' or 'MajajHong' for shared first-author><YYYY year of publication> * For requests: <lab identifier>.<b for behavioral|n for neuroidal>.<m for monkey|h for human>.<proposer e.g. 'Margalit'>.<pull request number> :param stimulus_set_identifier: The unique name of an existing StimulusSet in the BrainIO system. :param assembly_class_name: The name of a DataAssembly subclass. :param bucket_name: The name of the bucket to upload to.

package_data_assembly_locally(...[, ...])

Package a set of data locally to Downloads folder instead of uploading to S3. :param proto_data_assembly: An xarray DataArray containing experimental measurements and all related metadata. * The dimensions of the DataArray must be appropriate for the DataAssembly class: * NeuroidAssembly and its subclasses: "presentation", "neuroid"[, "time_bin"] * except for SpikeTimesAssembly: "event" * MetaDataAssembly: "event" * BehavioralAssembly: should have a "presentation" dimension, but can be flexible about its other dimensions. * A presentation dimension must have a stimulus_id coordinate and should have coordinates for presentation-level metadata such as repetition. The presentation dimension should not have coordinates for stimulus-specific metadata, these will be drawn from the StimulusSet based on stimulus_id. * The neuroid dimension must have a neuroid_id coordinate and should have coordinates for as much neural metadata as possible (e.g. region, subregion, animal, row in array, column in array, etc.) * The time_bin dimension should have coordinates time_bin_start and time_bin_end. :param assembly_identifier: A dot-separated string starting with a lab identifier. * For published: <lab identifier>.<first author e.g. 'Rajalingham' or 'MajajHong' for shared first-author><YYYY year of publication> * For requests: <lab identifier>.<b for behavioral|n for neuroidal>.<m for monkey|h for human>.<proposer e.g. 'Margalit'>.<pull request number> :param stimulus_set_identifier: The unique name of an existing StimulusSet in the BrainIO system. :param assembly_class_name: The name of a DataAssembly subclass. :param downloads_path: Optional path to save files. Defaults to ~/Downloads/brainscore_packages/ :param extras: Optional dictionary of additional DataArrays to include in the NetCDF file.

package_stimulus_set(catalog_name, ...[, ...])

Package a set of stimuli along with their metadata for the BrainIO system. :param catalog_name: The name of the lookup catalog to add the stimulus set to. :param proto_stimulus_set: A StimulusSet containing one row for each stimulus, and the columns {'stimulus_id', ['stimulus_path_within_store' (optional to structure zip directory layout)]} and columns for all stimulus-set-specific metadata but not the column 'filename'. :param stimulus_set_identifier: A unique name identifying the stimulus set <lab identifier>.<first author e.g. 'Rajalingham' or 'MajajHong' for shared first-author><YYYY year of publication>. :param bucket_name: The name of the bucket to upload to.

package_stimulus_set_locally(...[, ...])

Package a set of stimuli locally to Downloads folder instead of uploading to S3. :param proto_stimulus_set: A StimulusSet containing one row for each stimulus, and the columns {'stimulus_id', ['stimulus_path_within_store' (optional to structure zip directory layout)]} and columns for all stimulus-set-specific metadata but not the column 'filename'. :param stimulus_set_identifier: A unique name identifying the stimulus set <lab identifier>.<first author e.g. 'Rajalingham' or 'MajajHong' for shared first-author><YYYY year of publication>. :param downloads_path: Optional path to save files. Defaults to ~/Downloads/brainscore_packages/.

upload_to_s3(source_file_path, bucket_name, ...)

write_netcdf(assembly, target_netcdf_file[, ...])

Write a DataAssembly object to a netCDF file.