brainscore_core.supported_data_standards.brainio.packaging
PACKAGING MODULE - Upload assemblies and stimulus sets to S3
PURPOSE:
This module handles packaging and uploading Brain-Score data (assemblies and stimulus sets) to AWS S3 storage. It creates the standardized file formats and directory structures used by the Brain-Score ecosystem.
KEY FUNCTIONS:
package_data_assembly(): Upload neural/behavioral data to S3 as NetCDF
package_stimulus_set(): Upload stimulus collections to S3 as CSV+ZIP
write_netcdf(): Convert assemblies to NetCDF format with compression
upload_to_s3(): Handle S3 uploads with progress tracking and user tagging
CORE FUNCTIONALITY:
Creates NetCDF files from xarray DataArrays
Packages stimuli into ZIP archives with CSV metadata
Validates stimulus sets for consistency and naming conventions
Generates SHA1 hashes for data integrity verification
Handles S3 authentication and upload progress
CATALOG SYSTEM REMOVED:
Original BrainIO packaging functions required catalog_identifier parameters to register uploads in CSV lookup tables. These are now optional/removed since Brain-Score uses direct S3 references instead.
Functions
|
Checks the stimulus set files are non-corrupt and named/numbered sequentially. |
|
|
|
|
|
|
|
|
|
|
|
Create zip file for stimuli in StimulusSet. Files in the zip will follow a flat directory structure with each row's filename equal to the stimulus_id by default, or stimulus_path_within_store if passed. :param proto_stimulus_set: a StimulusSet with a get_stimulus: stimulus_id -> local path method, a stimulus_id column, and optionally a stimulus_path_within_store column. :param target_zip_path: path to write the zip file to :return: SHA1 hash of the zip file. |
|
|
|
|
|
Package a set of data along with its metadata for the BrainIO system. :param catalog_identifier: The name of the lookup catalog to add the data assembly to. :param proto_data_assembly: An xarray DataArray containing experimental measurements and all related metadata. * The dimensions of the DataArray must be appropriate for the DataAssembly class: * NeuroidAssembly and its subclasses: "presentation", "neuroid"[, "time_bin"] * except for SpikeTimesAssembly: "event" * MetaDataAssembly: "event" * BehavioralAssembly: should have a "presentation" dimension, but can be flexible about its other dimensions. * A presentation dimension must have a stimulus_id coordinate and should have coordinates for presentation-level metadata such as repetition. The presentation dimension should not have coordinates for stimulus-specific metadata, these will be drawn from the StimulusSet based on stimulus_id. * The neuroid dimension must have a neuroid_id coordinate and should have coordinates for as much neural metadata as possible (e.g. region, subregion, animal, row in array, column in array, etc.) * The time_bin dimension should have coordinates time_bin_start and time_bin_end. :param assembly_identifier: A dot-separated string starting with a lab identifier. * For published: <lab identifier>.<first author e.g. 'Rajalingham' or 'MajajHong' for shared first-author><YYYY year of publication> * For requests: <lab identifier>.<b for behavioral|n for neuroidal>.<m for monkey|h for human>.<proposer e.g. 'Margalit'>.<pull request number> :param stimulus_set_identifier: The unique name of an existing StimulusSet in the BrainIO system. :param assembly_class_name: The name of a DataAssembly subclass. :param bucket_name: The name of the bucket to upload to. |
|
Package a set of data locally to Downloads folder instead of uploading to S3. :param proto_data_assembly: An xarray DataArray containing experimental measurements and all related metadata. * The dimensions of the DataArray must be appropriate for the DataAssembly class: * NeuroidAssembly and its subclasses: "presentation", "neuroid"[, "time_bin"] * except for SpikeTimesAssembly: "event" * MetaDataAssembly: "event" * BehavioralAssembly: should have a "presentation" dimension, but can be flexible about its other dimensions. * A presentation dimension must have a stimulus_id coordinate and should have coordinates for presentation-level metadata such as repetition. The presentation dimension should not have coordinates for stimulus-specific metadata, these will be drawn from the StimulusSet based on stimulus_id. * The neuroid dimension must have a neuroid_id coordinate and should have coordinates for as much neural metadata as possible (e.g. region, subregion, animal, row in array, column in array, etc.) * The time_bin dimension should have coordinates time_bin_start and time_bin_end. :param assembly_identifier: A dot-separated string starting with a lab identifier. * For published: <lab identifier>.<first author e.g. 'Rajalingham' or 'MajajHong' for shared first-author><YYYY year of publication> * For requests: <lab identifier>.<b for behavioral|n for neuroidal>.<m for monkey|h for human>.<proposer e.g. 'Margalit'>.<pull request number> :param stimulus_set_identifier: The unique name of an existing StimulusSet in the BrainIO system. :param assembly_class_name: The name of a DataAssembly subclass. :param downloads_path: Optional path to save files. Defaults to ~/Downloads/brainscore_packages/ :param extras: Optional dictionary of additional DataArrays to include in the NetCDF file. |
|
Package a set of stimuli along with their metadata for the BrainIO system. :param catalog_name: The name of the lookup catalog to add the stimulus set to. :param proto_stimulus_set: A StimulusSet containing one row for each stimulus, and the columns {'stimulus_id', ['stimulus_path_within_store' (optional to structure zip directory layout)]} and columns for all stimulus-set-specific metadata but not the column 'filename'. :param stimulus_set_identifier: A unique name identifying the stimulus set <lab identifier>.<first author e.g. 'Rajalingham' or 'MajajHong' for shared first-author><YYYY year of publication>. :param bucket_name: The name of the bucket to upload to. |
|
Package a set of stimuli locally to Downloads folder instead of uploading to S3. :param proto_stimulus_set: A StimulusSet containing one row for each stimulus, and the columns {'stimulus_id', ['stimulus_path_within_store' (optional to structure zip directory layout)]} and columns for all stimulus-set-specific metadata but not the column 'filename'. :param stimulus_set_identifier: A unique name identifying the stimulus set <lab identifier>.<first author e.g. 'Rajalingham' or 'MajajHong' for shared first-author><YYYY year of publication>. :param downloads_path: Optional path to save files. Defaults to ~/Downloads/brainscore_packages/. |
|
|
|
Write a DataAssembly object to a netCDF file. |