SDK_Document_Lens_Bulk
Installation
[ ]:
!python3 -m pip install bioturing_connector
1. Connect to host server:
Must run this step before any further analyses
User’s token is generated from host website
[42]:
import numpy as np
import pandas as pd
from bioturing_connector.typing import Species
from bioturing_connector.typing import ChunkSize
from bioturing_connector.typing import StudyType
from bioturing_connector.typing import StudyUnit
from bioturing_connector.typing import InputMatrixType
from bioturing_connector.lens_bulk_connector import LensBulkConnector
connector = LensBulkConnector(
host="https://talk2data.bioturing.com/lens_bulk/",
token="930e9375d5164aa7a4a36593a52c6cd5",
ssl=True
)
[43]:
connector.test_connection()
Connecting to host at https://talk2data.bioturing.com/lens_bulk/api/v1/test_connection
Connection successful: BioTuring Lens Bulk server
2. List groups, studies and s3
2.1. Get info of available groups
[4]:
user_groups = connector.get_user_groups()
user_groups
[4]:
[{'group_id': '48ba44afb7f14f51a6f6f1dc6f4c3ea9', 'group_name': 'Demo'},
{'group_id': 'all_members', 'group_name': 'All members'},
{'group_id': 'bioturing_public_studies',
'group_name': 'BioTuring Public Studies'},
{'group_id': 'd32297be0bb543688994cca14f58b14e',
'group_name': 'BioTuring Spatial'},
{'group_id': 'personal', 'group_name': 'Personal workspace'}]
2.2. List all available studies in a group
[7]:
# Using group_id from step 2.1
study_list = connector.get_all_studies_info_in_group(
group_id='personal',
species=Species.HUMAN.value,
)
study_list
[7]:
[{'uuid': 'f6f4c94460af44fabaa07ac77087351c',
'study_title': 'TBD',
'study_hash_id': 'MERGED_VISIUM',
'created_by': 'dev@bioturing.com'},
{'uuid': 'b25ff33bead3453680e802963d3e9caf',
'study_title': 'TBD',
'study_hash_id': 'GEOMX',
'created_by': 'dev@bioturing.com'}]
2.3. List all s3 bucket of current user
[ ]:
connector.get_user_s3()
[{'id': '505e49d2abee405f8a7b4ce2628d5270',
'bucket': 'bioturingdebug',
'prefix': ''},
{'id': 'd938706094354d7eb4726d6c9b07de9c',
'bucket': 'talk2data',
'prefix': ''}]
3. Submit study
NOTE: Get group_id from step “2.1. Get info of available groups”
3.1. Option 1: Submit study from s3
Parameters:
----
group_id: str
ID of the group to submit the data to.
s3_id: str
ID of s3 bucket. Default: None
If s3_id is not provided, we will use the first s3 bucket configured on the platform.
batch_info: List[dict]
File path and batch name information, the path DOES NOT include bucket path configured on platform!
Example:
For DSP format:
[{
'matrix': 's3_path/data_1/matrix.xlsx',
'image': 's3_path/data_1/image.ome.tiff',
}, {...}]
For Visium format:
[{
'matrix': 's3_path/data_1/matrix.h5',
'image': 's3_path/data_1/image.tiff'
'position': 's3_path/data_1/tissue_positions_list.csv'
'scale': 's3_path/data_1/scalefactors_json.json'
}, {...}]
For Visium RDS format:
[{
'matrix': 's3_path/GSE128223_1.rds'
}, {...}]
For Visium Anndata format:
[{
'matrix': 's3_path/GSE128223_1.h5ad'
}, {...}]
study_id: str
Will be name of study (eg: VISIUM_PBMC)
If no value is provided, default id will be a random uuidv4 string
name: str
Name of the study.
authors: List[str]
Authors of the study.
abstract: str
Abstract of the study.
species: str
Species of the study.
Support:
Species.HUMAN.value
Species.MOUSE.value
Species.NON_HUMAN_PRIMATE.value
Species.OTHERS.value
study_type: int
Format of the study
Support:
StudyType.DSP.value
StudyType.VISIUM.value
StudyType.VISIUM_RDS.value
StudyType.VISIUM_ANN.value
3.1.1. Visium format
[52]:
## The path DOES NOT include the bucket path configured on platform
## Support multiple batches per submission
batch_info = [{
'matrix': 'demo_data/visium_test/Visium_FFPE_Human_Prostate_IF_filtered_feature_bc_matrix.h5',
'image': 'demo_data/visium_test/tissue_hires_image.png',
'position': 'demo_data/visium_test/tissue_positions.csv',
'scale': 'demo_data/visium_test/scalefactors_json.json',
}, {...}]
connector.submit_study_from_s3(
group_id='personal',
batch_info=batch_info,
study_id='visium_test',
name='This is my first study',
authors=['Huy Nguyen', 'Thao Truong'],
species=Species.HUMAN.value,
study_type=StudyType.VISIUM.value
)
[2023-09-26 06:24] Waiting in queue
[2023-09-26 06:24] Downloading demo_data/visium_test/Visium_FFPE_Human_Prostate_IF_filtered_feature_bc_matrix.h5 from s3: 262.1 KB / 19.0 MB
[2023-09-26 06:24] Downloading demo_data/visium_test/tissue_hires_image.png from s3: 262.1 KB / 4.4 MB
[2023-09-26 06:24] Downloading demo_data/visium_test/tissue_positions.csv from s3: 191.8 KB / 191.8 KB
[2023-09-26 06:24] Downloading demo_data/visium_test/scalefactors_json.json from s3: 148 B / 148 B
[2023-09-26 06:24] File downloaded
[2023-09-26 06:24] Reading batch: visium_test
[2023-09-26 06:24] [visium_test] Indexing matrix
[2023-09-26 06:24] [visium_test] Indexing images
[2023-09-26 06:24] Finish batch: visium_test
[2023-09-26 06:24] Preprocessing expression matrix: 3460 cells x 17943 genes
[2023-09-26 06:24] Filtered: 3460 cells remain
[2023-09-26 06:24] Waiting in queue (matrix processing)
[2023-09-26 06:24] Normalizing expression matrix (matrix processing)
[2023-09-26 06:24] Running PCA (matrix processing)
[2023-09-26 06:24] Running venice binarizer (matrix processing)
[2023-09-26 06:25] Study was successfully submitted
[2023-09-26 06:25] DONE!!!
Study submitted successfully!
[52]:
True
3.1.2. DSP format
[ ]:
## The path DOES NOT include the bucket path configured on platform
## Support multiple batches per submission
batch_info = [{
'matrix': 's3_path/data_1/matrix.xlsx',
'image': 's3_path/data_1/image.ome.tiff',
}, {...}]
connector.submit_study_from_s3(
group_id='personal',
batch_info=batch_info,
study_id='visium_test',
name='This is my first study',
authors=['Huy Nguyen', 'Thao Truong'],
species=Species.HUMAN.value,
study_type=StudyType.DSP.value
)
3.1.3. Visium Scanpy object
[ ]:
## The path DOES NOT include the bucket path configured on platform
## Support multiple batches per submission
batch_info = [{
'matrix': 's3_path/GSE128223_1.h5ad'
}, {...}]
connector.submit_study_from_s3(
group_id='personal',
batch_info=batch_info,
study_id='visium_test',
name='This is my first study',
authors=['Huy Nguyen', 'Thao Truong'],
species=Species.HUMAN.value,
study_type=StudyType.VISIUM_ANN.value
)
3.1.4. Visium Seurat object
[ ]:
## The path DOES NOT include the bucket path configured on platform
## Support multiple batches per submission
batch_info = [{
'matrix': 's3_path/GSE128223_1.rds'
}, {...}]
connector.submit_study_from_s3(
group_id='personal',
batch_info=batch_info,
study_id='visium_test',
name='This is my first study',
authors=['Huy Nguyen', 'Thao Truong'],
species=Species.HUMAN.value,
study_type=StudyType.VISIUM_RDS.value
)
3.2. Option 2: Submit study from local machine
Parameters:
------
group_id: str
ID of the group to submit the data to.
batch_info: List[dict]
File path and batch name information
Example:
For DSP format:
[{
'name': 'data_1',
'matrix': 'local_path/data_1/matrix.xlsx',
'image': 'local_path/data_1/image.ome.tiff',
}, {...}]
For Visium format:
[{
'name': 'data_1',
'matrix': 'local_path/data_1/matrix.h5',
'image': 'local_path/data_1/image.tiff'
'position': 'local_path/data_1/tissue_positions_list.csv'
'scale': 'local_path/data_1/scalefactors_json.json'
}, {...}]
For Visium RDS format:
[{
'matrix': 'local_path/GSE128223_1.rds'
}, {...}]
For Visium Anndata format:
[{
'matrix': 'local_path/GSE128223_1.h5ad'
}, {...}]
study_id: str
If no value is provided, default id will be a random uuidv4 string
name: str
Name of the study.
authors: List[str]
Authors of the study.
abstract: str
Abstract of the study.
species: str
Species of the study.
Support:
Species.HUMAN.value
Species.MOUSE.value
Species.NON_HUMAN_PRIMATE.value
Species.OTHERS.value
study_type: int
Format of the study
Support:
StudyType.DSP.value
StudyType.VISIUM.value
StudyType.VISIUM_RDS.value
StudyType.VISIUM_ANN.value
chunk_size: int
size of each separated chunk for uploading. Default: ChunkSize.CHUNK_100_MB.value\n
Support:
ChunkSize.CHUNK_5_MB.value
ChunkSize.CHUNK_100_MB.value
ChunkSize.CHUNK_500_MB.value
ChunkSize.CHUNK_1_GB.value
3.2.1. Visium format
[4]:
## Support multiple batches per submission
batch_info = [{
'name': 'test_visium',
'matrix': '/data/dev/SonVo/visium_test/Visium_FFPE_Human_Prostate_IF_filtered_feature_bc_matrix.h5',
'image': '/data/dev/SonVo/visium_test/tissue_hires_image.png',
'position': '/data/dev/SonVo/visium_test/tissue_positions.csv',
'scale': '/data/dev/SonVo/visium_test/scalefactors_json.json',
}]
connector.submit_study_from_local(
group_id='personal',
batch_info=batch_info,
study_id='test_visium',
name='This is my first study',
authors=['Huy Nguyen', 'Thao Truong'],
species=Species.HUMAN.value,
study_type=StudyType.VISIUM.value,
)
test_visiummatrix.h5 - chunk_0: 18%|█████████████████████████████▋ | 18.1M/100M [00:00<00:01, 84.8MMB/s]
test_visiumhires.png - chunk_0: 4%|██████▉ | 4.24M/100M [00:00<00:01, 62.5MMB/s]
test_visiumposition.csv - chunk_0: 0%|▎ | 188k/100M [00:00<00:13, 7.72MMB/s]
test_visiumscale.json - chunk_0: 0%| | 510/100M [00:00<1:06:43, 26.2kMB/s]
[2023-09-26 06:36] Waiting in queue
[2023-09-26 06:36] Reading batch: test_visium
[2023-09-26 06:36] [test_visium] Indexing matrix
[2023-09-26 06:36] [test_visium] Indexing images
[2023-09-26 06:36] Finish batch: test_visium
[2023-09-26 06:36] Preprocessing expression matrix: 3460 cells x 17943 genes
[2023-09-26 06:36] Filtered: 3460 cells remain
[2023-09-26 06:36] Waiting in queue (matrix processing)
[2023-09-26 06:36] Normalizing expression matrix (matrix processing)
[2023-09-26 06:36] Running PCA (matrix processing)
[2023-09-26 06:36] Running kNN (matrix processing)
[2023-09-26 06:36] Study was successfully submitted
[2023-09-26 06:36] DONE!!!
Study submitted successfully!
[4]:
True
3.2.2. DSP format
[ ]:
## Support multiple batches per submission
batch_info = [{
'name': 'data_1',
'matrix': 'local_path/data_1/matrix.xlsx',
'image': 'local_path/data_1/image.ome.tiff',
}, {...}]
connector.submit_study_from_local(
group_id='personal',
batch_info=batch_info,
study_id='test_visium',
name='This is my first study',
authors=['Huy Nguyen', 'Thao Truong'],
species=Species.HUMAN.value,
study_type=StudyType.DSP.value,
)
3.2.3. Visium Scanpy object
[ ]:
## Support multiple batches per submission
batch_info = [{
'matrix': 'local_path/GSE128223_1.h5ad'
}, {...}]
connector.submit_study_from_local(
group_id='personal',
batch_info=batch_info,
study_id='test_visium',
name='This is my first study',
authors=['Huy Nguyen', 'Thao Truong'],
species=Species.HUMAN.value,
study_type=StudyType.VISIUM_ANN.value,
)
3.2.4. Visium Seurat object
[ ]:
## Support multiple batches per submission
batch_info = [{
'matrix': 'local_path/GSE128223_1.rds'
}, {...}]
connector.submit_study_from_local(
group_id='personal',
batch_info=batch_info,
study_id='test_visium',
name='This is my first study',
authors=['Huy Nguyen', 'Thao Truong'],
species=Species.HUMAN.value,
study_type=StudyType.VISIUM_RDS.value,
)
4. Submit metadata
NOTE: Get group_id and study_id (uuid) from step “2. List groups and studies”
4.1. Submit a dataframe directly
This is an example metadata. Barcodes column must be DataFrame.index
[9]:
meta_df = pd.read_csv('MERGED_VISIUM_metadata.tsv', sep='\t', index_col=0)
meta_df
[9]:
Batches | |
---|---|
Barcodes | |
spatial_TATGGCAGACTTTCGA-1 | spatial |
spatial_CTTCGTGCCCGCATCG-1 | spatial |
spatial_AAACGGGTTGGTATCC-1 | spatial |
spatial_TGCAAACCCACATCAA-1 | spatial |
spatial_GACGGGATGTCTTATG-1 | spatial |
... | ... |
visium_test_AGTATACACAGCGACA-1 | visium_test |
visium_test_TGTGGTTGCTAAAGCT-1 | visium_test |
visium_test_TGATTCCCGGTTACCT-1 | visium_test |
visium_test_AACATTGTGACTCGAG-1 | visium_test |
visium_test_GCTCTTTCCGCTAGTG-1 | visium_test |
7495 rows × 1 columns
[22]:
connector.submit_metadata_from_dataframe(
species=Species.HUMAN.value,
study_id='f6f4c94460af44fabaa07ac77087351c',
group_id='personal',
df=meta_df
)
[22]:
'Successful'
4.2. Submit file from local / server
[23]:
connector.submit_metadata_from_local(
species=Species.HUMAN.value,
study_id='f6f4c94460af44fabaa07ac77087351c',
group_id='personal',
file_path='./MERGED_VISIUM_metadata.tsv'
)
[23]:
'Successful'
4.3. Submit file from s3
[ ]:
connector.submit_metadata_from_s3(
species=Species.HUMAN.value,
study_id='f6f4c94460af44fabaa07ac77087351c',
group_id='personal',
file_path='test_bucket/GSE128223_meta.tsv' #This path DOES NOT include the bucket path configured on platform e.g. s3://bioturing_bucket
)
5. Access study data
NOTE: Get study_id (uuid) from step “2.2. List all available studies in a group”
5.1. Get barcodes
[29]:
barcodes = np.array(connector.get_barcodes(
study_id='f6f4c94460af44fabaa07ac77087351c',
species=Species.HUMAN.value,
))
print(barcodes)
['spatial_TATGGCAGACTTTCGA-1' 'spatial_CTTCGTGCCCGCATCG-1'
'spatial_AAACGGGTTGGTATCC-1' ... 'visium_test_TGATTCCCGGTTACCT-1'
'visium_test_AACATTGTGACTCGAG-1' 'visium_test_GCTCTTTCCGCTAGTG-1']
5.2. Get features
[30]:
features = np.array(connector.get_features(
study_id='f6f4c94460af44fabaa07ac77087351c',
species=Species.HUMAN.value,
))
print(features)
['5S_RRNA' '5_8S_RRNA' '7SK' ... 'AL121908.1' 'AP000527.1' 'AL035681.1']
5.3. Get metadata dataframe
[32]:
metadata = connector.get_metadata(
study_id='f6f4c94460af44fabaa07ac77087351c',
species=Species.HUMAN.value
)
metadata.iloc[:5, :5]
[32]:
Barcodes | Batches | Batches (1) | Batches (2) | Number of genes | |
---|---|---|---|---|---|
0 | spatial_TATGGCAGACTTTCGA-1 | spatial | spatial | spatial | 6782 |
1 | spatial_CTTCGTGCCCGCATCG-1 | spatial | spatial | spatial | 6948 |
2 | spatial_AAACGGGTTGGTATCC-1 | spatial | spatial | spatial | 6972 |
3 | spatial_TGCAAACCCACATCAA-1 | spatial | spatial | spatial | 8065 |
4 | spatial_GACGGGATGTCTTATG-1 | spatial | spatial | spatial | 6229 |
5.4. Get embeddings
5.4.1. List all embeddings
[34]:
embeddings = connector.list_all_custom_embeddings(
study_id='f6f4c94460af44fabaa07ac77087351c',
species=Species.HUMAN.value,
)
embeddings
[34]:
[{'embedding_id': '8e31785c43b6458c8fdc7aa06d2e1028',
'embedding_name': 'PCA (no batch corrected)'},
{'embedding_id': '1a23d7b23f164d258bbd24d83658f194',
'embedding_name': 'tSNE (perplexity=30)'}]
5.4.2. Access an embedding
[35]:
chosen_embedding = connector.retrieve_custom_embedding(
study_id='f6f4c94460af44fabaa07ac77087351c',
species=Species.HUMAN.value,
embedding_id='8e31785c43b6458c8fdc7aa06d2e1028',
)
chosen_embedding
[35]:
array([[-24.439453 , 1.8610998 , -4.453333 , ..., -0.23137376,
0.70885265, 0.513127 ],
[-25.617645 , 2.945625 , -5.5363693 , ..., 0.13122909,
-0.7750866 , 0.21110779],
[-25.639389 , 1.7325156 , -3.724022 , ..., 0.02746161,
-0.19130976, -0.55235636],
...,
[ 27.012297 , -13.96414 , 2.4462044 , ..., -0.89074665,
2.0481367 , 1.2320619 ],
[ 26.676434 , -9.607573 , 2.836241 , ..., -1.4937118 ,
-3.8412411 , 3.404403 ],
[ 26.823132 , -0.6082494 , 8.160352 , ..., 0.42766818,
-3.8642507 , 4.8920965 ]], dtype=float32)
5.5. Query genes
Parameters:
----
group_id: str
ID of the group to submit the data to.
study_id: str
If no value is provided, default id will be a random uuidv4 string
gene_names: List[str], default=[]
If the value array is empty, the return value will be the whole matrix
unit: str
Support:
StudyUnit.UNIT_RAW.value
StudyUnit.UNIT_LOGNORM.value
[36]:
gene_exp = connector.query_genes(
species=Species.HUMAN.value,
study_id='f6f4c94460af44fabaa07ac77087351c',
gene_names=['CD3D', 'CD8A'],
unit=StudyUnit.UNIT_RAW.value,
)
gene_exp
[36]:
<7495x2 sparse matrix of type '<class 'numpy.float32'>'
with 7006 stored elements in Compressed Sparse Column format>
6. Standardize your metadata
NOTE: Get group_id and study_id (uuid) from step “2. List groups and studies”
6.1. Retrieve ontology tree
Returns
----------
Ontologies tree : Dict[Dict]
In which:
'name': name of the node, which will be used in further steps
[ ]:
connector.get_ontologies_tree(
species=Species.HUMAN.value,
group_id='bioturing_public_studies'
)
6.2. Assign standardized terms
Parameters
-----
species: str
Species of the study.
Support: Species.HUMAN.value
Species.MOUSE.value
Species.PRIMATE.value
Species.OTHERS.value
group_id: str
ID of the group to submit the data to.
study_id: str
ID of the study (uuid)
metadata_field: str
column name of meta dataframe in platform (eg: author's tissue)
metadata_value: str
metadata value within the metadata field (eg: normal lung)
root_name: str
name of root in btr ontologies tree (eg: tissue)
leaf_name: str
name of leaf in btr ontologies tree (eg: lung)
[ ]:
# This function is only usable in a group (not 'personal')
connector.assign_standardized_meta(
species=Species.HUMAN.value,
group_id='bioturing_public_studies',
study_id='a1558f8ed6064095be86a091a4118c4a',
metadata_field='Cell type',
metadata_value='TCRV delta 1 gamma-delta T cell',
root_name='cell type',
leaf_name='gamma-delta T cell',
)