SDK_Document_Lens_SC
Installation
[ ]:
!python3 -m pip install bioturing_connector
1. Connect to host server
Must run this step before any further analyses
User’s token is generated from host website
[13]:
import numpy as np
import pandas as pd
from bioturing_connector.typing import Species
from bioturing_connector.typing import ChunkSize
from bioturing_connector.typing import StudyType
from bioturing_connector.typing import StudyUnit
from bioturing_connector.typing import InputMatrixType
from bioturing_connector.lens_sc_connector import LensSCConnector
connector = LensSCConnector(
host="https://talk2data.bioturing.com/lens_sc/",
token="cb8a76d79a264a55af79a2991f982ef7",
ssl=True
)
[14]:
connector.test_connection()
Connecting to host at https://talk2data.bioturing.com/lens_sc/api/v1/test_connection
Connection successful
2. List groups, studies and s3
2.1. Get info of available groups
[8]:
user_groups = connector.get_user_groups()
user_groups
[8]:
[{'id': 'all_members',
'name': 'All members',
'visible': 1,
'creator': 'admin'},
{'id': 'bioturing_public_studies',
'name': 'BioTuring Public Studies',
'visible': 1,
'creator': 'admin'},
{'id': 'personal',
'name': 'Personal workspace',
'visible': 1,
'creator': 'admin'}]
2.2. List all available studies in a group
[9]:
# Using group_id from step 2.1
study_list = connector.get_all_studies_info_in_group(
group_id='personal',
species=Species.HUMAN.value,
)
study_list
[9]:
[{'uuid': '5c470f3b799d474e91d0ca65aec3cf56',
'study_title': 'TBD',
'study_hash_id': 'SMALL_COSMX',
'created_by': 'dev@bioturing.com'},
{'uuid': '9b1d980887944d0199719ef8d3ddb17a',
'study_title': 'TBD',
'study_hash_id': 'XENIUM_BREAST_SMALL',
'created_by': 'dev@bioturing.com'}]
2.3. List all s3 bucket of current user
[ ]:
connector.get_user_s3()
[{'id': '505e49d2abee405f8a7b4ce2628d5270',
'bucket': 'bioturingdebug',
'prefix': ''},
{'id': 'd938706094354d7eb4726d6c9b07de9c',
'bucket': 'talk2data',
'prefix': ''}]
3. Submit study
NOTE: Get group_id from step “2.1. Get info of available groups”
3.1. Submit single cell - spatial dataset (COSMX, VISIUM, VIZGEN, …)
3.1.1. Option 1: Submit study from s3
Parameters:
----
group_id: str
ID of the group to submit the data to.
s3_id : str
ID of s3 bucket. Default: None
If s3_id is not provided, we will use the first s3 bucket configured on the platform.
batch_info: List[dict]
File path and batch name information, the path DOES NOT include the bucket path configured on platform!
Example:
[{
'name': 'study_1',
'folder': 's3_path/study_folder',
}, {...}]
study_id: str
If no value is provided, default id will be a random uuidv4 string
name: str
Name of the study.
authors: List[str]
Authors of the study.
abstract: str
Abstract of the study.
species: str
Species of the study.
Support: Species.HUMAN.value
Species.MOUSE.value
Species.PRIMATE.value
Species.OTHERS.value
study_type: int
Format of the study
Support: StudyType.VIZGEN.value
StudyType.COSMX.value
StudyType.XENIUM.value
min_counts: int. Default: 0
Minimum number of counts required for a cell to pass filtering.
min_genes: int. Default: 0
Minimum number of genes expressed required for a cell to pass filtering.
max_counts: int. Default: inf
Maximum number of counts required for a cell to pass filtering.
max_genes: int. Default: inf
Maximum number of genes expressed required for a cell to pass filtering.
neg_controls_percentage: int. Default: 100
Maximum number of control/negative genes percentage required for a cell to pass filtering.
Ranging from 0 to 100
[19]:
batch_info = [{
'name': 'dataset1',
'folder': 's3_path/data_fol_1',
}, {
'name': 'dataset2',
'folder': 's3_path/data_fol_2',
}]
# --------
## Demo submisison
## The path DOES NOT include the bucket path configured on platform
batch_info = [{
'name': 'small_cosmx',
'folder': 'demo_data/small_cosmx',
}]
connector.submit_study_from_s3_lens_sc(
group_id='personal',
batch_info=batch_info,
study_id='COSMX_SMALL_DATASET',
name='This is my first study',
authors=['Huy Nguyen', 'Thao Truong'],
species=Species.HUMAN.value,
study_type=StudyType.COSMX.value,
min_genes=15,
neg_controls_percentage=5,
)
[2023-09-26 08:29] Waiting in queue
[2023-09-26 08:29] Downloading from s3: demo_data/small_cosmx/tx_file.csv
[2023-09-26 08:29] Downloading from s3: demo_data/small_cosmx/R5779_TMA2-S6_fov_positions_file.csv
[2023-09-26 08:29] Downloading from s3: demo_data/small_cosmx/CellLabels
[2023-09-26 08:29] [List folder demo_data/small_cosmx/CellLabels] Files: demo_data/small_cosmx/CellLabels/CellLabels_F001.tif | demo_data/small_cosmx/CellLabels/CellLabels_F002.tif | demo_data/small_cosmx/CellLabels/CellLabels_F003.tif | demo_data/small_cosmx/CellLabels/CellLabels_F004.tif | demo_data/small_cosmx/CellLabels/CellLabels_F005.tif ; Folders:
[2023-09-26 08:30] Downloading from s3: demo_data/small_cosmx/RawMorphologyImages
[2023-09-26 08:30] [List folder demo_data/small_cosmx/RawMorphologyImages] Files: demo_data/small_cosmx/RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F001.TIF | demo_data/small_cosmx/RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F002.TIF | demo_data/small_cosmx/RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F003.TIF | demo_data/small_cosmx/RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F004.TIF | demo_data/small_cosmx/RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F005.TIF ; Folders:
[2023-09-26 08:30] All files downloaded
[2023-09-26 08:30] Reading batch: small_cosmx
[2023-09-26 08:30] [small_cosmx] Preprocess data
[2023-09-26 08:30] [small_cosmx] Indexing cell boundaries
[2023-09-26 08:31] Finish: create_cell_boundaries_and_centers 54.29751372337341
[2023-09-26 08:31] [small_cosmx] Indexing sample images
[2023-09-26 08:35] Finish: indexing sample images 237.9231081008911
[2023-09-26 08:35] [small_cosmx] Indexing transcripts
[2023-09-26 08:40] Finish: create_cell_boundaries_and_centers 325.7730453014374
[2023-09-26 08:40] [small_cosmx] Indexing matrix
[2023-09-26 08:40] Finish batch: small_cosmx
[2023-09-26 08:40] Preprocessing expression matrix: 12658 cells x 63702 genes
[2023-09-26 08:40] Filtered: 11814 cells remain
[2023-09-26 08:40] Waiting in queue (matrix processing)
[2023-09-26 08:40] Normalizing expression matrix (matrix processing)
[2023-09-26 08:40] Running PCA (matrix processing)
[2023-09-26 08:40] Running venice binarizer (matrix processing)
[2023-09-26 08:41] Study was successfully submitted
[2023-09-26 08:41] DONE!!!
Study submitted successfully!
[19]:
True
3.1.2. Option 2: Submit study from local machine / server
Parameters:
------
group_id: str
ID of the group to submit the data to.
batch_info: List[dict]
File path and batch name information
Example:
[{
'name': 'dataset_1',
'folder': 'server_path/dataset_folder_1',
}, {...}]
study_id: str
If no value is provided, default id will be a random uuidv4 string
name: str
Name of the study.
authors: List[str]
Authors of the study.
abstract: str
Abstract of the study.
species: str
Species of the study.
Support: Species.HUMAN.value
Species.MOUSE.value
Species.PRIMATE.value
Species.OTHERS.value
study_type: int
Format of the study
Support: StudyType.VIZGEN.value
StudyType.COSMX.value
StudyType.XENIUM.value
min_counts: int. Default: 0
Minimum number of counts required for a cell to pass filtering.
min_genes: int. Default: 0
Minimum number of genes expressed required for a cell to pass filtering.
max_counts: int. Default: inf
Maximum number of counts required for a cell to pass filtering.
max_genes: int. Default: inf
Maximum number of genes expressed required for a cell to pass filtering.
neg_controls_percentage: int. Default: 100
Maximum number of control/negative genes percentage required for a cell to pass filtering.
Ranging from 0 to 100
chunk_size: int
size of each separated chunk for uploading. Default: ChunkSize.CHUNK_100_MB.value
Support:
ChunkSize.CHUNK_5_MB.value
ChunkSize.CHUNK_100_MB.value
ChunkSize.CHUNK_500_MB.value
ChunkSize.CHUNK_1_GB.value
[20]:
batch_info = [{
'name': 'batch1',
'folder': 'local_path/dataset_folder_1',
}, {
'name': 'batch2',
'folder': 'local_path/dataset_folder_2',
}, {...}]
#----
## Demo submission
batch_info = [{
'name': 'batch1',
'folder': '/mnt/gvol8080/demo_data/cosmx/small_cosmx',
}]
connector.submit_study_from_local_lens_sc(
group_id='personal',
batch_info=batch_info,
study_id='COSMX_SMALL_DATASET',
name='This is my first study',
authors=['Huy Nguyen', 'Thao Truong'],
species=Species.HUMAN.value,
study_type=StudyType.COSMX.value,
min_genes=15,
neg_controls_percentage=5,
)
Zipping neccesary files of batch [batch1].
Location: /mnt/gvol8080/data/SonVo/sonvo_ssd/sc_spatial/cosmx/small_cosmx/batch1.zip
adding: tx_file.csv (deflated 74%)
adding: R5779_TMA2-S6_fov_positions_file.csv (deflated 46%)
adding: CellLabels/ (stored 0%)
adding: CellLabels/CellLabels_F001.tif (deflated 31%)
adding: CellLabels/CellLabels_F002.tif (deflated 23%)
adding: CellLabels/CellLabels_F003.tif (deflated 23%)
adding: CellLabels/CellLabels_F004.tif (deflated 23%)
adding: CellLabels/CellLabels_F005.tif (deflated 22%)
adding: RawMorphologyImages/ (stored 0%)
adding: RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F001.TIF (deflated 7%)
adding: RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F002.TIF (deflated 6%)
adding: RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F003.TIF (deflated 9%)
adding: RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F004.TIF (deflated 6%)
adding: RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F005.TIF (deflated 5%)
/data/dev/SonVo/btr_connector_notebook
Uploading all files to server...
batch1.zip - chunk_0: 100MMB [00:00, 122MMB/s]
batch1.zip - chunk_1: 100MMB [00:00, 117MMB/s]
batch1.zip - chunk_2: 100MMB [00:00, 113MMB/s]
batch1.zip - chunk_3: 100MMB [00:00, 117MMB/s]
batch1.zip - chunk_4: 100MMB [00:00, 113MMB/s]
batch1.zip - chunk_5: 100MMB [00:00, 119MMB/s]
batch1.zip - chunk_6: 100MMB [00:00, 122MMB/s]
batch1.zip - chunk_7: 72%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 71.8M/100M [00:01<00:00, 54.0MMB/s]
Delete zip files: [/mnt/gvol8080/data/SonVo/sonvo_ssd/sc_spatial/cosmx/small_cosmx/batch1.zip]
[2023-09-26 08:49] Waiting in queue
[2023-09-26 08:49] Reading batch: batch1
[2023-09-26 08:49] [batch1] Preprocess data
[2023-09-26 08:49] [batch1] Indexing cell boundaries
[2023-09-26 08:50] Finish: create_cell_boundaries_and_centers 54.93383240699768
[2023-09-26 08:50] [batch1] Indexing sample images
[2023-09-26 08:54] Finish: indexing sample images 246.05648136138916
[2023-09-26 08:54] [batch1] Indexing transcripts
[2023-09-26 08:59] Finish: create_cell_boundaries_and_centers 327.1916997432709
[2023-09-26 08:59] [batch1] Indexing matrix
[2023-09-26 08:59] Finish batch: batch1
[2023-09-26 08:59] Preprocessing expression matrix: 12658 cells x 63702 genes
[2023-09-26 08:59] Filtered: 11814 cells remain
[2023-09-26 08:59] Waiting in queue (matrix processing)
[2023-09-26 08:59] Normalizing expression matrix (matrix processing)
[2023-09-26 08:59] Running PCA (matrix processing)
[2023-09-26 08:59] Running venice binarizer (matrix processing)
[2023-09-26 08:59] Running t-SNE (matrix processing)
[2023-09-26 08:59] Study was successfully submitted
[2023-09-26 08:59] DONE!!!
Study submitted successfully!
[20]:
True
3.2. Submit proteomics dataset (CODEX, AKOYA, …)
3.2.1. Option 1: Submit study from s3
Parameters:
-----
group_id: str
ID of the group to submit the data to.
s3_id : str
ID of s3 bucket. Default: None
If s3_id is not provided, we will use the first s3 bucket configured on the platform.
batch_info: Dict[]
File path and batch name information, the path DOES NOT included the bucket path!
Example:
{
'image': 's3_path/image.ome.tiff'
}
study_id: str
If no value is provided, default id will be a random uuidv4 string
name: str
Name of the study.
authors: List[str]
Authors of the study.
abstract: str
Abstract of the study.
species: str
Species of the study.
Support: Species.HUMAN.value
Species.MOUSE.value
Species.PRIMATE.value
Species.OTHERS.value
min_counts: int. Default: 0
Minimum number of counts required for a cell to pass filtering.
min_genes: int. Default: 0
Minimum number of genes expressed required for a cell to pass filtering.
max_counts: int. Default: inf
Maximum number of counts required for a cell to pass filtering.
max_genes: int. Default: inf
Maximum number of genes expressed required for a cell to pass filtering.
[ ]:
## ONLY accept 1 image per submission
## The path DOES NOT include the bucket path configured on platform
batch_info = {
'image': 's3_path/image.qptiff',
}
batch_info = {
'image': 's3_path/image.ome.tiff',
}
connector.submit_study_from_s3_proteomics(
group_id='personal',
batch_info=batch_info,
study_id='PROTEOMICS_BRAIN',
name='This is my first study',
authors=['Huy Nguyen', 'Thao Truong'],
species=Species.HUMAN.value,
)
3.1.2. Option 2: Submit study from local machine / server
Parameters:
------
group_id: str
ID of the group to submit the data to.
batch_info: List[]
File path and batch name information
Example:
{
'image': 'server_path/image.ome.tiff'
}
study_id: str
If no value is provided, default id will be a random uuidv4 string
name: str
Name of the study.
authors: List[str]
Authors of the study.
abstract: str
Abstract of the study.
species: str
Species of the study.
Support: Species.HUMAN.value
Species.MOUSE.value
Species.PRIMATE.value
Species.OTHERS.value
min_counts: int. Default: 0
Minimum number of counts required for a cell to pass filtering.
min_genes: int. Default: 0
Minimum number of genes expressed required for a cell to pass filtering.
max_counts: int. Default: inf
Maximum number of counts required for a cell to pass filtering.
max_genes: int. Default: inf
Maximum number of genes expressed required for a cell to pass filtering.
chunk_size: int
size of each separated chunk for uploading. Default: ChunkSize.CHUNK_100_MB.value\n
Support:
ChunkSize.CHUNK_5_MB.value
ChunkSize.CHUNK_100_MB.value
ChunkSize.CHUNK_500_MB.value
ChunkSize.CHUNK_1_GB.value
[ ]:
## ONLY accept 1 image per submission
batch_info = {
'image': 'local_path/image.qptiff',
}
batch_info = {
'image': 'local_path/image.ome.tiff',
}
connector.submit_study_from_local_proteomics(
group_id='personal',
batch_info=batch_info,
study_id='PROTEOMICS_BRAIN',
name='This is my first study',
authors=['Huy Nguyen', 'Thao Truong'],
species=Species.HUMAN.value,
)
4. Submit metadata
NOTE: Get group_id and study_id (uuid) from step “2. List groups and studies”
4.1. Submit a dataframe directly
This is an example metadata. Barcodes column must be DataFrame.index
[2]:
meta_df = pd.read_csv('SMALL_COSMX_metadata.tsv', sep='\t', index_col=0)
meta_df
[2]:
fov | |
---|---|
Barcodes | |
1_1 | fov_1 |
1_2 | fov_1 |
1_3 | fov_1 |
1_4 | fov_1 |
1_5 | fov_1 |
... | ... |
5_3301 | fov_5 |
5_3313 | fov_5 |
5_3314 | fov_5 |
5_3321 | fov_5 |
5_3324 | fov_5 |
11814 rows × 1 columns
[3]:
connector.submit_metadata_from_dataframe(
species=Species.HUMAN.value,
study_id="5c470f3b799d474e91d0ca65aec3cf56",
group_id='personal',
df=meta_df
)
[3]:
'Successful'
4.2. Submit file from local / server
[4]:
connector.submit_metadata_from_local(
species=Species.HUMAN.value,
study_id='5c470f3b799d474e91d0ca65aec3cf56',
group_id='personal',
file_path='./SMALL_COSMX_metadata.tsv'
)
[4]:
'Successful'
4.3. Submit file from s3
[ ]:
connector.submit_metadata_from_s3(
species=Species.HUMAN.value,
study_id='5c470f3b799d474e91d0ca65aec3cf56',
group_id='personal',
file_path='demo_data/SMALL_COSMX_metadata.tsv' #This path DOES NOT include the bucket path configured on platform e.g. s3://bioturing_bucket
)
5. Access study data
NOTE: Get study_id (uuid) from step “2.2. List all available studies in a group”
5.1. Get barcodes
[7]:
barcodes = np.array(connector.get_barcodes(
study_id='5c470f3b799d474e91d0ca65aec3cf56',
species=Species.HUMAN.value,
))
barcodes
[7]:
array(['1_1', '1_2', '1_3', ..., '5_3314', '5_3321', '5_3324'],
dtype='<U6')
5.2. Get features
[8]:
features = np.array(connector.get_features(
study_id='5c470f3b799d474e91d0ca65aec3cf56',
species=Species.HUMAN.value,
))
features
[8]:
array(['5S_RRNA', '5_8S_RRNA', '7SK', ..., 'NEGPRB18', 'NEGPRB10',
'NEGPRB15'], dtype='<U26')
5.3. Get metadata dataframe
[9]:
metadata = connector.get_metadata(
study_id='5c470f3b799d474e91d0ca65aec3cf56',
species=Species.HUMAN.value
)
metadata.iloc[:5, :5]
[9]:
Barcodes | Alexa-488_Histone_Nuclei | Alexa-546_G_None | Alexa-594_rRNA_CD298_B2M_Membrane | Alexa-647_GFAP_Astrocytes | |
---|---|---|---|---|---|
0 | 1_1 | 49 | 4 | 19 | 68 |
1 | 1_2 | 46 | 4 | 25 | 96 |
2 | 1_3 | 77 | 5 | 22 | 49 |
3 | 1_4 | 57 | 5 | 27 | 8 |
4 | 1_5 | 38 | 5 | 30 | 17 |
5.4. Get embeddings
5.4.1. List all embeddings
[10]:
embeddings = connector.list_all_custom_embeddings(
study_id='5c470f3b799d474e91d0ca65aec3cf56',
species=Species.HUMAN.value,
)
embeddings
[10]:
[{'embedding_id': 'c4529a43ceaf40e98935f857aa1caa5c',
'embedding_name': 'PCA (no batch corrected)'},
{'embedding_id': '63270cea38374086ae38c9bc142a1b30',
'embedding_name': 'tSNE (perplexity=30)'}]
5.4.2. Access an embedding
[11]:
chosen_embedding = connector.retrieve_custom_embedding(
study_id='5c470f3b799d474e91d0ca65aec3cf56',
species=Species.HUMAN.value,
embedding_id='c4529a43ceaf40e98935f857aa1caa5c',
)
chosen_embedding
[11]:
array([[-3.2380335e-03, -2.1599566e-03, 8.6972013e-04, ...,
-6.6192023e-04, 2.0368092e-04, 7.6390570e-05],
[-3.0471983e-03, -2.6254782e-03, 1.5224112e-03, ...,
8.3375332e-04, 1.8367210e-03, 9.4208797e-04],
[-3.9695334e-03, -3.1506929e-03, 8.5247034e-04, ...,
8.9647510e-04, 1.2072887e-04, -6.8749214e-04],
...,
[-8.9350082e-03, -1.0427534e-02, 5.4663382e-03, ...,
-3.6334249e-03, -1.3702468e-03, 1.4806709e-03],
[-7.5779855e-03, -7.6015377e-03, -2.1946256e-04, ...,
-3.1667415e-03, -4.9561551e-03, -3.7561799e-03],
[-3.5920746e-03, -7.0776208e-03, -3.0932256e-06, ...,
-8.8387710e-04, 3.7067404e-03, -2.5200413e-03]], dtype=float32)
5.5. Query genes
Parameters:
----
group_id: str
ID of the group to submit the data to.
study_id: str
If no value is provided, default id will be a random uuidv4 string
gene_names: List[str], default=[]
If the value array is empty, the return value will be the whole matrix
unit: str
Support:
StudyUnit.UNIT_RAW.value
StudyUnit.UNIT_LOGNORM.value
[12]:
gene_exp = connector.query_genes(
study_id='5c470f3b799d474e91d0ca65aec3cf56',
species=Species.HUMAN.value,
gene_names=['CD3D', 'CD8A'],
unit=StudyUnit.UNIT_RAW.value,
)
gene_exp
[12]:
<11814x2 sparse matrix of type '<class 'numpy.float32'>'
with 1649 stored elements in Compressed Sparse Column format>
6. Standardize your metadata
NOTE: Get group_id and study_id (uuid) from step “2. List groups and studies”
6.1. Retrieve ontology tree
Returns
----------
Ontologies tree : Dict[Dict]
In which:
'name': name of the node, which will be used in further steps
[ ]:
connector.get_ontologies_tree(
species=Species.HUMAN.value,
group_id='bioturing_public_studies'
)
6.2. Assign standardized terms
Parameters
-----
species: str
Species of the study.
Support: Species.HUMAN.value
Species.MOUSE.value
Species.PRIMATE.value
Species.OTHERS.value
group_id: str
ID of the group to submit the data to.
study_id: str
ID of the study (uuid)
metadata_field: str
column name of meta dataframe in platform (eg: author's tissue)
metadata_value: str
metadata value within the metadata field (eg: normal lung)
root_name: str
name of root in btr ontologies tree (eg: tissue)
leaf_name: str
name of leaf in btr ontologies tree (eg: lung)
[ ]:
# This function is only usable in a group (not 'personal')
connector.assign_standardized_meta(
species=Species.HUMAN.value,
group_id='bioturing_public_studies',
study_id='5c470f3b799d474e91d0ca65aec3cf56',
metadata_field='Cell type',
metadata_value='TCRV delta 1 gamma-delta T cell',
root_name='cell type',
leaf_name='gamma-delta T cell',
)