SDK_Document_BBrowserX

Installation

[ ]:
!python3 -m pip install -U bioturing_connector

1. Connect to host server

Must run this step before any further analyses

User’s token is generated from host website

[39]:
import numpy as np
import pandas as pd
from bioturing_connector.typing import Species
from bioturing_connector.typing import ChunkSize
from bioturing_connector.typing import StudyType
from bioturing_connector.typing import StudyUnit
from bioturing_connector.typing import InputMatrixType
from bioturing_connector.bbrowserx_connector import BBrowserXConnector

connector = BBrowserXConnector(
  host="https://talk2data.bioturing.com/t2d_index_tool/",
  token="98592aac0b284c899ebf5dd0ff2eff90",
  ssl=True
)
[20]:
connector.test_connection()
Connecting to host at https://talk2data.bioturing.com/t2d_index_tool/api/v1/test_connection
Connection successful

2. List groups, studies and s3

2.1. Get info of available groups

[2]:
user_groups = connector.get_user_groups()
user_groups
[2]:
[{'group_id': 'all_members', 'group_name': 'All members'},
 {'group_id': 'bioturing_public_studies',
  'group_name': 'BioTuring Public Studies'},
 {'group_id': 'personal', 'group_name': 'Personal workspace'}]

2.2. List all available studies in a group

[3]:
# Using group_id from step 2.1

study_list = connector.get_all_studies_info_in_group(
  group_id='personal',
  species=Species.HUMAN.value,
)
study_list
[3]:
[{'uuid': '80d76fc8136c4dfe807e3aa2beefca76',
  'study_title': 'TBD',
  'study_hash_id': 'COSMX_HUMAN_CORTEX',
  'created_by': 'sonvo@bioturing.com'},
 {'uuid': 'a1558f8ed6064095be86a091a4118c4a',
  'study_title': 'TBD',
  'study_hash_id': 'GSE128223',
  'created_by': 'sonvo@bioturing.com'}]

2.3. List all s3 bucket of current user

[ ]:
connector.get_user_s3()
[{'id': '505e49d2abee405f8a7b4ce2628d5270',
  'bucket': 'bioturingdebug',
  'prefix': ''},
 {'id': 'd938706094354d7eb4726d6c9b07de9c',
  'bucket': 'talk2data',
  'prefix': ''}]

2.4. List all shared s3 of a group

[ ]:
connector.get_shared_s3_of_group('all_members')
[]

3. Submit study

NOTE: Get group_id from step “2.1. Get info of available groups”

3.1. Option 1: Submit study from s3

Parameters:
----
group_id: str
      ID of the group to submit the data to.
s3_id: str
      ID of s3 bucket. Default: None\n
      If s3_id is not provided, we will use the first s3 bucket configured on the platform.
batch_info: List[dict]
      File path and batch name information, the path DOES NOT include bucket path configured on platform!
      Example:
        For H5AD format:
          [{
            'matrix': 's3_path/GSE128223_1.h5ad'
          }, {...}]
        For RDS format:
          [{
            'matrix': 's3_path/GSE128223_1.rds'
          }, {...}]
        For MTX_10X format:
          [{
            'matrix': 's3_path/data_1/matrix.mtx',
            'features': 's3_path/data_1/features.tsv',
            'barcodes': 's3_path/data_1/barcodes.tsv',
          }, {...}]
        For TILE_DB format:
          [{
            'folder': 's3_path/GSE128223_1'
          }, {...}]
study_id: str
      Will be name of study (eg: GSE128223)
      If no value is provided, default id will be a random uuidv4 string
name: str
      Name of the study.
authors: List[str]
      Authors of the study.
abstract: str
      Abstract of the study.
species: str
      Species of the study.
      Support:
            Species.HUMAN.value
            Species.MOUSE.value
            Species.NON_HUMAN_PRIMATE.value
            Species.OTHERS.value
skip_dimred: Bool
      Skip BioTuring pipeline if set to True (only appliable when input is a scanpy/seurat object).
input_matrix_type: str
      Is the input matrix already normalized or not?
      Support:
            InputMatrixType.NORMALIZED.value (will skip BioTuring normalization, h5ad: use adata.X)
            InputMatrixType.RAW.value (apply BioTuring normalization, h5ad: use adata.raw.X)
study_type: int
      Format of dataset
      Support:
            StudyType.BBROWSER.value
            StudyType.H5_10X.value
            StudyType.H5AD.value
            StudyType.MTX_10X.value
            StudyType.BCS.value
            StudyType.RDS.value
            StudyType.TSV.value
            StudyType.TILE_DB.value
min_counts: int
      Minimum number of counts required for a cell to pass filtering.
min_genes: int
      Minimum number of genes expressed required for a cell to pass filtering.
max_counts: int
      Maximum number of counts required for a cell to pass filtering.
max_genes:
      Maximum number of genes expressed required for a cell to pass filtering.
mt_percentage: int
      Maximum number of mitochondria genes percentage required for a cell to pass filtering.
      Ranging from 0 to 100

3.1.1. 10X Matrix format

[37]:
## The path DOES NOT include the bucket path configured on platform
## Support multiple batches per submission
batch_info = [{
    'matrix': 'GSE128223/raw/matrix.mtx',
    'features': 'GSE128223/raw/features.tsv',
    'barcodes': 'GSE128223/raw/barcodes.tsv',
}, {...}]
connector.submit_study_from_s3(
  group_id='personal',
  batch_info=batch_info,
  study_id='GSE128223',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
  input_matrix_type=InputMatrixType.RAW.value,
  study_type=StudyType.MTX_10X.value
)
[2023-09-26 06:08] Waiting in queue
[2023-09-26 06:08] Downloading GSE128223/raw/barcodes.tsv from s3: 262.1 KB / 539.5 KB
[2023-09-26 06:08] Downloading GSE128223/raw/features.tsv from s3: 262.1 KB / 322.8 KB
[2023-09-26 06:08] Downloading GSE128223/raw/matrix.mtx from s3: 262.1 KB / 927.0 MB
[2023-09-26 06:09] File downloaded
[2023-09-26 06:09] Reading batch: raw
[2023-09-26 06:09] Preprocessing expression matrix: 20923 cells x 35756 genes
[2023-09-26 06:09] Filtered: 20923 cells remain
[2023-09-26 06:09] Start processing study
[2023-09-26 06:09] Normalizing expression matrix
[2023-09-26 06:09] Running PCA
[2023-09-26 06:09] Running kNN
[2023-09-26 06:09] Running venice binarizer
[2023-09-26 06:09] Running t-SNE
[2023-09-26 06:09] Study was successfully submitted
[2023-09-26 06:09] DONE!!!
Study submitted successfully!
[37]:
True

3.1.2. Scanpy object

[ ]:
## The path DOES NOT include the bucket path configured on platform
## Support multiple batches per submission
batch_info = [{
    'matrix': 's3_path/GSE128223_1.h5ad',
}, {...}]

connector.submit_study_from_s3(
  group_id='personal',
  batch_info=batch_info,
  study_id='GSE128223',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
  input_matrix_type=InputMatrixType.RAW.value,
  study_type=StudyType.H5AD.value
)

3.1.3. Seurat object

[ ]:
## The path DOES NOT include the bucket path configured on platform
## Support multiple batches per submission
batch_info = [{
    'matrix': 's3_path/GSE128223_1.rds',
}, {...}]

connector.submit_study_from_s3(
  group_id='personal',
  batch_info=batch_info,
  study_id='GSE128223',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
  input_matrix_type=InputMatrixType.RAW.value,
  study_type=StudyType.RDS.value
)

3.1.4. Tile DB format

[ ]:
## The path DOES NOT include the bucket path configured on platform
## Support multiple batches per submission
batch_info = [{
    'folder': 's3_path/GSE128223_1',
}, {...}]

connector.submit_study_from_s3(
  group_id='personal',
  batch_info=batch_info,
  study_id='GSE128223',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
  input_matrix_type=InputMatrixType.RAW.value,
  study_type=StudyType.TILE_DB.value
)

3.1.5. Full matrix dataframe

[ ]:
## The path DOES NOT include the bucket path configured on platform
## Support multiple batches per submission
batch_info = [{
    'matrix': 's3_path/GSE128223_1.tsv',
}, {...}]

connector.submit_study_from_s3(
  group_id='personal',
  batch_info=batch_info,
  study_id='GSE128223',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
  input_matrix_type=InputMatrixType.RAW.value,
  study_type=StudyType.TSV.value
)

3.2. Option 2: Submit study from local machine

Parameters:
------
group_id: str
      ID of the group to submit the data to.
batch_info: List[dict]
      File path and batch name information.
      Example:
        For H5AD format:
          [{
            'matrix': 'local_path/GSE128223_1.h5ad'
          }, {...}]
        For RDS format:
          [{
            'matrix': 'local_path/GSE128223_1.rds'
          }, {...}]
        For MTX_10X format:
          [{
            'name': 'data_1',
            'matrix': 'local_path/data_1/matrix.mtx',
            'features': 'local_path/data_1/features.tsv',
            'barcodes': 'local_path/data_1/barcodes.tsv',
          }, {...}]
study_id: str
      If no value is provided, default id will be a random uuidv4 string
name: str
      Name of the study.
authors: List[str]
      Authors of the study.
abstract: str
      Abstract of the study.
species: str
      Species of the study.
      Support:
            Species.HUMAN.value
            Species.MOUSE.value
            Species.NON_HUMAN_PRIMATE.value
            Species.OTHERS.value
skip_dimred: bool
      Skip BioTuring pipeline if set to True (only appliable when input is a scanpy/seurat object).
input_matrix_type: str
      Is the input matrix already normalized or not?
      Support:
          InputMatrixType.NORMALIZED.value (will skip BioTuring normalization, h5ad: use adata.X)
          InputMatrixType.RAW.value (apply BioTuring normalization, h5ad: use adata.raw.X)
study_type: int
      Format of dataset
      Support:
            StudyType.BBROWSER.value
            StudyType.H5_10X.value
            StudyType.H5AD.value
            StudyType.MTX_10X.value
            StudyType.BCS.value
            StudyType.RDS.value
            StudyType.TSV.value
min_counts: int
      Minimum number of counts required for a cell to pass filtering.
min_genes: int
      Minimum number of genes expressed required for a cell to pass filtering.
max_counts: int
      Maximum number of counts required for a cell to pass filtering.
max_genes: int
      Maximum number of genes expressed required for a cell to pass filtering.
mt_percentage: int
      Maximum number of mitochondria genes percentage required for a cell to pass filtering.
      Ranging from 0 to 100
chunk_size: int
      Size of each separated chunk for uploading. Default: ChunkSize.CHUNK_100_MB.value\n
      Support:
            ChunkSize.CHUNK_5_MB.value
            ChunkSize.CHUNK_100_MB.value
            ChunkSize.CHUNK_500_MB.value
            ChunkSize.CHUNK_1_GB.value

3.2.1. 10X Matrix format

[38]:
## Support multiple batches per submission
batch_info = [{
    'name': 'GSE128223',
    'matrix': '/data/dev/example_dataset/GSE128223/raw/matrix.mtx',
    'features': '/data/dev/example_dataset/GSE128223/raw/features.tsv',
    'barcodes': '/data/dev/example_dataset/GSE128223/raw/barcodes.tsv',
}, {...}]

connector.submit_study_from_local(
  group_id='personal',
  batch_info=batch_info,
  study_id='GSE128223',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
  input_matrix_type=InputMatrixType.RAW.value,
  study_type=StudyType.MTX_10X.value
)
GSE128223matrix.mtx - chunk_0: 100MMB [00:08, 12.2MMB/s]
GSE128223matrix.mtx - chunk_1: 100MMB [00:09, 11.5MMB/s]
GSE128223matrix.mtx - chunk_2: 100MMB [00:08, 12.4MMB/s]
GSE128223matrix.mtx - chunk_3: 100MMB [00:10, 10.3MMB/s]
GSE128223matrix.mtx - chunk_4: 100MMB [00:10, 10.1MMB/s]
GSE128223matrix.mtx - chunk_5: 100MMB [00:11, 9.27MMB/s]
GSE128223matrix.mtx - chunk_6: 100MMB [00:11, 8.90MMB/s]
GSE128223matrix.mtx - chunk_7: 100MMB [00:07, 13.7MMB/s]
GSE128223matrix.mtx - chunk_8:  84%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                          | 84.0M/100M [00:02<00:00, 38.6MMB/s]
GSE128223features.tsv - chunk_0:   0%|▌                                                                                                                                                                   | 316k/100M [00:00<00:11, 8.95MMB/s]
GSE128223barcodes.csv - chunk_0:   1%|▊                                                                                                                                                                   | 527k/100M [00:00<00:05, 17.9MMB/s]
[2023-09-26 06:15] Waiting in queue
[2023-09-26 06:15] Reading batch: GSE128223
[2023-09-26 06:15] Preprocessing expression matrix: 20923 cells x 35756 genes
[2023-09-26 06:15] Filtered: 20923 cells remain
[2023-09-26 06:15] Start processing study
[2023-09-26 06:15] Normalizing expression matrix
[2023-09-26 06:15] Running PCA
[2023-09-26 06:15] Running kNN
[2023-09-26 06:15] Running venice binarizer
[2023-09-26 06:15] Running t-SNE
[2023-09-26 06:15] Study was successfully submitted
[2023-09-26 06:15] DONE!!!
Study submitted successfully!
[38]:
True

3.2.2. Scanpy object

[ ]:
## Support multiple batches per submission
batch_info = [{
    'matrix': 'local_path/GSE128223_1.h5ad',
}, {...}]

connector.submit_study_from_local(
  group_id='personal',
  batch_info=batch_info,
  study_id='GSE128223',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
  input_matrix_type=InputMatrixType.RAW.value,
  study_type=StudyType.H5AD.value
)

3.2.3. Seurat object

[ ]:
## Support multiple batches per submission
batch_info = [{
    'matrix': 'local_path/GSE128223_1.rds',
}, {...}]

connector.submit_study_from_local(
  group_id='personal',
  batch_info=batch_info,
  study_id='GSE128223',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
  input_matrix_type=InputMatrixType.RAW.value,
  study_type=StudyType.RDS.value
)

3.2.4. Full matrix dataframe

[ ]:
## Support multiple batches per submission
batch_info = [{
    'matrix': 'local_path/GSE128223_1.tsv',
}, {...}]

connector.submit_study_from_local(
  group_id='personal',
  batch_info=batch_info,
  study_id='GSE128223',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
  input_matrix_type=InputMatrixType.RAW.value,
  study_type=StudyType.TSV.value
)

3.3. Option 3: Submit study with shared s3 of a group

Parameters:
----
group_id: str
      ID of the group to submit the data to.
shared_s3_id: str
      ID of s3 bucket
batch_info: List[dict]
      File path and batch name information, the path DOES NOT include bucket path configured on platform!
      Example:
        For H5AD format:
          [{
            'matrix': 's3_path/GSE128223_1.h5ad'
          }, {...}]
        For RDS format:
          [{
            'matrix': 's3_path/GSE128223_1.rds'
          }, {...}]
        For MTX_10X format:
          [{
            'matrix': 's3_path/data_1/matrix.mtx',
            'features': 's3_path/data_1/features.tsv',
            'barcodes': 's3_path/data_1/barcodes.tsv',
          }, {...}]
        For TILE_DB format:
          [{
            'folder': 's3_path/GSE128223_1'
          }, {...}]
study_id: str
      Will be name of study (eg: GSE128223)
      If no value is provided, default id will be a random uuidv4 string
name: str
      Name of the study.
authors: List[str]
      Authors of the study.
abstract: str
      Abstract of the study.
species: str
      Species of the study.
      Support:
            Species.HUMAN.value
            Species.MOUSE.value
            Species.NON_HUMAN_PRIMATE.value
            Species.OTHERS.value
skip_dimred: Bool
      Skip BioTuring pipeline if set to True (only appliable when input is a scanpy/seurat object).
input_matrix_type: str
      Is the input matrix already normalized or not?
      Support:
            InputMatrixType.NORMALIZED.value (will skip BioTuring normalization, h5ad: use adata.X)
            InputMatrixType.RAW.value (apply BioTuring normalization, h5ad: use adata.raw.X)
study_type: int
      Format of dataset
      Support:
            StudyType.BBROWSER.value
            StudyType.H5_10X.value
            StudyType.H5AD.value
            StudyType.MTX_10X.value
            StudyType.BCS.value
            StudyType.RDS.value
            StudyType.TSV.value
            StudyType.TILE_DB.value
min_counts: int
      Minimum number of counts required for a cell to pass filtering.
min_genes: int
      Minimum number of genes expressed required for a cell to pass filtering.
max_counts: int
      Maximum number of counts required for a cell to pass filtering.
max_genes:
      Maximum number of genes expressed required for a cell to pass filtering.
mt_percentage: int
      Maximum number of mitochondria genes percentage required for a cell to pass filtering.
      Ranging from 0 to 100

3.3.1. 10X Matrix format

[ ]:
## The path DOES NOT include the bucket path configured on platform
## Support multiple batches per submission
batch_info = [{
    'matrix': 'GSE128223/raw/matrix.mtx',
    'features': 'GSE128223/raw/features.tsv',
    'barcodes': 'GSE128223/raw/barcodes.tsv',
}, {...}]
connector.submit_study_from_shared_s3(
  group_id='6b3cfc27fa694779a1b2a5015e438b94',
  batch_info=batch_info,
  study_id='GSE128223',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
  input_matrix_type=InputMatrixType.RAW.value,
  study_type=StudyType.MTX_10X.value,
  shared_s3_id='15de18d355b4ce0a1u512a5b45c8e3c'
)

3.3.2. Scanpy object

[ ]:
## The path DOES NOT include the bucket path configured on platform
## Support multiple batches per submission
batch_info = [{
    'matrix': 's3_path/GSE128223_1.h5ad',
}, {...}]

connector.submit_study_from_shared_s3(
  group_id='6b3cfc27fa694779a1b2a5015e438b94',
  batch_info=batch_info,
  study_id='GSE128223',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
  input_matrix_type=InputMatrixType.RAW.value,
  study_type=StudyType.H5AD.value,
  shared_s3_id='15de18d355b4ce0a1u512a5b45c8e3c'
)

3.3.3. Seurat object

[ ]:
## The path DOES NOT include the bucket path configured on platform
## Support multiple batches per submission
batch_info = [{
    'matrix': 's3_path/GSE128223_1.rds',
}, {...}]

connector.submit_study_from_shared_s3(
  group_id='6b3cfc27fa694779a1b2a5015e438b94',
  batch_info=batch_info,
  study_id='GSE128223',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
  input_matrix_type=InputMatrixType.RAW.value,
  study_type=StudyType.RDS.value,
  shared_s3_id='15de18d355b4ce0a1u512a5b45c8e3c'
)

3.3.4. Tile DB format

[ ]:
## The path DOES NOT include the bucket path configured on platform
## Support multiple batches per submission
batch_info = [{
    'folder': 's3_path/GSE128223_1',
}, {...}]

connector.submit_study_from_shared_s3(
  group_id='6b3cfc27fa694779a1b2a5015e438b94',
  batch_info=batch_info,
  study_id='GSE128223',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
  input_matrix_type=InputMatrixType.RAW.value,
  study_type=StudyType.TILE_DB.value,
  shared_s3_id='15de18d355b4ce0a1u512a5b45c8e3c'
)

3.3.5. Full matrix dataframe

[ ]:
## The path DOES NOT include the bucket path configured on platform
## Support multiple batches per submission
batch_info = [{
    'matrix': 's3_path/GSE128223_1.tsv',
}, {...}]

connector.submit_study_from_shared_s3(
  group_id='6b3cfc27fa694779a1b2a5015e438b94',
  batch_info=batch_info,
  study_id='GSE128223',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
  input_matrix_type=InputMatrixType.RAW.value,
  study_type=StudyType.TSV.value,
  shared_s3_id='15de18d355b4ce0a1u512a5b45c8e3c'
)

4. Submit metadata

NOTE: Get group_id and study_id (uuid) from step “2. List groups and studies”

4.1. Submit a dataframe directly

This is an example metadata. Barcodes column must be DataFrame.index

[8]:
meta_df = pd.read_csv('GSE128223_metadata.tsv', sep='\t', index_col=0)
meta_df
[8]:
Cell type
Barcodes
donor1_d1_AAACCTGGTAGAGGAA TCRV delta 1 gamma-delta T cell
donor1_d1_AAACGGGCAGACACTT TCRV delta 1 gamma-delta T cell
donor1_d1_AAAGCAAAGAGTAATC TCRV delta 1 gamma-delta T cell
donor1_d1_AAAGCAATCATGCATG TCRV delta 1 gamma-delta T cell
donor1_d1_AAAGCAATCCTCAACC TCRV delta 1 gamma-delta T cell
... ...
pbmc_8k_TTTGTCATCATGTCCC naive CD8 T cell
pbmc_8k_TTTGTCATCCGATATG naive CD8 T cell
pbmc_8k_TTTGTCATCGTCTGAA monocyte
pbmc_8k_TTTGTCATCTCGAGTA CD8 T cell
pbmc_8k_TTTGTCATCTGCTTGC naive CD8 T cell

19121 rows × 1 columns

[12]:
connector.submit_metadata_from_dataframe(
    species=Species.HUMAN.value,
    study_id='a1558f8ed6064095be86a091a4118c4a',
    group_id='personal',
    df=meta_df
)
[12]:
'Successful'

4.2. Submit file from local / server

[14]:
connector.submit_metadata_from_local(
    species=Species.HUMAN.value,
    study_id='a1558f8ed6064095be86a091a4118c4a',
    group_id='personal',
    file_path='./GSE128223_metadata.tsv'
)
[14]:
'Successful'

4.3. Submit file from s3

[ ]:
connector.submit_metadata_from_s3(
    species=Species.HUMAN.value,
    study_id='a1558f8ed6064095be86a091a4118c4a',
    group_id='personal',
    file_path='test_bucket/GSE128223_meta.tsv'        #This path DOES NOT include the bucket path configured on platform e.g. s3://bioturing_bucket
)

4.4. Submit file from shared s3 of a group

[ ]:
connector.submit_metadata_from_shared_s3(
    species=Species.HUMAN.value,
    study_id='a1558f8ed6064095be86a091a4118c4a',
    group_id='bioturing_public_studies',              #This function DOES NOT applied for group_id='personal'
    file_path='test_bucket/GSE128223_meta.tsv',        #This path DOES NOT include the bucket path configured on platform e.g. s3://bioturing_bucket
    shared_s3_id='ce26142487ed4a3697bb8902bf9d9670'
)

5. Access study data

NOTE: Get study_id (uuid) from step “2.2. List all available studies in a group”

5.1. Get barcodes

[18]:
barcodes = np.array(connector.get_barcodes(
  study_id='a1558f8ed6064095be86a091a4118c4a',
  species=Species.HUMAN.value,
))
print(barcodes)
['donor1_d1_AAACCTGGTAGAGGAA' 'donor1_d1_AAACGGGCAGACACTT'
 'donor1_d1_AAAGCAAAGAGTAATC' ... 'pbmc_8k_TTTGTCATCGTCTGAA'
 'pbmc_8k_TTTGTCATCTCGAGTA' 'pbmc_8k_TTTGTCATCTGCTTGC']

5.2. Get features

[19]:
features = np.array(connector.get_features(
  study_id='a1558f8ed6064095be86a091a4118c4a',
  species=Species.HUMAN.value,
))
print(features)
['5S_RRNA' '5_8S_RRNA' '7SK' ... 'THRA1/BTR' 'UTAT33' 'ZSCAN5CP']

5.3. Get metadata dataframe

[22]:
metadata = connector.get_metadata(
  study_id='a1558f8ed6064095be86a091a4118c4a',
  species=Species.HUMAN.value
)
metadata.iloc[:5, :5]
[22]:
Barcodes Cell type Cell type (1) Cell type (2) Cmv status
0 donor1_d1_AAACCTGGTAGAGGAA TCRV delta 1 gamma-delta T cell TCRV delta 1 gamma-delta T cell TCRV delta 1 gamma-delta T cell CMV+
1 donor1_d1_AAACGGGCAGACACTT TCRV delta 1 gamma-delta T cell TCRV delta 1 gamma-delta T cell TCRV delta 1 gamma-delta T cell CMV+
2 donor1_d1_AAAGCAAAGAGTAATC TCRV delta 1 gamma-delta T cell TCRV delta 1 gamma-delta T cell TCRV delta 1 gamma-delta T cell CMV+
3 donor1_d1_AAAGCAATCATGCATG TCRV delta 1 gamma-delta T cell TCRV delta 1 gamma-delta T cell TCRV delta 1 gamma-delta T cell CMV+
4 donor1_d1_AAAGCAATCCTCAACC TCRV delta 1 gamma-delta T cell TCRV delta 1 gamma-delta T cell TCRV delta 1 gamma-delta T cell CMV+

5.4. Get embeddings

5.4.1. List all embeddings

[24]:
embeddings = connector.list_all_custom_embeddings(
  study_id='a1558f8ed6064095be86a091a4118c4a',
  species=Species.HUMAN.value,
)
embeddings
[24]:
[{'embedding_id': 'bee0c214d7d44dc1882313cc803aece3',
  'embedding_name': '_x_pca'},
 {'embedding_id': '0c856f67796b4f4b86dbedb812974ff1',
  'embedding_name': '_x_tsne'},
 {'embedding_id': '5ab6ae13ce344381a81aa7d6afb26616',
  'embedding_name': 'PCA (no batch corrected)'},
 {'embedding_id': '21f767838c1c4d5095249dcdab9388eb',
  'embedding_name': 'tSNE (perplexity=30)'}]

5.4.2. Access an embedding

[25]:
chosen_embedding = connector.retrieve_custom_embedding(
  study_id='a1558f8ed6064095be86a091a4118c4a',
  species=Species.HUMAN.value,
  embedding_id='bee0c214d7d44dc1882313cc803aece3',
)
chosen_embedding
[25]:
array([[-5.3032417 ,  7.8890934 ,  3.359574  , ...,  0.21355404,
        -0.64777076, -1.6085205 ],
       [-2.9219244 ,  0.11274821,  2.3836405 , ...,  0.06213907,
        -0.1660905 ,  0.24691239],
       [-5.4160094 , 12.229488  ,  7.7536416 , ..., -0.5595666 ,
         1.1389648 ,  0.28183457],
       ...,
       [17.052692  ,  8.085365  , -6.64449   , ...,  0.6446202 ,
        -0.95552135, -1.0086697 ],
       [-2.2584836 , -3.0889986 ,  2.9076786 , ...,  1.5332366 ,
        -0.38599294, -0.29490623],
       [-2.2893648 , -7.0735717 ,  1.3277851 , ..., -0.13736992,
        -1.7899635 ,  0.07911549]], dtype=float32)

5.5. Query genes

Parameters:
----
group_id: str
    ID of the group to submit the data to.
study_id: str
    If no value is provided, default id will be a random uuidv4 string
gene_names: List[str], default=[]
    If the value array is empty, the return value will be the whole matrix
unit: str
    Support:
          StudyUnit.UNIT_LOGNORM.value
          StudyUnit.UNIT_RAW.value
[26]:
gene_exp = connector.query_genes(
  study_id='a1558f8ed6064095be86a091a4118c4a',
  species=Species.HUMAN.value,
  gene_names=['CD3D', 'CD8A'],
  unit=StudyUnit.UNIT_RAW.value,
)
gene_exp
[26]:
<19121x2 sparse matrix of type '<class 'numpy.float32'>'
        with 17584 stored elements in Compressed Sparse Column format>

6. Standardize your metadata

NOTE: Get group_id and study_id (uuid) from step “2. List groups and studies”

6.1. Retrieve ontology tree

Returns
----------
Ontologies tree : Dict[Dict]
  In which:
    'name': name of the node, which will be used in further steps
[ ]:
connector.get_ontologies_tree(
    species=Species.HUMAN.value,
    group_id='bioturing_public_studies'
)

6.2. Assign standardized terms

Parameters
-----
species: str
      Species of the study.
      Support:  Species.HUMAN.value
                Species.MOUSE.value
                Species.PRIMATE.value
                Species.OTHERS.value
group_id: str
      ID of the group to submit the data to.
study_id: str
      ID of the study (uuid)
metadata_field: str
      column name of meta dataframe in platform (eg: author's tissue)
metadata_value: str
      metadata value within the metadata field (eg: normal lung)
root_name: str
      name of root in btr ontologies tree (eg: tissue)
leaf_name: str
      name of leaf in btr ontologies tree (eg: lung)
[ ]:
# This function is only usable in a group (not 'personal')

connector.assign_standardized_meta(
    species=Species.HUMAN.value,
    group_id='bioturing_public_studies',
    study_id='a1558f8ed6064095be86a091a4118c4a',
    metadata_field='Cell type',
    metadata_value='TCRV delta 1 gamma-delta T cell',
    root_name='cell type',
    leaf_name='gamma-delta T cell',
)