SDK_Document_Lens_SC

Installation

[ ]:

!python3 -m pip install bioturing_connector

1. Connect to host server

Must run this step before any further analyses

User’s token is generated from host website

[13]:

import numpy as np
import pandas as pd
from bioturing_connector.typing import Species
from bioturing_connector.typing import ChunkSize
from bioturing_connector.typing import StudyType
from bioturing_connector.typing import StudyUnit
from bioturing_connector.typing import InputMatrixType
from bioturing_connector.lens_sc_connector import LensSCConnector

connector = LensSCConnector(
  host="https://talk2data.bioturing.com/lens_sc/",
  token="cb8a76d79a264a55af79a2991f982ef7",
  ssl=True
)

[14]:

connector.test_connection()

Connecting to host at https://talk2data.bioturing.com/lens_sc/api/v1/test_connection
Connection successful

2. List groups, studies and s3

2.1. Get info of available groups

[8]:

user_groups = connector.get_user_groups()
user_groups

[8]:

[{'id': 'all_members',
  'name': 'All members',
  'visible': 1,
  'creator': 'admin'},
 {'id': 'bioturing_public_studies',
  'name': 'BioTuring Public Studies',
  'visible': 1,
  'creator': 'admin'},
 {'id': 'personal',
  'name': 'Personal workspace',
  'visible': 1,
  'creator': 'admin'}]

2.2. List all available studies in a group

[9]:

# Using group_id from step 2.1

study_list = connector.get_all_studies_info_in_group(
  group_id='personal',
  species=Species.HUMAN.value,
)
study_list

[9]:

[{'uuid': '5c470f3b799d474e91d0ca65aec3cf56',
  'study_title': 'TBD',
  'study_hash_id': 'SMALL_COSMX',
  'created_by': 'dev@bioturing.com'},
 {'uuid': '9b1d980887944d0199719ef8d3ddb17a',
  'study_title': 'TBD',
  'study_hash_id': 'XENIUM_BREAST_SMALL',
  'created_by': 'dev@bioturing.com'}]

2.3. List all s3 bucket of current user

[ ]:

connector.get_user_s3()

[{'id': '505e49d2abee405f8a7b4ce2628d5270',
  'bucket': 'bioturingdebug',
  'prefix': ''},
 {'id': 'd938706094354d7eb4726d6c9b07de9c',
  'bucket': 'talk2data',
  'prefix': ''}]

2.4. List all shared s3 of a group

[ ]:

connector.get_shared_s3_of_group('all_members')

[]

3. Submit study

NOTE: Get group_id from step “2.1. Get info of available groups”

3.1. Submit single cell - spatial dataset (COSMX, VISIUM, VIZGEN, …)

3.1.1. Option 1: Submit study from s3

Parameters:
----
group_id: str
    ID of the group to submit the data to.
s3_id : str
    ID of s3 bucket. Default: None
    If s3_id is not provided, we will use the first s3 bucket configured on the platform.
batch_info: List[dict]
    File path and batch name information, the path DOES NOT include the bucket path configured on platform!
    Example:
      [{
        'name': 'study_1',
        'folder': 's3_path/study_folder',
      }, {...}]
study_id: str
    If no value is provided, default id will be a random uuidv4 string
name: str
    Name of the study.
authors: List[str]
    Authors of the study.
abstract: str
    Abstract of the study.
species: str
    Species of the study.
    Support:  Species.HUMAN.value
              Species.MOUSE.value
              Species.PRIMATE.value
              Species.OTHERS.value
study_type: int
    Format of the study
    Support:  StudyType.VIZGEN.value
              StudyType.COSMX.value
              StudyType.XENIUM.value
min_counts: int. Default: 0
    Minimum number of counts required for a cell to pass filtering.
min_genes: int. Default: 0
    Minimum number of genes expressed required for a cell to pass filtering.
max_counts: int. Default: inf
    Maximum number of counts required for a cell to pass filtering.
max_genes: int. Default: inf
    Maximum number of genes expressed required for a cell to pass filtering.
neg_controls_percentage: int. Default: 100
    Maximum number of control/negative genes percentage required for a cell to pass filtering.
    Ranging from 0 to 100

[19]:

batch_info = [{
    'name': 'dataset1',
    'folder': 's3_path/data_fol_1',
  }, {
    'name': 'dataset2',
    'folder': 's3_path/data_fol_2',
}]

# --------

## Demo submisison
## The path DOES NOT include the bucket path configured on platform
batch_info = [{
    'name': 'small_cosmx',
    'folder': 'demo_data/small_cosmx',
  }]

connector.submit_study_from_s3_lens_sc(
  group_id='personal',
  batch_info=batch_info,
  study_id='COSMX_SMALL_DATASET',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
  study_type=StudyType.COSMX.value,
  min_genes=15,
  neg_controls_percentage=5,
)

[2023-09-26 08:29] Waiting in queue
[2023-09-26 08:29] Downloading from s3: demo_data/small_cosmx/tx_file.csv
[2023-09-26 08:29] Downloading from s3: demo_data/small_cosmx/R5779_TMA2-S6_fov_positions_file.csv
[2023-09-26 08:29] Downloading from s3: demo_data/small_cosmx/CellLabels
[2023-09-26 08:29] [List folder demo_data/small_cosmx/CellLabels] Files: demo_data/small_cosmx/CellLabels/CellLabels_F001.tif | demo_data/small_cosmx/CellLabels/CellLabels_F002.tif | demo_data/small_cosmx/CellLabels/CellLabels_F003.tif | demo_data/small_cosmx/CellLabels/CellLabels_F004.tif | demo_data/small_cosmx/CellLabels/CellLabels_F005.tif ; Folders:
[2023-09-26 08:30] Downloading from s3: demo_data/small_cosmx/RawMorphologyImages
[2023-09-26 08:30] [List folder demo_data/small_cosmx/RawMorphologyImages] Files: demo_data/small_cosmx/RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F001.TIF | demo_data/small_cosmx/RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F002.TIF | demo_data/small_cosmx/RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F003.TIF | demo_data/small_cosmx/RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F004.TIF | demo_data/small_cosmx/RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F005.TIF ; Folders:
[2023-09-26 08:30] All files downloaded
[2023-09-26 08:30] Reading batch: small_cosmx
[2023-09-26 08:30] [small_cosmx] Preprocess data
[2023-09-26 08:30] [small_cosmx] Indexing cell boundaries
[2023-09-26 08:31] Finish: create_cell_boundaries_and_centers 54.29751372337341
[2023-09-26 08:31] [small_cosmx] Indexing sample images
[2023-09-26 08:35] Finish: indexing sample images 237.9231081008911
[2023-09-26 08:35] [small_cosmx] Indexing transcripts
[2023-09-26 08:40] Finish: create_cell_boundaries_and_centers 325.7730453014374
[2023-09-26 08:40] [small_cosmx] Indexing matrix
[2023-09-26 08:40] Finish batch: small_cosmx
[2023-09-26 08:40] Preprocessing expression matrix: 12658 cells x 63702 genes
[2023-09-26 08:40] Filtered: 11814 cells remain
[2023-09-26 08:40] Waiting in queue (matrix processing)
[2023-09-26 08:40] Normalizing expression matrix (matrix processing)
[2023-09-26 08:40] Running PCA (matrix processing)
[2023-09-26 08:40] Running venice binarizer (matrix processing)
[2023-09-26 08:41] Study was successfully submitted
[2023-09-26 08:41] DONE!!!
Study submitted successfully!

[19]:

True

3.1.2. Option 2: Submit study from local machine / server

Parameters:
------
group_id: str
    ID of the group to submit the data to.
batch_info: List[dict]
    File path and batch name information
    Example:
      [{
        'name': 'dataset_1',
        'folder': 'server_path/dataset_folder_1',
      }, {...}]
study_id: str
    If no value is provided, default id will be a random uuidv4 string
name: str
    Name of the study.
authors: List[str]
    Authors of the study.
abstract: str
    Abstract of the study.
species: str
    Species of the study.
    Support:  Species.HUMAN.value
              Species.MOUSE.value
              Species.PRIMATE.value
              Species.OTHERS.value
study_type: int
    Format of the study
    Support:  StudyType.VIZGEN.value
              StudyType.COSMX.value
              StudyType.XENIUM.value
min_counts: int. Default: 0
    Minimum number of counts required for a cell to pass filtering.
min_genes: int. Default: 0
    Minimum number of genes expressed required for a cell to pass filtering.
max_counts: int. Default: inf
    Maximum number of counts required for a cell to pass filtering.
max_genes: int. Default: inf
    Maximum number of genes expressed required for a cell to pass filtering.
neg_controls_percentage: int. Default: 100
    Maximum number of control/negative genes percentage required for a cell to pass filtering.
   Ranging from 0 to 100
chunk_size: int
    size of each separated chunk for uploading. Default: ChunkSize.CHUNK_100_MB.value
    Support:
          ChunkSize.CHUNK_5_MB.value
          ChunkSize.CHUNK_100_MB.value
          ChunkSize.CHUNK_500_MB.value
          ChunkSize.CHUNK_1_GB.value

[20]:

batch_info = [{
    'name': 'batch1',
    'folder': 'local_path/dataset_folder_1',
  }, {
    'name': 'batch2',
    'folder': 'local_path/dataset_folder_2',
  }, {...}]

#----

## Demo submission
batch_info = [{
    'name': 'batch1',
    'folder': '/mnt/gvol8080/demo_data/cosmx/small_cosmx',
}]
connector.submit_study_from_local_lens_sc(
  group_id='personal',
  batch_info=batch_info,
  study_id='COSMX_SMALL_DATASET',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
  study_type=StudyType.COSMX.value,
  min_genes=15,
  neg_controls_percentage=5,
)

Zipping neccesary files of batch [batch1].
Location: /mnt/gvol8080/data/SonVo/sonvo_ssd/sc_spatial/cosmx/small_cosmx/batch1.zip
  adding: tx_file.csv (deflated 74%)
  adding: R5779_TMA2-S6_fov_positions_file.csv (deflated 46%)
  adding: CellLabels/ (stored 0%)
  adding: CellLabels/CellLabels_F001.tif (deflated 31%)
  adding: CellLabels/CellLabels_F002.tif (deflated 23%)
  adding: CellLabels/CellLabels_F003.tif (deflated 23%)
  adding: CellLabels/CellLabels_F004.tif (deflated 23%)
  adding: CellLabels/CellLabels_F005.tif (deflated 22%)
  adding: RawMorphologyImages/ (stored 0%)
  adding: RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F001.TIF (deflated 7%)
  adding: RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F002.TIF (deflated 6%)
  adding: RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F003.TIF (deflated 9%)
  adding: RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F004.TIF (deflated 6%)
  adding: RawMorphologyImages/20230505_010716_S2_C902_P99_N99_F005.TIF (deflated 5%)
/data/dev/SonVo/btr_connector_notebook
Uploading all files to server...

batch1.zip - chunk_0: 100MMB [00:00, 122MMB/s]
batch1.zip - chunk_1: 100MMB [00:00, 117MMB/s]
batch1.zip - chunk_2: 100MMB [00:00, 113MMB/s]
batch1.zip - chunk_3: 100MMB [00:00, 117MMB/s]
batch1.zip - chunk_4: 100MMB [00:00, 113MMB/s]
batch1.zip - chunk_5: 100MMB [00:00, 119MMB/s]
batch1.zip - chunk_6: 100MMB [00:00, 122MMB/s]
batch1.zip - chunk_7:  72%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                                 | 71.8M/100M [00:01<00:00, 54.0MMB/s]

Delete zip files: [/mnt/gvol8080/data/SonVo/sonvo_ssd/sc_spatial/cosmx/small_cosmx/batch1.zip]
[2023-09-26 08:49] Waiting in queue
[2023-09-26 08:49] Reading batch: batch1
[2023-09-26 08:49] [batch1] Preprocess data
[2023-09-26 08:49] [batch1] Indexing cell boundaries
[2023-09-26 08:50] Finish: create_cell_boundaries_and_centers 54.93383240699768
[2023-09-26 08:50] [batch1] Indexing sample images
[2023-09-26 08:54] Finish: indexing sample images 246.05648136138916
[2023-09-26 08:54] [batch1] Indexing transcripts
[2023-09-26 08:59] Finish: create_cell_boundaries_and_centers 327.1916997432709
[2023-09-26 08:59] [batch1] Indexing matrix
[2023-09-26 08:59] Finish batch: batch1
[2023-09-26 08:59] Preprocessing expression matrix: 12658 cells x 63702 genes
[2023-09-26 08:59] Filtered: 11814 cells remain
[2023-09-26 08:59] Waiting in queue (matrix processing)
[2023-09-26 08:59] Normalizing expression matrix (matrix processing)
[2023-09-26 08:59] Running PCA (matrix processing)
[2023-09-26 08:59] Running venice binarizer (matrix processing)
[2023-09-26 08:59] Running t-SNE (matrix processing)
[2023-09-26 08:59] Study was successfully submitted
[2023-09-26 08:59] DONE!!!
Study submitted successfully!

[20]:

True

3.1.3. Option 3: Submit study with shared s3 of a group

Parameters:
----
group_id: str
    ID of the group to submit the data to.
shared_s3_id : str
    ID of s3 bucket.
batch_info: List[dict]
    File path and batch name information, the path DOES NOT include the bucket path configured on platform!
    Example:
      [{
        'name': 'study_1',
        'folder': 's3_path/study_folder',
      }, {...}]
study_id: str
    If no value is provided, default id will be a random uuidv4 string
name: str
    Name of the study.
authors: List[str]
    Authors of the study.
abstract: str
    Abstract of the study.
species: str
    Species of the study.
    Support:  Species.HUMAN.value
              Species.MOUSE.value
              Species.PRIMATE.value
              Species.OTHERS.value
study_type: int
    Format of the study
    Support:  StudyType.VIZGEN.value
              StudyType.COSMX.value
              StudyType.XENIUM.value
min_counts: int. Default: 0
    Minimum number of counts required for a cell to pass filtering.
min_genes: int. Default: 0
    Minimum number of genes expressed required for a cell to pass filtering.
max_counts: int. Default: inf
    Maximum number of counts required for a cell to pass filtering.
max_genes: int. Default: inf
    Maximum number of genes expressed required for a cell to pass filtering.
neg_controls_percentage: int. Default: 100
    Maximum number of control/negative genes percentage required for a cell to pass filtering.
    Ranging from 0 to 100

[ ]:

batch_info = [{
    'name': 'dataset1',
    'folder': 's3_path/data_fol_1',
  }, {
    'name': 'dataset2',
    'folder': 's3_path/data_fol_2',
}]

# --------

## Demo submisison
## The path DOES NOT include the bucket path configured on platform
batch_info = [{
    'name': 'small_cosmx',
    'folder': 'demo_data/small_cosmx',
  }]

connector.submit_study_from_shared_s3_lens_sc(
  group_id='6b3cfc27fa694779a1b2a5015e438b94',
  batch_info=batch_info,
  study_id='COSMX_SMALL_DATASET',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
  study_type=StudyType.COSMX.value,
  min_genes=15,
  neg_controls_percentage=5,
  shared_s3_id='15de18d355b4ce0a1u512a5b45c8e3c'
)

3.2. Submit proteomics dataset (CODEX, AKOYA, …)

3.2.1. Option 1: Submit study from s3

Parameters:
-----
group_id: str
    ID of the group to submit the data to.
s3_id : str
    ID of s3 bucket. Default: None
    If s3_id is not provided, we will use the first s3 bucket configured on the platform.
batch_info: Dict[]
    File path and batch name information, the path DOES NOT included the bucket path!
    Example:
      {
        'image': 's3_path/image.ome.tiff'
      }
study_id: str
    If no value is provided, default id will be a random uuidv4 string
name: str
    Name of the study.
authors: List[str]
    Authors of the study.
abstract: str
    Abstract of the study.
species: str
    Species of the study.
    Support:  Species.HUMAN.value
              Species.MOUSE.value
              Species.PRIMATE.value
              Species.OTHERS.value
min_counts: int. Default: 0
    Minimum number of counts required for a cell to pass filtering.
min_genes: int. Default: 0
    Minimum number of genes expressed required for a cell to pass filtering.
max_counts: int. Default: inf
    Maximum number of counts required for a cell to pass filtering.
max_genes: int. Default: inf
    Maximum number of genes expressed required for a cell to pass filtering.

[ ]:

## ONLY accept 1 image per submission
## The path DOES NOT include the bucket path configured on platform
batch_info = {
    'image': 's3_path/image.qptiff',
  }
batch_info = {
    'image': 's3_path/image.ome.tiff',
  }

connector.submit_study_from_s3_proteomics(
  group_id='personal',
  batch_info=batch_info,
  study_id='PROTEOMICS_BRAIN',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
)

3.1.2. Option 2: Submit study from local machine / server

Parameters:
------
group_id: str
    ID of the group to submit the data to.
batch_info: List[]
    File path and batch name information
    Example:
      {
        'image': 'server_path/image.ome.tiff'
      }
study_id: str
    If no value is provided, default id will be a random uuidv4 string
name: str
   Name of the study.
authors: List[str]
   Authors of the study.
abstract: str
    Abstract of the study.
species: str
    Species of the study.
    Support:  Species.HUMAN.value
              Species.MOUSE.value
              Species.PRIMATE.value
              Species.OTHERS.value
min_counts: int. Default: 0
   Minimum number of counts required for a cell to pass filtering.
min_genes: int. Default: 0
   Minimum number of genes expressed required for a cell to pass filtering.
max_counts: int. Default: inf
   Maximum number of counts required for a cell to pass filtering.
max_genes: int. Default: inf
   Maximum number of genes expressed required for a cell to pass filtering.
chunk_size: int
    size of each separated chunk for uploading. Default: ChunkSize.CHUNK_100_MB.value\n
    Support:
          ChunkSize.CHUNK_5_MB.value
          ChunkSize.CHUNK_100_MB.value
          ChunkSize.CHUNK_500_MB.value
          ChunkSize.CHUNK_1_GB.value

[ ]:

## ONLY accept 1 image per submission
batch_info = {
    'image': 'local_path/image.qptiff',
  }
batch_info = {
    'image': 'local_path/image.ome.tiff',
  }

connector.submit_study_from_local_proteomics(
  group_id='personal',
  batch_info=batch_info,
  study_id='PROTEOMICS_BRAIN',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
)

3.2.3. Option 3: Submit study with shared s3 of a group

Parameters:
-----
group_id: str
    ID of the group to submit the data to.
shared_s3_id : str
    ID of s3 bucket.
batch_info: Dict[]
    File path and batch name information, the path DOES NOT included the bucket path!
    Example:
      {
        'image': 's3_path/image.ome.tiff'
      }
study_id: str
    If no value is provided, default id will be a random uuidv4 string
name: str
    Name of the study.
authors: List[str]
    Authors of the study.
abstract: str
    Abstract of the study.
species: str
    Species of the study.
    Support:  Species.HUMAN.value
              Species.MOUSE.value
              Species.PRIMATE.value
              Species.OTHERS.value
min_counts: int. Default: 0
    Minimum number of counts required for a cell to pass filtering.
min_genes: int. Default: 0
    Minimum number of genes expressed required for a cell to pass filtering.
max_counts: int. Default: inf
    Maximum number of counts required for a cell to pass filtering.
max_genes: int. Default: inf
    Maximum number of genes expressed required for a cell to pass filtering.

[ ]:

## ONLY accept 1 image per submission
## The path DOES NOT include the bucket path configured on platform
batch_info = {
    'image': 's3_path/image.qptiff',
  }
batch_info = {
    'image': 's3_path/image.ome.tiff',
  }

connector.submit_study_from_s3_proteomics(
  group_id='6b3cfc27fa694779a1b2a5015e438b94',
  batch_info=batch_info,
  study_id='PROTEOMICS_BRAIN',
  name='This is my first study',
  authors=['Huy Nguyen', 'Thao Truong'],
  species=Species.HUMAN.value,
  shared_s3_id='15de18d355b4ce0a1u512a5b45c8e3c'
)

4. Submit metadata

NOTE: Get group_id and study_id (uuid) from step “2. List groups and studies”

4.1. Submit a dataframe directly

This is an example metadata. Barcodes column must be DataFrame.index

[2]:

meta_df = pd.read_csv('SMALL_COSMX_metadata.tsv', sep='\t', index_col=0)
meta_df

[2]:

	fov
Barcodes
1_1	fov_1
1_2	fov_1
1_3	fov_1
1_4	fov_1
1_5	fov_1
...	...
5_3301	fov_5
5_3313	fov_5
5_3314	fov_5
5_3321	fov_5
5_3324	fov_5

11814 rows × 1 columns

[3]:

connector.submit_metadata_from_dataframe(
    species=Species.HUMAN.value,
    study_id="5c470f3b799d474e91d0ca65aec3cf56",
    group_id='personal',
    df=meta_df
)

[3]:

'Successful'

4.2. Submit file from local / server

[4]:

connector.submit_metadata_from_local(
    species=Species.HUMAN.value,
    study_id='5c470f3b799d474e91d0ca65aec3cf56',
    group_id='personal',
    file_path='./SMALL_COSMX_metadata.tsv'
)

[4]:

'Successful'

4.3. Submit file from s3

[ ]:

connector.submit_metadata_from_s3(
    species=Species.HUMAN.value,
    study_id='5c470f3b799d474e91d0ca65aec3cf56',
    group_id='personal',
    file_path='demo_data/SMALL_COSMX_metadata.tsv'        #This path DOES NOT include the bucket path configured on platform e.g. s3://bioturing_bucket
)

4.4. Submit file from shared s3 of a group

[ ]:

connector.submit_metadata_from_shared_s3(
    species=Species.HUMAN.value,
    study_id='a1558f8ed6064095be86a091a4118c4a',
    group_id='bioturing_public_studies',              #This function DOES NOT applied for group_id='personal'
    file_path='test_bucket/GSE128223_meta.tsv',        #This path DOES NOT include the bucket path configured on platform e.g. s3://bioturing_bucket
    shared_s3_id='ce26142487ed4a3697bb8902bf9d9670'
)

5. Access study data

NOTE: Get study_id (uuid) from step “2.2. List all available studies in a group”

5.1. Get barcodes

[7]:

barcodes = np.array(connector.get_barcodes(
  study_id='5c470f3b799d474e91d0ca65aec3cf56',
  species=Species.HUMAN.value,
))
barcodes

[7]:

array(['1_1', '1_2', '1_3', ..., '5_3314', '5_3321', '5_3324'],
      dtype='<U6')

5.2. Get features

[8]:

features = np.array(connector.get_features(
  study_id='5c470f3b799d474e91d0ca65aec3cf56',
  species=Species.HUMAN.value,
))
features

[8]:

array(['5S_RRNA', '5_8S_RRNA', '7SK', ..., 'NEGPRB18', 'NEGPRB10',
       'NEGPRB15'], dtype='<U26')

5.3. Get metadata dataframe

[9]:

metadata = connector.get_metadata(
  study_id='5c470f3b799d474e91d0ca65aec3cf56',
  species=Species.HUMAN.value
)
metadata.iloc[:5, :5]

[9]:

	Barcodes	Alexa-488_Histone_Nuclei	Alexa-546_G_None	Alexa-594_rRNA_CD298_B2M_Membrane	Alexa-647_GFAP_Astrocytes
0	1_1	49	4	19	68
1	1_2	46	4	25	96
2	1_3	77	5	22	49
3	1_4	57	5	27	8
4	1_5	38	5	30	17

5.4. Get embeddings

5.4.1. List all embeddings

[10]:

embeddings = connector.list_all_custom_embeddings(
  study_id='5c470f3b799d474e91d0ca65aec3cf56',
  species=Species.HUMAN.value,
)
embeddings

[10]:

[{'embedding_id': 'c4529a43ceaf40e98935f857aa1caa5c',
  'embedding_name': 'PCA (no batch corrected)'},
 {'embedding_id': '63270cea38374086ae38c9bc142a1b30',
  'embedding_name': 'tSNE (perplexity=30)'}]

5.4.2. Access an embedding

[11]:

chosen_embedding = connector.retrieve_custom_embedding(
  study_id='5c470f3b799d474e91d0ca65aec3cf56',
  species=Species.HUMAN.value,
  embedding_id='c4529a43ceaf40e98935f857aa1caa5c',
)
chosen_embedding

[11]:

array([[-3.2380335e-03, -2.1599566e-03,  8.6972013e-04, ...,
        -6.6192023e-04,  2.0368092e-04,  7.6390570e-05],
       [-3.0471983e-03, -2.6254782e-03,  1.5224112e-03, ...,
         8.3375332e-04,  1.8367210e-03,  9.4208797e-04],
       [-3.9695334e-03, -3.1506929e-03,  8.5247034e-04, ...,
         8.9647510e-04,  1.2072887e-04, -6.8749214e-04],
       ...,
       [-8.9350082e-03, -1.0427534e-02,  5.4663382e-03, ...,
        -3.6334249e-03, -1.3702468e-03,  1.4806709e-03],
       [-7.5779855e-03, -7.6015377e-03, -2.1946256e-04, ...,
        -3.1667415e-03, -4.9561551e-03, -3.7561799e-03],
       [-3.5920746e-03, -7.0776208e-03, -3.0932256e-06, ...,
        -8.8387710e-04,  3.7067404e-03, -2.5200413e-03]], dtype=float32)

5.5. Query genes

Parameters:
----
group_id: str
    ID of the group to submit the data to.
study_id: str
    If no value is provided, default id will be a random uuidv4 string
gene_names: List[str], default=[]
    If the value array is empty, the return value will be the whole matrix
unit: str
    Support:
          StudyUnit.UNIT_RAW.value
          StudyUnit.UNIT_LOGNORM.value

[12]:

gene_exp = connector.query_genes(
  study_id='5c470f3b799d474e91d0ca65aec3cf56',
  species=Species.HUMAN.value,
  gene_names=['CD3D', 'CD8A'],
  unit=StudyUnit.UNIT_RAW.value,
)
gene_exp

[12]:

<11814x2 sparse matrix of type '<class 'numpy.float32'>'
        with 1649 stored elements in Compressed Sparse Column format>

6. Standardize your metadata

NOTE: Get group_id and study_id (uuid) from step “2. List groups and studies”

6.1. Retrieve ontology tree

Returns
----------
Ontologies tree : Dict[Dict]
  In which:
    'name': name of the node, which will be used in further steps

[ ]:

connector.get_ontologies_tree(
    species=Species.HUMAN.value,
    group_id='bioturing_public_studies'
)

6.2. Assign standardized terms

Parameters
-----
species: str
      Species of the study.
      Support:  Species.HUMAN.value
                Species.MOUSE.value
                Species.PRIMATE.value
                Species.OTHERS.value
group_id: str
      ID of the group to submit the data to.
study_id: str
      ID of the study (uuid)
metadata_field: str
      column name of meta dataframe in platform (eg: author's tissue)
metadata_value: str
      metadata value within the metadata field (eg: normal lung)
root_name: str
      name of root in btr ontologies tree (eg: tissue)
leaf_name: str
      name of leaf in btr ontologies tree (eg: lung)

[ ]:

# This function is only usable in a group (not 'personal')

connector.assign_standardized_meta(
    species=Species.HUMAN.value,
    group_id='bioturing_public_studies',
    study_id='5c470f3b799d474e91d0ca65aec3cf56',
    metadata_field='Cell type',
    metadata_value='TCRV delta 1 gamma-delta T cell',
    root_name='cell type',
    leaf_name='gamma-delta T cell',
)