{ "cells": [ { "cell_type": "markdown", "id": "ed56c80a-2066-42ec-a199-3576aae50968", "metadata": {}, "source": [ "# SDK_Document_Lens_Bulk" ] }, { "cell_type": "markdown", "id": "95aea447-4b8c-43d0-a00b-84cc3cbe7a8e", "metadata": {}, "source": [ "## Installation" ] }, { "cell_type": "code", "execution_count": null, "id": "f1db9dd1-0d5e-48d7-b4ab-0934a2bad2cb", "metadata": {}, "outputs": [], "source": [ "!python3 -m pip install bioturing_connector" ] }, { "cell_type": "markdown", "id": "deb8a6d8-dbbe-49bd-854d-0172eb84928a", "metadata": {}, "source": [ "## 1. Connect to host server:" ] }, { "cell_type": "markdown", "id": "df88f7b3-de4a-4e6a-8730-c6275c260ce4", "metadata": {}, "source": [ "
Must run this step before any further analyses
" ] }, { "cell_type": "markdown", "id": "09b17bd8-c0d1-4f9c-b7d2-620774f2c6e4", "metadata": {}, "source": [ "User's token is generated from host website" ] }, { "cell_type": "code", "execution_count": 42, "id": "25c56829-e579-4354-8acf-3ce658c36212", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from bioturing_connector.typing import Species\n", "from bioturing_connector.typing import ChunkSize\n", "from bioturing_connector.typing import StudyType\n", "from bioturing_connector.typing import StudyUnit\n", "from bioturing_connector.typing import InputMatrixType\n", "from bioturing_connector.lens_bulk_connector import LensBulkConnector\n", "\n", "connector = LensBulkConnector(\n", " host=\"https://talk2data.bioturing.com/lens_bulk/\",\n", " token=\"930e9375d5164aa7a4a36593a52c6cd5\",\n", " ssl=True\n", ")" ] }, { "cell_type": "code", "execution_count": 43, "id": "cb203d7a-3fee-4bce-9bb1-1df74c3b29c8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Connecting to host at https://talk2data.bioturing.com/lens_bulk/api/v1/test_connection\n", "Connection successful: BioTuring Lens Bulk server\n" ] } ], "source": [ "connector.test_connection()" ] }, { "cell_type": "markdown", "id": "8789beab-c2bc-4c1c-a474-63f75f0ebf11", "metadata": {}, "source": [ "## 2. List groups, studies and s3" ] }, { "cell_type": "markdown", "id": "13ee4cb3-ee8d-4691-9078-81c5b6e5af27", "metadata": {}, "source": [ "### 2.1. Get info of available groups" ] }, { "cell_type": "code", "execution_count": 4, "id": "dcdc00de-3e39-43a9-89d2-3d82a336129e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'group_id': '48ba44afb7f14f51a6f6f1dc6f4c3ea9', 'group_name': 'Demo'},\n", " {'group_id': 'all_members', 'group_name': 'All members'},\n", " {'group_id': 'bioturing_public_studies',\n", " 'group_name': 'BioTuring Public Studies'},\n", " {'group_id': 'd32297be0bb543688994cca14f58b14e',\n", " 'group_name': 'BioTuring Spatial'},\n", " {'group_id': 'personal', 'group_name': 'Personal workspace'}]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "user_groups = connector.get_user_groups()\n", "user_groups" ] }, { "cell_type": "markdown", "id": "cdd6ea7e-918f-4900-a575-e7de250d1fa3", "metadata": {}, "source": [ "### 2.2. List all available studies in a group" ] }, { "cell_type": "code", "execution_count": 7, "id": "fe08fe63-b90b-4e8f-a06b-5b27b3e55fef", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'uuid': 'f6f4c94460af44fabaa07ac77087351c',\n", " 'study_title': 'TBD',\n", " 'study_hash_id': 'MERGED_VISIUM',\n", " 'created_by': 'dev@bioturing.com'},\n", " {'uuid': 'b25ff33bead3453680e802963d3e9caf',\n", " 'study_title': 'TBD',\n", " 'study_hash_id': 'GEOMX',\n", " 'created_by': 'dev@bioturing.com'}]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Using group_id from step 2.1\n", "\n", "study_list = connector.get_all_studies_info_in_group(\n", " group_id='personal',\n", " species=Species.HUMAN.value,\n", ")\n", "study_list" ] }, { "cell_type": "markdown", "id": "be359b70", "metadata": {}, "source": [ "### 2.3. List all s3 bucket of current user" ] }, { "cell_type": "code", "execution_count": null, "id": "efa712bb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'id': '505e49d2abee405f8a7b4ce2628d5270',\n", " 'bucket': 'bioturingdebug',\n", " 'prefix': ''},\n", " {'id': 'd938706094354d7eb4726d6c9b07de9c',\n", " 'bucket': 'talk2data',\n", " 'prefix': ''}]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "connector.get_user_s3()" ] }, { "cell_type": "markdown", "id": "5286ae51", "metadata": {}, "source": [ "### 2.4. List all shared s3 of a group" ] }, { "cell_type": "code", "execution_count": null, "id": "b6aac4c4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "connector.get_shared_s3_of_group('all_members')" ] }, { "cell_type": "markdown", "id": "b34a16e5-feff-457e-8a67-d9619a2c668a", "metadata": { "tags": [] }, "source": [ "## 3. Submit study" ] }, { "cell_type": "markdown", "id": "953e4af0-d715-4835-8251-485982ebdf75", "metadata": {}, "source": [ "
NOTE: Get group_id from step \"2.1. Get info of available groups\"
" ] }, { "cell_type": "markdown", "id": "adbb4166-03c2-4423-b1ff-77cae7e7bf04", "metadata": {}, "source": [ "### 3.1. Option 1: Submit study from s3" ] }, { "cell_type": "markdown", "id": "2d14334c-787f-410a-8eaa-d9960b7fb05c", "metadata": {}, "source": [ "```\n", "Parameters:\n", "----\n", "group_id: str\n", " ID of the group to submit the data to.\n", "s3_id: str\n", " ID of s3 bucket. Default: None\n", " If s3_id is not provided, we will use the first s3 bucket configured on the platform.\n", "batch_info: List[dict]\n", " File path and batch name information, the path DOES NOT include bucket path configured on platform!\n", " Example:\n", " For DSP format:\n", " [{\n", " 'matrix': 's3_path/data_1/matrix.xlsx',\n", " 'image': 's3_path/data_1/image.ome.tiff',\n", " }, {...}]\n", " For Visium format:\n", " [{\n", " 'matrix': 's3_path/data_1/matrix.h5',\n", " 'image': 's3_path/data_1/image.tiff'\n", " 'position': 's3_path/data_1/tissue_positions_list.csv'\n", " 'scale': 's3_path/data_1/scalefactors_json.json'\n", " }, {...}]\n", " For Visium RDS format:\n", " [{\n", " 'matrix': 's3_path/GSE128223_1.rds'\n", " }, {...}]\n", " For Visium Anndata format:\n", " [{\n", " 'matrix': 's3_path/GSE128223_1.h5ad'\n", " }, {...}]\n", "study_id: str\n", " Will be name of study (eg: VISIUM_PBMC)\n", " If no value is provided, default id will be a random uuidv4 string\n", "name: str\n", " Name of the study.\n", "authors: List[str]\n", " Authors of the study.\n", "abstract: str\n", " Abstract of the study.\n", "species: str\n", " Species of the study.\n", " Support:\n", " Species.HUMAN.value\n", " Species.MOUSE.value\n", " Species.NON_HUMAN_PRIMATE.value\n", " Species.OTHERS.value\n", "study_type: int\n", " Format of the study\n", " Support:\n", " StudyType.DSP.value\n", " StudyType.VISIUM.value\n", " StudyType.VISIUM_RDS.value\n", " StudyType.VISIUM_ANN.value\n", " \n", "```" ] }, { "cell_type": "markdown", "id": "e962ec20", "metadata": {}, "source": [ "#### 3.1.1. Visium format" ] }, { "cell_type": "code", "execution_count": 52, "id": "7d47fc3e-295a-4757-a6e4-79e8fa213381", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2023-09-26 06:24] Waiting in queue\n", "[2023-09-26 06:24] Downloading demo_data/visium_test/Visium_FFPE_Human_Prostate_IF_filtered_feature_bc_matrix.h5 from s3: 262.1 KB / 19.0 MB\n", "[2023-09-26 06:24] Downloading demo_data/visium_test/tissue_hires_image.png from s3: 262.1 KB / 4.4 MB\n", "[2023-09-26 06:24] Downloading demo_data/visium_test/tissue_positions.csv from s3: 191.8 KB / 191.8 KB\n", "[2023-09-26 06:24] Downloading demo_data/visium_test/scalefactors_json.json from s3: 148 B / 148 B\n", "[2023-09-26 06:24] File downloaded\n", "[2023-09-26 06:24] Reading batch: visium_test\n", "[2023-09-26 06:24] [visium_test] Indexing matrix\n", "[2023-09-26 06:24] [visium_test] Indexing images\n", "[2023-09-26 06:24] Finish batch: visium_test\n", "[2023-09-26 06:24] Preprocessing expression matrix: 3460 cells x 17943 genes\n", "[2023-09-26 06:24] Filtered: 3460 cells remain\n", "[2023-09-26 06:24] Waiting in queue (matrix processing) \n", "[2023-09-26 06:24] Normalizing expression matrix (matrix processing) \n", "[2023-09-26 06:24] Running PCA (matrix processing) \n", "[2023-09-26 06:24] Running venice binarizer (matrix processing) \n", "[2023-09-26 06:25] Study was successfully submitted\n", "[2023-09-26 06:25] DONE!!!\n", "Study submitted successfully!\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## The path DOES NOT include the bucket path configured on platform\n", "## Support multiple batches per submission\n", "batch_info = [{\n", " 'matrix': 'demo_data/visium_test/Visium_FFPE_Human_Prostate_IF_filtered_feature_bc_matrix.h5',\n", " 'image': 'demo_data/visium_test/tissue_hires_image.png',\n", " 'position': 'demo_data/visium_test/tissue_positions.csv',\n", " 'scale': 'demo_data/visium_test/scalefactors_json.json',\n", "}, {...}]\n", "\n", "connector.submit_study_from_s3(\n", " group_id='personal',\n", " batch_info=batch_info,\n", " study_id='visium_test',\n", " name='This is my first study',\n", " authors=['Huy Nguyen', 'Thao Truong'],\n", " species=Species.HUMAN.value,\n", " study_type=StudyType.VISIUM.value\n", ")" ] }, { "cell_type": "markdown", "id": "272f32d8", "metadata": {}, "source": [ "#### 3.1.2. DSP format" ] }, { "cell_type": "code", "execution_count": null, "id": "2784823c", "metadata": {}, "outputs": [], "source": [ "## The path DOES NOT include the bucket path configured on platform\n", "## Support multiple batches per submission\n", "batch_info = [{\n", " 'matrix': 's3_path/data_1/matrix.xlsx',\n", " 'image': 's3_path/data_1/image.ome.tiff',\n", " }, {...}]\n", "\n", "connector.submit_study_from_s3(\n", " group_id='personal',\n", " batch_info=batch_info,\n", " study_id='visium_test',\n", " name='This is my first study',\n", " authors=['Huy Nguyen', 'Thao Truong'],\n", " species=Species.HUMAN.value,\n", " study_type=StudyType.DSP.value\n", ")" ] }, { "cell_type": "markdown", "id": "35837d6b", "metadata": {}, "source": [ "#### 3.1.3. Visium Scanpy object" ] }, { "cell_type": "code", "execution_count": null, "id": "577ce035", "metadata": {}, "outputs": [], "source": [ "## The path DOES NOT include the bucket path configured on platform\n", "## Support multiple batches per submission\n", "batch_info = [{\n", " 'matrix': 's3_path/GSE128223_1.h5ad'\n", "}, {...}]\n", "\n", "connector.submit_study_from_s3(\n", " group_id='personal',\n", " batch_info=batch_info,\n", " study_id='visium_test',\n", " name='This is my first study',\n", " authors=['Huy Nguyen', 'Thao Truong'],\n", " species=Species.HUMAN.value,\n", " study_type=StudyType.VISIUM_ANN.value\n", ")" ] }, { "cell_type": "markdown", "id": "a2dc3fe0", "metadata": {}, "source": [ "#### 3.1.4. Visium Seurat object" ] }, { "cell_type": "code", "execution_count": null, "id": "533a33fa", "metadata": {}, "outputs": [], "source": [ "## The path DOES NOT include the bucket path configured on platform\n", "## Support multiple batches per submission\n", "batch_info = [{\n", " 'matrix': 's3_path/GSE128223_1.rds'\n", "}, {...}]\n", "\n", "connector.submit_study_from_s3(\n", " group_id='personal',\n", " batch_info=batch_info,\n", " study_id='visium_test',\n", " name='This is my first study',\n", " authors=['Huy Nguyen', 'Thao Truong'],\n", " species=Species.HUMAN.value,\n", " study_type=StudyType.VISIUM_RDS.value\n", ")" ] }, { "cell_type": "markdown", "id": "77a5cca2-708d-4e12-84a1-fb8f5e2da94e", "metadata": {}, "source": [ "### 3.2. Option 2: Submit study from local machine" ] }, { "cell_type": "markdown", "id": "3dc20cec-48a3-45bd-910e-1b87c90f20b1", "metadata": {}, "source": [ "```\n", "Parameters:\n", "------\n", "group_id: str\n", " ID of the group to submit the data to.\n", "batch_info: List[dict]\n", " File path and batch name information\n", " Example:\n", " For DSP format:\n", " [{\n", " 'name': 'data_1',\n", " 'matrix': 'local_path/data_1/matrix.xlsx',\n", " 'image': 'local_path/data_1/image.ome.tiff',\n", " }, {...}]\n", " For Visium format:\n", " [{\n", " 'name': 'data_1',\n", " 'matrix': 'local_path/data_1/matrix.h5',\n", " 'image': 'local_path/data_1/image.tiff'\n", " 'position': 'local_path/data_1/tissue_positions_list.csv'\n", " 'scale': 'local_path/data_1/scalefactors_json.json'\n", " }, {...}]\n", " For Visium RDS format:\n", " [{\n", " 'matrix': 'local_path/GSE128223_1.rds'\n", " }, {...}]\n", " For Visium Anndata format:\n", " [{\n", " 'matrix': 'local_path/GSE128223_1.h5ad'\n", " }, {...}]\n", "study_id: str\n", " If no value is provided, default id will be a random uuidv4 string\n", "name: str\n", " Name of the study.\n", "authors: List[str]\n", " Authors of the study.\n", "abstract: str\n", " Abstract of the study.\n", "species: str\n", " Species of the study.\n", " Support:\n", " Species.HUMAN.value\n", " Species.MOUSE.value\n", " Species.NON_HUMAN_PRIMATE.value\n", " Species.OTHERS.value\n", "study_type: int\n", " Format of the study\n", " Support:\n", " StudyType.DSP.value\n", " StudyType.VISIUM.value\n", " StudyType.VISIUM_RDS.value\n", " StudyType.VISIUM_ANN.value\n", "chunk_size: int\n", " size of each separated chunk for uploading. Default: ChunkSize.CHUNK_100_MB.value\\n\n", " Support:\n", " ChunkSize.CHUNK_5_MB.value\n", " ChunkSize.CHUNK_100_MB.value\n", " ChunkSize.CHUNK_500_MB.value\n", " ChunkSize.CHUNK_1_GB.value\n", "```" ] }, { "cell_type": "markdown", "id": "c371619a", "metadata": {}, "source": [ "#### 3.2.1. Visium format" ] }, { "cell_type": "code", "execution_count": 4, "id": "c6168f9e-d417-40c5-99c3-fce09036c607", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "test_visiummatrix.h5 - chunk_0: 18%|█████████████████████████████▋ | 18.1M/100M [00:00<00:01, 84.8MMB/s]\n", "test_visiumhires.png - chunk_0: 4%|██████▉ | 4.24M/100M [00:00<00:01, 62.5MMB/s]\n", "test_visiumposition.csv - chunk_0: 0%|▎ | 188k/100M [00:00<00:13, 7.72MMB/s]\n", "test_visiumscale.json - chunk_0: 0%| | 510/100M [00:00<1:06:43, 26.2kMB/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[2023-09-26 06:36] Waiting in queue\n", "[2023-09-26 06:36] Reading batch: test_visium\n", "[2023-09-26 06:36] [test_visium] Indexing matrix\n", "[2023-09-26 06:36] [test_visium] Indexing images\n", "[2023-09-26 06:36] Finish batch: test_visium\n", "[2023-09-26 06:36] Preprocessing expression matrix: 3460 cells x 17943 genes\n", "[2023-09-26 06:36] Filtered: 3460 cells remain\n", "[2023-09-26 06:36] Waiting in queue (matrix processing) \n", "[2023-09-26 06:36] Normalizing expression matrix (matrix processing) \n", "[2023-09-26 06:36] Running PCA (matrix processing) \n", "[2023-09-26 06:36] Running kNN (matrix processing) \n", "[2023-09-26 06:36] Study was successfully submitted\n", "[2023-09-26 06:36] DONE!!!\n", "Study submitted successfully!\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## Support multiple batches per submission\n", "batch_info = [{\n", " 'name': 'test_visium',\n", " 'matrix': '/data/dev/SonVo/visium_test/Visium_FFPE_Human_Prostate_IF_filtered_feature_bc_matrix.h5',\n", " 'image': '/data/dev/SonVo/visium_test/tissue_hires_image.png',\n", " 'position': '/data/dev/SonVo/visium_test/tissue_positions.csv',\n", " 'scale': '/data/dev/SonVo/visium_test/scalefactors_json.json',\n", "}]\n", "connector.submit_study_from_local(\n", " group_id='personal',\n", " batch_info=batch_info,\n", " study_id='test_visium',\n", " name='This is my first study',\n", " authors=['Huy Nguyen', 'Thao Truong'],\n", " species=Species.HUMAN.value,\n", " study_type=StudyType.VISIUM.value,\n", ")" ] }, { "cell_type": "markdown", "id": "45e448cc", "metadata": {}, "source": [ "#### 3.2.2. DSP format" ] }, { "cell_type": "code", "execution_count": null, "id": "5b9158d1", "metadata": {}, "outputs": [], "source": [ "## Support multiple batches per submission\n", "batch_info = [{\n", " 'name': 'data_1',\n", " 'matrix': 'local_path/data_1/matrix.xlsx',\n", " 'image': 'local_path/data_1/image.ome.tiff',\n", "}, {...}]\n", "\n", "connector.submit_study_from_local(\n", " group_id='personal',\n", " batch_info=batch_info,\n", " study_id='test_visium',\n", " name='This is my first study',\n", " authors=['Huy Nguyen', 'Thao Truong'],\n", " species=Species.HUMAN.value,\n", " study_type=StudyType.DSP.value,\n", ")" ] }, { "cell_type": "markdown", "id": "ed8acf46", "metadata": {}, "source": [ "#### 3.2.3. Visium Scanpy object" ] }, { "cell_type": "code", "execution_count": null, "id": "29e0f659", "metadata": {}, "outputs": [], "source": [ "## Support multiple batches per submission\n", "batch_info = [{\n", " 'matrix': 'local_path/GSE128223_1.h5ad'\n", "}, {...}]\n", "\n", "connector.submit_study_from_local(\n", " group_id='personal',\n", " batch_info=batch_info,\n", " study_id='test_visium',\n", " name='This is my first study',\n", " authors=['Huy Nguyen', 'Thao Truong'],\n", " species=Species.HUMAN.value,\n", " study_type=StudyType.VISIUM_ANN.value,\n", ")" ] }, { "cell_type": "markdown", "id": "a84d5faa", "metadata": {}, "source": [ "#### 3.2.4. Visium Seurat object" ] }, { "cell_type": "code", "execution_count": null, "id": "6ff928b0", "metadata": {}, "outputs": [], "source": [ "## Support multiple batches per submission\n", "batch_info = [{\n", " 'matrix': 'local_path/GSE128223_1.rds'\n", "}, {...}]\n", "\n", "connector.submit_study_from_local(\n", " group_id='personal',\n", " batch_info=batch_info,\n", " study_id='test_visium',\n", " name='This is my first study',\n", " authors=['Huy Nguyen', 'Thao Truong'],\n", " species=Species.HUMAN.value,\n", " study_type=StudyType.VISIUM_RDS.value,\n", ")" ] }, { "cell_type": "markdown", "id": "b3d3a66f", "metadata": {}, "source": [ "### 3.3. Option 3: Submit study with shared s3 of a group" ] }, { "cell_type": "markdown", "id": "8d43a75a", "metadata": {}, "source": [ "```\n", "Parameters:\n", "----\n", "group_id: str\n", " ID of the group to submit the data to.\n", "shared_s3_id: str\n", " ID of s3 bucket.\n", "batch_info: List[dict]\n", " File path and batch name information, the path DOES NOT include bucket path configured on platform!\n", " Example:\n", " For DSP format:\n", " [{\n", " 'matrix': 's3_path/data_1/matrix.xlsx',\n", " 'image': 's3_path/data_1/image.ome.tiff',\n", " }, {...}]\n", " For Visium format:\n", " [{\n", " 'matrix': 's3_path/data_1/matrix.h5',\n", " 'image': 's3_path/data_1/image.tiff'\n", " 'position': 's3_path/data_1/tissue_positions_list.csv'\n", " 'scale': 's3_path/data_1/scalefactors_json.json'\n", " }, {...}]\n", " For Visium RDS format:\n", " [{\n", " 'matrix': 's3_path/GSE128223_1.rds'\n", " }, {...}]\n", " For Visium Anndata format:\n", " [{\n", " 'matrix': 's3_path/GSE128223_1.h5ad'\n", " }, {...}]\n", "study_id: str\n", " Will be name of study (eg: VISIUM_PBMC)\n", " If no value is provided, default id will be a random uuidv4 string\n", "name: str\n", " Name of the study.\n", "authors: List[str]\n", " Authors of the study.\n", "abstract: str\n", " Abstract of the study.\n", "species: str\n", " Species of the study.\n", " Support:\n", " Species.HUMAN.value\n", " Species.MOUSE.value\n", " Species.NON_HUMAN_PRIMATE.value\n", " Species.OTHERS.value\n", "study_type: int\n", " Format of the study\n", " Support:\n", " StudyType.DSP.value\n", " StudyType.VISIUM.value\n", " StudyType.VISIUM_RDS.value\n", " StudyType.VISIUM_ANN.value\n", " \n", "```" ] }, { "cell_type": "markdown", "id": "35684112", "metadata": {}, "source": [ "#### 3.1.1. Visium format" ] }, { "cell_type": "code", "execution_count": null, "id": "8ff9a760", "metadata": {}, "outputs": [], "source": [ "## The path DOES NOT include the bucket path configured on platform\n", "## Support multiple batches per submission\n", "batch_info = [{\n", " 'matrix': 'demo_data/visium_test/Visium_FFPE_Human_Prostate_IF_filtered_feature_bc_matrix.h5',\n", " 'image': 'demo_data/visium_test/tissue_hires_image.png',\n", " 'position': 'demo_data/visium_test/tissue_positions.csv',\n", " 'scale': 'demo_data/visium_test/scalefactors_json.json',\n", "}, {...}]\n", "\n", "connector.submit_study_from_shared_s3(\n", " group_id='6b3cfc27fa694779a1b2a5015e438b94',\n", " batch_info=batch_info,\n", " study_id='visium_test',\n", " name='This is my first study',\n", " authors=['Huy Nguyen', 'Thao Truong'],\n", " species=Species.HUMAN.value,\n", " study_type=StudyType.VISIUM.value,\n", " shared_s3_id='15de18d355b4ce0a1u512a5b45c8e3c'\n", ")" ] }, { "cell_type": "markdown", "id": "14bbba24", "metadata": {}, "source": [ "#### 3.1.2. DSP format" ] }, { "cell_type": "code", "execution_count": null, "id": "efa6523a", "metadata": {}, "outputs": [], "source": [ "## The path DOES NOT include the bucket path configured on platform\n", "## Support multiple batches per submission\n", "batch_info = [{\n", " 'matrix': 's3_path/data_1/matrix.xlsx',\n", " 'image': 's3_path/data_1/image.ome.tiff',\n", " }, {...}]\n", "\n", "connector.submit_study_from_shared_s3(\n", " group_id='6b3cfc27fa694779a1b2a5015e438b94',\n", " batch_info=batch_info,\n", " study_id='visium_test',\n", " name='This is my first study',\n", " authors=['Huy Nguyen', 'Thao Truong'],\n", " species=Species.HUMAN.value,\n", " study_type=StudyType.DSP.value,\n", " shared_s3_id='15de18d355b4ce0a1u512a5b45c8e3c'\n", ")" ] }, { "cell_type": "markdown", "id": "c48f9912", "metadata": {}, "source": [ "#### 3.1.3. Visium Scanpy object" ] }, { "cell_type": "code", "execution_count": null, "id": "b22c5073", "metadata": {}, "outputs": [], "source": [ "## The path DOES NOT include the bucket path configured on platform\n", "## Support multiple batches per submission\n", "batch_info = [{\n", " 'matrix': 's3_path/GSE128223_1.h5ad'\n", "}, {...}]\n", "\n", "connector.submit_study_from_shared_s3(\n", " group_id='6b3cfc27fa694779a1b2a5015e438b94',\n", " batch_info=batch_info,\n", " study_id='visium_test',\n", " name='This is my first study',\n", " authors=['Huy Nguyen', 'Thao Truong'],\n", " species=Species.HUMAN.value,\n", " study_type=StudyType.VISIUM_ANN.value,\n", " shared_s3_id='15de18d355b4ce0a1u512a5b45c8e3c'\n", ")" ] }, { "cell_type": "markdown", "id": "69a15dee", "metadata": {}, "source": [ "#### 3.1.4. Visium Seurat object" ] }, { "cell_type": "code", "execution_count": null, "id": "25c59740", "metadata": {}, "outputs": [], "source": [ "## The path DOES NOT include the bucket path configured on platform\n", "## Support multiple batches per submission\n", "batch_info = [{\n", " 'matrix': 's3_path/GSE128223_1.rds'\n", "}, {...}]\n", "\n", "connector.submit_study_from_shared_s3(\n", " group_id='6b3cfc27fa694779a1b2a5015e438b94',\n", " batch_info=batch_info,\n", " study_id='visium_test',\n", " name='This is my first study',\n", " authors=['Huy Nguyen', 'Thao Truong'],\n", " species=Species.HUMAN.value,\n", " study_type=StudyType.VISIUM_RDS.value,\n", " shared_s3_id='15de18d355b4ce0a1u512a5b45c8e3c'\n", ")" ] }, { "cell_type": "markdown", "id": "b53fe51a-7a2b-4a22-ad70-0473fd7f8538", "metadata": {}, "source": [ "## 4. Submit metadata" ] }, { "cell_type": "markdown", "id": "d2240ae0-e9ec-4687-adce-36a3807d8be9", "metadata": {}, "source": [ "
NOTE: Get group_id and study_id (uuid) from step \"2. List groups and studies\"
" ] }, { "cell_type": "markdown", "id": "026597a6-87e0-4926-8d1e-83baa57aed9e", "metadata": {}, "source": [ "### 4.1. Submit a dataframe directly " ] }, { "cell_type": "markdown", "id": "e8ed9de0-e633-4da4-9570-021f38514732", "metadata": {}, "source": [ "This is an example metadata. Barcodes column must be DataFrame.index" ] }, { "cell_type": "code", "execution_count": 9, "id": "18383247-bb97-4950-bc3d-5c5e06b7a927", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Batches
Barcodes
spatial_TATGGCAGACTTTCGA-1spatial
spatial_CTTCGTGCCCGCATCG-1spatial
spatial_AAACGGGTTGGTATCC-1spatial
spatial_TGCAAACCCACATCAA-1spatial
spatial_GACGGGATGTCTTATG-1spatial
......
visium_test_AGTATACACAGCGACA-1visium_test
visium_test_TGTGGTTGCTAAAGCT-1visium_test
visium_test_TGATTCCCGGTTACCT-1visium_test
visium_test_AACATTGTGACTCGAG-1visium_test
visium_test_GCTCTTTCCGCTAGTG-1visium_test
\n", "

7495 rows × 1 columns

\n", "
" ], "text/plain": [ " Batches\n", "Barcodes \n", "spatial_TATGGCAGACTTTCGA-1 spatial\n", "spatial_CTTCGTGCCCGCATCG-1 spatial\n", "spatial_AAACGGGTTGGTATCC-1 spatial\n", "spatial_TGCAAACCCACATCAA-1 spatial\n", "spatial_GACGGGATGTCTTATG-1 spatial\n", "... ...\n", "visium_test_AGTATACACAGCGACA-1 visium_test\n", "visium_test_TGTGGTTGCTAAAGCT-1 visium_test\n", "visium_test_TGATTCCCGGTTACCT-1 visium_test\n", "visium_test_AACATTGTGACTCGAG-1 visium_test\n", "visium_test_GCTCTTTCCGCTAGTG-1 visium_test\n", "\n", "[7495 rows x 1 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "meta_df = pd.read_csv('MERGED_VISIUM_metadata.tsv', sep='\\t', index_col=0)\n", "meta_df" ] }, { "cell_type": "code", "execution_count": 22, "id": "df777667-9090-4a8c-a7da-381d9d08a91b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Successful'" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "connector.submit_metadata_from_dataframe(\n", " species=Species.HUMAN.value,\n", " study_id='f6f4c94460af44fabaa07ac77087351c',\n", " group_id='personal',\n", " df=meta_df\n", ")" ] }, { "cell_type": "markdown", "id": "f28f3b69-5847-4c94-abdc-c6aa12811ed6", "metadata": {}, "source": [ "### 4.2. Submit file from local / server" ] }, { "cell_type": "code", "execution_count": 23, "id": "638fa955-7096-44c4-8851-ffe1fe2b1e07", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Successful'" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "connector.submit_metadata_from_local(\n", " species=Species.HUMAN.value,\n", " study_id='f6f4c94460af44fabaa07ac77087351c',\n", " group_id='personal',\n", " file_path='./MERGED_VISIUM_metadata.tsv'\n", ")" ] }, { "cell_type": "markdown", "id": "fbafd539-da4d-43b0-b26b-dfe0c672142f", "metadata": {}, "source": [ "### 4.3. Submit file from s3" ] }, { "cell_type": "code", "execution_count": null, "id": "881ea5a8-f696-42a4-b492-a400503e3d48", "metadata": {}, "outputs": [], "source": [ "connector.submit_metadata_from_s3(\n", " species=Species.HUMAN.value,\n", " study_id='f6f4c94460af44fabaa07ac77087351c',\n", " group_id='personal',\n", " file_path='test_bucket/GSE128223_meta.tsv' #This path DOES NOT include the bucket path configured on platform e.g. s3://bioturing_bucket\n", ")" ] }, { "cell_type": "markdown", "id": "e9f048b2", "metadata": {}, "source": [ "### 4.4. Submit file from shared s3 of a group" ] }, { "cell_type": "code", "execution_count": null, "id": "1ebb3109", "metadata": {}, "outputs": [], "source": [ "connector.submit_metadata_from_shared_s3(\n", " species=Species.HUMAN.value,\n", " study_id='a1558f8ed6064095be86a091a4118c4a',\n", " group_id='bioturing_public_studies', #This function DOES NOT applied for group_id='personal'\n", " file_path='test_bucket/GSE128223_meta.tsv', #This path DOES NOT include the bucket path configured on platform e.g. s3://bioturing_bucket\n", "\tshared_s3_id='ce26142487ed4a3697bb8902bf9d9670'\n", ")" ] }, { "cell_type": "markdown", "id": "7cab4732-521a-4605-a103-c36363621f46", "metadata": {}, "source": [ "## 5. Access study data" ] }, { "cell_type": "markdown", "id": "9ebd2818-18ba-4147-ab23-ff556c5b4e85", "metadata": {}, "source": [ "
NOTE: Get study_id (uuid) from step \"2.2. List all available studies in a group\"
" ] }, { "cell_type": "markdown", "id": "d9b81b3d-27f3-4aa4-a181-818e94843fe4", "metadata": {}, "source": [ "### 5.1. Get barcodes" ] }, { "cell_type": "code", "execution_count": 29, "id": "34ce2be9-6bd0-4ddf-b7bc-75ee2d307c3e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['spatial_TATGGCAGACTTTCGA-1' 'spatial_CTTCGTGCCCGCATCG-1'\n", " 'spatial_AAACGGGTTGGTATCC-1' ... 'visium_test_TGATTCCCGGTTACCT-1'\n", " 'visium_test_AACATTGTGACTCGAG-1' 'visium_test_GCTCTTTCCGCTAGTG-1']\n" ] } ], "source": [ "barcodes = np.array(connector.get_barcodes(\n", " study_id='f6f4c94460af44fabaa07ac77087351c',\n", " species=Species.HUMAN.value,\n", "))\n", "print(barcodes)" ] }, { "cell_type": "markdown", "id": "3785035a-dd98-4fd5-b312-06698108900c", "metadata": {}, "source": [ "### 5.2. Get features" ] }, { "cell_type": "code", "execution_count": 30, "id": "8a377a4d-adbc-4c3a-bd32-b578217bc4d0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['5S_RRNA' '5_8S_RRNA' '7SK' ... 'AL121908.1' 'AP000527.1' 'AL035681.1']\n" ] } ], "source": [ "features = np.array(connector.get_features(\n", " study_id='f6f4c94460af44fabaa07ac77087351c',\n", " species=Species.HUMAN.value,\n", "))\n", "print(features)" ] }, { "cell_type": "markdown", "id": "4bd5793d-6f41-4cf0-835f-da9b7388c548", "metadata": {}, "source": [ "### 5.3. Get metadata dataframe" ] }, { "cell_type": "code", "execution_count": 32, "id": "3225fb73-1ff3-42f2-ae0b-28db50ba3392", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BarcodesBatchesBatches (1)Batches (2)Number of genes
0spatial_TATGGCAGACTTTCGA-1spatialspatialspatial6782
1spatial_CTTCGTGCCCGCATCG-1spatialspatialspatial6948
2spatial_AAACGGGTTGGTATCC-1spatialspatialspatial6972
3spatial_TGCAAACCCACATCAA-1spatialspatialspatial8065
4spatial_GACGGGATGTCTTATG-1spatialspatialspatial6229
\n", "
" ], "text/plain": [ " Barcodes Batches Batches (1) Batches (2) \\\n", "0 spatial_TATGGCAGACTTTCGA-1 spatial spatial spatial \n", "1 spatial_CTTCGTGCCCGCATCG-1 spatial spatial spatial \n", "2 spatial_AAACGGGTTGGTATCC-1 spatial spatial spatial \n", "3 spatial_TGCAAACCCACATCAA-1 spatial spatial spatial \n", "4 spatial_GACGGGATGTCTTATG-1 spatial spatial spatial \n", "\n", " Number of genes \n", "0 6782 \n", "1 6948 \n", "2 6972 \n", "3 8065 \n", "4 6229 " ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "metadata = connector.get_metadata(\n", " study_id='f6f4c94460af44fabaa07ac77087351c',\n", " species=Species.HUMAN.value\n", ")\n", "metadata.iloc[:5, :5]" ] }, { "cell_type": "markdown", "id": "ae37260a-47a5-419b-b60b-7945509bb24d", "metadata": {}, "source": [ "### 5.4. Get embeddings" ] }, { "cell_type": "markdown", "id": "4a133f3b-1edb-4cc3-8156-5faec669bd42", "metadata": {}, "source": [ "#### 5.4.1. List all embeddings" ] }, { "cell_type": "code", "execution_count": 34, "id": "b9d5a84c-b9aa-492d-ab5f-ce452aadb53e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'embedding_id': '8e31785c43b6458c8fdc7aa06d2e1028',\n", " 'embedding_name': 'PCA (no batch corrected)'},\n", " {'embedding_id': '1a23d7b23f164d258bbd24d83658f194',\n", " 'embedding_name': 'tSNE (perplexity=30)'}]" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "embeddings = connector.list_all_custom_embeddings(\n", " study_id='f6f4c94460af44fabaa07ac77087351c',\n", " species=Species.HUMAN.value,\n", ")\n", "embeddings" ] }, { "cell_type": "markdown", "id": "460e1bd8-3731-4804-abcb-09e80c8fc2b8", "metadata": {}, "source": [ "#### 5.4.2. Access an embedding" ] }, { "cell_type": "code", "execution_count": 35, "id": "7adbcbc2-8a98-49e2-9b03-aaf63e815d99", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-24.439453 , 1.8610998 , -4.453333 , ..., -0.23137376,\n", " 0.70885265, 0.513127 ],\n", " [-25.617645 , 2.945625 , -5.5363693 , ..., 0.13122909,\n", " -0.7750866 , 0.21110779],\n", " [-25.639389 , 1.7325156 , -3.724022 , ..., 0.02746161,\n", " -0.19130976, -0.55235636],\n", " ...,\n", " [ 27.012297 , -13.96414 , 2.4462044 , ..., -0.89074665,\n", " 2.0481367 , 1.2320619 ],\n", " [ 26.676434 , -9.607573 , 2.836241 , ..., -1.4937118 ,\n", " -3.8412411 , 3.404403 ],\n", " [ 26.823132 , -0.6082494 , 8.160352 , ..., 0.42766818,\n", " -3.8642507 , 4.8920965 ]], dtype=float32)" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chosen_embedding = connector.retrieve_custom_embedding(\n", " study_id='f6f4c94460af44fabaa07ac77087351c',\n", " species=Species.HUMAN.value,\n", " embedding_id='8e31785c43b6458c8fdc7aa06d2e1028',\n", ")\n", "chosen_embedding" ] }, { "cell_type": "markdown", "id": "4f0f891f-cae6-4f43-98fe-ba8a6bedc049", "metadata": {}, "source": [ "### 5.5. Query genes" ] }, { "cell_type": "markdown", "id": "ada2bf19-a504-41cc-a1fa-d49a63214b6b", "metadata": {}, "source": [ "```\n", "Parameters:\n", "----\n", "group_id: str\n", " ID of the group to submit the data to.\n", "study_id: str\n", " If no value is provided, default id will be a random uuidv4 string\n", "gene_names: List[str], default=[]\n", " If the value array is empty, the return value will be the whole matrix\n", "unit: str\n", " Support:\n", " StudyUnit.UNIT_RAW.value\n", " StudyUnit.UNIT_LOGNORM.value\n", "```" ] }, { "cell_type": "code", "execution_count": 36, "id": "e9df1399-872a-42b7-b30c-8c37513bbbfe", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<7495x2 sparse matrix of type ''\n", "\twith 7006 stored elements in Compressed Sparse Column format>" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gene_exp = connector.query_genes(\n", " species=Species.HUMAN.value,\n", " study_id='f6f4c94460af44fabaa07ac77087351c',\n", " gene_names=['CD3D', 'CD8A'],\n", " unit=StudyUnit.UNIT_RAW.value,\n", ")\n", "gene_exp" ] }, { "cell_type": "markdown", "id": "7a3d58f2-5efd-47ca-9b62-de2225782295", "metadata": {}, "source": [ "## 6. Standardize your metadata" ] }, { "cell_type": "markdown", "id": "137d4673-56ec-4d74-b7dd-6422a05f0cf4", "metadata": {}, "source": [ "
NOTE: Get group_id and study_id (uuid) from step \"2. List groups and studies\"
" ] }, { "cell_type": "markdown", "id": "c62e55fe-8a03-4586-9cce-9051a527849b", "metadata": {}, "source": [ "### 6.1. Retrieve ontology tree" ] }, { "cell_type": "markdown", "id": "e0e49b2d", "metadata": {}, "source": [ "```\n", "Returns\n", "----------\n", "Ontologies tree : Dict[Dict]\n", " In which:\n", " 'name': name of the node, which will be used in further steps\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "247967ab-e07a-44d2-bb04-8bb06e46a171", "metadata": { "tags": [] }, "outputs": [], "source": [ "connector.get_ontologies_tree(\n", " species=Species.HUMAN.value,\n", " group_id='bioturing_public_studies'\n", ")" ] }, { "cell_type": "markdown", "id": "06224852-3faa-4888-ae0a-4b1c30842167", "metadata": {}, "source": [ "### 6.2. Assign standardized terms" ] }, { "cell_type": "markdown", "id": "cc0926ff-2fca-42e9-88a1-2632cf8e4f84", "metadata": {}, "source": [ "```\n", "Parameters\n", "-----\n", "species: str\n", " Species of the study.\n", " Support: Species.HUMAN.value\n", " Species.MOUSE.value\n", " Species.PRIMATE.value\n", " Species.OTHERS.value\n", "group_id: str\n", " ID of the group to submit the data to.\n", "study_id: str\n", " ID of the study (uuid)\n", "metadata_field: str\n", " column name of meta dataframe in platform (eg: author's tissue)\n", "metadata_value: str\n", " metadata value within the metadata field (eg: normal lung)\n", "root_name: str\n", " name of root in btr ontologies tree (eg: tissue)\n", "leaf_name: str\n", " name of leaf in btr ontologies tree (eg: lung)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "a7a22d45-4420-4c58-8387-02468eea522b", "metadata": {}, "outputs": [], "source": [ "# This function is only usable in a group (not 'personal')\n", "\n", "connector.assign_standardized_meta(\n", " species=Species.HUMAN.value,\n", " group_id='bioturing_public_studies',\n", " study_id='a1558f8ed6064095be86a091a4118c4a',\n", " metadata_field='Cell type',\n", " metadata_value='TCRV delta 1 gamma-delta T cell',\n", " root_name='cell type',\n", " leaf_name='gamma-delta T cell',\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 5 }