开启左侧

AI大模型帮我整理GEO数据库中样本信息表格

[复制链接]
在线会员 GBh28zHK 发表于 2025-3-15 08:21:47 | 显示全部楼层 |阅读模式 打印 上一主题 下一主题 |快速收录
当下,野生智能的开展使人惊讶没有已经,它所展示出的强大才气实在震动民气。正在理论事情中,借帮年夜模子动作患上力帮忙,能够清楚进步事情服从。交下来,尔将以数据收拾整顿为例,战各人分享一点儿经历,期望能起到举一反三的感化。

正在死物疑息教范围,从 GEO(Gene Expression Omnibus)数据库下载数据是罕见操纵。可是,一个遍及存留的成就是,样原疑息常常分离正在差别的表格中。要念深入阐发那些数据,便需要将那些分离的疑息调整到一个表格里,那个历程不但烦琐,借简单堕落,十分消耗时间战肉体。

不外,现在有了野生智能的帮力,情况便年夜没有差异了。AI 能够轻快处置这种重复性下、操纵烦琐的任务,极地面提拔了数据收拾整顿的服从战精确性。

上面,尔以 GEO 数据库中的一组数据为例,背各人具体介绍那一历程。该数据的链交为:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE195832 。

尔需要收拾整顿一个表格文献记载对于应的样原疑息以下,但是那些疑息散布正在差别的表格需要兼并:meta.tsvIDSRRGSMPtStatusCombinedBamBai3219_SR_1        SRR17839312GSM5851565Pt1PrePt1Pre        ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839312/3219_SR_1possorted_genome_bam.bam.1        ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839312/3219_SR_1possorted_genome_bam.bam.1.bai3219_SR_2        SRR17839311GSM5851566Pt1PostPt1Post        ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839311/3219_SR_2possorted_genome_bam.bam.1        ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839311/3219_SR_2possorted_genome_bam.bam.1.bai3219_SR_3        SRR17839310GSM5851567Pt2PrePt2Pre        ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839310/3219_SR_3possorted_genome_bam.bam.1        ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839310/3219_SR_3possorted_genome_bam.bam.1.bai3219_SR_4        SRR17839309GSM5851568Pt2PostPt2Post        ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839309/3219_SR_4possorted_genome_bam.bam.1        ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839309/3219_SR_4possorted_genome_bam.bam.1.bai3219_SR_5        SRR17839308GSM5851569Pt3PrePt3Pre        ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839308/3219_SR_5possorted_genome_bam.bam.1        ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839308/3219_SR_5possorted_genome_bam.bam.1.bai3219_SR_6        SRR17839307GSM5851570Pt3PostPt3Post        ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839307/3219_SR_6possorted_genome_bam.bam.1        ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839307/3219_SR_6possorted_genome_bam.bam.1.bai3521_SR_1        SRR17839306GSM5851571Pt4PrePt4Pre        ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839306/3521_SR_1possorted_genome_bam.bam.1        ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839306/3521_SR_1possorted_genome_bam.bam.1.bai3521_SR_2        SRR17839305GSM5851572Pt4PostPt4Post        ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839305/3521_SR_2possorted_genome_bam.bam.1        ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839305/3521_SR_2possorted_genome_bam.bam.1.bai
表格疑息滥觞以下:
ena 搜刮勾选需要的数据 数据下载路子:https://www.ebi.ac.uk/ena/browser/view/PRJNA802247
AI年夜模子助尔收拾整顿GEO数据库中样原疑息表格w2.jpg
获得下载链交表格1:wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839311/3219_SR_2possorted_genome_bam.bam.1.baiwget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839309/3219_SR_4possorted_genome_bam.bam.1.baiwget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839309/3219_SR_4possorted_genome_bam.bam.1wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839307/3219_SR_6possorted_genome_bam.bam.1wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839312/3219_SR_1possorted_genome_bam.bam.1wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839305/3521_SR_2possorted_genome_bam.bam.1wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839312/3219_SR_1possorted_genome_bam.bam.1.baiwget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839306/3521_SR_1possorted_genome_bam.bam.1wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839308/3219_SR_5possorted_genome_bam.bam.1wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839310/3219_SR_3possorted_genome_bam.bam.1wget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839308/3219_SR_5possorted_genome_bam.bam.1.baiwget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839306/3521_SR_1possorted_genome_bam.bam.1.baiwget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839310/3219_SR_3possorted_genome_bam.bam.1.baiwget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839307/3219_SR_6possorted_genome_bam.bam.1.baiwget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839305/3521_SR_2possorted_genome_bam.bam.1.baiwget -nc ftp://ftp.sra.ebi.ac.uk/vol1/run/SRR178/SRR17839311/3219_SR_2possorted_genome_bam.bam.1
那些疑息散布正在差别的表格中,咱们需要收拾整顿一下,那里能够颠末AI模子助咱们处置一下:
表格2 GSM号战样原ID疑疑GSM5851565        ScRNA Pt1 PreGSM5851566        ScRNA Pt1 PostGSM5851567        ScRNA Pt2 PreGSM5851568        ScRNA Pt2 PostGSM5851569        ScRNA Pt3 PreGSM5851570        ScRNA Pt3 PostGSM5851571        ScRNA Pt4 PreGSM5851572        ScRNA Pt4 Post
表格3 数据SRR号战GSM的疑息:
Run        Library Name        tissue        tissue_type        time_pointSRR17839305        GSM5851572        Tumor        Head And Neck Oral Cavity Squamous Cell Carcinoma        postSRR17839306        GSM5851571        Tumor        Head And Neck Oral Cavity Squamous Cell Carcinoma        preSRR17839307        GSM5851570        Tumor        Head And Neck Oral Cavity Squamous Cell Carcinoma        postSRR17839308        GSM5851569        Tumor        Head And Neck Oral Cavity Squamous Cell Carcinoma        preSRR17839309        GSM5851568        Tumor        Head And Neck Oral Cavity Squamous Cell Carcinoma        postSRR17839310        GSM5851567        Tumor        Head And Neck Oral Cavity Squamous Cell Carcinoma        preSRR17839311        GSM5851566        Tumor        Head And Neck Oral Cavity Squamous Cell Carcinoma        postSRR17839312        GSM5851565        Tumor        Head And Neck Oral Cavity Squamous Cell Carcinoma        pre
使用AI干表格兼并:

那里用的是豆包速率会快一点儿,deepseek也能够完毕可是比力卡,如下是提醒词汇及完毕历程:

AI年夜模子助尔收拾整顿GEO数据库中样原疑息表格w3.jpg

终极成果以下:
文献名        Library Name        Description        Tissue        Tissue Type        Time Point3219_SR_4        GSM5851568        ScRNA Pt2 Post        Tumor        Head And Neck Oral Cavity Squamous Cell Carcinoma        post3219_SR_6        GSM5851570        ScRNA Pt3 Post        Tumor        Head And Neck Oral Cavity Squamous Cell Carcinoma        post3219_SR_1        GSM5851565        ScRNA Pt1 Pre        Tumor        Head And Neck Oral Cavity Squamous Cell Carcinoma        pre3521_SR_2        GSM5851572        ScRNA Pt4 Post        Tumor        Head And Neck Oral Cavity Squamous Cell Carcinoma        post3521_SR_1        GSM5851571        ScRNA Pt4 Pre        Tumor        Head And Neck Oral Cavity Squamous Cell Carcinoma        pre3219_SR_5        GSM5851569        ScRNA Pt3 Pre        Tumor        Head And Neck Oral Cavity Squamous Cell Carcinoma        pre3219_SR_3        GSM5851567        ScRNA Pt2 Pre        Tumor        Head And Neck Oral Cavity Squamous Cell Carcinoma        pre3219_SR_2        GSM5851566        ScRNA Pt1 Post        Tumor        Head And Neck Oral Cavity Squamous Cell Carcinoma        post
各人能够翻开链交寓目对于话实质:https://www.doubao.com/thread/w882cbc4f971511f0



更多死疑课程:

您需要登录后才可以回帖 登录 | 立即注册 qq_login

本版积分规则

发布主题
阅读排行更多+
用专业创造成效
400-778-7781
周一至周五 9:00-18:00
意见反馈:server@mailiao.group
紧急联系:181-67184787
ftqrcode

扫一扫关注我们

Powered by 职贝云数A新零售门户 X3.5© 2004-2025 职贝云数 Inc.( 蜀ICP备2024104722号 )