付岩 - 中科院数学与系统科学研究院 -

当前位置： X-MOL首页 › 全球导师 › 国内导师 › 付岩

付岩副研究员收藏完善纠错
中科院数学与系统科学研究院
登录后才能查看导师的联系方式，马上登录...

研究领域

bioinformatics,biostatistics,dataminingandmachinelearning.Wearecurrentlyfocusingonutilizinganddevelopingpowerfulcomputationalandstatisticalalgorithmsandsoftwaretoolsformassspectrometrybasedproteomicsstudies,e.g.,proteinidentification,post-translationalmodificationidentification,discoveryandlocalization,proteinquantification,proteotypicpeptideprediction,falsediscoveryratecontrol,multiplehypothesistesting,etc.

Ourresearchareaisinterdisciplinaryresearchofstatistics,computationandbiology,withcurrentfocusoncomputationalandstatisticalproteomics.Ourrepresentativeresearchresultsaresummarizedbelow. 1.Algorithmsandsoftwareforproteinandpost-translationalmodificationidentificationandquantification Searchingmassspectrometrydataagainstproteindatabasestoidentifyproteinsequencesandpost-translationalmodificationsiscentraltoproteomicsresearch.In2004,weproposedanewscoringfunctionnamed"kernelizedSpectralVectorDotProduct(KSDP)",anddevelopedpFind1.0,thefirstproteinidentificationsearchengineinChina(Bioinformatics,2004,20:1948~1954).Sincethen,pFindhasbeendevelopedcontinuouslyforyearsandevolvedintothewell-knownpFindproteinidentificationsystemandpFindresearchgroup(http://pfind.ict.ac.cn). Thehugenumberofunexpectedpost-translationalmodificationsonproteinsareconsideredtobethe"darkmatter"inproteomicdata.Wehavedevelopedavarietyofmodificationdiscoveryalgorithms.WeproposedtheopenmasslibrarysearchalgorithmpMatchtodiscoverunexpectedmodificationsbycomparingthesimilaritiesbetweenmodifiedandunmodifiedspectra.ThepaperofpMatchwasacceptedandreportedinISMB(2010),oneofthetopconferencesofbioinformatics,andmeanwhilepublishedinBioinformatics(2010).Atpresent,pMatchhasbecomeanalgorithmfrequentlycitedandreferencedinthefieldofmasslibrarysearchandmodificationdiscovery.BasedonpMatch,wehaverecentlydevelopedaglycosylationmodificationidentificationalgorithmpMatchGlyco(BioMedResearchInternational,2018). WedevelopedDeltAMT,analgorithmformassspectraclusteringusingpeptidemassandretentiontimeinformationtodiscoverhigh-abundancemodificationtypes(Molecular&CellularProteomics,2011).InthecorefucosylatedglycoproteinidentificationresearchcollaboratedwiththeStateKeyLaboratoryofProteomicsofChina,DeltAMTaswellasotherdataanalysismethodswereusedtosuccessfullyidentifythelargestsetofcoreucosylatedsitesatthattime(Molecular&CellularProteomics,2010). WedevelopedPTMiner,ahigh-accuracyprobabilisticalgorithmformodificationlocalizationandqualitycontrolforopen(masstolerant)databasesearch（Molecular&CellularProteomics，2019).Thealgorithmautomaticallylearnsthepriorprobability,themass-matchingerrordistributionandthematching-peakintensitydistributionfromthemassspectraldatathroughaniterativeprocess,andusesthecontinuouslyupdatedpriorprobabilityandthetwotypesofdistributionstomoreaccuratelyestimatetheposteriorprobabilityofthemodifiedsite.WeusedPTMinertoanalyzethemodificationspresentinthemassivedataofhumanproteomedraft,andlocalizedmorethanonemillionmodificationsat1%FDR,systematicallycharacterizingknownandunknownmodificationsinthehumanproteome.Thepaperwasoncethesecond‘mostread’paperwhenpublishedonline.BasedonthePTMineralgorithm,WedevelopedSAVControl,aqualitycontrolmethodforproteinaminoacidmutations(canbetreatedasaspecialtypeofmodification),whichwaspublishedinJournalofProteomics(2018). Inproteinquantification,massspectrometryusuallyhaslargerandomnesssuchas:1)somepeptidescanbedetectedwhilesomecannotbe,and2)peptidesofthesameconcentrationsmayhavealargedifferenceinmassspectrometrysignalintensity.Theserandomnessseriouslyreducetheaccuracyofproteinquantification.Inordertosolvetheaboveproblems,weproposedtheconceptofquantitativemass-spectrometryefficiencyofpeptides,anddevelopedanewproteinabsolutequantificationalgorithm,namedLFAQ,basedonthepredictedpeptidequantitativeefficiencies(AnalyticalChemistry,2019a).ThenweproposedtoincorporatethedigestibilityofpeptidesintopeptidedetectabilitypredictionmodelanddevelopedAP3,apeptidedetectabilitypredictionalgorithmbasedontherandom-forestmachinelearningmethod(AnalyticalChemistry,2019b). 2.ProteomicsdataFDRcontrolmethodsandapplications Whilebigdataaregivingusbigopportunitiestodiscovernewknowledge,therearealsomanybigrisksandpitfallsoffalsediscoveries.Falsediscoveryrate(FDR)analysisinhigh-dimensionalstatisticalinferenceisconsideredasoneofthemostimportantprogressofstatistics.Inmultiplehypothesistesting,theFDRisdefinedastheexpectationoftheproportionoffalselyrejectedhypothesesamongallrejectedhypotheses.Theinitialpaper(BenjaminiandHochberg,J.R.StatSocietyB,1995)proposingtheFDRhasbeencitedmorethan57,000times,showingitsimportanceandinfluence.ThemainresearchersofFDRincludefamousstatisticiansBradleyEfron,JohnStoreyandEmmanuelCandes. Specially,howtoaccuratelyestimatetheFDRofsubgroupsofhypothesistestsisadifficultproblem,whichwasproposedinitiallybyBradleyEfron(Ann.Appl.Stat.2:197-223,2008).Thisproblemispracticallyimportantinproteomics.Forthefirsttime,wehavemathematicallystudiedtheproblemofFDRestimationforsubgroupsofpeptideidentifications(suchasmodifiedpeptides)inproteomicdataanalysis.ViaBayesiananalysiswetheoreticallyprovedthatthesubgroupFDRandthecombinedFDRarenotequaltoeachotherunderthesamescoringthreshold,andthusproposedtheprincipleofseparatesubgroupfilteringandFDRestimationandderivedaseriesofinsightfultheoreticalresults(StatisticsandItsInterface,2012). Basedontheabovetheoreticalanalysis,weproposedasimplerbutmoreintuitiverelationshipbetweenthesubgroupFDRandcombinedFDR,andfurtherdevelopedTransferFDR,anaccurateFDRestimationmethodforsmallsubgroupsofpeptideidentifications(Molecular&CellularProteomics,2014).TherationalofTransferFDRisasfollows.Whentheabundanceofthemodificationtobeidentifiedislow,thedirectFDRestimationwouldbeseverelyinaccurateduetoinsufficientdatasamplesize.Basedontheobservationandanalysisofrealdata,weinventedaestimationmethodfortheconditionalprobabilityofanerroneouslyidentifiedpeptidebeingamodifiedpeptide.Basedonthisestimation,aquantitativerelationshipbetweenthesubgroupFDRofmodifiedpeptidesandthecombinedFDRofallpeptidesisobtained.Throughthisrelationship,thesubgroupFDRcanbeindirectlypredictedfromthecombinedFDR,whichcanusuallybeaccuratelyestimated.ThisovercomesthedifficultyofsmallsubgroupFDRestimationduetothelackofsamplesize. WeappliedtheabovesubgroupFDRanalysisandTransferredFDRmethodstoanumberofspecialidentificationproblems.Forexample,inthestudyofFDRestimationofnovelgenesidentifiedbysix-frametranslationinproteogenomics,itwasfoundthatifthecombinedFDRwereused,thegeneannotationratioisthedominantfactoraffectingtherealFDRofnewgenes(newpeptides)(Bioinformatics,2015).Also,theTransferFDRmethodwassuccessfullyappliedtothequalitycontrolofopenmodificationsearch（Molecular&CellularProteomics，2019)andaminoacidmutationidentification（JournalofProteomics，2018).Inaddition,theTransferFDRmethodwassuccessfullyusedinacollaborativestudyofprimate-specificgeneidentification(GenomeResearch,2019). 3.Statisticalinferenceanddatamining Intheprocessofanalyzingbiologicaldata,wedevelopedseveralgeneralstatisticalinferenceanddataminingmethods,goingonestepforwardfromappliedresearchtomethodologicalandtheoreticalresearch. Thetarget-decoycompetition(TDC)strategyisthegoldstandardmethodforFDRcontrolofproteomicdata.Thismethodhasbeenusedformanyyears,butitisstillanempiricalmethodandlackstheoreticalfoundation.Inthismethod,theratioofthenumbersofdecoyandtargetresultsisusuallyusedasanestimateofFDR,butwhetherthiscancontrolFDR(thatis,tomaketherealFDRlessthanaspecifiedthreshold)isstillunknown.Wefoundthata+1correctiontotheaboveestimate(decoynumberplus1)canstrictlycontrolFDR,andgavetheoreticalproofforthisconclusion(arXiv,2015). Furtherandmoreimportant,weextendedtheabovecorrectedTDCmethodtothegeneralmultiplehypothesistestingproblem(arXiv,2018).ThepreviousFDRcontrolmethodsinmultiplehypothesistestingwereusuallybasedonanulldistributionoftheteststatistic.However,alltypesofnulldistributions,includingtheoretical,permutation-basedandempiricalones,havesomeinherentdrawbacks.Forexample,thetheoreticalnulldistributionwillfailiftheassumptionsonthesampledistributionarewrong.Inaddition,manyFDRcontrolmethodsrequiretheestimationoftheproportionoftruenullhypotheses,whichisdifficultandhasnotbeenverywellresolved.WeproposedageneralTDC-basedFDRcontrolmethodusingrandompermutations.Ourmethoddoesnotneedtoestimatethenulldistributionofthestatisticortheproportionoftruenullhypotheses,butisonlybasedontherankofthetestsbysomestatistic/score.Itconstructscompetitivedecoyhypothesesfromrandomsamplepermutations.WeprovedthatthismethodcanrigorouslycontrolFDR.SimulationexperimentsshowthatourmethodcancontrolFDRmoreeffectivelythantheBayesandEmpiricalBayesmethods,andhasgreaterstatisticalpower.】

近期论文

查看导师新发文章（温馨提示：请注意重名现象，建议点开原文通过作者单位确认）

QingboShu#,MengjieLi#,LianShu#,ZhiwuAn,JifengWang,HaoLv,MingYang,TanxiCai,TonyHu,YanFu*andFuquanYang*.Large-scaleIdentificationofN-linkedGlycopeptidesinHumanSerumusingHILICEnrichmentandSpectralLibrarySearch.Molecular&CellularProteomics,19:672–689,2020. ZhiqiangGao#,ChengChang#,JinghanYang,YunpingZhu*,YanFu*.AP3:AnAdvancedProteotypicPeptidePredictorforTargetedProteomicsbyIncorporatingPeptideDigestibility.AnalyticalChemistry,2019,91,8705−8711. ZhiwuAn#,LinhuiZhai#,WantaoYing,XiaohongQian,FuzhouGong*,MinjiaTan*andYanFu*.PTMiner:LocalizationandQualityControlofProteinModificationsDetectedinanOpenSearchandItsApplicationtoComprehensivePost-translationalModificationCharacterizationinHumanProteome.Molecular&CellularProteomics,2019,18(2)391-405. ChengChang#,ZhiqiangGao#,WantaoYing#,YanFu*,YanZhao,SongfengWu,MengjieLi,GuibinWang,XiaohongQian*,YunpingZhu*,FuchuHe*.LFAQ:towardsunbiasedlabel-freeabsoluteproteinquantificationbypredictingpeptidequantitativefactors.AnalyticalChemistry,2019,91,1335−1343. YiShao,Chunyan,ChenHao,Shen,BinZHe,DaqiYu,ShuaiJiang,ShileiZhao,ZhiqiangGao,ZhenglinZhu,XiChen,YanFu,HuaChen,GeGao,ManyuanLong,YongEZhang.GenTree,anintegratedresourceforanalyzingtheevolutionandfunctionofprimate-specificcodinggenes.GenomeResearch,20190412;29(4):682-696. XinpeiYi#,BoWang#,ZhiwuAn,FuzhouGong*,JingLi*,YanFu*,Qualitycontrolofsingleaminoacidvariationsdetectedbytandemmassspectrometry,JournalofProteomics,187:144–151,2018. ZhiwuAn#,QingboShu#,HaoLv,LianShu,JifengWang,FuquanYang*,YanFu*,N-LinkedGlycopeptideIdentificationBasedonOpenMassSpectralLibrarySearch,BioMedResearchInternational,doi.org/10.1155/2018/1564136,2018. YanFu,DataAnalysisStrategiesforProteinModificationIdentification,InKlausJung(Ed.):StatisticalAnalysisinProteomics,HumanaPress,NewYork,NY,pp1362:265-75,2016. KunZhang#，YanFu*，Wen-FengZeng，KunHe，HaoChi，ChaoLiu，Yan-ChangLi，YuanGao，PingXu*，Si-MinHe*，Anoteonthefalsediscoveryrateofnovelpeptidesinproteogenomic，Bioinformatics，2015.06.14，3249~3253 ShanLu，Sheng-BoFan，BingYang，Yu-XinLi，Jia-MingMeng，LongWu，PinLi，KunZhang，Mei-JunZhang，YanFu，Jin-CaiLuo，Rui-XiangSun，Si-MinHe，Meng-QiuDong，Mappingnativedisulfidebondsataproteomescale，NatureMethods，2015.01.01，12：329~331 YanFu*，XiaohongQian，Transferredsubgroupfalsediscoveryrateforrarepost-translationalmodificationsdetectedbymassspectrometry，Molecular&CellularProteomics，2014.01.01，13（5）：1359~1368 YanFu,KernelMethodsandApplicationsinBioinformatics.InKasabov,NikolaK.(Ed.):HandbookofBio-/Neuro-Informatics,Springer-VerlagBerlinandHeidelbergGmbH&Co.K,pp275-285,2013. YanFu*，Bayesianfalsediscoveryratesforpost-translationalmodificationproteomics，StatisticsandItsInterface，2012.01.01，5（1）：47~59 YanFu*，Li-YunXiu，WeiJia，DingYe，Rui-XiangSun，Xiao-HongQian，Si-MinHe，DeltAMT:AStatisticalAlgorithmforFastDetectionofProteinModificationsFromLC-MS/MSData，Molecular&CellularProteomics，2011.5.01，10（5）：1~15 YanFu#*,RongPan,QiangYang,WenGao.Query-AdaptiveRankingwithSupportVectorMachinesforProteinHomologyPrediction.InProceedingsofthe7thInternationalSymposiumonBioinformaticsResearchandApplications(ISBRA2011).LectureNotesinBioinformatics,6674:320–331,2011 DingYe#，YanFu*，Rui-XiangSun*，Hai-PengWang，Zuo-FeiYuan，HaoChi,Si-MinHe，OpenMS/MSspectrallibrarysearchtoidentifyunanticipatedpost-translationalmodificationsandincreasespectralidentificationrate.InProceedingsofthe18thAnnualInternationalConferenceonIntelligentSystemsforMolecularBiology(ISMB2010).Bioinformatics,26(12):i399-i406,2010 JiaWei#，LuZhuang#，YanFu#，Hai-PengWang，WangLe-Heng，HaoChi，Zuo-FeiYuan，Zhao-BinZheng，Li-NaSong，Huan-HuanHan,Yi-MinLiang，Jing-LanWang，YunCai，Yu-KuiZhang，Yu-LinDeng，Wan-TaoYing*，Si-MinHe*，Xiao-HongQian*，AStrategyforPreciseandLargeScaleIdentificationofCoreFucosylatedGlycoproteins，MOLECULAR&CELLULARPROTEOMICS，2009.5.01，8（5）：913~923 YanFu#*，WeiJia，ZhuangLu，HaipengWang，ZuofeiYuan，ZuofeiYuan，HaoChi，YouLi，LiyunXiu，WenpingWang，ChaoLiu，LehengWang，RuixiangSun，WenGao，XiaohongQian，Si-MinHe，Efficientdiscoveryofabundantpost-translationalmodificationsandspectralpairsusingpeptidemassandretentiontimedifferences.InProceedingsofthe7thAsia-PacificBioinformaticsConference(APBC2009)，BMCBioinformatics，2009.01.01，10：S50~S50 安志武#，付岩*，基于质谱的蛋白质修饰定位算法，生命的化学2017.2.01，37（1）：104~112

我收藏的导师 >

便捷的期刊搜索 >

免费课题组网站 >

推荐链接