孙乐 - 中国科学院大学 - 计算机科学与技术学院

个人简介

招生专业 081202-计算机软件与理论招生方向中文信息处理与信息检索教育背景 1995-05--1998-05 南京理工大学博士 1992-09--1995-03 南京理工大学硕士 1988-09--1992-07 南京理工大学本科工作简历 2010-03--今中国科学院软件研究所研究员 2004-12--2005-12 加拿大Montreal大学计算机系访问学者 2003-03--2003-09 英国Birmingham大学语料库研究中心访问学者 2001-01--2010-12 中国科学院软件研究所副研究员 1998-07--2000-10 中国科学院软件研究所博士后教授课程智能信息检索专利成果（1）一种机器翻译方法，发明，2010，第2作者，专利号：201010191769.8 （2）一种基于隐含狄利克雷分配模型的并行数据处理方法，发明，2010，第2作者，专利号：ZL 200810126728.3 （3）一种音字转换方法，发明，2010，第2作者，专利号：ZL 200910079270.5 （4）一种高效的关联主题模型数据处理方法及其系统，发明，2010，第2作者，专利号：ZL 200810057989.4 （5）一种挖掘查询语句子话题并聚类的信息搜索方法，发明，2012，第1作者，专利号：ZL201210004772.3 译著： Daniel Jurafsky & James H. Martin，冯志伟孙乐译《自然语言处理综论》，电子工业出版社， 2005年6月 (SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition) (第二版，将于2014年出版) 在研项目 *基于查询语义分析与推理的隐式相关反馈检索模型研究，主持，国家自然科学基金 *智慧城市中的搜索问题，参与，863 已完成的部分科研项目 * 语义计算与理解的资源共享与测评方法，国家自然科学基金重大研究计划培育项目 * 网站分类系统研发，国家发改委项目，与人民搜索合作 * 下一代信息检索，国家自然科学基金重点项目子课题，与哈尔滨工业大学、清华大学合作 * 基于NLP的高精度文本检索模型研究，国家自然科学基金面上项目 * 大规模网络文本数据的语义理解和分类技术，863项目 * 面向跨语言搜索的机器翻译关键技术研究，863重点项目子课题，与中科院计算所、哈工大、厦大、中科院自动化所合作 * 面向短文本的语义理解技术，242项目 * 基于语言知识库的统计翻译模型研究，国家自然科学基金 * WEB信息检索中客户端用户行为研究，富士通国际合作项目 *中文信息处理和智能人机接口技术评测，863项目子课题 * 基于WEB的特定领域双语词汇关系研究，富士通国际合作项目 * 基于时间序列的BBS热点话题监控系统研究与实现，富士通国际合作项目 *北京市科技新星计划A类获奖 2008 全国信息检索与内容安全学术会优秀论文 2007 中国科学院软件研究所优秀导师奖 2006 全国学生计算语言学研讨会优秀论文 2006 AFNLP-Nagao Fund for COLING/ACL Participation Award （亚洲自然语言处理协会的COLING/ACL 参与奖）国际会议参与 2014：ACL（Reviewer), SIGIR (Reviewer) 2012： ACL（Demo Reviewer), AIRS (General Co-Chair), COLING, IALP, IWSLT 2011： SIGIR（Paper Reviewer，Mentoring Committee member, Local committee member）， IJCNLP （Publicity co-chair）,AIRS，MT Summit(local committee co-chair)，CLIA, PACLIC 2010： COLING-2010 (local committee co-chair)，SIGIR2010（Reviewer） CLP2010 （General Chair） 2009: NTCIR (MOAT task Co-organizer)，AIRS 2009 2008： NTCIR (MOAT task Co-organizer); Fourth Asia Information Retrieval Symposium AIRS 2008; WWW 2008 Workshop, NLP Challenges in the Information Explosion Era; International Conference on Asian Language Processing

研究领域

基于知识的自然语言理解、文本信息检索、信息抽取与问答、跨语言信息检索等

近期论文

查看导师新发文章（温馨提示：请注意重名现象，建议点开原文通过作者单位确认）

国际会议： [1] Xianpei Han, Le Sun and Jun Zhao. Collective Entity Linking in Web Text: A Graph-Based Method. In Proceedings of the 34th Annual ACM SIGIR Conference (SIGIR 2011). Beijing, China, July 24-28, 2011. pp. 765-774. [2] Xianpei Han, Le Sun, A GENERATIVE ENTITY-MENTION MODEL FOR LINKING ENTITIES WITH KNOWLEDGE BASE, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL-HLT 2011), Portland, Oregon, USA, June 19-24, 2011 [3] Zhenzhong Zhang and Le Sun. Improving Word Sense Induction by Exploiting Semantic Relevance. In Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP2011).Chiang Mai, Thailand. Nov. 8-13, 2011. [4] Xue Jiang, Xianpei Han and Le Sun. ISCAS at Subtopic Mining Task in NTCIR9. In Proceedings of NII Test Collection for IR Systems ( NTCIR2011). Tokyo, Japan. December 6-9, 2011. [5]Yunping Huang, Le Sun. Query Model Refinement Using Word Graphs.In Proceedings of the 18th International Conference on Information and Knowledge Management（CIKM 2010）, Toronto, Canada, October 26-30, 2010 [6] Wenbo Li, Le Sun, Zhenzhong Zhang, Xue Jiang, Weiru Zhang. TC-DCA: A System for Text Classification Based on Document’s Content Allocation. In Proceedings of the 18th International Conference on Information and Knowledge Management （CIKM 2010）, Toronto, Canada, October 26-30, 2010 [7] Yunping Huang, Le Sun. A Unified Iterative Optimization Algorithm for Query Model and Ranking Refinement. The Sixth Asia Information Retrieval Societies Conference （AIRS 2010） [8] Dakun Zhang, Le Sun, Wenbo Li. Improving Phrase-based SMT Model with Flattened Bilingual Parse Tree. In Proceedings of the 6th IEEE International Conference on Natural Language Processing and Knowledge Engineering (NLPKE2010) [9] Zhenzhong Zhang, Le Sun, Wenbo Li. ISCAS: A System of Chinese Word Sense Induction Based on K-means Algorithm. In Proceedings of the 1st CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP2010) [10] Le Sun，Zhenzhong Zhang,, Qiang Dong. Overview of Chinese Word Sense Induction at Task-4 at CLP2010. In Proceedings of the 1st CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP2010) [11] Yunping Huang, Le Sun, Jian-Yun Nie. Smoothing Document Language Model with Local Word Graph. In Proceedings of the 18th International Conference on Information and Knowledge Management (CIKM 2009) [12] Yunping Huang, Le Sun, Zhe Wang. A Unified Graph-Based Iterative Reinforcement Approach to Personalized Search. The Fifth Asia Information Retrieval Symposium (AIRS 2009) [13] Dakun Zhang, Le Sun, Wenbo Li, A Structured Prediction Approach for Statistical Machine Translation, In Proceedings of the 4th International Joint Conference on Natural Language Processing（IJCNLP 2008）, pp. 649-654. Hyderabad,India [14] Le Sun, Introduction of the HTRDP Chinese IR Evaluation Task,: In Proceedings of the First International Workshop on Evaluating Information Access (EVIA 2007, Invited paper ) [15] Li Jing, Le Sun, Kit Chun Yu, J. Webster, A Query-focused Multi-document Summarizer based on Lexical Chains, DUC workshop（DUC2007）, 2007 [16] Yuanyong Feng, Ruihong Huang, Le Sun, Two-step Chinese Named Entity Recognition Based on Conditional Random Fields, Proceedings of SIGHAN Workshop(SIGHAN2007), 2007 [17] Ruihong Huang, Longxi Pan, Le Sun, ISCAS in Opinion Analysis Pilot Task: Experiment with Sentimental Dictionary based Classifier and CRF Model, Proceedings of NTCIR Workshop Meeting(NTCIR2007), Tokyo, Japan, May, 2007 [18]Yuanhua Lv, Le Sun, etc. An Iterative Implicit Feedback Approach to Personalized Search， Proceeding of COLING/ACL2006(COLING/ACL2006)，Sydney [19]Yuanyong Feng. Le Sun. Yuanhua Lv, Chinese Word Segmentation and Named Entity Recognition Based on Conditional Random Fields Models , In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing(SIGHAN2006), 2006，Sydney [20]Quan Zhou, Le Sun, Yuanhua Lv, ISCAS at DUC06, Proceeding of the Document Understanding Conferences (DUC2006) [21] Jinming Min, Le Sun and Junlin Zhang, ISCAS in English-Chinese CLIR at NTCIR-5, Proceedings of the Fifth NTCIR Workshop on Research in Information Access Technologies Information Retrieval, Question Answering and Summarization(NTCIR2005), Tokyo Japan, 2005. [22] Quan Zhou, Le Sun, Jian-Yun Nie, A Multi-Document Summarizer based on Document Index Graphic and Lexical Chains, Proceeding of the Document Understanding Conferences (DUC 2005) [23] Junlin Zhang, Le Sun, Using the Web Corpus to Translate the Queries in Cross-Lingual Information Retrieval, 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering(NLPKE2005). Oct., 2005 [24] Yuanyong Feng, Le Sun and Julin Zhang, Early Results for Chinese Named Entity Recognition Using Conditional Random Fields Model, HMM and Maximum Entropy, 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering. (NLPKE2005) Oct., 2005. [25] Junlin Zhang, Le Sun, Quan zhou,A Cue-based Hub-Authority Approach for Multi-Document Text Summarization, 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering(NLPKE2005). Oct., 2005. [26] Zhang Junlin, Sun le, Lv Yuanhua,Zhang Wei, Relevance Feedback by Exploring the Different Feedback Source and Collection Structure, In Proceeding of the Text REtrieval Conference (TREC2005). [27] Sun le, Zhang Junlin, Sun Yufang, ISCAS at TREC2004:HARD Track. Proceeding of the Text REtrieval Conference (TREC2004). [28] Zhang Junlin, Sun Le , Qu Weimin, Sun Yufang. A Trigger Language Model-based IR system, The 20th International Conference on Computational Linguistics(COLING2004). Geneva, Switzerland, Vol.1, pp. 680-686, Aug, 2004 [29] Zhang Junlin, Sun Le , Yongchen Zhang, Applying Language Model into IR Task, NTCIR Workshop Fourth Meeting,2004. (NTCIR2004) [30] Zhang JL, Sun Le, Qu WM, et al., A three level cache-based adaptive Chinese language model, 1st International Joint Conference on Natural Language Processing (IJCNLP 2004), MAR 22-24, 2004 [31] Zeng Wu, Lin Du, Le Sun, Shiwei Ye, TREC12 HARD Track at ISCAS, Proceeding of the Text REtrieval Conference (TREC 2003). [32] Sun Le, Qu Wei-min, Xue Song, Constructing of a Large-Scale Chinese-English Parallel Corpus, In Coling2002 the 3rd Workshop on Asian Language Resources and International Standardization(COLING2002), TaiWan, 2002 [33] Zhang Jun-lin Zhang，Sun Le, Qu Wei-min, Du Lin, Xue Song, ISCAS IN NTCIR-3(NTCIR2002), NTCIR-3, Tokyo, Japan, 2002 [34] Sun Le, Zhang YiBo, Zhang JunLin, Sun YuFang, PECAT: A Computer-Aided Translation Tool Based On Bilingual Corpora, Proceeding of the IEEE SMC 2001(SMC2001), Tucson, Arizona,USA, Oct. 7-10, 2001, p927~932 [35] Sun Le, Zhang Junlin, Qu Weiming, Sun Yufang, Evaluation of an English-Chinese CLIR Experimental System Based on Bilingual Dictionary, International Conference on Chinese Computing(ICCC2001), Singapore, Nov. 2001 [36] Zhang Yibo, Sun Le, Du Lin, Jin Youbing, Sun Yufang, ISCAS’ Text Retrieval in NTCIR Workshop II, Proceedings of the Second NTCIR Workshop Research in Chinese & Japanese Text Retrieval and Text Summarization(NTCIR2001), Tokyo, Japan, pp.146-153， Mar. 7-9, 2001 书摘： [1] 孙乐、孙玉芳，“从应用角度看自然语言处理”，《中文信息处理若干重要问题》，科学出版社，2003，p281-290 [2] Zhang Junlin, Sun Le, Qu Weimin, Du Lin, Sun Yufang, A Three Level Cache-based Adaptive Chinese Language Model, Lecture Notes in Computer Science (Springer) 2005. Volume 3248/2005, 487-492 [3] Sun Le, A User Adaptive Framework for Computer-aided Translation System, Chapter 9 in book Computer-aided Translation: Theory and Practice, 2007 [4] Wenbo Li, Le Sun, etc. Smoothing LDA Model for Text Categorization., Lecture Notes in Computer Science (Springer) Volume 4993, pp. 83–94 [5] Li Jing, Le Sun, A Lexical Chain Approach for Query-focused Update-style Multi-document Summarization, Lecture Notes in Computer Science (Springer), 2008, Volume 4993, pp.310-320 [6] Ruihong Huang, Le Sun,Yuanyong Feng, Study of kernel-based Methods for Chinese Relation Extraction, Lecture Notes in Computer Science (Springer), 2008, Volume 4993/2008, 598-604 期刊： [1]王俞霖，孙乐，李文波. 基于VASE特征词的网络查询分类研究. 中文信息学报. 23卷第3期. 2009 [2]黄瑞红，孙乐，冯元勇，黄云平，基于核方法的中文实体关系抽取研究，中文信息学报，22（5），pp.102-108, 2008 [3] 李文波，孙乐，张大鲲. 基于Labeled-LDA 模型的文本分类新算法. 计算机学报，pp.620-627，31(4)，2008 [4] 李文波，孙乐，诺明花，吴健. 基于核方法的敏感信息过滤的研究. 通信学报，pp.57-62，29(4)，2008 [5] 冯元勇，孙乐，张大鲲，李文波. 基于单字提示特征的中文命名实体识别快速算法，中文信息学报，22(1), 2008 [6] 冯元勇，孙乐，张大鲲，李文波. 基于小规模尾字特征的中文命名实体识别研究，电子学报，9(36)，2008 [7] 冯元勇，孙乐，董静，李文波. 基于分类信心重排序的中文共指消解研究，中文信息学报，21(6): 22-28. 2007 [8] LIU Qun, WANG Xiangdong, LIU Hong, SUN Le, TANG Sheng, XIONG Deyi, HOU Hongxu, LV Yuanhua, LI Wenbo, LIN Shouxun, QIAN Yueliang，Introduction to HTRDP evaluations on Chinese information processing and intelligent human-machine interface, Frontiers of Computer Sciences in China, Vol.1, No.1, Feb.2007 [9] 董静，孙乐，冯元勇,黄瑞红，中文实体关系抽取中的特征选择研究，中文信息学报，2007,21(4):80-85 [10] 张玮，孙乐，冯元勇，李文波，黄瑞红，词汇搭配和用户模型在拼音输入法中的应用，中文信息学报，2007，21(4):105-110 [11] 张大鲲，张炜, 冯元勇，孙乐基于非连续短语的统计翻译模型研究，中文信息学报，2007，21（1） [12] 张俊林，刘洋，孙乐，刘群, 2005年度863 信息检索评测方法研究和实施, 中文信息学报，2006 [13] 闽金明,孙乐,张俊林, 重新审视跨语言信息检索, 中文信息学报, 2006，Vol 20（4） [14] 张永臣，孙乐，等中文信息学报，基于数据的特定领域双语词典抽取，中文信息学报，2006,20（2） [15] 张俊林，孙乐, 孙玉芳，一种改进的基于记忆的自适应汉语语言模型，中文信息学报，2005，19（1） [16] 曲为民，张俊林，孙乐，孙玉芳，Difx：利用动态索引算法实现高效的XML数据查询，计算机研究与发展，2005， Vol.42 No.11 [17] 曲为民，孙乐，孙玉芳，XML数据查询中值匹配查询代价估计算法的研究，软件学报，2005年4月，16（4） [18] 张俊林，孙乐，孙玉芳，基于主题语言模型的中文信息检索系统研究，中文信息学报，2005，19（3） [19] 张俊林,曲为民,孙乐,孙玉芳一种改善的基于语言模型的中文检索系统研究, 中文信息学报，2004,18 (2) [20] 曲为民，孙乐，孙玉芳，半结构化中文信息检索中查询结果相关度算法的研究，中文信息学报，2004,18（4） [21] 曲卫民，张俊林，孙乐，基于主题的汉语语言模型的研究，计算机研究与发展, 2003，Vol, 40, No.9, p1368~1374 [22] 曲为民，张俊林，孙乐，孙玉芳，基于记忆的中文自适应语言模型的研究，中文信息学报，2003,Vol 17 (5)

学术兼职

2011-12--今中国中文信息学会，副理事长兼秘书长中文信息学报，副主编 2006-