關(guān)于我們
書單推薦
新書推薦
|
文本數(shù)據(jù)挖掘(英文版) 《Text data mining》 offers thorough and detailed introduction to the fundamental theories and methods of text data mining, ranging from pre-processing (for both Chinese and English texts), text representation, feature selection, to text classification and text clustering. Also it presents predominant applications of text data mining, for example, topic model, sentiment analysis and opinion mining, topic detection and tracking, information extraction, and text automatic summarization, etc. 《文本數(shù)據(jù)挖掘(英文版)》面向文本挖掘任務(wù)的實際需求,通過實例從原理上對相關(guān)技術(shù)的理論方法和實現(xiàn)算法進行闡述,寫作風(fēng)格力求言簡意賅,深入淺出,而不過多地涉及實現(xiàn)細(xì)節(jié),盡量使讀者能夠在充分理解基本原理的基礎(chǔ)上掌握應(yīng)用系統(tǒng)的實現(xiàn)方法。 Preface 600 times. Jiajun Zhang joined our institute after he graduated from university in 2006 and studied in my group in pursuit of his Ph.D. degree. He mainly engaged in machine translation research, but he performed well in many research topics, such as multilanguage automatic summarization, information extraction, and human computer dialogue systems. Since 2016, he has been teaching some parts of the course on Natural Language Processing in cooperation with me, such as machine translation, automatic summarization, and text classi?cation, at the University of Chinese Academy of Sciences; this course is very popular with students. With the solid theoretical foundation of these two talents and their keen scienti?c insights, I am grati?ed that many cutting-edge technical methods and research results could be veri?ed and practiced and included in this book. Beijing, China Chengqing Zong Chengqing Zong is professor at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences. He serves as chairs for many prestigious conferences such as ACL-IJCNLP, IJCAI, IJCAI-ECAI, AAAI and COLING, etc., and served as associate editors for prestigious journals such as TALLIP, Machine Translation, etc. He is the President of Asian Federation on Natural Language Processing and a member of International Committee on Computational Linguistics. 1 Introduction 1 5.4 Deep Learning Methods ............................................. 111 5.4.1 Multilayer Feed-Forward Neural Network ................ 111 5.4.2 Convolutional Neural Network ............................ 113 5.4.3 Recurrent Neural Network ................................. 115 5.5 Evaluation of Text Classi?cation 120 7.3.1 Model Hypothesis .......................................... 150 7.3.2 Parameter Learning ......................................... 151 7.4 Latent Dirichlet Allocation .......................................... 153 7.4.1 Model Hypothesis .......................................... 153 7.4.2 Joint Probability ............................................ 155 7.4.3 Inference in LDA ........................................... 158 7.4.4 Inference for New Documents ............................. 160 7.5 Further Reading 161 8.3 Methods for Document/Sentence-Level Sentiment Analysis 168 8.5.1 Aspect Term Extraction .................................... 183 8.5.2 Aspect-Level Sentiment Classi?cation .................... 186 8.5.3 Generative Modeling of Topics and Sentiments .......... 191 8.6 Special Issues in Sentiment Analysis................................ 193 8.6.1 Sentiment Polarity Shift .................................... 193 8.6.2 Domain Adaptation ......................................... 195 8.7 Further Reading ...................................................... 198 9 Topic Detection and Tracking ............................................. 201 9.1 History of Topic Detection and Tracking ........................... 201 9.2 Terminology and Task De?nition.................................... 202 9.2.1 Terminology ................................................ 202 9.2.2 Task ......................................................... 203 9.3 Story/Topic Representation and Similarity Computation .......... 206 9.4 Topic Detection....................................................... 209 9.4.1 Online Topic Detection ..................................... 209 9.4.2 Retrospective Topic Detection ............................. 211 9.5 Topic Tracking........................................................ 212 9.6 Evaluation ............................................................ 213 9.7 Social Media Topic Detection and Tracking ........................ 215 9.7.1 Social Media Topic Detection.............................. 216 9.7.2 Social Media Topic Tracking .............................. 217 9.8 Bursty Topic Detection............................................... 217 9.8.1 Burst State Detection ....................................... 218 9.8.2 Document-Pivot Methods .................................. 221 9.8.3 Feature-Pivot Methods ..................................... 222 9.9 Further Reading ...................................................... 224 10 Information Extraction 227
10.3.1 Clustering-Based Entity Disambiguation Method ........ 243 10.3.2 Linking-Based Entity Disambiguation .................... 248 10.3.3 Evaluation of Entity Disambiguation .. . . . ................. 254 10.4 Relation Extraction ................................................... 256 10.4.1 Relation Classi?cation Using Discrete Features .......... 258 10.4.2 Relation Classi?cation Using Distributed Features ....... 265 10.4.3 Relation Classi?cation Based on Distant Supervision .. . . 268 10.4.4 Evaluation of Relation Classi?cation . ..................... 269 10.5 Event Extraction 270 10.5.1 Event Description Template................................ 270 10.5.2 Event Extraction Method ................................... 272 10.5.3 Evaluation of Event Extraction ............................ 281 10.6 Further Reading ...................................................... 281 11 Automatic Text Summarization 285 Encoder-Decoder Framework .............................. 313 11.5 Query-Based Automatic Summarization ............................ 316 11.5.1 Relevance Calculation Based on the Language Model . . . 317 11.5.2 Relevance Calculation Based on Keyword Co-occurrence .............................................. 317 11.5.3 Graph-Based Relevance Calculation Method ............. 318 11.6 Crosslingual and Multilingual Automatic Summarization ......... 319 11.6.1 Crosslingual Automatic Summarization .. . ................ 319 11.6.2 Multilingual Automatic Summarization .. . . ............... 323 11.7 Summary Quality Evaluation and Evaluation Workshops.......... 325 11.7.1 Summary Quality Evaluation Methods .................... 325 11.7.2 Evaluation Workshops...................................... 330 11.8 Further Reading ...................................................... 332 References 335
你還可能感興趣
我要評論
|