AI 人工智能新闻资讯: 自然语言处理领域重要论文&资源全索引

2017年10月13日星期五

自然语言处理领域重要论文&资源全索引

自然语言处理（NLP）是人工智能研究中极具挑战的一个分支。随着深度学习等技术的引入，NLP 领域正在以前所未有的速度向前发展。但对于初学者来说，这一领域目前有哪些研究和资源是必读的？最近，Kyubyong Park 为我们整理了一份完整列表。

GitHub 项目链接：http://ift.tt/2yfj72h

本人从事自然语言处理任务（NLP）的研究已经有很长时间了，有一天我想到，我需要为庞大的 NLP 领域做一个概览，我知道自己肯定不是想要一睹 NLP 任务的全貌的第一个人。

我曾竭尽所能的研究过尽可能多种类型的 NLP 任务，但由于个人知识的局限，我承认还远远没有穷尽整个领域。目前，该项目选取的参考文献都偏重最新的深度学习研究成果。我希望这些能为想要深入钻研一个 NLP 任务的人们提供一个开端。这个项目将持续更新，不过，我更希望与更多人合作。如果你有意愿的话，欢迎对这个项目作出贡献。

回指解析

See Coreference Resolution （http://ift.tt/2xEDQrO

自动作文评分

论文：AutomaticText Scoring Using Neural Networks （http://ift.tt/2yn06Kp
论文：ANeural Approach to Automated Essay Scoring （http://ift.tt/2xFfmPm
竞赛：Kaggle:The Hewlett Foundation: Automated Essay Scoring （http://ift.tt/2ynjS8y
项目：EnhancedAI Scoring Engine（http://ift.tt/2xErpMN

自动语音识别

WIKI Speech recognition（http://ift.tt/2ymCJ3E
论文：DeepSpeech 2: End-to-End Speech Recognition in English and Mandarin （http://ift.tt/2xEEjdF
论文：WaveNet:A Generative Model for Raw Audio （http://ift.tt/2ynlFL9
项目：A TensorFlow implementation of Baidu's DeepSpeech architecture （http://ift.tt/2xFflLi
项目：Speech-to-Text-WaveNet: End-to-end sentence level English speech recognition using DeepMind's WaveNet（http://ift.tt/2ynanGK
竞赛：The 5thCHiME Speech Separation and Recognition Challenge （http://ift.tt/2xERcEC
资源：The 5thCHiME Speech Separation and Recognition Challenge （http://ift.tt/2ynUsrD
资源：CSTRVCTK Corpus （http://ift.tt/2xEEkhJ
资源：LibriSpeechASR corpus （http://ift.tt/2ynl4ZM
资源：Switchboard-1Telephone Speech Corpus （http://ift.tt/2xEe7ju
资源：TED-LIUMCorpus （http://ift.tt/2ynapym

自动摘要

WIKI Automatic summarization （http://ift.tt/2xEMuqr
书籍：AutomaticText Summarization （http://ift.tt/2ynqUKQ
论文：TextSummarization Using Neural Networks （http://ift.tt/2xEy9KM
论文：Rankingwith Recursive Neural Networks and Its Application to Multi-DocumentSummarization （http://ift.tt/2ynjU0a
资源：TextAnalytics Conferences（TAC）（http://ift.tt/2xEMtmn
资源：DocumentUnderstanding Conferences （DUC）（http://ift.tt/2ynjSWt

指代消解

INFO Coreference Resolution（http://ift.tt/2xEwiWd
论文：DeepReinforcement Learning for Mention-Ranking Coreference Models （http://ift.tt/2ymMwa0
论文：ImprovingCoreference Resolution by Learning Entity-Level Distributed Representations（http://ift.tt/2xERdIG
竞赛：CoNLL2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes（http://ift.tt/2yn47OV
竞赛：CoNLL2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes （http://ift.tt/2xEJM42

实体链接

见「命名实体消歧」部分

语法错误纠正

论文：NeuralNetwork Translation Models for Grammatical Error Correction （http://ift.tt/2ynqT9K
竞赛：CoNLL-2013Shared Task: Grammatical Error Correction （http://ift.tt/2xErLCX
竞赛：CoNLL-2014Shared Task: Grammatical Error Correction （http://ift.tt/2yn5UDK
资源：NUSNon-commercial research/trial corpus license （http://ift.tt/2xEDvFT
资源：Lang-8Learner Corpora（http://ift.tt/2ynj7fG
资源：CornellMovie--Dialogs Corpus （http://ift.tt/2xEfvT4
项目：DeepText Corrector（http://ift.tt/2ynlG1F
产品：deepgrammar（http://ift.tt/2xEwgxo

字素音素转换

论文：Grapheme-to-PhonemeModels for （Almost） Any Language （http://ift.tt/2ymCJka
论文：PolyglotNeural Language Models: A Case Study in Cross-Lingual Phonetic RepresentationLearning （http://ift.tt/2xEJLgu
论文：MultitaskSequence-to-Sequence Models for Grapheme-to-Phoneme Conversion （http://ift.tt/2ymRkfq
项目：Sequence-to-Sequence G2P toolkit （http://ift.tt/2xEx9pX
资源：Multilingual Pronunciation Data （http://ift.tt/2ynfQNA

语种猜测

见「语种辨别」部分

语种辨别

WIKI Language identification （http://ift.tt/2xEfwX8
论文：AUTOMATICLANGUAGE IDENTIFICATION USING DEEP NEURAL NETWORKS （http://ift.tt/2ynhO0r
竞赛： 2015Language Recognition Evaluation （http://ift.tt/2xEPfYB

语言建模

WIKI Language model （http://ift.tt/2ymMxL6
工具包： KenLMLanguage Model Toolkit （http://ift.tt/2xEe7A0
论文：DistributedRepresentations of Words and Phrases and their Compositionality （http://ift.tt/2ynkpaM
论文：Character-AwareNeural Language Models （http://ift.tt/2xEMtCT
资源： PennTreebank （http://ift.tt/2ymCLbM

语种识别

见「语种辨别」部分

同一词类

WIKI Lemmatisation （http://ift.tt/2xEcjHg
论文： JointLemmatization and Morphological Tagging with LEMMING （http://ift.tt/2ynkqLS
工具包：WordNet Lemmatizer （http://ift.tt/2xErrnT
资源：Treebank-3 （http://ift.tt/2ynl6AS

观唇辨意

WIKI Lip reading （http://ift.tt/2xEQ0Rz
论文：LipReading Sentences in the Wild （http://ift.tt/2ynfLcK
论文：3DConvolutional Neural Networks for Cross Audio-Visual Matching Recognition （http://ift.tt/2xEwjcJ
项目： LipReading - Cross Audio-Visual Recognition using 3D Convolutional Neural Networks（http://ift.tt/2ymTY4Z
资源： TheGRID audiovisual sentence corpus （http://ift.tt/2xEPZNv

机器翻译

论文：NeuralMachine Translation by Jointly Learning to Align and Translate （http://ift.tt/2ynlHTh
论文：NeuralMachine Translation in Linear Time （http://ift.tt/2xEEQfB
论文：AttentionIs All You Need （http://ift.tt/2sUoTUo
竞赛： ACL2014 NINTH WORKSHOP ON STATISTICAL MACHINE TRANSLATION （http://ift.tt/2ynaq5o
竞赛： EMNLP2017 SECOND CONFERENCE ON MACHINE TRANSLATION （WMT17）（http://ift.tt/2xEH52A
资源：OpenSubtitles2016 （http://ift.tt/2ynj9nO
资源： WIT3:Web Inventory of Transcribed and Translated Talks （https://wit3.fbk.eu/）
资源： TheQCRI Educational Domain （QED） Corpus （http://ift.tt/2xEsGU0

生成词法变化

WIKI Inflection （http://ift.tt/2ynj7wc
论文：MorphologicalInflection Generation Using Character Sequence to Sequence Learning （http://ift.tt/2xEqKe4
竞赛：SIGMORPHON 2016 Shared Task: Morphological Reinflection （http://ift.tt/2ynnmrR
资源：sigmorphon2016 （http://ift.tt/2xErqjP

命名实体消歧

WIKI Entity linking （http://ift.tt/2ynbF4w
论文：Robustand Collective Entity Disambiguation through Semantic Embeddings （http://ift.tt/2xEwkgN

命名实体识别

WIKI Named-entity recognition （http://ift.tt/2ynlKhV
论文：NeuralArchitectures for Named Entity Recognition （http://ift.tt/2xEgKSr
项目： OSUTwitter NLP Tools （http://ift.tt/2ynfR46
竞赛： NamedEntity Recognition in Twitter （http://ift.tt/2xErqAl
竞赛： CoNLL2002 Language-Independent Named Entity Recognition （http://ift.tt/2ymMy1C
竞赛：Introduction to the CoNLL-2003 Shared Task: Language-Independent Named EntityRecognition （http://ift.tt/2xEDPUM
资源：CoNLL-2002 NER corpus （http://ift.tt/2yn3JAf
资源：CoNLL-2003 NER corpus （http://ift.tt/2xEqLi8
资源： NUTNamed Entity Recognition in Twitter Shared task （http://ift.tt/2ynksU0

释义检测

论文：DynamicPooling and Unfolding Recursive Autoencoders for Paraphrase Detection （http://ift.tt/2xEaJ8f
项目：Paralex: Paraphrase-Driven Learning for Open Question Answering （http://ift.tt/2yno0FT
资源：Microsoft Research Paraphrase Corpus （http://ift.tt/2xErSi6
资源：Microsoft Research Video Description Corpus （http://ift.tt/2yn0993
资源： PascalDataset （http://ift.tt/2xEcMsJ
资源：Flicker Dataset （http://ift.tt/2ymCLsi
资源： TheSICK data set （http://ift.tt/2xErrEp
资源： PPDB:The Paraphrase Database （http://ift.tt/2ynl6Ro
资源：WikiAnswers Paraphrase Corpus （http://ift.tt/2xEEQw7

语法分析

WIKI Parsing （http://ift.tt/2ymCNjU
工具包： TheStanford Parser: A statistical parser （http://ift.tt/2xEH5j6
工具包： spaCyparser （http://ift.tt/2ynjUxc
论文：A fastand accurate dependency parser using neural networks （http://ift.tt/2xEJLx0
竞赛： CoNLL2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies （http://ift.tt/2ynjTtv
竞赛： CoNLL2016 Shared Task: Multilingual Shallow Discourse Parsing （http://ift.tt/2xEsIeA
竞赛： CoNLL2015 Shared Task: Shallow Discourse Parsing （http://ift.tt/2ynnkjB
竞赛：SemEval-2016 Task 8: The meaning representations may be abstract, but this taskis concrete! （http://ift.tt/2xEgK4T

词性标记

WIKI Part-of-speech tagging （http://ift.tt/2yneGSd
论文：MultilingualPart-of-Speech Tagging with Bidirectional Long Short-Term Memory Models andAuxiliary Loss （http://ift.tt/2xERdbE
论文：UnsupervisedPart-Of-Speech Tagging with Anchor Hidden Markov Models （http://ift.tt/2yn5YU0
资源：Treebank-3 （http://ift.tt/2ynl6AS
工具包：nltk.tag package （http://ift.tt/2xEwkxj

拼音-中文转换

论文：NeuralNetwork Language Model for Chinese Pinyin Input Method Engine （http://ift.tt/2ynfSVI
项目： NeuralChinese Transliterator （http://ift.tt/2xEgL8X

问答系统

WIKI Question answering （http://ift.tt/2ynovjc
论文：Ask MeAnything: Dynamic Memory Networks for Natural Language Processing （http://ift.tt/2xEx9Gt
论文：DynamicMemory Networks for Visual and Textual Question Answering （http://ift.tt/2ynbFl2
竞赛： TRECQuestion Answering Task （http://ift.tt/2xEyahO
竞赛：NTCIR-8: Advanced Cross-lingual Information Access （ACLIA）（http://ift.tt/2ynniIv
竞赛： CLEFQuestion Answering Track （http://ift.tt/2xEqKuA
竞赛：SemEval-2017 Task 3: Community Question Answering （http://ift.tt/2ynhQp5
资源： MSMARCO: Microsoft MAchine Reading COmprehension Dataset （http://ift.tt/2xErM9Z
资源：Maluuba NewsQA （http://ift.tt/2ynhRt9
资源： SQuAD:100,000+ Questions for Machine Comprehension of Text （http://ift.tt/2xExaKx
资源：GraphQuestions: A Characteristic-rich Question Answering Dataset （http://ift.tt/2ynhS0b
资源： StoryCloze Test and ROCStories Corpora （http://ift.tt/2xEe9b6
资源：Microsoft Research WikiQA Corpus （http://ift.tt/2ynfTsK
资源：DeepMind Q&A Dataset （http://ift.tt/2xEckei
资源： QASent（http://ift.tt/2ynfTZM

关系提取

WIKI Relationship extraction （http://ift.tt/2xEPgvD
论文：A deeplearning approach for relationship extraction from interaction context insocial manufacturing paradigm （http://ift.tt/2ynhSxd

语义角色标注

WIKI Semantic role labeling （http://ift.tt/2xEPhzH
书籍：Semantic Role Labeling （http://ift.tt/2ynhT4f
论文：End-to-endLearning of Semantic Role Labeling Using Recurrent Neural Networks （http://ift.tt/2xEe872
论文：NeuralSemantic Role Labeling with Dependency Path Embeddi ngs （http://ift.tt/2ynfUwO
竞赛：CoNLL-2005 Shared Task: Semantic Role Labeling （http://ift.tt/2xFfnCU
竞赛：CoNLL-2004 Shared Task: Semantic Role Labeling （http://ift.tt/2ynhU8j
工具包：Illinois Semantic Role Labeler （SRL）（http://ift.tt/2xEEl5h
资源：CoNLL-2005 Shared Task: Semantic Role Labeling （http://ift.tt/2ynfV3Q

语句边界消歧

WIKI Sentence boundary disambiguation （http://ift.tt/2xEDRfm
论文：AQuantitative and Qualitative Evaluation of Sentence Boundary Detection for theClinical Domain （http://ift.tt/2ynhUFl
工具包： NLTKTokenizers （http://ift.tt/2xEgLpt
资源： TheBritish National Corpus （http://ift.tt/2ynfVAS
资源：Switchboard-1 Telephone Speech Corpus （http://ift.tt/2xEe7ju

情绪分析

WIKI Sentiment analysis （http://ift.tt/2xEEPIz
INFO Awesome Sentiment Analysis （http://ift.tt/2ynfW7U
竞赛：Kaggle: UMICH SI650 - Sentiment Classification （http://ift.tt/2xEe9rC
竞赛：SemEval-2017 Task 4: Sentiment Analysis in Twitter （http://ift.tt/2ynfWEW
竞赛：SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogsand News （http://ift.tt/2xErSP8
项目：SenticNet （http://ift.tt/2ynhVcn
资源：Multi-Domain Sentiment Dataset （version2.0）（http://ift.tt/2xEgKlp
资源：Stanford Sentiment Treebank （http://ift.tt/2ynhVJp
资源：Twitter Sentiment Corpus （http://ift.tt/2xEckuO
资源：Twitter Sentiment Analysis Training Corpus （http://ift.tt/2ynhWgr
资源： AFINN:List of English words rated for valence （http://ift.tt/2xEwjJL

源分离

WIKI Source separation （http://ift.tt/2ynfXbY
论文：FromBlind to Guided Audio Source Separation （http://ift.tt/2xEsHr2
论文：JointOptimization of Masks and Deep Recurrent Neural Networks for Monaural SourceSeparation （http://ift.tt/2ynl77U
竞赛： SignalSeparation Evaluation Campaign （SiSEC）（http://ift.tt/2xEcMZL
竞赛： CHiMESpeech Separation and Recognition Challenge （http://ift.tt/2xERcEC

说话人认证

见「说话人识别」部分

语音身份分离

WIKI Speaker diarisation （http://ift.tt/2ynfXJ0
论文：DNN-basedspeaker clustering for speaker diarisation （http://ift.tt/2xEwkNP
论文：UnsupervisedMethods for Speaker Diarization: An Integrated and Iterative Approach （http://ift.tt/2ynUv6N
论文：Audio-VisualSpeaker Diarization Based on Spatiotemporal Bayesian Fusion （http://ift.tt/2xEyayk
竞赛： RichTranscription Evaluation （http://ift.tt/2yneH8J

说话人识别

WIKI Speaker recognition （http://ift.tt/2xEQ1F7
论文：A NOVELSCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK （http://ift.tt/2ynfYN4
论文：DEEPNEURAL NETWORKS FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATION （http://ift.tt/2xEfxKG
竞赛： NISTSpeaker Recognition Evaluation （SRE）（http://ift.tt/2ynj9UQ
INFO Are there any suggestions for free databases for speakerrecognition? （http://ift.tt/2xEPgM9

唇读

见「观唇辨意」部分

语音识别

见「自动语音识别」部分

语音分割

WIKI Speech_segmentation （http://ift.tt/2ynovzI
论文：WordSegmentation by 8-Month-Olds: When Speech Cues Count More Than Statistics （http://ift.tt/2xFfnTq
论文：UnsupervisedWord Segmentation and Lexicon Discovery Using Acoustic Word Embeddings （http://ift.tt/2ynkriU
论文：UnsupervisedLexicon Discovery from Acoustic Inpu （http://ift.tt/2xEJMRA
论文：Weaklysupervised spoken term discovery using cross-lingual side information （http://ift.tt/2yn5XiU
资源：CALLHOME Spanish Speech （http://ift.tt/2xEclyS

语音合成

WIKI Speech synthesis （http://ift.tt/2ynktaw
论文：WaveNet:A Generative Model for Raw Audio （http://ift.tt/2ynlFL9
论文：Tacotron:Towards End-to-End Speech Synthesis （http://ift.tt/2xEaKJl
论文：DeepVoice 2: Multi-Speaker Neural Text-to-Speech （http://ift.tt/2ynqXX2
资源： TheWorld English Bible （http://ift.tt/2xEcO3P
资源： LJSpeech Dataset （http://ift.tt/2ynnl7h
资源： LessacData （http://ift.tt/2xEDRvS
竞赛：Blizzard Challenge 2017 （http://ift.tt/2ymRmE4
PRODUCT Lyrebird （https://lyrebird.ai/）
项目： TheFestvox project （http://ift.tt/2xEQ2Jb
工具包：Merlin: The Neural Network （NN） based Speech Synthesis System （http://ift.tt/2ynbHcE

语音增强

WIKI Speech enhancement （http://ift.tt/2xEDxxv
书籍： Speechenhancement: theory and practice （http://ift.tt/2ynjUNI
论文 AnExperimental Study on Speech Enhancement BasedonDeepNeuralNetwork （http://ift.tt/2xEEllN
论文： ARegression Approach to Speech Enhancement BasedonDeepNeuralNetworks （http://ift.tt/2ynasu2
论文： SpeechEnhancement Based on Deep Denoising Autoencoder （http://ift.tt/2xEReMK

语音文本转换

见「自动语音识别」部分

口语的术语检测

见「语音分割」部分

词干提取

WIKI Stemming （http://ift.tt/2ymMyyE
论文： ABACKPROPAGATION NEURAL NETWORK TO IMPROVE ARABIC STEMMING （http://ift.tt/2xEckLk
工具包： NLTKStemmers （http://ift.tt/2ymCNAq

术语提取

WIKI Terminology extraction （http://ift.tt/2xEsILC
论文： NeuralAttention Models for Sequence Classification: Analysis and Application to KeyTerm Extraction and Dialogue Act Detection （http://ift.tt/2ynUvnj

文本简化

WIKI Text simplification （http://ift.tt/2xFfoXu
论文：Aligning Sentences from Standard Wikipedia to Simple Wikipedia （http://ift.tt/2ynqWlW
论文：Problems in Current Text Simplification Research: New Data Can Help （http://ift.tt/2xErNuz
资源：Newsela Data （http://ift.tt/2ynhYoz

文本语音转换

见「语音合成」部分

文本蕴涵

WIKI Textual entailment （http://ift.tt/2xEwl4l
项目：Textual Entailment with TensorFlow （http://ift.tt/2ynj8jK
论文：Textual Entailment with Structured Attentions and Composition （http://ift.tt/2xEES7d
竞赛：SemEval-2014 Task 1: Evaluation of compositional distributional semantic modelson full sentences through semantic relatedness and textual entailment （http://ift.tt/2ynnkA7
竞赛：SemEval-2013 Task 7: The Joint Student Response Analysis and 8th RecognizingTextual Entailment Challenge （http://ift.tt/2xEDSzW

声音转换

论文：PHONETIC POSTERIORGRAMS FOR MANY-TO-ONE VOICE CONVERSION WITHOUT PARALLEL DATATRAINING （http://ift.tt/2ynqYdy
项目： Animplementation of voice conversion system utilizing phonetic posteriorgrams （http://ift.tt/2xExbhz
竞赛： VoiceConversion Challenge 2016 （http://ift.tt/2ynjabm
竞赛： VoiceConversion Challenge 2018 （http://ift.tt/2xEEmpR
资源：CMU_ARCTIC speech synthesis databases （http://ift.tt/2ymyPYv
资源： TIMITAcoustic-Phonetic Continuous Speech Corpus （http://ift.tt/2xEPinf

声音识别

见「说话人识别」部分

词嵌入

WIKI Word embedding （http://ift.tt/2ynkrzq
工具包：Gensim: word2vec （http://ift.tt/2xEycWY
工具包：fastText （http://ift.tt/2ynnXtH
工具包： GloVe:Global Vectors for Word Representation （http://ift.tt/2xEwhBs
INFO Where to get a pretrained model （http://ift.tt/2ynktr2
项目：Pre-trained word vectors of 30+ languages （http://ift.tt/2xEMvL1
项目：Polyglot: Distributed word representations for multilingual NLP （http://ift.tt/2ynbHta

词预测

INFO What is Word Prediction? （http://ift.tt/2xEeaMc
论文： Theprediction of character based on recurrent neural network language model （http://ift.tt/2ynnYxL
论文： AnEmbedded Deep Learning based Word Prediction （http://ift.tt/2xEJN86
论文：Evaluating Word Prediction: Framing Keystroke Savings （http://ift.tt/2ynnZ4N
资源： AnEmbedded Deep Learning based Word Prediction （http://ift.tt/2xEH66E
项目： WordPrediction using Convolutional Neural Networks—can you do better than iPhone™Keyboard? （http://ift.tt/2ynl9g2

词分割

WIKI Word segmentation （http://ift.tt/2xEebQg
论文： NeuralWord Segmentation Learning for Chinese （http://ift.tt/2yn5Zr2
项目：Convolutional neural network for Chinese word segmentation （http://ift.tt/2xEecDO
工具包：Stanford Word Segmenter （http://ift.tt/2ymTX0V
工具包： NLTKTokenizers （http://ift.tt/2xEgLpt

词义消歧

资源：Word-sense disambiguation （http://ift.tt/2xEDxO1
论文：Train-O-Matic: Large-Scale Supervised Word Sense Disambiguation in MultipleLanguages without Manual Training Data （http://ift.tt/2ymTYC1
资源：Train-O-Matic Data （http://ift.tt/2xEElCj
资源：BabelNet （http://babelnet.org/）

]]> 原文： http://ift.tt/2ynjUh3