

  1. Dirichlet 过程 参考1 掘金
  2. 贝叶斯推断
  3. 贝叶斯网络 Uncertainty in Profit Scoring (Bayesian Deep Learning) 知乎大佬Bayesian Neural Networks:贝叶斯神经网络
  4. 无偏蒙特卡洛梯度 参考)
  5. 蒙特卡罗方法(Monte Carlo method)Reinforcement-Learning-Monte-Carlo
  6. 蒙特卡洛dropoutWhat is Monte Carlo dropout?
  7. 蒙特卡洛积分 Monte Carlo数学原理 随机模拟-Monte Carlo积分及采样(详述直接采样、接受-拒绝采样、重要性采样)
  8. Dropout variational inference Dropout变分推理
  9. reference—深度学习中的两种不确定性
  10. What My Deep Model Doesn’t Know
    • prediction uncertainty 预测不确定性
  11. 变分贝叶斯方法 cnblog讲解
  12. 深度学习 回归任务 高斯噪声
  13. prediction uncertainty 在分类问题中,预测不确定性可以利用蒙特卡洛积分来近似
  14. 混合尺度高斯先验(scale mixture gaussian prior)
  15. 自动编码器
  16. 变分自动编码器
  17. 变分推理variational inference 变分推理) 变分推断 贝叶斯神经网络有什么论文可以推荐阅读吗?
  18. 残差
    • 普通残差:
    • 残差图:估计观察或预测到的误差error(残差residuals)与随机误差(stochastic error)是否一致
  19. 减弱 错误标签 的影响 PaperReading:Learning with Noisy Label-深度学习廉价落地
  20. 利用不确定性来衡量多任务学习中的损失函数
  21. 高斯过程 CS229——Gaussian processes cornell——Lecture 15: Gaussian Processes 高斯过程Gaussian Process教程 krasserm blog——code contained
    Gaussian Processes for Machine Learning——book
  22. 深度高斯过程
  23. 点估计 区间估计
  24. 贝叶斯推断之最大后验概率(MAP) 花书 19.3节
  25. Epsilon greedy search
  26. Confidence calibration 置信度校正 模型的校正度:
    • 校正的目的是 makes the confidence scores reflect true probabilities.
    • A simple way to visualize calibration is plotting accuracy as a function of confidence (known as a reliability diagram).
    • “On Calibration of Modern Neural Networks”
  27. What is the meaning of the word logits in TensorFlow? logits What does Logits in machine learning mean?
  28. Batch Normalization 第十节——Batch Normalization)
  29. 共轭梯度法 共轭梯度法的简单分析)
  30. 远程监督:主要是对知识库与非结构化文本对齐来自动构建大量训练数据,减少模型对人工标注数据的依赖,增强模型跨领域适应能力。但有 noise problem
  31. Knowledge Graph
    • Entity-Centric Knowledge Graph
    • Event-Centric Knowledge Graph
  32. 事件提取 Event Extraction
    定义: Identify the relation between $\color{red}{an event and an entity}$
    Event定义:An event is defined as a specific occurrence involving participants
    要找到Event trigger, Event Type, Event argument, Argument role
    Event 一般与 trigger有紧密关系(Event Identification(TriggerWords)),且 trigger一般为 verb
  33. 开放域事件提取 Open Domain Event Extraction

    • Features Representation
      • Traditional Methods for Feature Representation
        • Human designed features
        • Too much rely on imprecise NLP tools for feature extraction
        • Limitations for low-resources languages
      • Dynamic CNN
      • Argument Attention(Event arguments)
        arguments 识别对Event Detection有很大帮助
        If we consider the argument phrase “former protege” (Role=Position), we will have more confidence to predict it as an End-·ition event
        • 从 contextual words 和 entities 的信息找 arguments—— context representation learning(CRL) 学到 contextual words 和 entities的representation(embedding或是其它),与对应的attention $\alpha$ 内积
    • Training Data Generation
      • External Resources
        • Employing FrameNet( semantic role descriptions in FrameNet, VerbNet (Kipper etal., 2008) and Propbank (Palmer et al., 2005).) FrameNet & FrameNet Python API
          How to generate training data in FrameNet
          • 方法1 [Open Domain Event Extraction from Texts]image-20200809101143166
        • Employing Freebase 2016年,谷歌宣布将Freebase的数据和API服务都迁移至Wikidata,并正式关闭了Freebase 知识图谱调研-Freebase
      • Generating Labeled Data from Structured KB
        • Distant Supervision(Weak) Supervision in Relation Extraction($\color{red}{doesn’t}$ work for Event Extraction)
          Automatically Labeled Data Generation for Large Scale Event Extraction文中对Freebase 的介绍
        • Triggers are not given out in existing knowledge bases 所以没法直接用existing Structured KB
          所以可以根据Structured KB中的 Key Arguments label back,依据假设提取Trigger:
          1. Event Trigger Words Extraction
            假设:The sentences mention all arguments denote such events
          2. Argument Extraction/Role Identification
            根据 Trigger words and Entities
            语言学上的规律:Arguments for a specific event instance are usually mentioned in multiple sentences,Only 0.02% of instances can find all argument mentions in one sentence
        • method
          1. image-20200809110639296
  34. 关系抽取 Relation Extraction
    定义: Identify the relation between $\color{}{}$ $\color{red}{two given entities}$

  35. Event Extraction(EE) 事件提取
    reference1—chriszhangcx blog) and 2—Introduction of Event Extraction
    A Survey of Open Domain Event Extraction
    事件抽取(Event Extraction)经典模型
  36. POS tagged pos标记Part of Speech (PoS) Tagging
  37. ACE Corpus:ACE2005: 529 Training, 33 Development, 40 Testing
  38. NLTK Natural Language Toolkit3.5
  39. [Attention Mechanism](https://blog.floydhub.com/attention-mechanism
  41. burn-in (Gibbs sampling) Burn-In is Unnecessary
  42. LDA 主题模型 一文详解LDA主题模型
  43. G-test
  44. 共指和指代消解 Coreference Resolution)
  45. semantic role labeling representation(SRL) Semantic Role Labeling
    meaning representations:Abstract Meaning Representation (AMR)、Stanford Typed Dependencies 、FrameNet Meaning Representation and SRL: assuming there is some meaning
    Advanced Semantic Representation
    AMR Tutorial
    Abstract Meaning Representation (AMR) 1.2Specification
  46. 综述 | 事件抽取及推理 (上)
  47. word sense 词的意思Word sense
  48. NLP的任务
  49. Word2Vec — Skip-Gram) and CBOW
  50. WordNet sense 到 OntoNotes sense的 mapping tool
  51. Ontonotes Sense Groups
  52. 分布语义,Distributional Semantic Representation,基于分布假设:linguistic items with similar distributions have similar meanings.
  53. GRU 动手深度学习GRU
  54. DAG有向无环图:
  55. syntactic parsing语法分析:
    • 短语结构树(phrase structure tree 对应语法 context-free grammar CFG 上下文无关法)
    • 依存句法树(dependency parse tree): 直观来讲,依存句法分析识别句子中的“主谓宾”、“定状补”这些语法成分,并分析各语法成分之间的关系。
  56. 语义依存分析 (Semantic Dependency Parsing, SDP):分析句子各个语言单位之间的语义关联,并将语义关联以依存结构呈现
  57. Distributional Representation和Distributed Representation 聊聊文本的分布式表示—邱锡鹏
  58. ACE2005:
    ACE2005定义的事件抽取是:(1) 以句子级为单位,识别句子中出现的trigger词及类型,(2) 针对每个trigger词,判断其的论元argument以及论元类型。下图即是ACE2005任务的一个示例。
  59. RNN及变体和BPTT RNN 其常见架构
  60. dependency parsing 笔记1 笔记2 笔记3
  61. bootstrapping 自助法
  62. ELMo ELMo最好用词向量Deep Contextualized Word Representations
  63. Understanding Ranking Loss, Contrastive Loss, Margin Loss, Triplet Loss, Hinge Loss and all those confusing names
  64. Multi-instance Learning (MIL) 多实例学习
    南大周志华教授 miVLAD and miFV,
  65. Snorkel - 基于弱监督学习的数据标注工具
    snorkel Sonrkel—从0开始构建机器学习项目
    • 领域启发式搜索,例如:常见模式、经验法则等
    • 已有的正确标注的数据,虽然不完全适用于当前的任务,但有一定的作用。这在传统上被称为远程监督
    • 不可靠的非专家标注人,例如:众包标注
    • 硬编码的推导:通常使用正则表达式
    • 语义结构:例如,使用spacy得到的依存关系结构
    • 远程监督:例如使用外部的知识库
    • 有噪声人工标注:例如众包标注
    • 外部模型:其他可以给出有用标注信号的模型
      当编写好标注函数后,Snorkel将利用这些不同的标注函数之间的冲突训练一个标注模型(Label Model)来估算不同标注函数的标注准确度。通过观察标注函数之间的彼此一致性,标注模型能够学习到每个监督源的准确度。
      sample_and_sgd函数: (计算在某一分布下的期望时,用蒙特卡洛积分近似, 去掉极限可以看成是采样点的均值)
  66. Gibbs采样算法 Gibbs采样的原理
  67. PGM 概率图模型
    An Introduction to Factor Graphs
    Snorkel、PGM and sampling、SGD相关论文
  68. Factor Graphs and the Sum-Product Algorithm
  69. 结构学习 structure learning
    李宏毅 4 episode Structured Learning 1: Introduction
  70. 半监督 ML Lecture 12: Semi-supervised
  71. 利用生成模型从噪声标签源合成标签:
  72. 玻尔兹曼机 RBM
  73. 命名实体 NER 命名实体识别 NER 论文综述
  74. Highway Networks Highway Networks及HBilstm Network
  75. entity span detection:找出 文本中指向同一实体的所有文段,这是因为,人们对同一个实体往往有多种不同的说法,如代词、省略词、别名等等。 reference—基于span prediction的共指消解模型
  76. 如何理解LSTM后接CRF?
    CRF和LSTM 模型在序列标注上的优劣?
  77. Viterbi(维特比算法)HMM+Viterbi(维特比算法)+最短路径分析
  78. 知识图谱上的实体消歧 一些关于NER任务调研的小思考
  79. 实体词典:entity dictionary
  80. What are Chunks ?
    Chunks are made up of words and the kinds of words are defined using the part-of-speech tags. One can even define a pattern or words that can’t be a part of chuck and such words are known as chinks.
    What are IOB tags ?
    It is a format for chunks. These tags are similar to part-of-speech tags but provide can denote the inside, utside, and the beginning of a chunk. Not just noun phrase but multiple different chunk phrase types are allowed here.
    7. Extracting Information from Text—nltk
  81. Karush-Kuhn-Tucker (KKT)条件
  82. An Introduction to Statistical Learning with Applications in R
  83. collective classification jointly determine the correct label assignments of all the objects in the network.
  84. ontology alignment 本体对齐
  85. personalized medicine 个性化医学
  86. opinion diffusion 意见传播
  87. trust in social networksolollllllllllllllllllll+
  88. graph summarization
  89. t-norm: t-norm is a binary algebraic operation on the interval [0, 1], 三角范数,用于模糊逻辑
  90. MPE inference 贝叶斯网络与最大可能解释(MPE)问题 MostProbable Explanation, MPE
  91. 共识优化 consensus optimization
  92. knowledge distillation知识蒸馏: 知乎1, 知乎2, paper reading list
  93. 后验正则化(posterior regularization)方法
  94. K-dimensional probability simplex
  95. max-over-time 池化层,NLP中的CNN: 参考1cnblog Pooling vs Pooling-over-time
  96. Bidirectional LSTM-CNN (BLSTM-CNN) Training System_Training_System)
  97. projected gradient descent (PGD)投影梯度下降
    Professor Bingsheng He—基于梯度投影的凸优化收缩算法和下降算法
  98. 利普西茨条件
    Lipschitz condition - Berkeley Math
    非凸优化基石:Lipschitz Condition - 知乎
    Existence and Uniqueness 1 Lipschitz Conditions
  99. CAM 和 Grad-CAM
    热力图?Class Activation Mapping
  100. self-training
  101. ULMFiT——文本分类通用训练技巧
  102. Transformer
    Attention head Self-Attention与Transformer
    multi-head attention
  103. warmup 神经网络中 warmup 策略为什么有效
  104. Zero Shot 零次学习(Zero-Shot Learning)入门
  105. NLP Subword算法:BPE、WordPiece、ULM
  106. Pytorch 训练加速
  107. Categorical Cross-Entropy LossUnderstanding Categorical Cross-Entropy Loss, Binary Cross-Entropy Loss, Softmax Loss, Logistic Loss, Focal Loss and all those confusing names
  108. 学生t-分布
  109. 概率单纯形 Simplex
  110. adamw优化器 AdamW and Super-convergence is now the fastest way to train neural nets 中文
  111. metric learning Metric Learning科普文
    Deep Metric Learning
    What is Metric Learning?—scikit-learn
    General Pipeline:
    一般来说,DML包含三个部分:特征提取网络来map embedding,一个采样策略来将一个mini-batch里的样本组合成很多个sub-set,最后loss function在每个sub-set上计算loss.
  112. Maximum Inner Product Search 最大点积向量检索
  113. Locality Sensitive Hashing :Unlike space partitioningtechniques, both the running time as well as the accuracy guarantee of LSH based NNS are in a wayindependent of the dimensionality of the data
  114. tornado Python web框架和异步网络库 Tornado Web Server — Tornado 4.3 文档
  115. 如何理解皮尔逊相关系数(Pearson Correlation Coefficient)?
  116. BatchNormalization、LayerNormalization、InstanceNorm、GroupNorm、SwitchableNorm总结
  117. 彩票假设
  118. Deep SSL系列4: Mean Teacher
    半监督深度学习又小结之Consistency Regularization
  119. 半监督VAT(虚拟对抗训练)论文解读
  120. 维基数据 (Wikidata) 是一个怎样的项目?
    Welcome to Wikidata——官网
    Wikidata:SPARQL query service/queries/examples


  1. Joint event extraction via recurrent neural networks 论文解读
  2. 【论文笔记】Graph Convolutional Networks with Argument-Aware Pooling for Event Detection 笔记2
  3. Jointly Extracting Event Triggers and Arguments by Dependency-Bridge RNN and Tensor-Based Argument Interaction 笔记1 笔记2