OwenZhu's Blog

Note - Deep contextualized word representations

2018/07/28 Share

Note: Deep contextualized word representations

Peters et al. - 2018 - Deep contextualized word representations

Introduction

They introduce a new type of deep contextualised word representation complex characteristics of word use. Their word vectors are learned functions of the internal states of a deep bidirectional language model (biLM).

Bidirectional language models

Given $(t_1, t_2, \ldots, t_N)$, a forward language model computes the probability:

A backward LM is similar to a forward LM:

The formulation jointly maximises the log likelihood:

The parameters both the token representation $\Theta_{x}$ and softmax layer $\Theta_{s}$ are shared in the forward and backward direction.

CATALOG
  1. 1. Note: Deep contextualized word representations
    1. 1.1. Introduction
    2. 1.2. Bidirectional language models