在自然語言處理中，情感分析一般是指判斷一段文本所表達(dá)的情緒狀態(tài)。其中，一段文本可以是一個(gè)句子，一個(gè)段落或一個(gè)文檔。情緒狀態(tài)可以是兩類，如（正面，負(fù)面），（高興，悲傷）；也可以是三類，如（積極，消極，中性）等等。情感分析的應(yīng)用場景十分廣泛，如把用戶在購物網(wǎng)站（亞馬遜、天貓、淘寶等）、旅游網(wǎng)站、電影評論網(wǎng)站上發(fā)表的評論分成正面評論和負(fù)面評論；或?yàn)榱朔治鲇脩魧τ谀骋划a(chǎn)品的整體使用感受，抓取產(chǎn)品的用戶評論并進(jìn)行情感分析等等。表格1展示了對電影評論進(jìn)行情感分析的例子：

電影評論	類別
在馮小剛這幾年的電影里，算最好的一部的了	正面
很不好看，好像一個(gè)地方臺的電視劇	負(fù)面
圓方鏡頭全程炫技，色調(diào)背景美則美矣，但劇情拖沓，口音不倫不類，一直努力卻始終無法入戲	負(fù)面
劇情四星。但是圓鏡視角加上婺源的風(fēng)景整個(gè)非常有中國寫意山水畫的感覺，看得實(shí)在太舒服了。。	正面

表格 1 電影評論情感分析

在自然語言處理中，情感分析屬于典型的文本分類問題，即把需要進(jìn)行情感分析的文本劃分為其所屬類別。文本分類涉及文本表示和分類方法兩個(gè)問題。在深度學(xué)習(xí)的方法出現(xiàn)之前，主流的文本表示方法為詞袋模型BOW(bag of words)，話題模型等等；分類方法有SVM(support vector machine), LR(logistic regression)等等。

對于一段文本，BOW表示會忽略其詞順序、語法和句法，將這段文本僅僅看做是一個(gè)詞集合，因此BOW方法并不能充分表示文本的語義信息。例如，句子“這部電影糟糕透了”和“一個(gè)乏味，空洞，沒有內(nèi)涵的作品”在情感分析中具有很高的語義相似度，但是它們的BOW表示的相似度為0。又如，句子“一個(gè)空洞，沒有內(nèi)涵的作品”和“一個(gè)不空洞而且有內(nèi)涵的作品”的BOW相似度很高，但實(shí)際上它們的意思很不一樣。

本章我們所要介紹的深度學(xué)習(xí)模型克服了BOW表示的上述缺陷，它在考慮詞順序的基礎(chǔ)上把文本映射到低維度的語義空間，并且以端對端（end to end）的方式進(jìn)行文本表示及分類，其性能相對于傳統(tǒng)方法有顯著的提升[1]。

模型概覽

本章所使用的文本表示模型為卷積神經(jīng)網(wǎng)絡(luò)（Convolutional Neural Networks）和循環(huán)神經(jīng)網(wǎng)絡(luò)(Recurrent Neural Networks)及其擴(kuò)展。下面依次介紹這幾個(gè)模型。

文本卷積神經(jīng)網(wǎng)絡(luò)簡介（CNN）

我們在推薦系統(tǒng)一節(jié)介紹過應(yīng)用于文本數(shù)據(jù)的卷及神經(jīng)網(wǎng)絡(luò)模型的計(jì)算過程，這里進(jìn)行一個(gè)簡單的回顧。

對卷積神經(jīng)網(wǎng)絡(luò)來說，首先使用卷積處理輸入的詞向量序列，產(chǎn)生一個(gè)特征圖（feature map），對特征圖采用時(shí)間維度上的最大池化（max pooling over time）操作得到此卷積核對應(yīng)的整句話的特征，最后，將所有卷積核得到的特征拼接起來即為文本的定長向量表示，對于文本分類問題，將其連接至softmax即構(gòu)建出完整的模型。在實(shí)際應(yīng)用中，我們會使用多個(gè)卷積核來處理句子，窗口大小相同的卷積核堆疊起來形成一個(gè)矩陣，這樣可以更高效的完成運(yùn)算。另外，我們也可使用窗口大小不同的卷積核來處理句子，推薦系統(tǒng)一節(jié)的圖3作為示意畫了四個(gè)卷積核，不同顏色表示不同大小的卷積核操作。

對于一般的短文本分類問題，上文所述的簡單的文本卷積網(wǎng)絡(luò)即可達(dá)到很高的正確率[1]。若想得到更抽象更高級的文本特征表示，可以構(gòu)建深層文本卷積神經(jīng)網(wǎng)絡(luò)[2,3]。

循環(huán)神經(jīng)網(wǎng)絡(luò)（RNN）

循環(huán)神經(jīng)網(wǎng)絡(luò)是一種能對序列數(shù)據(jù)進(jìn)行精確建模的有力工具。實(shí)際上，循環(huán)神經(jīng)網(wǎng)絡(luò)的理論計(jì)算能力是圖靈完備的[4]。自然語言是一種典型的序列數(shù)據(jù)（詞序列），近年來，循環(huán)神經(jīng)網(wǎng)絡(luò)及其變體（如long short term memory[5]等）在自然語言處理的多個(gè)領(lǐng)域，如語言模型、句法解析、語義角色標(biāo)注（或一般的序列標(biāo)注）、語義表示、圖文生成、對話、機(jī)器翻譯等任務(wù)上均表現(xiàn)優(yōu)異甚至成為目前效果最好的方法。

http://wiki.jikexueyuan.com/project/deep-learning/images/06-01.png" alt="png" />
圖1. 循環(huán)神經(jīng)網(wǎng)絡(luò)按時(shí)間展開的示意圖

循環(huán)神經(jīng)網(wǎng)絡(luò)按時(shí)間展開后如圖1所示：在第$t$時(shí)刻，網(wǎng)絡(luò)讀入第$t$個(gè)輸入$x_t$（向量表示）及前一時(shí)刻隱層的狀態(tài)值$h_{t-1}$（向量表示，$h_0$一般初始化為$0$向量），計(jì)算得出本時(shí)刻隱層的狀態(tài)值$h_t$，重復(fù)這一步驟直至讀完所有輸入。如果將循環(huán)神經(jīng)網(wǎng)絡(luò)所表示的函數(shù)記為$f$，則其公式可表示為：

$$h_t=f(x_t,h_{t-1})=\sigma(W_{xh}x_t+W_{hh}h_{h-1}+b_h)$$

其中$W_{xh}$是輸入到隱層的矩陣參數(shù)，$W_{hh}$是隱層到隱層的矩陣參數(shù)，$b_h$為隱層的偏置向量（bias）參數(shù)，$\sigma$為$sigmoid$函數(shù)。

在處理自然語言時(shí)，一般會先將詞（one-hot表示）映射為其詞向量（word embedding）表示，然后再作為循環(huán)神經(jīng)網(wǎng)絡(luò)每一時(shí)刻的輸入$x_t$。此外，可以根據(jù)實(shí)際需要的不同在循環(huán)神經(jīng)網(wǎng)絡(luò)的隱層上連接其它層。如，可以把一個(gè)循環(huán)神經(jīng)網(wǎng)絡(luò)的隱層輸出連接至下一個(gè)循環(huán)神經(jīng)網(wǎng)絡(luò)的輸入構(gòu)建深層（deep or stacked）循環(huán)神經(jīng)網(wǎng)絡(luò)，或者提取最后一個(gè)時(shí)刻的隱層狀態(tài)作為句子表示進(jìn)而使用分類模型等等。

長短期記憶網(wǎng)絡(luò)（LSTM）

對于較長的序列數(shù)據(jù)，循環(huán)神經(jīng)網(wǎng)絡(luò)的訓(xùn)練過程中容易出現(xiàn)梯度消失或爆炸現(xiàn)象[6]。為了解決這一問題，Hochreiter S, Schmidhuber J. (1997)提出了LSTM(long short term memory[5])。

相比于簡單的循環(huán)神經(jīng)網(wǎng)絡(luò)，LSTM增加了記憶單元$c$、輸入門$i$、遺忘門$f$及輸出門$o$。這些門及記憶單元組合起來大大提升了循環(huán)神經(jīng)網(wǎng)絡(luò)處理長序列數(shù)據(jù)的能力。若將基于LSTM的循環(huán)神經(jīng)網(wǎng)絡(luò)表示的函數(shù)記為$F$，則其公式為：

$$ h_t=F(x_t,h_{t-1})$$

$F$由下列公式組合而成[7]： \begin{align} i_t & = \sigma(W_{xi}x_t+W_{hi}h_{h-1}+W_{ci}c_{t-1}+b_i)\\ f_t & = \sigma(W_{xf}x_t+W_{hf}h_{h-1}+W_{cf}c_{t-1}+b_f)\\ c_t & = f_t\odot c_{t-1}+i_t\odot tanh(W_{xc}x_t+W_{hc}h_{h-1}+b_c)\\ o_t & = \sigma(W_{xo}x_t+W_{ho}h_{h-1}+W_{co}c_{t}+b_o)\\ h_t & = o_t\odot tanh(c_t)\\ \end{align} 其中，$i_t, f_t, c_t, o_t$分別表示輸入門，遺忘門，記憶單元及輸出門的向量值，帶角標(biāo)的$W$及$b$為模型參數(shù)，$tanh$為雙曲正切函數(shù)，$\odot$表示逐元素（elementwise）的乘法操作。輸入門控制著新輸入進(jìn)入記憶單元$c$的強(qiáng)度，遺忘門控制著記憶單元維持上一時(shí)刻值的強(qiáng)度，輸出門控制著輸出記憶單元的強(qiáng)度。三種門的計(jì)算方式類似，但有著完全不同的參數(shù)，它們各自以不同的方式控制著記憶單元$c$，如圖2所示：

http://wiki.jikexueyuan.com/project/deep-learning/images/06-02.png" alt="png" />
圖2. 時(shí)刻$t$的LSTM [7]

LSTM通過給簡單的循環(huán)神經(jīng)網(wǎng)絡(luò)增加記憶及控制門的方式，增強(qiáng)了其處理遠(yuǎn)距離依賴問題的能力。類似原理的改進(jìn)還有Gated Recurrent Unit (GRU)[8]，其設(shè)計(jì)更為簡潔一些。這些改進(jìn)雖然各有不同，但是它們的宏觀描述卻與簡單的循環(huán)神經(jīng)網(wǎng)絡(luò)一樣（如圖2所示），即隱狀態(tài)依據(jù)當(dāng)前輸入及前一時(shí)刻的隱狀態(tài)來改變，不斷地循環(huán)這一過程直至輸入處理完畢：

$$ h_t=Recrurent(x_t,h_{t-1})$$

其中，$Recrurent$可以表示簡單的循環(huán)神經(jīng)網(wǎng)絡(luò)、GRU或LSTM。

棧式雙向LSTM（Stacked Bidirectional LSTM）

對于正常順序的循環(huán)神經(jīng)網(wǎng)絡(luò)，$h_t$包含了$t$時(shí)刻之前的輸入信息，也就是上文信息。同樣，為了得到下文信息，我們可以使用反方向（將輸入逆序處理）的循環(huán)神經(jīng)網(wǎng)絡(luò)。結(jié)合構(gòu)建深層循環(huán)神經(jīng)網(wǎng)絡(luò)的方法（深層神經(jīng)網(wǎng)絡(luò)往往能得到更抽象和高級的特征表示），我們可以通過構(gòu)建更加強(qiáng)有力的基于LSTM的棧式雙向循環(huán)神經(jīng)網(wǎng)絡(luò)[9]，來對時(shí)序數(shù)據(jù)進(jìn)行建模。

如圖3所示（以三層為例），奇數(shù)層LSTM正向，偶數(shù)層LSTM反向，高一層的LSTM使用低一層LSTM及之前所有層的信息作為輸入，對最高層LSTM序列使用時(shí)間維度上的最大池化即可得到文本的定長向量表示（這一表示充分融合了文本的上下文信息，并且對文本進(jìn)行了深層次抽象），最后我們將文本表示連接至softmax構(gòu)建分類模型。

http://wiki.jikexueyuan.com/project/deep-learning/images/06-03.png" alt="png" />
圖3. 棧式雙向LSTM用于文本分類

示例程序

數(shù)據(jù)集介紹

我們以IMDB情感分析數(shù)據(jù)集為例進(jìn)行介紹。IMDB數(shù)據(jù)集的訓(xùn)練集和測試集分別包含25000個(gè)已標(biāo)注過的電影評論。其中，負(fù)面評論的得分小于等于4，正面評論的得分大于等于7，滿分10分。

aclImdb
|- test
   |-- neg
   |-- pos
|- train
   |-- neg
   |-- pos

Paddle在dataset/imdb.py中提實(shí)現(xiàn)了imdb數(shù)據(jù)集的自動下載和讀取，并提供了讀取字典、訓(xùn)練數(shù)據(jù)、測試數(shù)據(jù)等API。

import sys
import paddle.v2 as paddle

配置模型

在該示例中，我們實(shí)現(xiàn)了兩種文本分類算法，分別基于推薦系統(tǒng)一節(jié)介紹過的文本卷積神經(jīng)網(wǎng)絡(luò)，以及[棧式雙向LSTM](#棧式雙向LSTM（Stacked Bidirectional LSTM）)。

文本卷積神經(jīng)網(wǎng)絡(luò)

def convolution_net(input_dim,
                    class_dim=2,
                    emb_dim=128,
                    hid_dim=128,
                    is_predict=False):
    data = paddle.layer.data("word",
                             paddle.data_type.integer_value_sequence(input_dim))
    emb = paddle.layer.embedding(input=data, size=emb_dim)
    conv_3 = paddle.networks.sequence_conv_pool(
        input=emb, context_len=3, hidden_size=hid_dim)
    conv_4 = paddle.networks.sequence_conv_pool(
        input=emb, context_len=4, hidden_size=hid_dim)
    output = paddle.layer.fc(input=[conv_3, conv_4],
                             size=class_dim,
                             act=paddle.activation.Softmax())
    if not is_predict:
        lbl = paddle.layer.data("label", paddle.data_type.integer_value(2))
        cost = paddle.layer.classification_cost(input=output, label=lbl)
        return cost
    else:
        return output

網(wǎng)絡(luò)的輸入input_dim表示的是詞典的大小，class_dim表示類別數(shù)。這里，我們使用sequence_conv_pool API實(shí)現(xiàn)了卷積和池化操作。

棧式雙向LSTM

def stacked_lstm_net(input_dim,
                     class_dim=2,
                     emb_dim=128,
                     hid_dim=512,
                     stacked_num=3,
                     is_predict=False):
    """
    A Wrapper for sentiment classification task.
    This network uses bi-directional recurrent network,
    consisting three LSTM layers. This configure is referred to
    the paper as following url, but use fewer layrs.
        http://www.aclweb.org/anthology/P15-1109

    input_dim: here is word dictionary dimension.
    class_dim: number of categories.
    emb_dim: dimension of word embedding.
    hid_dim: dimension of hidden layer.
    stacked_num: number of stacked lstm-hidden layer.
    """
    assert stacked_num % 2 == 1

    layer_attr = paddle.attr.Extra(drop_rate=0.5)
    fc_para_attr = paddle.attr.Param(learning_rate=1e-3)
    lstm_para_attr = paddle.attr.Param(initial_std=0., learning_rate=1.)
    para_attr = [fc_para_attr, lstm_para_attr]
    bias_attr = paddle.attr.Param(initial_std=0., l2_rate=0.)
    relu = paddle.activation.Relu()
    linear = paddle.activation.Linear()

    data = paddle.layer.data("word",
                             paddle.data_type.integer_value_sequence(input_dim))
    emb = paddle.layer.embedding(input=data, size=emb_dim)

    fc1 = paddle.layer.fc(input=emb,
                          size=hid_dim,
                          act=linear,
                          bias_attr=bias_attr)
    lstm1 = paddle.layer.lstmemory(
        input=fc1, act=relu, bias_attr=bias_attr, layer_attr=layer_attr)

    inputs = [fc1, lstm1]
    for i in range(2, stacked_num + 1):
        fc = paddle.layer.fc(input=inputs,
                             size=hid_dim,
                             act=linear,
                             param_attr=para_attr,
                             bias_attr=bias_attr)
        lstm = paddle.layer.lstmemory(
            input=fc,
            reverse=(i % 2) == 0,
            act=relu,
            bias_attr=bias_attr,
            layer_attr=layer_attr)
        inputs = [fc, lstm]

    fc_last = paddle.layer.pooling(input=inputs[0], pooling_type=paddle.pooling.Max())
    lstm_last = paddle.layer.pooling(input=inputs[1], pooling_type=paddle.pooling.Max())
    output = paddle.layer.fc(input=[fc_last, lstm_last],
                             size=class_dim,
                             act=paddle.activation.Softmax(),
                             bias_attr=bias_attr,
                             param_attr=para_attr)

    if not is_predict:
        lbl = paddle.layer.data("label", paddle.data_type.integer_value(2))
        cost = paddle.layer.classification_cost(input=output, label=lbl)
        return cost
    else:
        return output

網(wǎng)絡(luò)的輸入stacked_num表示的是LSTM的層數(shù)，需要是奇數(shù)，確保最高層LSTM正向。Paddle里面是通過一個(gè)fc和一個(gè)lstmemory來實(shí)現(xiàn)基于LSTM的循環(huán)神經(jīng)網(wǎng)絡(luò)。

訓(xùn)練模型

if __name__ == '__main__':
    # init
    paddle.init(use_gpu=False)

啟動paddle程序，use_gpu=False表示用CPU訓(xùn)練，如果系統(tǒng)支持GPU也可以修改成True使用GPU訓(xùn)練。

訓(xùn)練數(shù)據(jù)

使用Paddle提供的數(shù)據(jù)集dataset.imdb中的API來讀取訓(xùn)練數(shù)據(jù)。

    print 'load dictionary...'
    word_dict = paddle.dataset.imdb.word_dict()
    dict_dim = len(word_dict)
    class_dim = 2

加載數(shù)據(jù)字典，這里通過word_dict()API可以直接構(gòu)造字典。class_dim是指樣本類別數(shù)，該示例中樣本只有正負(fù)兩類。

    train_reader = paddle.batch(
        paddle.reader.shuffle(
            lambda: paddle.dataset.imdb.train(word_dict), buf_size=1000),
        batch_size=100)
    test_reader = paddle.batch(
        lambda: paddle.dataset.imdb.test(word_dict),
        batch_size=100)

這里，dataset.imdb.train()和dataset.imdb.test()分別是dataset.imdb中的訓(xùn)練數(shù)據(jù)和測試數(shù)據(jù)API。train_reader在訓(xùn)練時(shí)使用，意義是將讀取的訓(xùn)練數(shù)據(jù)進(jìn)行shuffle后，組成一個(gè)batch數(shù)據(jù)。同理，test_reader是在測試的時(shí)候使用，將讀取的測試數(shù)據(jù)組成一個(gè)batch。

    feeding={'word': 0, 'label': 1}

feeding用來指定train_reader和test_reader返回的數(shù)據(jù)與模型配置中data_layer的對應(yīng)關(guān)系。這里表示reader返回的第0列數(shù)據(jù)對應(yīng)word層，第1列數(shù)據(jù)對應(yīng)label層。

構(gòu)造模型

    # Please choose the way to build the network
    # by uncommenting the corresponding line.
    cost = convolution_net(dict_dim, class_dim=class_dim)
    # cost = stacked_lstm_net(dict_dim, class_dim=class_dim, stacked_num=3)

該示例中默認(rèn)使用convolution_net網(wǎng)絡(luò)，如果使用stacked_lstm_net網(wǎng)絡(luò)，注釋相應(yīng)的行即可。其中cost是網(wǎng)絡(luò)的優(yōu)化目標(biāo)，同時(shí)cost包含了整個(gè)網(wǎng)絡(luò)的拓?fù)湫畔ⅰ?/p>

網(wǎng)絡(luò)參數(shù)

    # create parameters
    parameters = paddle.parameters.create(cost)

根據(jù)網(wǎng)絡(luò)的拓?fù)錁?gòu)造網(wǎng)絡(luò)參數(shù)。這里parameters是整個(gè)網(wǎng)絡(luò)的參數(shù)集。

優(yōu)化算法

    # create optimizer
    adam_optimizer = paddle.optimizer.Adam(
        learning_rate=2e-3,
        regularization=paddle.optimizer.L2Regularization(rate=8e-4),
        model_average=paddle.optimizer.ModelAverage(average_window=0.5))

Paddle中提供了一系列優(yōu)化算法的API，這里使用Adam優(yōu)化算法。

訓(xùn)練

可以通過paddle.trainer.SGD構(gòu)造一個(gè)sgd trainer，并調(diào)用trainer.train來訓(xùn)練模型。另外，通過給train函數(shù)傳遞一個(gè)event_handler來獲取每個(gè)batch和每個(gè)pass結(jié)束的狀態(tài)。

    # End batch and end pass event handler
    def event_handler(event):
        if isinstance(event, paddle.event.EndIteration):
            if event.batch_id % 100 == 0:
                print "\nPass %d, Batch %d, Cost %f, %s" % (
                    event.pass_id, event.batch_id, event.cost, event.metrics)
            else:
                sys.stdout.write('.')
                sys.stdout.flush()
        if isinstance(event, paddle.event.EndPass):
            result = trainer.test(reader=test_reader, feeding=feeding)
            print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)

比如，構(gòu)造如下一個(gè)event_handler可以在每100個(gè)batch結(jié)束后輸出cost和error；在每個(gè)pass結(jié)束后調(diào)用trainer.test計(jì)算一遍測試集并獲得當(dāng)前模型在測試集上的error。

    from paddle.v2.plot import Ploter

    train_title = "Train cost"
    cost_ploter = Ploter(train_title)
    step = 0
    def event_handler_plot(event):
        global step
        if isinstance(event, paddle.event.EndIteration):
            cost_ploter.append(train_title, step, event.cost)
            cost_ploter.plot()
            step += 1

或者構(gòu)造一個(gè)event_handler_plot畫出cost曲線。

    # create trainer
    trainer = paddle.trainer.SGD(cost=cost,
                                 parameters=parameters,
                                 update_equation=adam_optimizer)

    trainer.train(
        reader=train_reader,
        event_handler=event_handler,
        feeding=feeding,
        num_passes=2)

程序運(yùn)行之后的輸出如下。

Pass 0, Batch 0, Cost 0.693721, {'classification_error_evaluator': 0.5546875}
...................................................................................................
Pass 0, Batch 100, Cost 0.294321, {'classification_error_evaluator': 0.1015625}
...............................................................................................
Test with Pass 0, {'classification_error_evaluator': 0.11432000249624252}

應(yīng)用模型

可以使用訓(xùn)練好的模型對電影評論進(jìn)行分類，下面程序展示了如何使用paddle.infer接口進(jìn)行推斷。

    import numpy as np

    # Movie Reviews, from imdb test
    reviews = [
        'Read the book, forget the movie!',
        'This is a great movie.'
    ]
    reviews = [c.split() for c in reviews]

    UNK = word_dict['<unk>']
    input = []
    for c in reviews:
        input.append([[word_dict.get(words, UNK) for words in c]])

    # 0 stands for positive sample, 1 stands for negative sample
    label = {0:'pos', 1:'neg'}
    # Use the network used by trainer
    out = convolution_net(dict_dim, class_dim=class_dim, is_predict=True)
    # out = stacked_lstm_net(dict_dim, class_dim=class_dim, stacked_num=3, is_predict=True)
    probs = paddle.infer(output_layer=out, parameters=parameters, input=input)

    labs = np.argsort(-probs)
    for idx, lab in enumerate(labs):
        print idx, "predicting probability is", probs[idx], "label is", label[lab[0]]

總結(jié)

本章我們以情感分析為例，介紹了使用深度學(xué)習(xí)的方法進(jìn)行端對端的短文本分類，并且使用PaddlePaddle完成了全部相關(guān)實(shí)驗(yàn)。同時(shí)，我們簡要介紹了兩種文本處理模型：卷積神經(jīng)網(wǎng)絡(luò)和循環(huán)神經(jīng)網(wǎng)絡(luò)。在后續(xù)的章節(jié)中我們會看到這兩種基本的深度學(xué)習(xí)模型在其它任務(wù)上的應(yīng)用。

參考文獻(xiàn)

Kim Y. Convolutional neural networks for sentence classification[J]. arXiv preprint arXiv:1408.5882, 2014.
Kalchbrenner N, Grefenstette E, Blunsom P. A convolutional neural network for modelling sentences[J]. arXiv preprint arXiv:1404.2188, 2014.
Yann N. Dauphin, et al. Language Modeling with Gated Convolutional Networks[J] arXiv preprint arXiv:1612.08083, 2016.
Siegelmann H T, Sontag E D. On the computational power of neural nets[C]//Proceedings of the fifth annual workshop on Computational learning theory. ACM, 1992: 440-449.
Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural computation, 1997, 9(8): 1735-1780.
Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult[J]. IEEE transactions on neural networks, 1994, 5(2): 157-166.
Graves A. Generating sequences with recurrent neural networks[J]. arXiv preprint arXiv:1308.0850, 2013.
Cho K, Van Merri?nboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv preprint arXiv:1406.1078, 2014.
Zhou J, Xu W. End-to-end learning of semantic role labeling using recurrent neural networks[C]//Proceedings of the Annual Meeting of the Association for Computational Linguistics. 2015.

本教程由 PaddlePaddle 創(chuàng)作，采用知識共享署名-相同方式共享 4.0 國際許可協(xié)議進(jìn)行許可。

上一篇：個(gè)性化推薦下一篇：語義角色標(biāo)注

在线观看不卡亚洲电影_亚洲妓女99综合网_91青青青亚洲娱乐在线观看_日韩无码高清综合久久

情感分析

背景介紹