在线观看不卡亚洲电影_亚洲妓女99综合网_91青青青亚洲娱乐在线观看_日韩无码高清综合久久

鍍金池/ 教程/ Python/ 詞干算法
文本翻譯
提取URL地址
處理PDF
塊分類
搜索和匹配
大寫轉(zhuǎn)換
提取電子郵件地址
字符串的不變性
文本處理狀態(tài)機(jī)
雙字母組
閱讀RSS提要
單詞替換
WordNet接口
重新格式化段落
標(biāo)記單詞
向后讀取文件
塊和裂口
美化打印數(shù)字
拼寫檢查
將二進(jìn)制轉(zhuǎn)換為ASCII
文本分類
文字換行
頻率分布
字符串作為文件
約束搜索
詞干算法
符號化
同義詞和反義詞
過濾重復(fù)的字詞
刪除停用詞
Python文本處理教程
文字摘要
段落計(jì)數(shù)令牌
語料訪問
文字改寫
文本處理簡介
處理Word文檔
Python文本處理開發(fā)環(huán)境
排序行

詞干算法

在自然語言處理領(lǐng)域,會遇到兩個(gè)或多個(gè)單詞具有共同根的情況。 例如,三個(gè)詞 - agreed, agreeingagreeable 具有相同的詞根同意。 涉及任何這些詞的搜索應(yīng)將它們視為同一個(gè)詞,即根詞。 因此,將所有單詞鏈接到根詞中變得至關(guān)重要。 NLTK庫具有執(zhí)行此鏈接的方法,并提供顯示根詞的輸出。

nltk中有三種最常用的詞干算法。它們的結(jié)果略有不同。 以下示例顯示了所有三種詞干算法及其結(jié)果的使用。

import nltk
from nltk.stem.porter import PorterStemmer
from nltk.stem.lancaster import LancasterStemmer
from nltk.stem import SnowballStemmer 

porter_stemmer = PorterStemmer()
lanca_stemmer = LancasterStemmer()
sb_stemmer = SnowballStemmer("english",)

word_data = "Aging head of famous crime family decides to transfer his position to one of his subalterns" 
# First Word tokenization
nltk_tokens = nltk.word_tokenize(word_data)
#Next find the roots of the word
print '***PorterStemmer****\n'
for w_port in nltk_tokens:
   print "Actual: %s  || Stem: %s"  % (w_port,porter_stemmer.stem(w_port))

print '\n***LancasterStemmer****\n'    
for w_lanca in nltk_tokens:
      print "Actual: %s  || Stem: %s"  % (w_lanca,lanca_stemmer.stem(w_lanca))
print '\n***SnowballStemmer****\n' 

for w_snow in nltk_tokens:
      print "Actual: %s  || Stem: %s"  % (w_snow,sb_stemmer.stem(w_snow))

當(dāng)運(yùn)行上面的程序時(shí),我們得到以下輸出 -

***PorterStemmer****

Actual: Aging  || Stem: age
Actual: head  || Stem: head
Actual: of  || Stem: of
Actual: famous  || Stem: famou
Actual: crime  || Stem: crime
Actual: family  || Stem: famili
Actual: decides  || Stem: decid
Actual: to  || Stem: to
Actual: transfer  || Stem: transfer
Actual: his  || Stem: hi
Actual: position  || Stem: posit
Actual: to  || Stem: to
Actual: one  || Stem: one
Actual: of  || Stem: of
Actual: his  || Stem: hi
Actual: subalterns  || Stem: subaltern

***LancasterStemmer****

Actual: Aging  || Stem: ag
Actual: head  || Stem: head
Actual: of  || Stem: of
Actual: famous  || Stem: fam
Actual: crime  || Stem: crim
Actual: family  || Stem: famy
Actual: decides  || Stem: decid
Actual: to  || Stem: to
Actual: transfer  || Stem: transf
Actual: his  || Stem: his
Actual: position  || Stem: posit
Actual: to  || Stem: to
Actual: one  || Stem: on
Actual: of  || Stem: of
Actual: his  || Stem: his
Actual: subalterns  || Stem: subaltern

***SnowballStemmer****

Actual: Aging  || Stem: age
Actual: head  || Stem: head
Actual: of  || Stem: of
Actual: famous  || Stem: famous
Actual: crime  || Stem: crime
Actual: family  || Stem: famili
Actual: decides  || Stem: decid
Actual: to  || Stem: to
Actual: transfer  || Stem: transfer
Actual: his  || Stem: his
Actual: position  || Stem: posit
Actual: to  || Stem: to
Actual: one  || Stem: one
Actual: of  || Stem: of
Actual: his  || Stem: his
Actual: subalterns  || Stem: subaltern

上一篇:語料訪問下一篇:頻率分布