nltk离线下载文件_nltkwordnet下载,nltk离线下载资源-CSDN文库

共608个文件

xml：263个

txt：88个

pickle：44个

版权申诉

NLTK

nltk

5星 · 超过95%的资源 111 浏览量 2021-03-09 16:14:25 上传评论收藏 533.68MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

nltk离线下载文件（608个子文件）

data.adj 3.01MB

index.adj 805KB

data.adv 505KB

index.adv 159KB

citation.bib 212B

commandtalk.cfg 2.65MB

atis.cfg 193KB

basque2.cfg 536B

spanish1.cfg 362B

spanish2.cfg 272B

basque1.cfg 219B

basque3.cfg 157B

toy.cfg 139B

spanish3.cfg 136B

pt-br-universal-train.conll 7.43MB

fr-universal-train.conll 5.94MB

sv-universal-train.conll 2.51MB

fr-universal-dev.conll 1.19MB

pt-br-universal-test.conll 938KB

sv-universal-test.conll 791KB

sv-universal-dev.conll 363KB

fr-universal-test.conll 218KB

glue_train.conll 572B

ic-bnc-resnik-add1.dat 1.48MB

ic-bnc-resnik.dat 1.48MB

ic-brown-resnik.dat 1.34MB

ic-brown-resnik-add1.dat 1.34MB

ic-shaks-resnik.dat 1.32MB

ic-treebank-resnik.dat 1.32MB

ic-shaks-resnink-add1.dat 1.32MB

ic-treebank-resnik-add1.dat 1.31MB

ic-semcorraw-resnik.dat 1.27MB

ic-semcorraw-resnik-add1.dat 1.27MB

ic-bnc-add1.dat 1.16MB

ic-bnc.dat 1.15MB

ic-shaks-add1.dat 1.09MB

ic-shaks.dat 1.09MB

ic-brown-add1.dat 1.06MB

ic-brown.dat 1.06MB

ic-treebank-add1.dat 1.06MB

ic-treebank.dat 1.06MB

ic-semcorraw-add1.dat 1.04MB

ic-semcorraw.dat 1.04MB

ic-semcor-add1.dat 1.03MB

ic-semcor.dat 1.03MB

Source.dat 2KB

TargetColl.dat 1KB

TargetInd.dat 916B

Vocab.dat 328B

Misc.dat 80B

dep_test2.dep 104B

.DS_Store 6KB

vn_class-3.dtd 2KB

en 2.38MB

en-basic 5KB

noun.exc 37KB

verb.exc 37KB

adj.exc 22KB

adv.exc 85B

alvey.fcfg 1.05MB

chat_pnames.fcfg 27KB

gluesemantics.fcfg 5KB

discourse.fcfg 5KB

drt.fcfg 4KB

chat80.fcfg 4KB

german.fcfg 3KB

event.fcfg 3KB

sem2.fcfg 2KB

storage.fcfg 2KB

simple-sem.fcfg 2KB

bindop.fcfg 2KB

spanish1.fcfg 2KB

feat0.fcfg 1KB

hole.fcfg 1KB

feat1.fcfg 1KB

sql1.fcfg 1KB

basque3.fcfg 1002B

basque2.fcfg 974B

basque1.fcfg 874B

sql0.fcfg 764B

sql.fcfg 678B

sem1.fcfg 552B

np.fcfg 413B

sem0.fcfg 370B

spanish2.fcfg 340B

features 62.89MB

background.fol 553B

background0.fol 409B

h.g 14.67MB

r.g 7.7MB

l.g 7.03MB

u.g 2.13MB

m.g 1.76MB

rm.g 323KB

lm.g 280KB

ru.g 50KB

tt.g 19KB

glue 281B

共 608 条

Guidelines for Universal Dependency Annotation

Joakim Nivre and Ryan McDonald

This document describes the annotation guidelines used in the Universal Dependency Treebank Project,

Version 2.0. The aim of the project is to create dependency treebanks with cross-linguistically consistent

annotation by adapting and harmonizing variants of the Stanford typed dependencies (de Marneffe et al.,

2006; de Marneffe and Manning, 2008). This scheme was originally developed for English but has

subsequently been adapted and applied to a number of other languages including Chinese (Chang et al.,

2009), Finnish (Haverinen et al., 2013), Persian (Seraji et al., 2012), and Modern Hebrew (Tsarfaty,

2013). We ﬁrst give an overview of the modiﬁcations to the original Stanford scheme and then provide

a detailed description of each dependency relation and its relation to the original scheme(s). Besides a

syntactic dependency annotation, the treebanks also contain part-of-speech annotation using the Google

Universal Part-of-Speech Tags (Petrov et al., 2012).

1 Overview of the Annotation Scheme

We assume the Stanford basic dependencies (with punctuation included), where every dependency struc-

ture is a tree spanning all the input tokens, because this is the kind of representation that most available

dependency parsers require.

A sample dependency tree from the French treebank is shown in Figure 1.

Alexandre r

eside avec sa famille

a Tinqueux .

NOUN VERB ADP DET NOUN ADP NOUN P

NSUBJ

ADPMOD

ADPOBJ

POSS

ADPMOD

ADPOBJ

Figure 1: A sample French sentence.

The universal annotation scheme was created by harmonizing available treebanks in slightly different

variants of Stanford dependencies, some developed through manual annotation, some produced through

automatic conversion from other schemes.

In the harmonization step, we have eliminated cases where

the same label was used for different linguistic relations in different languages and, conversely, where

one and the same relation was annotated with different labels, both of which could happen accidentally

when the original Stanford scheme was adapted to speciﬁc languages. Secondly, we have avoided, as far

as possible, labels that are only used in one or two languages.

In order to satisfy these requirements, a number of language-speciﬁc labels have been merged into

more general labels. For example, in analogy with the nn label for (element of a) noun-noun compound,

the German scheme had a label aa for compound adjectives, and the Korean scheme had a label vv

for compound verbs. In the universal scheme, these three labels have been merged into a single label

compmod for modiﬁer in compound. For Korean, the annotation scheme distinguished four different

subtypes of nominal subjects, which have all been merged to the single relation nsubj in the universal

annotation.

In addition to harmonizing language-speciﬁc labels, we have also renamed relations where the name

would be misleading in the universal context (although quite appropriate for English). For example,

the label prep (for a modiﬁer headed by a preposition) has been renamed to adpmod, to make clear the

relation to other modiﬁer labels and to allow postpositions as well as prepositions. Consequently, pobj

and pcomp have been changed to adpobj and adpcomp. Similarly, npadvmod has been replaced by nmod

(in analogy with amod and advmod). We have also eliminated a few distinctions in the original Stanford

In addition to the universal tags, we also provide language-speciﬁc tags when available.

This is in contrast to the collapsed dependencies, where multiple heads are allowed and where some tokens may not

correspond to nodes in the dependency structure.

For a more detailed description of this process, see McDonald et al. (2013).

scheme that were not annotated consistently across languages, for example, merging complm with mark,

number with num, and purpcl with advcl.

Although the ultimate goal is to arrive at a single universal annotation for all languages, there are still

two types of constructions where the annotation may vary across languages. The Stanford basic depen-

dencies in general favor content words over function words as syntactic heads, but make an exception for

copula constructions (optionally) and adpositional phrases (always). In some of the language-speciﬁc

adaptations, notably for Finnish (Haverinen et al., 2013), this has been changed enforce the content-head

principle also in these constructions, making both copulas and adpositions dependents of their comple-

ments in the dependency structure. For some languages, the annotation permits accurate conversion

between these two representations, but for others it is difﬁcult to perform the conversion without intro-

ducing too much noise.

In Version 2.0, we therefore maintain two versions of the annotation scheme: the standard version,

which treats copulas and adpositions as heads of their complements, and the content-head version, which

consistently treats content words as syntactic heads. Currently, English, Portuguese, and Indonesian are

only available in the standard version, while Finnish is only available in the content-head version. For

Japanese and Korean, where the syntactic annotation is at the chunk (bunsetsu) level, the distinction

is neutralized, and for the remaining languages we provide both versions (although the content-head

version should be regarded as tentative and experimental at this point). For illustration, Figure 2 shows a

German sentence annotated in the standard version (left) and the content-head version (right).

Das Haarteil ist f

ur mich schließlich eine Prothese .

DET NOUN VERB ADP PRON ADV DET NOUN P

DET

NSUBJ

ADPMOD

ADPOBJ

ADVMOD

DET

ATTR

Das Haarteil ist f

ur mich schließlich eine Prothese .

DET NOUN VERB ADP PRON ADV DET NOUN P

DET

NSUBJ

COP

ADP

NMOD

ADVMOD

DET

Figure 2: A sample German sentence with standard (left) and content-head (right) annotation.

In addition to the two annotation versions, there are a few known inconsistencies across languages,

notably in the annotation of multiword expressions and in particular multiword names. Most treebanks

follow the practice from English to annotate name parts as components of (nominal) compounds (which

is questionable in languages like German where real nominal compounds are normally realized as single

orthographic words), while some treebanks instead annotate them as parts of multiword expressions.

In the future, it might be desirable to instead add a new relation name for this type of expression.

In addition to the inconsistency in name annotation, the internal structure of multiword expressions

varies between treebanks, being sometimes head-initial, sometimes head-ﬁnal, and sometimes with no

consistent headedness direction.

2 Dependency Relations

Below we give a brief description of each dependency relation used in the universal annotation. For

each relation, we also list the language-speciﬁc relation(s) that it replaces or subsumes. We talk about

replacement when it is a simple renaming and about subsumption when a more speciﬁc relation is merged

with a more general one.

This relation already exists in the native version of the Finnish treebank, but has been eliminated in the cross-linguistic

harmonization process.

评论收藏

内容反馈

版权申诉

zhuyijun09

2021-09-28

用户下载后在一定时间内未进行评价，系统默认好评。
PlanckPhelps

2022-06-30

用户下载后在一定时间内未进行评价，系统默认好评。
weixin_58006135

2024-03-27

资源有一定的参考价值，与资源描述一致，很实用，能够借鉴的部分挺多的，值得下载。
weixin_57704208

2021-10-01

用户下载后在一定时间内未进行评价，系统默认好评。
学不会深度学习倒立洗头

2022-04-09

用户下载后在一定时间内未进行评价，系统默认好评。

前往

页

herosunly

粉丝: 7w+
资源: 170

nltk离线下载文件

nltk_data corpora 离线下载

nltk download数据

离线下载安装 NLTK 的 nltk_data 模块-附件资源

nltk_data数据下载

import nltk nltk.download('omw-1.4')

NLTK wordnet.zip

NLTK wordnet_ic.zip

NLTK下载停用词（stopwords）

python27安装nltk的包及依赖

【问题与解决】Python中使用NLTK下载停用词（stopwords）时报错 [Errno 11004] 的解决方法-附件资源

stopwords.zip

下载nltk_data.zip

nltk语料库下载

nltk_data文件

NLTK sentiwordnet.zip

nltk_data nltk语料库下载

nltk-develop.zip

nltk安装包.zip

NLTK.Essentials

自然语言工具包NLTK.zip

nltk_data

nltk库中punkt.zip下载

nltk_data.zip

nltk_data数据包

nltk_data.7z.001

博客中聚类算法（K-means、FCM、DBSCAN、DPC）的数据集（免积分）

机器学习期末复习题及答案

神经网络回归预测--气温数据集

最新资源