Package pyannotation :: Package elan :: Module data :: Class EafGlossCorpusReader
[hide private]
[frames] | no frames]

Class EafGlossCorpusReader

source code

     object --+    
              |    
EafCorpusReader --+
                  |
                 EafGlossCorpusReader

The class EafCorpusReader implements a part of the corpus reader API described in the Natual Language Toolkit (NLTK). The class reads in all the .eaf files (from the linguistics annotation software called Elan) in a given directory and makes this data accessible through several functions. The .eaf files must at least contain a tier with words. Access to the data is normally read-only.

Instance Methods [hide private]
 
__init__(self, root, files='*.eaf', locale=None, participant=None, utterancetierType=None, wordtierType=None, morphemetierType=None, glosstierType=None)
root: is the directory where your .eaf files are stored.
source code
 
morphemes(self)
Returns a list of morphemes from the corpus files.
source code
 
tagged_morphemes(self)
Returns a list of (morpheme, list of glosses) tuples.
source code
 
tagged_words(self)
Returns a list of (word, tag) tuples.
source code
 
tagged_sents(self)
Returns a list of (list of (word, tag) tuples).
source code
 
tagged_sents_with_translations(self)
Returns a list of (sentence, translation) tuples.
source code

Inherited from EafCorpusReader: sents, sents_with_translations, words

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, root, files='*.eaf', locale=None, participant=None, utterancetierType=None, wordtierType=None, morphemetierType=None, glosstierType=None)
(Constructor)

source code 

root: is the directory where your .eaf files are stored. Only the
    files in the given directory are read, there is no recursive
    reading right now. This parameter is obligatory.
files: a regular expression for the filenames to read. The
    default value is "*.eaf"
locale: restricts the corpus data to tiers with the given locale.
participant: restricts the corpus data to tiers with the given
    particiapant.
utterancetierType: the type of the tier you gave to your
    "utterances" in Elan. The EafTrees have several default values
    for this tier type: [ "utterance", "utterances", "Äußerung",
    "Äußerungen" ]. If you used a different tier type in Elan you
    can specify it as a parameter here. The parameter may either
    be a string or a list of strings.
wordtierType: the type of the tier you gave to your
    "words" in Elan. The EafTrees have several default values
    for this tier type: [ "words", "word", "Wort", "Worte",
    "Wörter" ]. If you used a different tier type in Elan you
    can specify it as a parameter here. The parameter may either
    be a string or a list of strings.
morphemetierType: the type of the tier you gave to your
    "morphemes" in Elan. The EafTrees have several default values
    for this tier type: [ "morpheme", "morphemes",  "Morphem",
    "Morpheme" ]. If you used a different tier type in Elan you
    can specify it as a parameter here. The parameter may either
    be a string or a list of strings.
glosstierType: the type of the tier you gave to your
    "glosses" in Elan. The EafTrees have several default values
    for this tier type: [ "glosses", "gloss", "Glossen", "Gloss",
    "Glosse" ]. If you used a different tier type in Elan you
    can specify it as a parameter here. The parameter may either
    be a string or a list of strings.

Overrides: object.__init__

tagged_words(self)

source code 

Returns a list of (word, tag) tuples. Each tag is a list of (morpheme, list of glosses) tuples.

tagged_sents(self)

source code 

Returns a list of (list of (word, tag) tuples). Each tag is a list of (morpheme, list of glosses) tuples.

tagged_sents_with_translations(self)

source code 

Returns a list of (sentence, translation) tuples. Sentences are lists of (word, tag) tuples. Each tag is a list of (morpheme, list of glosses) tuples.