Packages

case class TeiReader(twoColumns: String, delimiter: String = "#") extends Product with Serializable

Factory for Vectors of HmtToken instances.

Example

The TeiReader reads data in the OHCO2 model from sources such as delimited-texts files or the Corpus object from the edu.holycross.ohco2 library. It produces a Vector of TokenAnalysis objects.

Example:

val tokenPairs = TeiReader.fromCorpus(CORPUS_OBJECT)

How it works

The TeiReader object maintains three mutable buffers, nodeText (a StringBuilder), wrappedWordBuffer and tokenBuffer (both mutable ArrayBuffers).

Linear Supertypes
Serializable, Serializable, Product, Equals, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. TeiReader
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new TeiReader(twoColumns: String, delimiter: String = "#")

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def abbrExpanChoice(hmtToken: HmtToken, el: Elem): Unit

    Collect tokens from a TEI abbr-expan pair.

    Collect tokens from a TEI abbr-expan pair.

    Results are added to the TeiReader's tokenBuffer.

    hmtToken

    token reflecting reading values for parent context

    el

    TEI choice element with abbr-expan children

  5. def addTokensFromElement(el: Elem, tokenSettings: HmtToken): Unit

    Parse an XML element and add all tokens in it to tokenBuffer.

    Parse an XML element and add all tokens in it to tokenBuffer.

    el

    XML element to parse.

    tokenSettings

    Initial contextual setting for tokens.

  6. def addTokensFromText(s: String, tokenSettings: HmtToken): Unit

    Parse a string and add all tokens in it to tokenBuffer.

    Parse a string and add all tokens in it to tokenBuffer.

    s

    String to parse.

    tokenSettings

    Initial contextual setting for tokens.

  7. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  8. def choiceError(hmtToken: HmtToken, elemNames: Seq[String]): ArrayBuffer[HmtToken]
  9. def clear: Unit
  10. def clone(): AnyRef
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @native() @throws( ... )
  11. def collectCited(currToken: HmtToken, citElem: Elem): Unit

    collect tokens from cited context

    collect tokens from cited context

    currToken

    token reflecting reading values for parent context

    citElem

    TEI cit element

  12. def collectRefString(currToken: HmtToken, rsElem: Elem): Unit

    collect appropriate type of token for varieties of TEI rs usage

    collect appropriate type of token for varieties of TEI rs usage

    currToken

    token reflecting reading values for parent context

    rsElem

    TEI rs element

  13. def collectTokens(currToken: HmtToken, n: Node): Unit

    Collect all tokens descended from a given XML node.

    Collect all tokens descended from a given XML node. Results are collected in tokenBuffer.

    currToken

    token reflecting reading values for parent context

    n

    XML node to collect content from

  14. def collectWrappedWordReadings(editorialStatus: EditorialStatus, n: Node): Unit

    recursively collect all Reading objects descended from a given node, and add a Vector of Readings to the TeiReader's wrappedWordBuffer

    recursively collect all Reading objects descended from a given node, and add a Vector of Readings to the TeiReader's wrappedWordBuffer

    editorialStatus

    editorial status of surrounding context

    n

    node to descend from

  15. def ctsSafe(s: String): String

    URL encode any colon characters in s so that s can be used as the extended citation string of a CtsUrn.

    URL encode any colon characters in s so that s can be used as the extended citation string of a CtsUrn.

    s

    String to use as extended citation string of a CtsUrn.

  16. def deletedText(hmtToken: HmtToken, el: Elem): Unit
  17. val delimiter: String
  18. def disambiguateNamedEntity(currToken: HmtToken, el: Elem): Unit

    collect tokens with appropriate disambiguation for varieties of named entities

    collect tokens with appropriate disambiguation for varieties of named entities

    currToken

    token reflecting reading values for parent context

    el

    a TEI element disambiguating a named entity. Should be one of persName, placeName or rs with type = ethnic

  19. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  20. def finalize(): Unit
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  21. def getAlternate(hmtToken: HmtToken, choiceElem: Elem): Any

    get alternates as well as tokens from a TEI choice element

    get alternates as well as tokens from a TEI choice element

    hmtToken

    token reflecting reading values for parent context

    choiceElem

    TEI choice element

  22. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  23. def indexSubstring(s: String, sub: String): Int

    find CTS subref index value of sub in s

    find CTS subref index value of sub in s

    The map in the hideously global tokenIndexCount is updated as a side effect of this.

    s

    string to index in

    sub

    substring to find in s

  24. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  25. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  26. var nodeText: StringBuilder

    Builder for recursively accumulated String value of a single token.

  27. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  28. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  29. def origRegChoice(hmtToken: HmtToken, el: Elem): Unit

    collect tokens from a TEI orig-reg pair

    collect tokens from a TEI orig-reg pair

    Results are added to the TeiReader's tokenBuffer.

    hmtToken

    token reflecting reading values for parent context

    el

    TEI choice element with orig-reg children

  30. val punctuationSplitter: String

    Terrifying regular expression to split a string on HMT Greek punctuation characters while keeping the punctuation characters as individual tokens.

  31. def sicCorrChoice(hmtToken: HmtToken, el: Elem): Unit

    collect tokens from a TEI sic-corr pair

    collect tokens from a TEI sic-corr pair

    Results are added to the TeiReader's tokenBuffer.

    hmtToken

    token reflecting reading values for parent context

    el

    TEI choice element with sic-corr children

  32. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  33. def teiToTokens(u: CtsUrn, xmlStr: String, tokenCount: Int = 0): Vector[TokenAnalysis]

    Read an XML fragment following HMT conventions to represent a single citable node, and construct a Vector of (CtsUrn,HmtToken) tuples from it.

    Read an XML fragment following HMT conventions to represent a single citable node, and construct a Vector of (CtsUrn,HmtToken) tuples from it.

    u

    URN for the citable node.

    xmlStr

    XML text for the citable node.

    tokenCount

    Index of this token within the containing canonically citable passage of text.

  34. var tokenBuffer: ArrayBuffer[HmtToken]

    Buffer of recursively accumulated HmtTokens.

  35. def tokens: Vector[TokenAnalysis]

    Parse a String in two-column format into a vector of analyzed tokens.

  36. def tokensFromNodeVector(nodes: Vector[CitableNode], tokens: Vector[TokenAnalysis]): Vector[TokenAnalysis]

    Parse a vector of CitableNode objects into a Vector of [TokenAnalysis] objects by recursively splitting each citable node into tokens and analyzing them.

    Parse a vector of CitableNode objects into a Vector of [TokenAnalysis] objects by recursively splitting each citable node into tokens and analyzing them.

    nodes

    Vector of CitableNode objects. Their text content must be XML conforming to HMT project conventions.

    tokens

    Accumulated Vector of analyzed tokens.

  37. val twoColumns: String
  38. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  39. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  40. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @throws( ... )
  41. var wrappedWordBuffer: ArrayBuffer[Reading]

    Buffer of recursively accumulated Readings for a single token.

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped