Packages

package edmodel

Provides classes modelling HMT editions of texts.

Overview

The starting point is the factory object TeiReader, that can read data in the OHCO2 model from a two-column file or a Corpus object to produce a Vector of TokenAnalysis objects. The TokenAnalysis pairs a CtsUrn for the citable text node with a fully analyzed HmtToken. Example:

val tokenPairs = TeiReader.fromCorpus(CORPUS_OBJECT)

The HmtToken captures everything known about a token from an HMT edition. See its documentation for more details.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. edmodel
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. sealed trait AlternateCategory extends AnyRef

    All possible categories for alternate readings are enumerated by case objects extending this trait

    All possible categories for alternate readings are enumerated by case objects extending this trait

    Used by org.homermultitext.edmodel.AlternateReading and therefore also by org.homermultitext.edmodel.HmtToken and org.homermultitext.edmodel.TeiReader

  2. case class AlternateReading(alternateCategory: AlternateCategory, reading: Vector[Reading]) extends Product with Serializable

    an alternate reading for a token

    an alternate reading for a token

    The name member must be implemented with an English description of the editorial status

    alternateCategory

    category of alternate reading

    reading

    all org.homermultitext.edmodel.Readings for this alternate reading

  3. sealed trait DiscourseCategory extends AnyRef

    All possible categories for discourse of a token are enumerated by case objects extending this trait

    All possible categories for discourse of a token are enumerated by case objects extending this trait

    The name member must be implemented with an English description of the discourse status

    Used by org.homermultitext.edmodel.HmtToken and therefore also by org.homermultitext.edmodel.TeiReader

  4. sealed trait EditorialStatus extends AnyRef

    All possible values for the editorial status of a token are enumerated by case objects extending this trait

    All possible values for the editorial status of a token are enumerated by case objects extending this trait

    The name member must be implemented with an English description of the editorial status

    Used by org.homermultitext.edmodel.Reading and therefore also by org.homermultitext.edmodel.HmtToken and org.homermultitext.edmodel.TeiReader

  5. case class HmtOrcaToken(urn: CtsUrn, src: CtsUrn, textDeformation: String, hmtToken: HmtToken) extends Product with Serializable

    token in an ORCA analytical exemplar

    token in an ORCA analytical exemplar

    urn

    exemplar-level URN identifying this token in a specific reading of a HMT edition

    src

    URN of passage read or analyzed

    textDeformation

    string view of this token

    hmtToken

    full analysis of this token

  6. case class HmtReading(title: String, description: String, tokens: Vector[HmtOrcaToken]) extends Product with Serializable

    a complete reading of a text expressed as an analytical exemplar

    a complete reading of a text expressed as an analytical exemplar

    title

    labelling string or title of edition

    tokens

    sequence of org.homermultitext.edmodel.HmtOrcaTokens defining an analytical edition

  7. case class HmtToken(analysis: Cite2Urn, sourceUrn: CtsUrn, editionUrn: CtsUrn, lang: String = "grc", readings: Vector[Reading], lexicalCategory: LexicalCategory, lexicalDisambiguation: Cite2Urn = ..., alternateReading: Option[AlternateReading] = None, discourse: DiscourseCategory = DirectVoice, externalSource: Option[CtsUrn] = None, errors: ArrayBuffer[String] = ArrayBuffer.empty[String]) extends Product with Serializable

    A fully documented, semantically distinct token.

    A fully documented, semantically distinct token. The model of this token supports the ORCA model of aligned text analysis. The analysis member is a CITE2 URN representing this token as an ORCA analysis. The sourceUrn member is a CTS URN with subreference index identifying the specific string of text analyzed. TheeditionUrn member is a CTS URN for this token in an analytical exemplar. The other members of the HmtToken provide the analytical data for this token.

    analysis

    CITE URN for this token analysis.

    sourceUrn

    URN for this token in the analyzed text

    editionUrn

    URN for this token in an analytical exemplar when promoted to an edition

    lang

    3-letter language code for the language code of this token, or a descriptive string if no ISO code defined for this language

    readings

    All org.homermultitext.edmodel.Readings belonging to this token

    lexicalCategory

    lexical category of this token

    lexicalDisambiguation

    URN for automated method to disambiguate tokens of a given type, or manually disambiguated URN for named entity values

    alternateReading

    optional org.homermultitext.edmodel.AlternateReadings belonging to this token

    discourse

    category of discourse of this token

    externalSource

    URN of source this token is quoted from

    errors

    list of error messages (hopefully empty)

  8. sealed trait LexicalCategory extends AnyRef

    All possible lexical categories for a token are enumerated by case objects extending this trait

    All possible lexical categories for a token are enumerated by case objects extending this trait

    The name member must be implemented with an English description of the lexical category

    Used by org.homermultitext.edmodel.HmtToken and therefore also by org.homermultitext.edmodel.TeiReader

  9. case class Reading(reading: String, status: EditorialStatus) extends Product with Serializable

    A typed reading of a passage.

    A typed reading of a passage.

    reading

    string read with given status

    status

    status of the given string

  10. case class ReadingConfig(title: String, description: String) extends Product with Serializable
  11. case class TeiReader(twoColumns: String, delimiter: String = "#") extends Product with Serializable

    Factory for Vectors of HmtToken instances.

    Factory for Vectors of HmtToken instances.

    Example

    The TeiReader reads data in the OHCO2 model from sources such as delimited-texts files or the Corpus object from the edu.holycross.ohco2 library. It produces a Vector of TokenAnalysis objects.

    Example:

    val tokenPairs = TeiReader.fromCorpus(CORPUS_OBJECT)

    How it works

    The TeiReader object maintains three mutable buffers, nodeText (a StringBuilder), wrappedWordBuffer and tokenBuffer (both mutable ArrayBuffers).

  12. case class TextDeformation(text: String) extends Product with Serializable
  13. case class TokenAnalysis(textNode: CtsUrn, analysis: HmtToken) extends Product with Serializable

    An analysis of a single token.

    An analysis of a single token.

    textNode

    CtsUrn of the citable node where this token occurs. Note that this will always be equivalent to the version-level URN for containing node for the "edition URN" of the HmtToken, since the edition URN extends the passage hierarchy with a "tokens" exemplar, and extends the passage hierarchy with a further level. Expressed in code, we can say that for any TokenAnalysis ta, the following relation is true:

    ta.analysis.editionUrn.collapsePassageBy(1) == ta.textNode.addExemplar("tokens")
    analysis

    The analysis of this token as a full HmtToken object.

Value Members

  1. val analyticalCollections: Map[String, Cite2Urn]
  2. def codeptList(s: String, idx: Int = 0, codepoints: List[Int] = Nil): List[Int]

    Recursively get list of code points for a String.

    Recursively get list of code points for a String.

    s

    String to get codepoints for.

    idx

    Index of codepoint to start from.

    codepoints

    List of codepoints seen so fare.

  3. def collectText(n: Node): String
  4. def collectText(n: Node, s: String): String

    Recursively collect contents of all text-node descendants of a given node.

    Recursively collect contents of all text-node descendants of a given node.

    n

    Node to collect from.

    returns

    A single String with all text from n.

  5. val collectionId: String
  6. val exemplarLabels: Map[String, ReadingConfig]
  7. def hmtNormalize(s: String): String
  8. val punctuation: Vector[String]
  9. val validElements: Vector[String]
  10. val versionId: String
  11. object AlternateReading extends Serializable

    string formatting function

  12. object Citation extends DiscourseCategory with Product with Serializable
  13. object Clear extends EditorialStatus with Product with Serializable

    Paleographically unambiguous reading.

  14. object Correction extends AlternateCategory with Product with Serializable

    scribal correction of text

  15. object Deletion extends AlternateCategory with Product with Serializable

    scribal deletion of text

  16. object DiplomaticEditionFactory

    Factory to build a diplomatic edition from a Vector of TokenAnalysiss.

  17. object DirectVoice extends DiscourseCategory with Product with Serializable

    token in direct voice of text

  18. object HmtChars

    Definitions of allowed characters in HMT editions.

  19. object HmtOrcaToken extends Serializable
  20. object HmtReading extends Serializable
  21. object HmtToken extends Serializable

    Factory for labelling information about tokens.

  22. object InvalidToken extends EditorialStatus with Product with Serializable

    Reading cannot be determined because source XML does not comply with HMT project requirements.

  23. object LexicalToken extends LexicalCategory with Product with Serializable

    parseable lexical token

  24. object LiteralToken extends LexicalCategory with Product with Serializable

    quoted literal string not parseable as a lexical token

  25. object Missing extends EditorialStatus with Product with Serializable

    Lacuna.

  26. object Multiform extends AlternateCategory with Product with Serializable

    alternate reading offered by scribe

  27. object NumericToken extends LexicalCategory with Product with Serializable

    token in Milesian numeric notation

  28. object Punctuation extends LexicalCategory with Product with Serializable

    single punctuation character

  29. object QuotedLanguage extends DiscourseCategory with Product with Serializable

    quoted word in the natural language of text

  30. object QuotedLiteral extends DiscourseCategory with Product with Serializable

    quoted string of characters not forming a valid lexical entity

  31. object QuotedText extends DiscourseCategory with Product with Serializable

    token in quotation of another text

  32. object Reading extends Serializable

    Companion object for formatting Vectors of Readings as Strings.

  33. object Restoration extends AlternateCategory with Product with Serializable

    restored by modern editor

    restored by modern editor

    This should only apply to editorial expansions of abbreviations.

  34. object Restored extends EditorialStatus with Product with Serializable

    Reading supplied by modern editor.

    Reading supplied by modern editor.

    Applies only to editorial expansion of abbreviations.

  35. object TeiReader extends Serializable
  36. object TextDeformation extends Serializable

    Factory for Vectors of org.homermultitext.edmodel.HmtOrcaToken instances.

  37. object Unclear extends EditorialStatus with Product with Serializable

    Paleographically ambiguous reading.

  38. object Unintelligible extends LexicalCategory with Product with Serializable

    token not parseable due to error in HMT edition

Inherited from AnyRef

Inherited from Any

Ungrouped