Packages

package edmodel

Provides classes modelling HMT editions of texts.

Overview

The starting point is the factory object TeiReader, that can read data in the OHCO2 model from a two-column file or a Corpus object to produce a Vector of TokenAnalysis objects. The TokenAnalysis pairs a CtsUrn for the citable text node with a fully analyzed HmtToken. Example:

val tokenPairs = TeiReader.fromCorpus(CORPUS_OBJECT)

The HmtToken captures everything known about a token from an HMT edition. See its documentation for more details.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. edmodel
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. sealed trait AlternateCategory extends AnyRef

    All possible categories for alternate readings are enumerated by case objects extending this trait

    All possible categories for alternate readings are enumerated by case objects extending this trait

    Used by org.homermultitext.edmodel.AlternateReading and therefore also by org.homermultitext.edmodel.HmtToken and org.homermultitext.edmodel.TeiReader

  2. case class AlternateReading(alternateCategory: AlternateCategory, readings: Vector[Reading]) extends Product with Serializable

    an alternate reading for a token

    an alternate reading for a token

    The name member must be implemented with an English description of the editorial status

    alternateCategory

    category of alternate reading

    readings

    all org.homermultitext.edmodel.Readings for this alternate reading

  3. trait AttributeRequirement extends LogSupport

    A specification of attribute usage for a single TEI element.

  4. sealed trait DiscourseCategory extends AnyRef

    All possible categories for discourse of a token are enumerated by case objects extending this trait

    All possible categories for discourse of a token are enumerated by case objects extending this trait

    The name member must be implemented with an English description of the discourse status

    Used by org.homermultitext.edmodel.HmtToken and therefore also by org.homermultitext.edmodel.TeiReader

  5. sealed trait EditorialStatus extends AnyRef

    All possible values for the editorial status of a token are enumerated by case objects extending this trait

    All possible values for the editorial status of a token are enumerated by case objects extending this trait

    The name member must be implemented with an English description of the editorial status

    Used by org.homermultitext.edmodel.Reading and therefore also by org.homermultitext.edmodel.HmtToken and org.homermultitext.edmodel.TeiReader

  6. trait HmtTeiTier extends AnyRef

    Trait defining HMT usage of TEI markup allowed at different tiers in a document.

  7. case class HmtToken(sourceUrn: CtsUrn, editionUrn: CtsUrn, lang: String = "grc", readings: Vector[Reading], lexicalCategory: LexicalCategory, lexicalDisambiguation: Cite2Urn = ..., alternateReading: Option[AlternateReading] = None, discourse: DiscourseCategory = DirectVoice, externalSource: Option[CtsUrn] = None, errors: Vector[String] = Vector.empty[String]) extends LogSupport with Product with Serializable

    A fully documented, semantically distinct token.

    A fully documented, semantically distinct token.

    sourceUrn

    URN with subreference for this token in the analyzed text.

    editionUrn

    URN for this token in a token edition with additional citation level in the passage hierarchy.

    lang

    3-letter language code for the language code of this token, or a descriptive string if no ISO code defined for this language.

    readings

    All org.homermultitext.edmodel.Readings belonging to this token.

    lexicalCategory

    Lexical category of this token.

    lexicalDisambiguation

    A URN disambiguating named entities.

    alternateReading

    Optional org.homermultitext.edmodel.AlternateReadings belonging to this token.

    discourse

    Category of discourse of this token.

    externalSource

    Optional URN of a text this token is quoted from.

    errors

    List of error messages (hopefully empty).

  8. sealed trait LexicalCategory extends AnyRef

    All possible lexical categories for a token are enumerated by case objects extending this trait

    All possible lexical categories for a token are enumerated by case objects extending this trait

    The name member must be implemented with an English description of the lexical category

    Used by org.homermultitext.edmodel.HmtToken and therefore also by org.homermultitext.edmodel.TeiReader

  9. case class Reading(text: String, status: EditorialStatus) extends Product with Serializable

    A typed reading of a passage.

    A typed reading of a passage.

    status

    status of the given string

  10. case class ReadingConfig(title: String, description: String) extends Product with Serializable
  11. case class TeiReader(hmtEditionType: MidEditionType) extends MidMarkupReader with Product with Serializable

    An implementation of the MidMarkupReader trait for HMT project editions.

    An implementation of the MidMarkupReader trait for HMT project editions.

    hmtEditionType

    Type of edition to generate.

  12. case class TextDeformation(text: String) extends Product with Serializable
  13. case class TokenAnalysis(textNode: CtsUrn, analysis: HmtToken) extends Product with Serializable

    An analysis of a single token.

    An analysis of a single token.

    textNode

    CtsUrn of the citable node where this token occurs. Note that this will always be equivalent to the version-level URN for containing node for the "edition URN" of the HmtToken, since the edition URN extends the passage hierarchy with a "tokens" exemplar, and extends the passage hierarchy with a further level. Expressed in code, we can say that for any TokenAnalysis ta, the following relation is true:

    ta.analysis.editionUrn.collapsePassageBy(1) == ta.textNode.addExemplar("tokens")
    analysis

    The analysis of this token as a full HmtToken object.

  14. case class TokenSettings(contextUrn: CtsUrn, lexicalCategory: LexicalCategory = HmtLexicalToken, status: EditorialStatus = Clear, alternateCategory: Option[AlternateCategory] = None, lexicalDisambiguation: Cite2Urn = ..., discourse: DiscourseCategory = DirectVoice, externalSource: Option[CtsUrn] = None, errors: Vector[String] = Vector.empty[String], lang: String = "grc", treeDepth: Int = HmtTeiElements.tiers.size) extends Product with Serializable

Value Members

  1. val analyticalCollections: Map[String, Cite2Urn]
  2. def codeptList(s: String, idx: Int = 0, codepoints: List[Int] = Nil): List[Int]

    Recursively get list of code points for a String.

    Recursively get list of code points for a String.

    s

    String to get codepoints for.

    idx

    Index of codepoint to start from.

    codepoints

    List of codepoints seen so fare.

  3. val collectionId: String
  4. val exemplarLabels: Map[String, ReadingConfig]
  5. def hmtNormalize(s: String): String

    Recursively collect contents of all text-node descendants of a given node.

    Recursively collect contents of all text-node descendants of a given node.

    returns

    A single String with all text from n. def collectText(n: xml.Node, s: String): String = { var buff = StringBuilder.newBuilder buff.append(s) n match { case t: xml.Text => { buff.append(t.text) } case e: xml.Elem => { for (ch <- e.child) { buff = new StringBuilder(collectText(ch, buff.toString)) } } } buff.toString } def collectText(n: xml.Node): String = { collectText(n,"") }

  6. val punctuation: Vector[String]
  7. val validElements: Vector[String]
  8. val versionId: String
  9. object AllScribalReading extends HmtTeiTier with Product with Serializable

    All elements belonging to the third tier of HMT markup.

  10. object AlternateReading extends Serializable

    string formatting function

  11. object Citation extends DiscourseCategory with Product with Serializable
  12. object Clear extends EditorialStatus with Product with Serializable

    Paleographically unambiguous reading.

  13. object Correction extends AlternateCategory with Product with Serializable

    Scribal correction of text indicated in HMT XML with sic/corr pair.

  14. object Deletion extends AlternateCategory with Product with Serializable

    Scribal deletion of text indicated by HMT del.

  15. object DiplomaticEditionFactory

    Factory to build a diplomatic edition from a Vector of TokenAnalysiss.

  16. object DiplomaticReader extends MidMarkupReader with LogSupport
  17. object DirectVoice extends DiscourseCategory with Product with Serializable

    token in direct voice of text

  18. object DisambiguatingElements extends HmtTeiTier with Product with Serializable

    Fourth-lowest tier: elements disambiguating named entities.

  19. object DiscourseAnalysis extends HmtTeiTier with Product with Serializable

  20. object EditorReading extends HmtTeiTier with Product with Serializable

    Lowest tier: HMT editor's ability to read the text.

    Lowest tier: HMT editor's ability to read the text. Should contain no child elements: only text nodes. The default state is "clear" readings, and requires no markup to indicate that.

  21. object EditoriallyNormalizedReader extends MidMarkupReader
  22. object Foreign extends AttributeRequirement with Product with Serializable

    Attribute requirements for TEI 'foreign'.

  23. object HmtChars

    Definitions of allowed characters in HMT editions.

  24. object HmtDiplomaticEdition extends MidEditionType with Product with Serializable

    case object HmtNamedEntityEdition extends MidEditionType { def label = "named entities" def description = "Tokenization all tokens to Option[Cite2Urn] of named entities" def versionId = "hmt_ne" }

  25. object HmtEditorsNormalizedEdition extends MidEditionType with Product with Serializable
  26. object HmtLacuna extends LexicalCategory with Product with Serializable

    Token no longer extant.

  27. object HmtLexicalToken extends LexicalCategory with Product with Serializable

    parseable lexical token

  28. object HmtLiteralToken extends LexicalCategory with Product with Serializable

    quoted literal string not parseable as a lexical token

  29. object HmtNumericToken extends LexicalCategory with Product with Serializable

    token in Milesian numeric notation

  30. object HmtPunctuationToken extends LexicalCategory with Product with Serializable

    single punctuation character

  31. object HmtScribalNormalizedEdition extends MidEditionType with Product with Serializable
  32. object HmtTeiAttributes extends LogSupport

    Singleton object for validating an XML element's compliance with HMT project requirements on usage of XML attributes.

  33. object HmtTeiChoice extends LogSupport
  34. object HmtTeiCit extends LogSupport
  35. object HmtTeiElements

    Static definition of TEI elements allowed in HMT editions, and specification of their permitted hierarchical relations.

  36. object HmtToken extends Serializable

    Factory for labelling information about tokens.

  37. object HmtUnintelligibleToken extends LexicalCategory with Product with Serializable

    token not parseable due to error in HMT edition

  38. object InvalidToken extends EditorialStatus with Product with Serializable

    Reading cannot be determined because source XML does not comply with HMT project requirements.

  39. object Missing extends EditorialStatus with Product with Serializable

    Lacuna.

  40. object Multiform extends AlternateCategory with Product with Serializable

    Alternate reading offered by scribe indicated in HMT XML with add.

  41. object Num extends AttributeRequirement with Product with Serializable

    Attribute requirements for TEI 'num'.

  42. object PairedScribalReading extends HmtTeiTier with Product with Serializable

    Third-lowest tier: elements that can be used within a grouping TEI choice element These include further categories of scribal modifications.

    Third-lowest tier: elements that can be used within a grouping TEI choice element These include further categories of scribal modifications. plus editorial expansion of abbreviations.

  43. object PersName extends AttributeRequirement with Product with Serializable

    Attribute requirements for TEI 'persName'.

  44. object PlaceName extends AttributeRequirement with Product with Serializable

    Attribute requirements for TEI 'placeName'.

  45. object QuotedLanguage extends DiscourseCategory with Product with Serializable

    quoted word in the natural language of text

  46. object QuotedLiteral extends DiscourseCategory with Product with Serializable

    quoted string of characters not forming a valid lexical entity

  47. object QuotedText extends DiscourseCategory with Product with Serializable

    token in quotation of another text

  48. object Reading extends Serializable

    Companion object for formatting Vectors of Readings as Strings.

  49. object Restoration extends AlternateCategory with Product with Serializable

    Restored by modern editor, indicated in HMT XML with expan/abbr pair.

  50. object Restored extends EditorialStatus with Product with Serializable

    Reading supplied by modern editor.

    Reading supplied by modern editor.

    Applies only to editorial expansion of abbreviations.

  51. object Rs extends AttributeRequirement with Product with Serializable

    Attribute requirements for TEI 'rs'.

  52. object ScholiaOrthography extends MidOrthography with LogSupport

  53. object ScribalReading extends HmtTeiTier with Product with Serializable

    Third-lowest tier: elements that can be used independently to identiy scribal modfications.

  54. object ScriballyNormalizedReader extends MidMarkupReader
  55. object Sic extends EditorialStatus with Product with Serializable
  56. object TeiReader extends LogSupport

    Object for parsing TEI XML into the HMT project object model of an edition.

  57. object TextDeformation extends Serializable

  58. object Title extends AttributeRequirement with Product with Serializable

    Attribute requirements for TEI 'title'.

  59. object TokenizingElements extends HmtTeiTier with Product with Serializable

    Second-lowest tier: elements grouping contents into tokens of a particular type.

    Second-lowest tier: elements grouping contents into tokens of a particular type. The default token types are white-space delimited lexical tokens and punctuation tokens identied by individual character value. The TEI w element is used to group lexical tokens that are broken into separate nodes by other markup

  60. object Unclear extends EditorialStatus with Product with Serializable

    Paleographically ambiguous reading.

Inherited from AnyRef

Inherited from Any

Ungrouped