The TextRiver Project - Development

Architecture

The components, as shown below, are all open source. Tomcat is the webserver. Hibernate provides access to a SQL database. Spring gives us a page framework. Cocoon is a powerful transformation-based data pipeline. Subversion stores files with multiple versions and logs activity. Javascript provides client-side scripting for visual effects like zooming graphics. The business logic is in our application, TextRiver, written in Java.

architecture

Data

The data (what determines the final output and how it is displayed) is basically the content of the document plus external files. Content, of course, is the flow of text that people read. But there are also pictures, icons, sound files, and other objects embedded in there. Regions of text may be highlighted, or markers inserted for links, pop-up effects, and pagebreaks. How to represent this auxiliary information is the problem of this application.

Some terminology:

Compositor A web application for managing flows, resources, and markers.
Document A complete package of content flows plus data.
Flow A contiguous piece of narrative text, marked up in XML, with embedded markers.
Frame An XML element embedded inside an XHTML file (see Page below) which defines the geometry of a text flow, and references the next frame in the sequence.
Marker A singleton XML element that denotes a specific location within a flow, and may include information to associate it with some external resource.
Page XHTML file for determining layout of a rendered HTML page. It would contain references to flows that would be expanded within frames.
Resource A data file that is referenced by a marker. This can be binary, such as a graphic, or XML. In the latter case, there will be predefined types of XML resources that the app will know how to use, and they may reference other resources (e.g. an image description file that references the binary image file itself, or even several versions of the image).
Template Describs how to transform a data resource file into displayable HTML.

The data will be stored in files on the server using directories to separate them by function. Subversion version control system will be employed to track changes and allow for logging and backtracking.

This is the structure for a typical document:

River Root
|
+--Groups
  |
  ...
|
+--People
  |
  +--Sarah Bellum
    |
    +--Newsletter
      |
      +--2008
        |
        +--Issue 1
          |
          +--Content
            |
            +--Main
            |
            +--Copyright
            |
            +--Article 1
            |
            +--Article 2
          |
          +--Metadata
            |
            +--Attributes.xml
            |
            +--Pages
              |
              +--Main
              |
              +--Feedback
            |
            +--Scripts
              |
              ...
            |
            +--Styles
              |
              +--Main.css
            |
            +--Templates
              |
              ...
          |
          +--Resources
            |
            +--Data
              |
              ...
            |
            +--Images
              |
              ...

This example shows the use of directories as an organizational convenience to the user. All content and data is stored under "River Root" which has two branches: "Groups" for shared spaces and "People" for individual users. This is actually not the final word on whether a project is shared or not, as there will be a separate workgroup management system; it's just a convenient place to store files initially and find them later.

In the realm of a personal or group directory, the user can define more hierarchical divisions as they like. Again, we should not think this is the only way to organize and search for files. It is convenient to use the server's own style of saving files. However, we can easily add a layer to sort by metadata keys.

It should be clear from this example that "Issue 1" is a document directory. Its folder hierarchy is controlled by the application. So "Content" holds all the flow files, "Metadata" holds files that describe the document, and "Resources" are for all the associated resources referenced by markers in flows.

The "Templates" directory under "Metadata" is for a special kind of resource that is user-defined. The template is actually a pair of files: one in XML format, which defines which data will be collected; and one in XSLT format, which defines how data will be output as HTML.

Marker Types:

Name Type Purpose*
Link Icon Point Link that jumps to another page when an icon is clicked.
Link Text Range Link that jumps to another page when text is clicked.
Pagebreak Standalone Point Denotes where a page should end.
Popup Text

Processed Point A text window that displays user-defined data.
Script Point Trigger for an action in Javascript.

"Standalone" means no resource is referenced. "Point" means the marker is in isolation (how sad), as opposed to "Range" which implies two markers (start and end). The "Processed" qualifier says that the resource file will be processed with a user-defined XSLT file.

User Interfaces

This is what we need:

Document Manager Create/display/navigate/rename/move directories. Create/copy/rename documents. Edit metadata.
Flow Editor Import text, edit text, insert/edit/move markers
Page Layout Create pages, create frames within pages, attach flows to frames.
People/Workgroup Manager Add/edit users, Create groups, Roster members, Grant privileges
Resource Editor Import data files, Edit text
Template Editor Edit XSLT, Add/edit/move fields
© Copyright 2007 by Erik T. Ray.
$Id: dev.html,v 1.1 2007/08/30 23:28:36 eray Exp $