20–22 May 2015
Europe/Ljubljana timezone

Document System Architecture

Not scheduled

Description

The document management systems were originally intended only for the storage and rapid access to documents. With the development of information technology document management systems evolved into a complex architecture closely related subsystems. Standard architecture of document system embeds modules for: • storing documents in optimized databases • processing and analysis of text and content • metadata tagging and linking • query, search and retrieval • visualization and representation • access security • collaboration and workflow processing • communication channels • others The basic part of the document management system architecture is a system for permanent storage of documents. Storage systems are separated according to the method of data storage - structural organization and storage technology, which affects the speed of storage and documents retrieval. The system can be realized as one of the standard storage options, such as the file system, relational database, graph-network database, so-called NoSQL database or a combination of different storages. Relational database storage Databases are designed on the principles of relational technology. Collections are made up of tables that are interconnected with relations. Relationships between tables are different, one record to one record, one to many or many to many. Based on these relationships and constraints, a system for querying relational databases was built - SQL (Structured Query Language), which is very effective. File system storage File system is a method for storing a set of related data. Streams of data is called a file that has attributed a couple of basic metadata attributes, such as size, date of creation, file name and file extension. In order to facilitate the organization of a large number of files, directories are used. Directories are virtual folders in which we sort files. Files can be also organized hierarchically. File systems enable fast storing and reading of data, but by themselves they do not allow access to structured information. Each file type can have its own structure. Identification of the structure type is typically carried out with the extension of the file name. Graph storage Graphs storages store data with the organization of meshes - graphs, where data is connected with network of connections. The organization of data is no longer two-dimensional but multidimensional. Graph systems allow query graph structures, such as finding the shortest path between two nodes in a network or group lookup on the network. Most real systems can be modeled with a network model. These include links of the document with other documents and information. NoSQL storage NoSQL systems are systems that do not use a strict relational logic but are designed to store semi-structured data. It is also possible to nest data structures, which brings flexibility in their use. Due to the less strict data organization, the techniques of retrieval are less flexible than with relational databases. The focus in the NoSQL systems is simple structure re/organization and fast data retrieval. Indexing structures are usually organized in simple indexes. Other storage models XML storage (directly storage of XML document), distributed storage, object-relational storage (combined storage), triplestore (database of triples with predicate), named graphs (graph of rdf documents), ... The demonstration will present different techniques of documents storage and the advantages and disadvantages of each technique.

Primary authors

Mr Gregor Ibic (University of Primorska) Dr Iztok Savnik (University of Primorska)

Presentation materials

There are no materials yet.