Skip to main content
Get your Wikispaces Classroom now:
the easiest way to manage your class.
Pages and Files
Data Modeling is about understanding and representing how things (real world, computer) relate to each other within a particular domain. We have already explored. Data models (and entities) may be already established for a particular domain or sub domain, or may need to be derived from scratch.
Think about the entities and their relationships in the life sciences. What are the entities? How do they related to each other?
Modeling structured data
allows specification of
and how they relate to each other. Information can be modeled at three levels, each called a schema:
- entities and their relationship in a particular domain (e.g. biology)
- implements conceptual schema with a particular computer representation (e.g. XML, relational database)
- representation at the hardware level (e.g. how bits and bytes are stored, devices, etc)
Traditionally, one builds a conceptual (and subsequent logical) schema through the process of Systems Analysis, in the process building graphical models, particularly:
(some ER diagram examples:
Data Flow Diagram
(see Figure 9.1 and more details
on this page
These extreme formalities tend to be only used in the world of databases. However, their kin are used in all kinds of methods. For example,
emphasizes the understanding of the full picture of a work environment in a particular domain, and from observation sessions one can create:
Physical Model (see an
related to designing software for dentists)
Data Flow Model
At the logical schema level, the two most dominant forms are the relational database (Oracle, MySQL, etc) and, increasingly, semantic database (Triple Stores, RDF, OWL)
When complex entities are represented in a computer program, they are called
Modeling unstructured data
Nowadays, we can think of the world of information from two perspectives: structured and unstructured. The structured world includes formal models, databases, XML, etc. It is the traditional world of information systems. The unstructured world includes tagging, folksonomies, Web 2.0, natural language processing, data mining, and so on. Until recently, creating structured data was a prerequisite for using information systems, indeed much of database and systems analysis theory was about identifying and organizing structured data from an unstructured world. In summary:
For the most part, information systems have grown up around structured data and structured systems. The structured environment is made up of data that has fields, columns, tables, rows and indexes. It centers around transactions and has reports, audits and definitions of words. There is a high degree of predictability and order associated with the structured environment.
The unstructured environment is very different from the structured environment. The unstructured environment has no particular order to it. It consists of text found in medical reports, warranties, contracts, e-mail and spreadsheets. The text has no rules governing its creation or usage. With text, there are no keys, no indexes, no columns or attributes. Text is free-form and is as disorderly as structured data is orderly
We need to be able to handle all kinds of data. Problems are usually not very simple any more.
Documents, spreadsheets, etc
We'll take an example from this
RSC journal article
, marked up as part of
Text mining is done by our local company
We did some work
Interesting trend - small pieces of structured data derived from unstructured data - see e.g.
help on how to format text
Turn off "Getting Started"