Skip to main content
Get your Wikispaces Classroom now:
the easiest way to manage your class.
Pages and Files
Whenever we wish to apply informatics to a particular domain, the first thing we have to think about is how to represent the entities of that domain (at a simple level this applies to dates, images, videos, and so on). Once we can represent the entities, we can develop techniques and algorithms to help solve problems using the representations. The whole package of representations and algorithms constitutes a particular application branch of informatics (or a set of).
But... Life is INCREDIBLY complex! If in doubt, watch this (see also
) - and this is just our current model of a very small piece of the puzzle...
So what do we do? Well the systems of life are very very very very complex (and still poorly understood), but life seems to be quite efficient in that the complex systems are made up of a small number of
of things like proteins, molecules, DNA and so on; and if we can work out how to represent these basic entities we can start to piece them together into more complex entities. But STILL we are in the infancy of this subject: for instance, a single cell is far too complicated for us to fully understand or represent, except incompletely or at high levels of abstraction. We are discovering new things all the time - e.g.
(see also on
Yet, even with these basic representations, we can do some amazing science on computers that wouldn't be possible without them.
Some things to bear in mind:
Our understanding of the operation of living things has advanced immensely in the last few decades, but the more we find out the more we realize there is to discover in the future
Most medical research is wrong
When we are representing life science entities, we are generally representing a model of the thing, not the thing itself (e.g. a 3D chemical structure is a model of a "real" molecule)
Informatics domains are built based on a set of representations, then algorithms that can be applied to those representations (e.g. bioinformatics = proteins, DNA, RNA, etc; cheminformatics=2D and 3D chemical structures; genomics=genes)
Here are some things we can do:
Represent chemical structures (atoms,bonds), proteins (atoms,bonds,amino acids), and DNA (codons/base pairs)
Represent biological pathways involving these entities
Store and search all of the above in databases
Store and search (ish) scientific publications
Store and search biomedical information - epidemiology, Electronic Medical Records, etc
Make predictions - activities of chemical compounds, protein function, protein structure, disease-gene associations, etc, etc.
Increasingly - map these to individual people (personalized medicine)
So let's take look at some of those entities
Chemical structures / small molecules
And let's look at how they are represented (for a more detailed description see
File Not Found
Remember from our first class, the 2D chemical structure for Aspirin?
Now look again... does the construct look familiar? What is this mathematically?
Now let's look at a 3D structure...
Proteins and polypeptides
Proteins are really just big molecules, but they are made up of repeating units (encoded by DNA) called
(or residues). In human beings and animals, there are 20 amino acid units. This means that as well as thinking of
in terms of atoms and bonds, we can consider
(often called a "sequence"),
Nature has been kind to us: the protein sequence is very computer-friendly, as it is really a "language" with 20 words, and a protein at this level is a string of these 20 words. We even already have a
. For example here is a primary sequence of Tat, a protein involved in HIV:
DNA & RNA
gets even easier - it's a language with 4 letters in a string (ACTG)
uses (almost) the same encoding system
So we can simply represent a DNA sequence as a string of text, or in small numbers of binary bits (how many do we need)?
Applying algorithms to the representation
Once we have these representations of chemicals, proteins and DNA we can do things...
search on the
page. We can use a Swissprot ID e.g. A0MPN3
BLAST database format
Chemical structure search on PubChem
- a protein-ligand complex for HIV Protease with a bound inhibitor from the
Protein Data Bank
for this complex
Chemical structures in the PubChem database projected into 3 dimensions and labelled with inferred disease relationships
Network relationships of PubChem compounds to diseases as visualized in Cytoscape
help on how to format text
Turn off "Getting Started"