Universal metric properties of the genetic code

Štambuk, Nikola (2000) Universal metric properties of the genetic code. Croatica Chemica Acta, 73 (4). pp. 1123-1139. ISSN 0011-1643

PDF - Published Version
Download (126kB)

Official URL: http://public.carnet.hr/ccacaa/co004.html

Abstract

Universal metric properties of the genetic code (i.e. RNA, DNA and protein coding) are defined by means of the nucleotide base representation on the square with vertices U or T = 00, C = 01, G = 10 and A = 11. It is shown that this notation defines the Canter set and Smale horseshoe map representation of the genetic code, the classic table arrangement and Siemion one-step mutation ring of the code. Gray code solutions to the problem of defining codon positions on the [0, 1] interval, and an extension to the octal coding system, based on the linear block triple check code, are given. This result enables short block (word) decoding of the genetic code patterns. The block code is related to the minimization of errors during transcription and translation processes, which implies that the genetic code is error-correcting and not degenerate. Two algorithms for the representation of codons on the [0, 1] interval and the related binary trees are discussed. It is concluded that the ternary Canter set algorithm is the method of choice for this type of analysis and coding. This procedure enables the analysis of the six dimensional hypercube codon positions by means of a simple time series and/or 'logistic' difference equation. Finally, a unified concept of the genetic code linked to the Canter set and horseshoe map is introduced in the form of a classic combinatorial 4 colour necklace model with three horizontal frames consisting of 64 coloured pearls (bases) and vertically hanging decorations of triplets (codons). Three horizontal necklace frames define Crick's code without comma, and vertical necklace decorations define the evolutional code. Thus, the type of the code depends on the level or direction of observation. The exact location of the mRNA and complementary DNA coding groups of triplets within a frame is determined. The latter enables decoding of long code block (language) patterns within the genetic code. This method of genetic code analysis is named Symbolic Canter Algorithm (SCA). The validity of the method was confirmed by 94% accurate classification of 50 proteins of known secondary structure (25 alpha -helices and 25 beta -sheets) with the C5.0 machine learning system. Nucleotide strings of proteins transcribed by SCA were used for the analysis. Spectral Fourier analysis of Pro-opiomelanocortin and Bone Morphogenetic Protein 6 confirmed that the method might be also applied to the analysis of bioactive hormone and cytokine sequences.

Item Type:

Article

Uncontrolled Keywords:

Cantor set; symbolic dynamics; SCA; Gray code; genetic code; necklace; protein; secondary structure; C5.0; machine learning; spectral analysis; prediction

Subjects:

NATURAL SCIENCES > Chemistry

Divisions:

NMR Center

Projects: