how to calculate the evolution of a protein

scruffy

Diamond Member
Mar 9, 2022
21,042
17,294
This issue apparently mystifies evolutionists and creationists alike. So I will explain it. Using math.

I will illustrate by example.

Let us say, we have an alphabet of 24 letters (amino acids), and let us say we are trying to build a word (protein) with the proper spelling (sequence) and the right meaning.

For this example we will use the word "abracadabra".

This problem, formally in mathematics, is a well known problem. It is called the "word problem". Here is the Wiki about it:


We can not simply multiply probabilities, what we have to is calculate all the ways the word can be put together, from individual letters AND from subsequences. (Subsequences having their own probabilities). For example, if we have pre-existing subsequences "abra", "ca", and "dab", what is the probability that we can complete the word by splitting off the "ra" from the first sequence and combining it with the last?

This is fundamentally an algebraic problem, and we can use the methods of group theory to address it.


If we're using a computer, we can use a "nested stack automaton" to do the work.


But wait, not so fast. Group theory tells us that some combinations are actually unsolvable (can not be calculated). At some point we run up against the Boone-Rogers theorem, which states that there is no uniform (partial) algorithm that solves the word problem in all finitely presented groups.


Which groups are solvable, and which are not?

The answer has to do with isomorphic copies, which are strings that resemble themselves. Palindromes, are an example. We can reference the work of Axel Thue, who was the first mathematician to study the problem in detail (around 1910). Thue studied "square free words", which do not have adjacent repeated factors. (Example: "dining" is not square free, "in" is an adjacent repeated factor).

In this case though, we have a set of unique forward looking sequences that are not adjacent. So the first question we ask, is how many ways can we factor the word. Which is simple combinatorics.

What our automaton does, is assign symbols to the subsequences, and operate on them using the (nested) stack. If you're familiar with the Forth computer language this process becomes intuitive. (You can also use lambda calculus, it's a little harder that way but it works).

Frank Ramsey also studied this problem, around 1930, by mapping the sequences to graphs. He came up with the concept of "unavoidable patterns", which play into loops in the computational process. Another important issue is "necklaces", which are circular sequences. Baudot studied these when he invented the Baudot code.

Our automaton builds a graph, made of vertices and edges. We label the edges with a letter in our alphabet. We build a "path" through the graph, from the initial vertex to the final vertex, traversing our letters. The path is the word. Some paths are impossible, which is why some words can not be calculated.

Fortunately in this case, we have a METRIC so we can measure the distance of our path. It's called a word metric. We use the word metric on the discrete group whose members are represented by the symbols in our alphabet.

To understand how this works, consider the word -3, and let's say it represents the tubulin alpha subunit. The most efficient way to represent the word (globular folding sequence) is -1--1-1. But there are other ways to get to the same word (equivalent globular protein).

This is how we get a METRIC for protein evolution, how we measure it. Once we have a metric we can get a norm (in group algebra this means the shortest length that gets the job done, that is to say gives us a homologous word).

This method is NECESSARY when studying biology, because there is more than one protein that can get the job done. We are interested in ALL the ways of getting the job done, "all possible paths" - and evolution almost always takes more than one path. Of the 100,000 proteins in a human, LESS THAN 100 are highly conserved, and these are the cases where it is likely that there is "only one" folding sequence that will get the job done. Everything else, has latitude - that is to say, will accept some limited mutations and continue to function. In English the words "yes", "yeah", "uh-huh", and "damn straight" all indicate the affirmative and all get the job done

This is the science around protein evolution.

The pHarmas have been synthesizing novel proteins for at least 40 years, and it is only in the last 5 years that they've begun using these methods. They wish to design molecules with the same shape that perform the needed function, but have different sequences that are easier to synthesize. This is how they do it.

This is a formal mathematical system that provides a metric and a norm, and IMPLIES a concept of angle via the graph embedding. This way scientists can deal with equations that predict in advance whether a protein will work in context.

The attractor for proteins is the FUNCTION, not the sequence. Some function is needed by the cell, and the purpose of evolution is to "find" such a function. Even if we know all the probabilities for point mutations and protein chain extensions, we still don't know if the result will perform the function. These methods give us the answer

More information on algebraic groups can be found here:

 
The pHarmas have been synthesizing novel proteins for at least 40 years, and it is only in the last 5 years that they've begun using these methods.

Excellent read! Just think what Percy Julian could do if alive today? He'd think he died and went to heaven.
 

Forum List

Back
Top