1. Installation
  2. First run
  3. Sampling mutants and structures
  4. Bounding the number of mutations
  5. Sequence constraints
  6. Structure constraints
  7. Parameterized sampling
  8. Prediction of deleterious mutations
  9. Designing RNAs

Installation

Unzip the tarball and move in the RNAmutants directory newly created. Then run './RNAmutants --help' in your terminal to check that the binary works on your computer. If you cannot find a distribution running your system, please email RNAmutants@csail.mit.edu.

First run

First, You need to localize the directory containing the energy model parameters named 'lib/'. By default it is located at the root of the 'RNAmutants' directory. Assume that we also have a toy sequence 'test.fasta' in fasta format. For instance a poly-A with 10 nucleotides. Now, run the basic command-line './RNAmutants --library --input-file test.fasta'. The program outputs:

0 1.00000000000000e+00
1 3.00703570240298e+01
2 4.08573799132498e+02
3 3.31117158303557e+03
4 1.79081460806700e+04
5 6.92429931640950e+04
6 2.03482897080526e+05
7 4.55285233708172e+05
8 6.72467679437443e+05
9 5.13143041407800e+05
10 1.63418257402122e+05
>> Superoptimal structure(s):
> 0-superoptimal structure: 0.0
AAAAAAAAAA
..........
> 1-superoptimal structure: 0.0
AAAAAuAAAA
..........
> 2-superoptimal structure: 0.0
AAAAAAuAAu
..........
> 3-superoptimal structure: 0.0
AugAAAAAAc
..........
> 4-superoptimal structure: -1.500000
ucgAAAgAAA
((....))..
> 5-superoptimal structure: -2.900000
ucugAAAAgA
(((....)))
> 6-superoptimal structure: -4.400000
uccgAAAggA
(((....)))
> 7-superoptimal structure: -5.500000
gccgAAAggc
(((....)))
> 8-superoptimal structure: -5.500000
gccgAgAggc
(((....)))
> 9-superoptimal structure: -5.500000
gccgcgAggc
(((....)))
> 10-superoptimal structure: -4.800000
gccuucgggc
(((....)))
  

The first eleven lines display an integer number representing the number of mutations in the sequences, followed by the partition function value computed over all these mutants.

The following part of the output shows the superoptimal sequences and structures. For a given number of mutations k, the k-superoptimal sequence is the sequence with the minimum free energy among all sequences with k mutations. Consequently, the k-superoptimal structure is the secondary structure that realizes this minimum free energy on the k-superoptimal sequence. These results are display in 3 lines. The first one starting with the character '>' recalls the number of mutations k and gives at the end the value of the minimum free energy in kCal/mole. The second line show the k-superoptimal sequence where mutations are annotated with lower-case letters. The third line give the k-superoptimal secondary structure.

Sampling mutants and structures

RNAmutants allows to sample mutant sequences with a secondary structure from the Boltzmann low energy ensemble. In other words, mutants folded in a secondary structure with a good free energy will be more represented. Unlike the the k-superoptimal output, this procedure allows to observe the diversity of sequences and structures that are likely to be realized. Hence observing the competion between mutations and/or structures.

We can produce samples using the option '--sample-number' followed by the number of desired samples. For example, we can sample 10 mutant sequences with their secondary structure with the extended command-line: './RNAmutants --library lib/ --input-file test.fasta --sample-number 10'. The results are concatenated at the end of the output. A possible output is:

>> Sampling 10 RNAmutants:
> sampling 10 mutant(s) and secondary structure(s)
uAgcggAAgu
..........
ccuuuAgAAc
..........
uucAAgcAgg
.((.....))
ggcAuuugcu
(((....)))
cgcgAccgcc
.((....)).
cAgugAgucu
..........
gccuccgggc
(((....)))
ccAggggggu
((...))...
cgcAuuugcg
(((....)))
gggggccugu
((....))..
  

The pairs, mutant sequences and secondary structures, are here displayed in two lines. In that specific case, it is easy to see that this procedure provides us a better view of the sequences and structure which are more likely to be realized.

Bounding the number of mutations

In some cases, it could be preferable to restrict the number of mutations allowed to values smaller than the length of the input sequence. For instance, because the input sequence is so long that an exploration of the full mutation landscape would be too long. Or simply because we do not need the predictions for large values of k.

This upper bound can be fixed using the option '--mutation' followed by the maximum number of mutations allowed. Hence, the command-line for limiting the number mutations to 1 is './RNAmutants --library lib/ --input-file test.fasta --mutation 1', and outputs the truncated results:

0 1.00000000000000e+00
1 3.00703570240298e+01
>> Superoptimal structure(s):
> 0-superoptimal structure: 0.0
AAAAAAAAAA
..........
> 1-superoptimal structure: 0.0
AgAAAAAAAA
..........
  

Sequence constraints

The first type of constraints allows to indicate wich nucleotides can mutate or not. We need for this purpose to store the index to not mutate in a file. The syntax used requires to start each line with the character 's' followed by the index to freeze. This filename is then passed as an argument through the option '--sequence-constraint'.

In the following example we freeze the nucleotides at position 1, 2 and 6. The sequence constraint file is named 'seq.cst', and the corresponding command-line is './RNAmutants --library lib/ --input-file test.fasta --sequence-constraint seq.cst --mutation 7'. Note that the maximal number of mutations allowed has been setup to 7 since the sequence is of length 10 with 3 frozen index. The output is:

0 1.00000000000000e+00
1 3.00500300336935e+01
2 4.06849718118601e+02
3 3.26560578553631e+03
4 1.72279298797619e+04
5 6.22351793040745e+04
6 1.54920450795251e+05
7 2.62883867948783e+05
>> Superoptimal structure(s):
> 0-superoptimal structure: 0.0
AAAAAAAAAA
..........
> 1-superoptimal structure: 0.0
AAAAuAAAAA
..........
> 2-superoptimal structure: 0.0
AAAcAAAgAA
..........
> 3-superoptimal structure: 0.0
AAcgAAcAAA
..........
> 4-superoptimal structure: -1.500000
AAucgAAAgA
..((....))
> 5-superoptimal structure: -2.600000
AAgcgAAAgc
..((....))
> 6-superoptimal structure: -2.600000
AAgcgcAAgc
..((....))
> 7-superoptimal structure: -1.900000
AAgcuAcggc
..((....))
  

Obviously the same constraints also applied to the output of the sampling engine.

Structure constraints

It is also possible to constrain the folding and force certain positions to base-pair or remain unpaired. These constraints are stored in bracket format. An open and close parenthesis force two nucleotides to base pair, while a dot ensure that the position will remain unpaired. A question mark leave the position unconstrained. We input the constraints through the option '--structure-constraint' followed by the name of the file storing the constraints in bracket format.

Here we constrain the sequence to base pair between index 2 and 8, and to remain unpaired at position 7. In bracket format, these constraints are written as: '?(????.)??'. Assuming we stored this string in a file named 'struct.cst', the command-line is './RNAmutants --library lib/ --input-file test.fasta --structure-constraint struct.cst'. Then , the program outputs:

0 0.00000000000000e+00
1 6.90673726836192e-03
2 2.83930814773348e-01
3 5.65049304634979e+00
4 6.86603452675092e+01
5 5.19092792013716e+02
6 2.43161585503786e+03
7 6.95926720134349e+03
8 1.16415344467086e+04
9 1.01630588133321e+04
10 3.37155688343698e+03
>> Superoptimal structure(s):
> 0-superoptimal structure
None: no secondary structure satisfy the constraints.
> 1-superoptimal structure: 3.400000
AuAAAAAAAA
.(.....)..
> 2-superoptimal structure: 2.500000
uuAAAAAAAA
((.....)).
> 3-superoptimal structure: 0.700000
ucAAAAAgAA
((.....)).
> 4-superoptimal structure: -0.400000
gcAAAAAgcA
((.....)).
> 5-superoptimal structure: -1.100000
gcAAAAggcA
((.....)).
> 6-superoptimal structure: -1.100000
gcAAAAggcu
((.....)).
> 7-superoptimal structure: -1.100000
gcAgAAggcu
((.....)).
> 8-superoptimal structure: -1.100000
gcAAucggcu
((.....)).
> 9-superoptimal structure: -1.100000
gcAcccggcu
((.....)).
> 10-superoptimal structure: -0.900000
gcgugucgcu
((.....)).
  

Parameterized sampling

Users can force the sampling engine to sample sequences with a fixed number of mutations. The command is stored in a file with a two column format where the first column are integer indicating the number of mutations and the second one the number of samples desired. The option '--sample-command' is used to pass this command-file to the program.

Here, we generate 10 samples with 4, 5 and 6 mutations. The command is stored in a file named 'param.smp' and the command-line is './RNAmutants --library lib/ --input-file test.fasta --sample-command param.smp'. A potential output is:

>> Sampling RNAmutants from file param.smp:
> sampling 10 sequence and secondary structure(s) with 4 mutations
AAAucAuuAA
..........
AAcAAggAAu
....(....)
guAAAgAAuA
..........
guAAAuAAcA
..........
gcAAgAAAAg
..........
AgAAAguAAg
..........
AAcuuAAgAA
..........
AgAccAAuAA
..........
AAAcgAgAAu
..........
AAucAucAAA
..........
> sampling 10 sequence and secondary structure(s) with 5 mutations
AAAAAcgggg
..........
AcuAgAguAA
..........
AAgAuAcugA
..........
gcAAuAcuAA
..........
AAccgAAAgg
..((....))
AuAAAugAgu
..........
ccAcAcAgAA
..........
gAguAAAgAu
..........
cuAgAAAuAg
(((....)))
ucugAAAAgA
(((....)))
> sampling 10 sequence and secondary structure(s) with 6 mutations
AAAuuuuccA
..........
uccgAAAggA
(((....)))
uAcgAAAggu
.((.....))
ccggAAAcAg
..........
uAAgAcuAgu
..........
cuAAcgAcAu
..........
uAAAcugcAc
..........
cAuugAgcAA
..........
ggAAAAuucc
(((....)))
gccAAAAggc
(((....)))
  

Prediction of deleterious mutations

The method used in (Waldispühl et al., 2008) propose to sample mutants sequences and sort them according to the base pair distance between the native structure and the secondary structure sampled together with the mutant. The mutations with with the largest base pair distance are naturally ranked first.

This method allows to identify quickly the mutations with the highest deleterious effect. Moreover, since our search is not restricted to single mutations and include any k-mutants, we can also potentially detect the deleterious potential of groups of mutations which are not deleterious when they occur individually.

A python script parsing the output and running this prosedure will soon be available on this website for register users.

Designing RNAs

Structure constraints can be efficiently used for designing RNA sequences. For instance, here we aim to design sequences folding as a single stem with 3 base pairs: (((....))). Assuming that we have stored this structure in a file named 'design.cst'. We design the RNA sequences by running the command './RNAmutants --library lib/ --input-file test.fasta --structure-constraint design.cst' and the output is:

>> Superoptimal structure(s):
> 0-superoptimal structure
None: no secondary structure satisfy the constraints.
> 1-superoptimal structure
None: no secondary structure satisfy the constraints.
> 2-superoptimal structure
None: no secondary structure satisfy the constraints.
> 3-superoptimal structure: 1.900000
uAuAAAAAuA
(((....)))
> 4-superoptimal structure: -0.900000
uAugAAAAuA
(((....)))
> 5-superoptimal structure: -2.900000
ucugAAAAgA
(((....)))
> 6-superoptimal structure: -4.400000
uccgAAAggA
(((....)))
> 7-superoptimal structure: -5.500000
gccgAAAggc
(((....)))
> 8-superoptimal structure: -5.500000
gccggAAggc
(((....)))
> 9-superoptimal structure: -5.500000
gccgcgAggc
(((....)))
> 10-superoptimal structure: -4.800000
gccuccgggc
(((....)))
  

We can use the superoptimal sequences to find a sequence folding in the desired shape. By looking at the superoptimal energies it is also possible to adjust the thermodynamical stability of the structure.

If necessary, a larger collection of sequences can be generated using the sampling procedure. A parameterized sampling as showed above is then used. An example of a command-line to run for this purpose is: './RNAmutants --library lib/ --input-file test.fasta --structure-constraint design.cst --sample-command param.smp'. The samples in the output can then be used as candidate sequences:

>> Sampling RNAmutants from file param.smp:
> sampling 10 sequence and secondary structure(s) with 4 mutations
uAuAAcAAuA
(((....)))
AAAgAAAuuu
(((....)))
AAugAAAAuu
(((....)))
ugAAAAAucA
(((....)))
AAugAAAAuu
(((....)))
AuAgAAAuAu
(((....)))
uAAgAAAuuA
(((....)))
uuugAAAAAA
(((....)))
uAAgAAAuuA
(((....)))
uAugAAAAuA
(((....)))
> sampling 10 sequence and secondary structure(s) with 5 mutations
AuAgcAAuAu
(((....)))
AuggAAAcAu
(((....)))
cuAuAAAuAg
(((....)))
gAcAAAAguc
(((....)))
uucgAAAgAA
(((....)))
uuggAAAcAA
(((....)))
AcugAAAAgu
(((....)))
AgugAAAAcu
(((....)))
ucugAAAAgA
(((....)))
uAggAAAcuA
(((....)))
> sampling 10 sequence and secondary structure(s) with 6 mutations
AgcAAAcgcu
(((....)))
ugcgAAAgcA
(((....)))
gcugAAAAgc
(((....)))
AgcgAAAgcu
(((....)))
ucgAAAucgA
(((....)))
ugAguAAucA
(((....)))
ucAgAgAugA
(((....)))
uAcgcAAguA
(((....)))
ugcgAAAgcA
(((....)))
uuggAgAcAA
(((....)))