IntroductionNumberingSequencesStructuresModellingMacrosPublicationsLinks
ColoringSequence StatisticsRenumberingAccessibilityTorsion AnglesStructural VariabilityHydrogen BondsDownloads

Macro AA_Parse_3Letter

Converts a string in a single cell into tripletts of characters in consecutive cells, assuming one separation character, e.g. Amino acid sequences in 3-letter-code or nucleotide tripletts. Several cells in the same column can be selected to convert an entire sequence alignment in one run.

Usage:

  • Select the range of cells containing the sequences to be parsed (only one column)
  • Choose "Macro" for the "Tool" menu
  • Run macro AA_Parse_3Letters
 
Caution:
Cells to the right of the selected cells will be overwritten by the parsed sequence.
The original Sequence will be deleted.
Sequences containing more than 250 amino acids will be truncated
Sub AA_Parse_3Letter()

   'converts a string in a single cell into tripletts of characters in consecutive cells, assuming one separation character
    'e.g. Amino acid sequences in 3-letter-code or nucleotide tripletts
    'you can select a range of cells within the same column
    'do not select more than one column, cells to the right of this column will be overwritten

    If Selection.Columns.Count = 0 Then      'Error, nothing selected
        MsgBox Prompt:="No cells selected, please select the sequences you wish to parse"
        Exit Sub
    End If
    
    If Selection.Columns.Count > 1 Then      'Error, more than one column selected
        MsgBox Prompt:="More than one column selected, select only one column"
        Exit Sub
    End If
   
    i1 = Selection.Row
    i2 = i1 + Selection.Rows.Count - 1
    j = Selection.Column
    n = 0                                               'length of longest sequence

    For i = i1 To i2 Step 1                             'for all selected cells
        aaa = Cells(i, j)
        
            If TypeName(aaa) = "String" Then
                a = Len(aaa)
                
                If (a / 4) > 254 - j Then               'Not enough cells available for the entire sequence
                    a = 1000 - j
                    Msg = "The sequence is too long, only the first " & a & " amino acids used"
                    MsgBox Prompt:=Msg
                End If
                
                    If (a / 4) > n Then n = (a / 4)     'determine length of longest sequence
                    
                    For k = 1 To (a / 4) Step 1         'parse the sequence
                        Cells(i, j + k) = Mid(aaa, (4 * k) - 3, 3)
                    Next k
            Else
                                                        'Error, is not a character string
            End If
    
    Next i
    Selection.Delete Shift:=xlToLeft                    'Delete original strings
    Range(Cells(i1, j), Cells(i2, j + n)).Select        'select sequence alignment
    Selection.ColumnWidth = 3                           'set column width to 3

End Sub
AAAAA Homepage Zürich University Dept. of Biochemistry Plückthun Group Annemarie Honegger

Last Modified by A.Honegger Wednesday, January 26, 2005