First Previous Next

Step 6: Gapping and Labelling

There is no macro for automated gapping and labelling (yet). The user has to supply the information to the program. For alignments containing only a few sequences, manual gapping by inserting cells in the appropriate position may be fastest, large alignments can be exported to be processed be external programs (GCG Pileup, Clustal X, etc.). After gapping, the header has to be filled to indicate the desired common residue nomenclature: the chain label in the first row, the residue labels in the second, insertion code, if necessary, in the third row. Formatting of these rows is ignored. Sequences start in the fourth row. Their order should not be changed.

The sequence alignment can be exported as space delimited text after entering two "." in the first cell of the first row, to generate a ".pretty" file, which can be converted to individual sequences using the GCG Wisconsin package module "PRETTY", which can be gapped using "PILEUP" and prepared for reimport using "PRETTY" again to generate a text file which can be reimported into EXCEL as described in the "HowTo"-File for the coloring macros.

Procedure:

  • Save alignment worksheet as space delimited text (for example, named seq.pretty). The first line of this text file has to be empty except for "..", indicating to the GCG programs that the following lines contain sequence information
  • Transfer the file to the computer on which GCG is installed and invoke GCG
  • the command "pretty -ugly seq.pretty" converts the alignment file into indivdual sequence files
  • "pileup *.ugly" produces a gapped multiple alignment, returned as a .msf file, for example seq.msf.
  • "pretty -lin=100 -blo=100 seq.msf{*}" reformats the sequence in such a way that it can easily be reimported into EXCEL
  • Transfer the resulting file back to the computer on which you run EXCEL
  • Import the file, using the "fixed" option to separate the sequence names from the sequence, as described in the "HowTo"-File for the coloring macros.
  • Distribute the residues of the sequences into individual cells using "AA_Parse". You might have to cut-and-paste multiple sequence blocks generated by pretty into the appropriate places to have each sequence in a single row again.
  • Some manual fine-tuning of the sequence alignment may be needed. Macro "AA_CDRarrange" can be used to coalesce and center multiple gaps gaps in the appropriate positions
  • Fill in the header rows to indicate the desired numbering scheme

Last Modified by A.Honegger Wednesday, January 26, 2005