IntroductionNumberingSequencesStructuresModellingMacrosPublicationsLinks
ColoringSequence StatisticsRenumberingAccessibilityTorsion AnglesStructural VariabilityHydrogen BondsDownloads

Renumbering PDB Files:

The renumbering macros take a number of PDB-Files in a directory (no more than 250, since the size of an EXCEL workbook is limited to 255 worksheets), read the ATOM and HETATM records of each file into an EXCEL worksheet, and extract the sequences, chain and residue labels into separate worksheets. The user can then process (gap) the sequence alignment manually or export it to gap it automatically with programs such as the GCG module PILEUP and reimport the sequence alignment into the EXCEL workbook. Further macros will then change the chain and residue labels of the PDB files according to a common numbering scheme indicated in the header row (for sequence alignments shorter than 250 amino acids) or in the header column (for sequence alignments longer than 250 amino acids) of the alignment and reexport the PDB coordinate files.

Click the "HowTo" button for step by step instructions on how to renumber a set of PDB files according to a common residue numbering scheme

HowToRenumber

Macros:

PDB_GetFiles reads the PDB coordinate files whose filenames are listed in the first column of the worksheet "Files" into separate worksheets of the active workbook

Problem: Under Windows, the xxx_GetFile utilities do not keep the file extensions of the imported files in the worksheet name and therefore do not recognize the individual worksheets properly. For the moment, these macros will only work correctly if you remove the .pdb filename extensions from the filenames. I will correct the problem as soon as possible.

PDB_CollectData collects the sequence information, residue labels and chain identifiers of all structures into three worksheets called "SEQ", "LABEL" and "CHAIN"

PDB_ExtractSeq Converts the vertical sequence alignment (in three-letter code) in worksheet "SEQ" into a horizontal one in worksheet "ALIG"

The resulting sequence alignment can now be gapped by the user to indicate the desired alignment to the residue labels and chain identifiers entered in the first and second row of the worksheet "ALIG"

PDB_GapFromAlig propagates the gapping and labelling information from the alignment to worksheets "SEQ", "LABEL" and "CHAIN"

PDB_GapFromLbl gaps worksheets "SEQ", "LABEL" and "CHAIN" as specified by the residue label. This gapping is retained as the sequence alignment is extracted by PDB_ExtractSeq.

PDB_GapfromSeq has to be used for sequences longer than 250 residues. In this case, the manual gapping is applied to the "SEQ" worksheet and propagated to the other two worksheets by this macro.

PDB_reLbl relabels the residues according to the information contained in the "SEQ", "LABEL" and "CHAIN" worksheets

PDB_PutFiles exports the relabelled PDB Files as individual ASCII text files

PDB_DeleteWater deletes water records (ATOM or HETATM recods with residue labels "WAT", "HOH" or "H2O")

Limitations: "PDB_ExtractSeq" can only be used if the final, gapped length of the sequence alignment is less than 250 amino acids, since EXCEL worksheets cannot contain more than 255 columns. For longer alignments, the gapping and labelling information has to be entered directly into the worksheet "SEQ", in which the sequences are listed vertically. The sequences in this worksheet can be converted to one-letter code using the macro "AA_Convert_3to1". The residue labels and chain labels in this case have to be entered in columns 1 and 2 of worksheet "SEQ". The gapping information is propagated to the other two worksheets using the macro "PDB_GapfromSeq". After this, "PDB_reLbl" and "PDB_PutFiles" can be used as they are used for shorter alignments.

"PDB_CollectData" and "PDB_reLbl" recognize the begin of a new residue by the atom label "N". This fails if the atoms of a residue are not listed in the order specified in the PDB ATOM record definition or if multiple positions are indicated for the same residues. Alternative ways to recognize the begin of a new residue in PDB files which are not properly relabelled by the normal macros are indicated in the description of the macros.

Verification:To make sure the renumbering was correct, PDB Files which have been numbered according to a purely numeric scheme can be imported with "PDB_GetFiles", the sequence collected with "PDB_CollectData", and the "SEQ", "LABEL" and "CHAIN" worksheets gapped with "PDB_GapFromLbl". "PDB_ExtractSeq" should then produce a properly gapped alignment.

AAAAA Homepage Zürich University Dept. of Biochemistry Plückthun Group Annemarie Honegger

Last Modified by A.Honegger Wednesday, January 26, 2005