PDB ATOM and HETATM Format DescriptionThe renumbering macros interprete the PDB file ATOM and HETATM records as specified in the PDB Format Description Version 2.2, other record types are deleted, since they could be corrupted by the EXCEL input parsing routine. Each field of the ATOM/HETATM record is read into a different cell of the EXCEL worksheet and the cell formated in such a way that the correct PDB format is regenerated as the content of the worksheet is saved as "space delimited text" ATOM:Overview The ATOM records present the atomic coordinates for standard residues. They also present the occupancy and temperature factor for each atom. Heterogen coordinates use the HETATM record type. The element symbol is always present on each ATOM record; segment identifier and charge are optional. Record Format COLUMNS DATA TYPE FIELD DEFINITION EXCEL COLUMN SIZE, TYPE AND FORMAT ------------------------------------------------------------------------------------------------------------------------- 1 - 6 Record name "ATOM " A (1) String (6) 7 - 11 Integer serial Atom serial number. B (2) Integer (5) 12 C (3) not used (1) 13 - 16 Atom name Atom name. D (4) String (4, left justified) 17 Character altLoc Alternate location indicator. E (5) String (1) 18 - 20 Residue name resName Residue name. F (6) String (3, left justified) 21 G (7) not used (1) 22 Character chainID Chain identifier. H (8) String (1) 23 - 26 Integer resSeq Residue sequence number. I (9) Integer (4) 27 AChar iCode Code for insertion of residues. J (10) String (1) 28 - 30 K (11) not used (3) 31 - 38 Real(8.3) x Orthogonal coordinates for X in L (12) Real (8, XXXX.XXX) Angstroms. 39 - 46 Real(8.3) y Orthogonal coordinates for Y in M (13) Real (8, XXXX.XXX) Angstroms. 47 - 54 Real(8.3) z Orthogonal coordinates for Z in N (14) Real (8, XXXX.XXX) Angstroms. 55 - 60 Real(6.2) occupancy Occupancy. O (15) Real (6, XXX.XX) 61 - 66 Real(6.2) tempFactor Temperature factor. P (16) Real (6, XXX.XX) 67 - 72 Q (17) not used (6) 73 - 76 LString(4) segID Segment identifier, left-justified. R (18) String (4, left justified) 77 - 78 LString(2) element Element symbol, right-justified. S (19) String (2, right justified) 79 - 80 LString(2) charge Charge on the atom. T (20) String (2, left justified) Details * ATOM records for proteins are listed from amino to carboxyl terminus. * Nucleic acid residues are listed from the 5' to the 3' terminus. * No ordering is specified for polysaccharides. * The list of ATOM records in a chain is terminated by a TER record. * If more than one model is present in the entry, each model is delimited by MODEL and ENDMDL records. * For more information on atom naming conventions, see Appendix 3, and for residue names, see Appendix 4 and the HET section of this document * If an atom is provided in more than one position, then a non-blank alternate location indicator must be used as the alternate location indicator for each of the positions. Within a residue all atoms that are associated with each other in a given conformation are assigned the same alternate position indicator. * For atoms that are in alternate sites indicated by the alternate site indicator, sorting of atoms in the ATOM/HETATM list uses the following general rules:
* Addition of atoms to side chains of standard residues are handled as follows:
* Chemical modifications of standard residue side chains by addition of new atoms are handled as follows:
* The insertion code is commonly used in sequence numbering and is described here. In most cases, the amino acids that comprise a protein are numbered sequentially starting with 1. However, there are a number of situations that may give rise to different numbering schemes:
REFERENCE PROTEIN NUMBERING HOMOLOGOUS PROTEIN NUMBERING --------------------------------------------------------------------- 59 59 60 60 61 62 62 REFERENCE PROTEIN NUMBERING HOMOLOGOUS PROTEIN NUMBERING --------------------------------------------------------------------- 85 85 86 86 86A 86B 87 87
* If the depositor provides the data, then the isotropic B value is given for the temperature factor. * If there is no isotropic B value from the depositor, but there is an ANISOU record with anisotropic temperature factors, then the B equivalent is stored in the tempFactor field, as calculated by:
* If there are neither isotropic B values from the depositor, nor anisotropic temperature factors in ANISOU, then the default value of 0.0 is used for the temperature factor. * In some entries, the occupancy and temperature factor fields are used for other quantities. In these cases, an explanation is provided in the remarks. * Columns 73 - 76 identify specific segments of the molecule. The segment id is a string of up to four (4) alphanumeric characters, left-justified, and may include a space, e.g., CH86, A 1, NASE. The segment itself may consist of a complete chain or a portion of a chain. The importance of this new field can be appreciated if one considers an antibody structure having two molecules in the asymmetric unit. Since each chain must have a unique chain identifier, the two heavy chains and two light chains cannot currently be labeled to indicate their nature. Segment id's of CH, VH1, VH2, VH3, CL, and VL would clearly identify regions of the chains and the relationship between them. Users of X-PLOR will be familiar with SEGID as used in the refinement application of X-PLOR. * Columns 77 - 78 contain the atom's element symbol (as given in the periodic table), right-justified. This is especially needed because in some cases it has not been possible to follow the convention that columns 13 - 14 of the atom name contain the element symbol. The most common cases are:
* Columns 79 - 80 indicate any charge on the atom, e.g., 2+, 1-. In most cases these are blank. Verification/Validation/Value Authority Control PDB checks ATOM/HETATM records for PDB format, sequence information, and packing. The PDB reserves the right to return deposited coordinates to the author for transformation into PDB format. PDB intends to verify the coordinates against the experimental structure factor data in the when available. Details on this will be forthcoming. Relationships to Other Record Types The ATOM records are compared to the corresponding sequence database. Residue discrepancies appear in the SEQADV record. Missing atoms are annotated in the remarks. HETATM records are formatted in the same way as ATOM records. The sequence implied by ATOM records must be identical to that given in SEQRES, with the exception that residues that have no coordinates, e.g., due to disorder, must appear in SEQRES. Remark 550 is used to describe the meaning assigned to any segment identifiers used. Example 1 2 3 4 5 6 7 8 12345678901234567890123456789012345678901234567890123456789012345678901234567890 ATOM 145 N VAL A 25 32.433 16.336 57.540 1.00 11.92 A1 N ATOM 146 CA VAL A 25 31.132 16.439 58.160 1.00 11.85 A1 C ATOM 147 C VAL A 25 30.447 15.105 58.363 1.00 12.34 A1 C ATOM 148 O VAL A 25 29.520 15.059 59.174 1.00 15.65 A1 O ATOM 149 CB AVAL A 25 30.385 17.437 57.230 0.28 13.88 A1 C ATOM 150 CB BVAL A 25 30.166 17.399 57.373 0.72 15.41 A1 C ATOM 151 CG1AVAL A 25 28.870 17.401 57.336 0.28 12.64 A1 C ATOM 152 CG1BVAL A 25 30.805 18.788 57.449 0.72 15.11 A1 C ATOM 153 CG2AVAL A 25 30.835 18.826 57.661 0.28 13.58 A1 C ATOM 154 CG2BVAL A 25 29.909 16.996 55.922 0.72 13.25 A1 C Known Problems Due to the ever-increasing size of protein structures in the PDB, the atom serial number field may soon need to be increased. An increase of one column will allow for cases where entries have more than 99,999 atoms. Only 5 digits are available for the atom serial number, but some structures have already been received with more that 99,999 atoms. No distinction is made between ribo- and deoxyribonucleotides in the SEQRES records. These residues are identified with the same residue name (i.e., A, C, G, T, U). |
|
|||||||||||||||||
Last Modified by A.Honegger |