A database of proteins with topological links

This database collects information about links – topologically non-trivial structures made by single, double, or triple chains, and complexes of chains (e.g. capsids), and represents their complexity using the minimal surface area method. LinkProt assembles both deterministic links (with loops closed e.g. by two disulfide bonds) and links formed stochastically. The first examples of proteins forming links were found in 1989 [1], and many more have been identified in recent years [2]. The first macrolink was found in 1991 [3] and deposited in the RCSB in 2002 [4]. Our database shows that links are much more common and more complex, can exist in single chains, and can be formed by three chains [5]. This database assembles around 350 protein chains with at least 30% of probability of forming links identified from the set of 120000 structures deposited in the RCSB.

Deterministic links are detected based on covalent loops using the single loop detection method used in [6]. Links involving a few proteins are detected based on the stochastic closure method introduced in [7]. The topological and geometrical complexity is represented by the minimal surface method used in [6], which enables a full classification of links in proteins, taking into account their chirality, and the aminoacids or domains which are threaded. The database also presents all the information about any detected link graphically, in a three-dimensional (with a minimal surface spanned on each loop), as well as with a sequential representation, and displays information about proteins based on other biological databases (PUBMED, DOI, RCSB, CATH, PFAM, EC). This information can be downloaded in a format that enables the visualization of minimal surfaces in VMD or Mathematica. In the case of stochastic links, the total likelihood of appearance of a particular type is presented in an interactive circle supported by all the information as in case of deterministic links.

To our knowledge this is the first database about links formed by single, double, triple protein chains or by a network of protein chains. This database is compatible with other databases and servers about non-trivial topology in proteins, such as KnotProt (http://knotprot.cent.uw.edu.pl/) and LassoProt (http://lassoprot.cent.uw.edu.pl/) facilitating its easy use in conjunction with them.

There are four main new tools in this database in comparison with other databases analyzing non-trivial topology. 1) The database contains information about the non-trivial topology between more than one chain which can be formed in two ways: 2) either by disulfide bond or by a stochastic closure. 3) The database shows the full likelihood of non-trivial topology with all corresponding topological data. 4) The database (server) is able to detect links in single chains (closed by two covalent loops). In comparison to other databases such as Protein Knots (http://knots.mit.edu/) or pKNOT (http://pknot.life.nctu.edu.tw): 1) our database offers search options using molecular keywords, molecular tags, Pfam identifier, and CATH topology (these options are unavailable in other databases), and 2) our database detects broken protein chains and uses this information in the analysis.

There are many possible applications of the data assembled in the database: one can detect if a given protein or complex (capsid) possesses the non-trivial topology of a link; one can characterize the probability of linking occurring in various reaction coordinates (e.g. native contacts or RMSD); one can identify homological proteins with the same or different topology; one can discover the function of linking based on an analysis of all available biological or structural data attached to each deposited link; etc. We are convinced that the database will be of great advantage to many users and will prove to be useful in many more applications.

[1] BN Violand, M Takano, D.F. Curran, L. A.Bentle, Journal of Protein Chemistry, 8, 5, 1989
[2] DR Boutz , Cascio D, Whitelegge J, Perry LJ and Yeates TO. J.Mol. Bio., 2007, 368, 1332-1344.
[3] RL Duda, Cell 94, 55, 1998.
[4] William R. Wikoff, Lars Liljas, Robert L. Duda, Hiro Tsuruta, Roger W. Hendrix, John E. Johnson. Science, 289, 5487, 2129-2133, 2002.
[5] P Dabrowski-Tumanski, JI Sulkowska Are there links in proteins? - submitted
[6] P Dabrowski-Tumanski, W Niemyska, P Pasznik, JI Sulkowska, NAR 10.1093/nar/gkw308
[7] Sulkowska JI, Rawdon EJ, Millett KC, Onuchic JN and Stasiak A (2012), Proc. Natl. Acad. Sci. U.S.A. 109, E1715–E1723


This database has been created in a joint collaboration between: Paweł Dąbrowski-Tumański* [1], Aleksandra I. Jarmolińska* [1], Wanda Niemyska* [2], Eric Rawdon [3],Ken Millett [4] Joanna I Sułkowska [1] with a help of Grzegorz Rajchel [1] and Joanna Macnar [1].

1. University of Warsaw; 2. University of Silesia; 3. University of St Thomas; 4. University of California, Santa Barbara


The research leading to creation of this database has been supported by: National Science Center [grant agreement Sonata BIS 2012/07/E/NZ1/01900 to J.S. and Preludium 2016/21/N/NZ1/02848 to P.D.T]; University of Warsaw, Faculty of Chemistry [120000-501/86-DSM-110200 to P.D.T.], Foundation of Polish Science [grant agreement Inter 130/UD/SKILLS/2015 to W.N.]; National Science Foundation [Division of Mathematical Sciences, #1418869 to E.R.].

LinkProt | Interdisciplinary Laboratory of Biological Systems Modelling