Sequence input

Enter your protein sequence to predict pathogen-associated domains.


Domain database

Explore individual Pfam domains in terms of pathogen-association and other statistics.

Full dataset

Download link for the collated information on all the domain families.

Visualizations and rankings of this data in the Domain database section and Domain rankings section, respectively, above).

Full dataset

Domain rankings

Explore lists of top-scoring Pfam domains according to pathogen-association and other criteria.


PathFams is a database of pre-computed analyses of Pfam domain families. The abundance of all 17,929 families in Pfam v. 32.0 was examined in:

  • Bioinformatic databases
  • Environments (human gut, marine, soil)
  • Taxonomic lineages
  • Pathogens vs non-pathogens

Additionally, all Pfam families were analyzed to predict:

  • Co-occurring gene families across the bacterial tree of life using PhyloCorrelate [link]
  • Feasibility for structure determination

The results of the above analyses and other collected information about the domain families (not including Phylocorrelate data) is available to browse in several different ways:

Sequence input

The sequence input section performs a domain search on your protein sequence query, and displays pathogen-association statistics for each domain match. The displayed domain matches also link to their individual pages in the Domain database (see below).

Domain database

Each page in this section contains visualizations of the data for a single domain family from the various analyses listed above, and how their rankings and statistics in the various categories compare to the other domain families.

Domain rankings

Here there is a selection of eight different lists based on filtering and ranking the collected domain information. Each list is displayed with only pertinent columns but the full dataset is available to download at the top of every list page, and on the main page of the website (see below).

Full dataset

This section only contains a download link to the full dataset that is used for the Domain database and the Domain rankings. It is a copy of Supplemental Data S3 from our manuscript.

For more information, please see our manuscript.


  • Lobb B, Tremblay BJ, Moreno-Hagelsieb G, Doxey AC. PathFams: statistical detection of pathogen-associated protein domains. Forthcoming.
  • Tremblay BJ, Lobb B, Doxey AC. PhyloCorrelate: inferring bacterial gene-gene functional associations through large-scale phylogenetic profiling. Bioinformatics. 2021;37(1):17-22.


  • Data analysis: Briallen Lobb
  • Developer: Benjamin Tremblay
  • PI: Andrew Doxey