Eff3ctidor

What is Effectidor's output?

Effectidor produces both tabular and graphical outputs. In the tabular output there are:

On screen tables

The positive set used to train the classifier. These samples are sorted according to their likelihood to encode T3Es.
Top 10 predictions. These predictions are the highest scoring putative T3Es, identified by Effectidor. Of note, the scores of these predictions should be taken into account in comparison to the scores of the known T3Es given in the previous table.

Downloadable files

Predictions file (tutorial video). All the OGs in the genomes (excluding OGs that represent T3SS and flagella components) are sorted according to their score, reflecting their likelihood to encode T3Es. The label of each OG is given, as well as ORFs identity from each genome and protein annotation (if available in the input data).
Feature importance file (tutorial video). This table lists the features used for the ML, sorted by their contribution to the learning process, and their relative importance.
Report on type 3 secretion system and flagella proteins found in the input genomes (tutorial video). These proteins are identified on the basis of sequence similarity to our T3SS datasets (see data). They are excluded from the ML process and therefore do not get a score, but are given as an extra output. To each subsystem found in the data all the components and their loci in each genome are listed.
Raw features file (tutorial video). This file contains all the features and their values for each OG in the data. These values were the input to the ML pipeline.

The graphical output includes analysis of the features. The 10 most contributing features will be compared between the two classes - effectors and non-effectors, in violin plots.
Additional graphical output include presence/absence maps of the T3Es and T3SS and flagella components found in the genomes.

You can see an example for the output in the running example here.

How do I interpret the results?

How is Effectidor evaluated and where can I see its predicting accuracy?

What input does Effectidor require and what is it used for?

Effectidor has one obligatory input - an ORFs file.
This is a FASTA file including all the genome ORFs. See instructions for downloading this file here and tutorial video for a single genome here, and for pan-genome here.
Some of the ORFs in this file (effectors and non-effectors) will be used to train the machine-learning algorithms, and based on the trained classifier, the main output - prediction for each ORF - will be performed.
In addition, it is possible to supply a known Effectors file, as appearing in the ORFs file. Alternatively, a homology search against an internal effectors' dataset will be performed to constitute the "known effectors" ORFs for the learning process. You can see a tutorial video for creating this input here

In the advanced options you can supply data that will result in additional features to feed the machine-learning and improve the predictions. These data include:

Host proteome archive. Protein FASTA files with the proteome of a known host of the studied bacterium. Multiple files can be included. All these files should be compressed in a single zip archive.
This input will be used for homology searches. As effectors interact with host proteins for their function, we expect them to have eukaryotic domains, that will be recognized in this homology search.
a tutorial video for this input is available here.
Archive of proteomes of closely related bacteria without T3SSs. This archive may contain several proteome records, each in a separate FASTA file. These FASTA files should be compressed in a single zip archive. A homology search will be performed against each of these proteomes. As these bacteria are closely related to the studied bacterium, the vast majority of the proteins in the studied bacterium are expected to have an ortholog in these proteomes. Nevertheless, since they do not encode a T3SS, effectors are not expected to have orthologs in these proteomes. Thus, these features are usually very informative for the machine-learning.
a tutorial video for this input is available here.
GFF3 file(s). These files will be used to compute genome organization features.
a tutorial video for this input is available here.
Full genome FASTA files. The full genome will be used to search for regulatory elements in the promoter region of each ORF. Specifically, we allow searching for the following motifs: PIP-box, relevant for Xanthomonas, Ralstonia, and Acidovorax. hrp-box, relevant for Pseudomonas syringae and plant pathogens of the Enterobacteria family. mxiE-box, relevant for Shigella. exs-box, relevant for Pseudomonas aeruginosa. tts-box which is relevant for rhizobia.
a tutorial video for this input is available here.

What is Effectidor's output?

How do I interpret the results?

How is Effectidor evaluated and where can I see its predicting accuracy?

What input does Effectidor require and what is it used for?

What is the expected running time of Effectidor?

For how long will my results be saved in the servers?

Why do you need my email?

Can I use Effectidor to run the analysis on several genomes simultaneously?

Research Site | Pupko Group