Effectidor produces both tabular and graphical outputs.
In the tabular output there are:
On screen tables
Downloadable files
The graphical output includes analysis of the features. The 10 most contributing features will be compared between the two classes - effectors and non-effectors, in violin plots.
Additional graphical output include presence/absence maps of the T3Es and T3SS and flagella components found in the genomes.
You can see an example for the output in the running example here.
The main results, the predictions file, contains the OGs sorted by their likelihood to encode T3Es.
To identify candidates to be novel T3Es, search for OGs with a high score at the top rows of the table.
If you used an input Effectors file, look for samples that are not already known to be effectors. These samples will have a question mark (?) in the column "is_effector".
If you did not use an input Effectors file, the samples marked with "yes" in the "is_effector" column are homologous to known T3Es, and the highly scored samples marked with a question mark (?) are potentially novel T3Es.
For your convenience, the highly scored samples are highlighted with dark green that becomes lighter as the score decreases.
For each OG the annotation (if exists in the input file) is provided, and its members in each genome are specified.
In general, an OG with a high score and an ambiguous annotation is a good candidate to be a novel T3E.
In the learning process Effectidor randomly leaves 20% of the labeled data (effectors and non-effectors) aside, as a test set. Each classification algorithm is trained on the remaining 80% training data, using cross validation, and then evaluated on the untouched 20% test set.
The measurement used to evaluate Effectidor's performance is the Area Under the Precision Recall Curve (AUPRC). The closer it is to 1, the more accurate the classifier is. You can see the resulting AUPRC achieved on the test set in the downloadable predictions file.
Effectidor has one obligatory input - an ORFs file.
This is a FASTA file including all the genome ORFs. See instructions for downloading this file here.
Some of the ORFs in this file (effectors and non-effectors) will be used to train the machine-learning algorithms, and based on the trained classifier, the main output - prediction for each ORF - will be performed.
In addition, it is recommended to supply a known Effectors file, as appearing in the ORFs file. Alternatively, a homology search against an internal effectors' dataset will be performed to constitute the "known effectors" ORFs for the learning process.
In the advanced options you can supply data that will result in additional features to feed the machine-learning and improve the predictions. These data include:
The running time depends on many factors, among them the input you supply and the load in our servers.
It can take a few minutes and up to several hours. In extreme cases, for example, when analyzing many genomes simultaneously, it can take several days. We will email you a link to the results upon submission, and upon completion.
The results will be saved in our servers for 3 months.
After 3 months they will be permanently deleted from our servers.
We will use it to send you a link to your results.
Yes! Effectidor V2 supports pan-genome analysis. The training and prediction are done on ortholog groups (OGs).