Performs comprehensive sequence comparison analysis between groups. All patterns of the sequences (subsequences of specific length) are extracted from all sequences in each group. The pattern frequencies are compared between the groups using a permutation test. The reported effect size is the difference between the observed test statistic (sum of squared differences between the observed and expected counts) and the mean value over the permutation samples divided by their standard deviation times square root of the number of observations.
Usage
compare_sequences(x, ...)
# Default S3 method
compare_sequences(
x,
group,
sub,
min_freq = 5L,
test = TRUE,
iter = 1000L,
adjust = "bonferroni",
...
)
# S3 method for class 'group_tna'
compare_sequences(
x,
sub,
min_freq = 5L,
test = TRUE,
iter = 1000L,
adjust = "bonferroni",
...
)Arguments
- x
A
group_tnaobject or adata.framecontaining sequence data in wide format.- ...
Not used.
- group
A
vectorindicating the group assignment of each row of the data/sequence. Must have the same length as the number of rows/sequences ofx. Alternatively, a singlecharacterstring giving the column name of the data that defines the group whenxis a wide formatdata.frameor atna_dataobject.- sub
An
integervector of pattern lengths to analyze. The default is2:5.- min_freq
An
integergiving the minimum number of times that a specific pattern has to be observed in each group to be included in the analysis. The default is5.- test
A
logicalvalue indicating whether to test the differences of pattern counts between the groups using a permutation test. The default isTRUE.- iter
An
integergiving the number of iterations for the permutation test. The default is1000.- adjust
A
characterstring naming the multiple comparison correction method (default:"bonferroni"). Supports all stats::p.adjust methods:"holm","hochberg","hommel","bonferroni","BH","BY","fdr","none". The adjustment is carried out within sequences of the same length.
Value
A tna_sequence_comparison object, which is a data.frame with
columns giving the names of the patterns, pattern frequencies, pattern
proportions (within patterns of the same length), effect sizes,
and p-values of the tests.
