install.packages("tna")1 Introduction
This is the companion tutorial to the main TNA tutorial, focusing on group comparisons, permutation testing, bootstrapping, and data-driven clustering. We assume you have already worked through the main tutorial and understand the basics of building and analyzing a TNA model.
A single TNA model describes the average transition dynamics across all individuals. But averages can hide important differences. Do high and low achievers follow different regulatory strategies? Are there latent subgroups with distinct behavioral patterns? Answering these questions requires moving beyond the aggregate model to group-level analysis.
We cover three complementary approaches:
- Group comparison — split data by a known variable and compare transition structures.
- Permutation testing — determine whether observed group differences are statistically significant or could have arisen by chance.
- Bootstrap validation — quantify the reliability of group-specific edges and centralities.
For data-driven clustering (discovering latent subgroups when no grouping variable exists), see the companion tutorial: TNA Clustering.
1.1 Installation
The tna package is the only package required. It provides all the functions needed for data preparation, model building, visualization, group comparison, permutation testing, bootstrapping, clustering, and sequence analysis.
Install from CRAN:
Or install the development version from GitHub:
# install.packages("remotes")
remotes::install_github("sonsoleslp/tna")1.2 Setup
We use the same built-in group_regulation_long dataset introduced in the main tutorial. This dataset contains coded collaborative regulation behaviors from student groups, with each row recording an action performed by an actor at a specific time. Crucially, the dataset includes an Achiever column that classifies each actor as “High” or “Low” — this is the grouping variable we will use for between-group comparisons.
The prepare_data() function converts the long-format event log into sequences and automatically preserves the Achiever column as metadata, making it available for group_tna() later.
# Load the built-in collaborative regulation dataset
data("group_regulation_long")
# Convert to sequences, preserving the Achiever metadata column
prepared_data <- prepare_data(
group_regulation_long,
action = "Action", # behavioral states (network nodes)
actor = "Actor", # participant IDs (one sequence per actor)
time = "Time" # timestamps (for ordering and session splitting)
)
# Build the aggregate TNA model (all sequences combined)
model <- tna(prepared_data)prepare_data() Arguments Explained
Each argument controls a different aspect of how the raw data is converted into sequences.
1.2.1 action — what happened
The only required argument. The name of the column containing the events or states to model (e.g., “Plan”, “Monitor”, “Discuss”). These become the nodes in your network. Called with action alone, every row chains into one long sequence — the last event of student A transitions directly into the first event of student B. Fine for a single continuous observation stream, but almost always wrong for multi-participant data.
1.2.2 actor — who did it
The column identifying who performed the action (student ID, user ID, group ID). Creates one sequence per actor instead of one sequence for the entire dataset. This is the single most important argument after action. Events within each actor are sorted by row order, so if your data is already sorted chronologically, this suffices.
1.2.3 time — when it happened
The column containing timestamps. Does two things: (1) sorts events chronologically within each actor, and (2) splits sequences at temporal gaps. If two consecutive events from the same actor are more than 15 minutes apart (the default), they become separate sequences. Change this with time_threshold (in seconds):
# 10-minute gap starts a new sequence
prepared <- prepare_data(df, action = "Action", actor = "Actor",
time = "Time", time_threshold = 10 * 60)1.2.4 order — what came first
A numeric column for event ordering when timestamps are unavailable (step number, turn counter). If both time and order are provided, data is sorted by time first with ties broken by order.
Any columns not specified as action, actor, time, or order are automatically preserved as metadata. The Achiever column (High/Low) is preserved and available for group_tna().
This tutorial uses long-format event data via prepare_data(), but tna() accepts several other input formats directly. Choose whichever matches your data.
1.2.5 Wide Data Frame
Each row is one sequence; each column is a time point. Cell values are categorical state labels. NA values are permitted for variable-length sequences.
data("group_regulation")
model <- tna(group_regulation)
# If extra columns exist, select sequence columns with `cols`:
model <- tna(group_regulation, cols = T1:T26)1.2.6 Pre-Computed Transition Matrix
A square numeric matrix where element [i, j] is the weight of the transition from state i to state j. Row and column names define the state labels.
mat <- matrix(
c(0.1, 0.6, 0.3,
0.4, 0.2, 0.4,
0.3, 0.3, 0.4),
nrow = 3, byrow = TRUE,
dimnames = list(c("A", "B", "C"), c("A", "B", "C"))
)
model <- tna(mat, inits = c(A = 0.5, B = 0.3, C = 0.2))1.2.7 TraMineR Sequence Object (stslist)
Sequence objects created by TraMineR::seqdef() can be passed directly to tna().
data("engagement")
model <- tna(engagement)1.2.8 One-Hot Encoded Data
Binary (0/1) data where each column is a feature. import_onehot() computes co-occurrence weights and returns a tna model directly.
model <- import_onehot(binary_data, feature1:feature6, window = "window_id")1.2.9 Summary
| Input Format | Function | Description |
|---|---|---|
| Long event log | prepare_data() then tna() |
Timestamped events with actors |
| Wide data frame | tna(df) |
Rows = sequences, columns = time points |
| Pre-computed matrix | tna(mat) |
Square weight matrix with named rows and columns |
| TraMineR sequence | tna(seqobj) |
Object from TraMineR::seqdef() |
| One-hot binary data | import_onehot() |
Co-occurrence model from binary feature data |
2 Group Analysis with Known Variables
When a meaningful grouping variable is available (e.g., achievement level, course section, experimental condition), we can build separate TNA models for each group and directly compare their transition structures. Because prepare_data() preserved the Achiever column as metadata, we can use group_tna() to split the data into groups automatically:
# Split data by the Achiever variable and build one TNA model per group
gtna <- group_tna(prepared_data, group = "Achiever")This creates two complete TNA models (one for “High”, one for “Low”), each with its own transition probability matrix, initial probabilities, and sequence data.
Each group is a standard tna object accessible with $:
# Transition probability matrix for the High achievers group
knitr::kable(round(gtna$High$weights, 3))| adapt | cohesion | consensus | coregulate | discuss | emotion | monitor | plan | synthesis | |
|---|---|---|---|---|---|---|---|---|---|
| adapt | 0.00 | 0.26 | 0.52 | 0.00 | 0.04 | 0.14 | 0.03 | 0.01 | 0.00 |
| cohesion | 0.00 | 0.04 | 0.54 | 0.08 | 0.04 | 0.12 | 0.02 | 0.15 | 0.01 |
| consensus | 0.00 | 0.02 | 0.08 | 0.17 | 0.23 | 0.08 | 0.04 | 0.36 | 0.01 |
| coregulate | 0.02 | 0.04 | 0.11 | 0.01 | 0.23 | 0.20 | 0.10 | 0.27 | 0.02 |
| discuss | 0.02 | 0.06 | 0.42 | 0.07 | 0.17 | 0.11 | 0.02 | 0.01 | 0.11 |
| emotion | 0.00 | 0.33 | 0.34 | 0.02 | 0.12 | 0.06 | 0.03 | 0.09 | 0.00 |
| monitor | 0.01 | 0.05 | 0.16 | 0.05 | 0.37 | 0.10 | 0.02 | 0.23 | 0.02 |
| plan | 0.00 | 0.03 | 0.29 | 0.02 | 0.06 | 0.18 | 0.08 | 0.33 | 0.00 |
| synthesis | 0.14 | 0.03 | 0.58 | 0.01 | 0.03 | 0.06 | 0.00 | 0.14 | 0.00 |
# Initial state probabilities for the Low achievers group
knitr::kable(t(round(gtna$Low$inits, 3)))| 0.01 | 0.04 | 0.22 | 0.03 | 0.17 | 0.13 | 0.16 | 0.22 | 0.02 |
# Summary statistics for both groups
summary(gtna)2.1 Visualizing Group Networks
Similar to standard TNA, almost all types of analysis work the same way and require no extra arguments. Only plot() is needed here with the group model — TNA recognizes that it is a group model and plots all of its networks automatically. Plotting the group models side by side reveals structural differences at a glance. Thicker edges indicate higher transition probabilities; the minimum and cut arguments control which edges are displayed.
# Side-by-side network plots; minimum hides edges below 0.05, cut fades below 0.1
plot(gtna, minimum = 0.05, cut = 0.1)