Draft

Introduction
Motivating example
- Problem Solving in Technology-Rich Environments
Methods
Estimation
Applications
- cd_tally
- book_order
References

Introduction

%%% stolen from the neap proposal

what’s process data

%Advances in technology have expanded opportunities for educational measurement through changes to item design, item delivery and data collection. Some examples include simulation-, scenario-, and game-based assessment and learning environments.

The NAEP computerized testing format provides an interactive environment for students. Students can choose among a set of available actions and take one or more steps to finish a task. All student actions are automatically recorded in system log (Kerr, Chung, \& Iseli, 2011), which can be used immediately for providing instant feedback to students for diagnostic and scoring purposes (DiCerbo \& Behrens, 2014).

Benefits of process data

The availability of process data open new research opportunities including to better understand test-takers’ behavior patterns, …, and ….

Challenges

While the availability of rich response process data during problem solving comes the great challenge of building appropriate psychometric models to analyze these data. The raw process data are usually formatted as lines of coded and time-stamped strings. The vast amount of data on students’ potential trial-and-error process makes it less than straightforward to detect patterns in problem solving.

Exisitng methods and limitations

Several data analysis techniques and models have been explored to uncover problem-solving patterns. For example, researchers used methods such as cluster analysis <&bergner_visualization_2014> and editing distance <&zhu_using_2016>. Other researchers explored the method of combining Markov movesl and item response theory (IRT) framework. Process mining techniques such as Petri net were also used to study behavioral patterns <&howard_reflecting_2010> In addition, researchers used digraphs to visualize and analyze sequential process data collected from assessment. <&zhu_using_2016> used network visualization and analysis for understanding process data.

<&chen_statistical_2019> <&chen_continuous-time_2020> <&tang_latent_2019> <&he_identifying_2015> <&ulitzsch_combining_2021> <&qiao_data_2018> <&wang_subtask_2020>

RNN <&tang_exploratory_2019>

n-gram <&van_der_ark_identifying_2015>

Motivation and our Aim

Students’ response outcomes are a result of a sequence of actions that they take. The quality as well as quantity of actions vary across individuals as well as across items. Understanding the action sequence and its relation to response outcomes will help us better understand the nature of response process and individual differences in the process. Models that relate process data to process outcomes are rare in the current literature.

We propose to develop a new, network modeling framework for analyzing time-stamped sequences of actions taken by NAEP test takers. The innovative aspect of our proposal is that we view test takers’ sequences of actions collected in the computer-assisted NAEP assessment system as directed paths between actions in a network of possible actions. With our framework, researchers and policymakers can quantify and better understand how learners with disabilities process mathematics test items.

We have successfully collaborated to develop novel network-based modeling approaches for analyzing conventional assessment data on two papers (Jin \& Jeon, 2018; Jeon, Jin, Schweinberger, \& Baugh, 2020), with more papers in the pipeline. We will extend this model-based framework for analyzing NAEP process data. Since the number of possible actions is large and many test takers will choose a small subset of the possible actions, the data is sparse. To deal with the sparsity of the data, we use machine learning techniques. These machine learning techniques penalize models that are more complex than warranted by the data.

Advantages of the proposed method

Advantage I. An important advantage of our network-based approach is the introduction of a virtual, two-dimensional Euclidean map of the interplay between actions for different test takers. This interactive map could offer substantially enhanced insights into how and why learners with and without disabilities are different in their response behavior on the current NAEP mathematics assessment.

Advantage II. A second advantage of our network-based approach is that we can easily link the network of actions with test takers’ mathematics performance outcomes, their background information, as well as any technical accommodations they utilized during the test, which allows educators to identify which accommodations might be more effective than others in helping learners with disabilities to display their full ability within the digitalized NAEP assessment environment.

paper org.

We first develop xxx. We further develop xxx. The remainder of this article is organized as follows. In Section, we introduce . In Section , we present. Applications are given in Section, followed by conclusions given in Section. x

Motivating example

Problem Solving in Technology-Rich Environments

% stolen from <&chen_statistical_2019> We introduce a specific item, CLIMATE CONTROL (CC), to demonstrate the data structure and to motivate our research questions. It is part of a CPS unit in PISA 2012 that was designed under the “MicroDYN” framework (Greiff et al., 2012; Wüstenberg et al., 2012), a framework for the development of small dynamic systems of causal relationships for assessing CPS.

Interactive tasks as implemented in the problem solving in a technology-rich environment (PSTRE) domain in the Programme for the International Assessment of Adult Competencies (PIAAC, <&oecd_technical_2019>) and the problem solving domain in the Programme for International Student Assessment (PISA, OECD, 2014) aim at mirroring real-life problem-solving behavior (Goldhammer, Naumann, & Keßel, 2013). While correct responses to such tasks can be assumed to stem from examinees having the skill set and the motivation required to solve the task, incorrect responses can occur for a variety of different reasons, ranging from lack of different subskills and/or metacompetencies required to solve the task through misinterpreting instructions to examinees not exerting their best Fo effort and interacting quickly and superficially with the task at hand.

As a motivating example, we introduce problem solving in technology-rich environments (PSTRE) We introduce an example of pro

OECD Survey of Adult Skills (PIAAC) Log Data Downloaded from https://piaac-logdata.tba-hosting.de/ Problem Solving Items:

The Programme for the International Assessment of Adult Competencies (PIAAC) is a programme of assessment and analysis of adult skills. The major survey conducted as part of PIAAC is the Survey of Adult Skills. The Survey measures adults’ proficiency in key information-processing skills - literacy, numeracy and problem solving - and gathers information and data on how adults use their skills at home, at work and in the wider community.

This international survey is conducted in over 40 countries/economies and measures the key cognitive and workplace skills needed for individuals to participate in society and for economies to prosper.

The OECD Survey of Adult Skills (PIAAC) assesses the proficiency of adults in information processing skills. During the PIAAC assessement, user interactions were logged automatically. This means that most of the users’ actions within the assessment tool were recorded and stored with time stamps in separate files called log files.

This refers to the ability to use technology to solve problems and accomplish complex tasks. It is not a measurement of “computer literacy”, but rather of the cognitive skills required in the information age – an age in which the accessibility of boundless information has made it essential for people to be able to decide what information they need, to evaluate it critically, and to use it to solve problems. In this survey, higher-order skills are identified along with basic proficiency.

Questions we like to answer (from Dr. Jeon’s proposal)

which sequences or actions are effective? given the person’s ability and item difficulty
is the same sequence (strategy) effective for all items or not?
is the same sequence effective for all people?
if effective sequences are not the same across all items, can we extract some common features of effective sequences ?
which sequences or actions are more or less effective for students with disability?
any other person covariates? ability? that is, does the effectiveness of sequences depend on person abilities? (interaction between sequence and ability)
does the effect of the sequence change depending on how long it took? for instance, when it was taken in a shorter time, a sequence might have a positive effect, while it might have a negative effect when it was taken in a longer time.
instead of using the log time (continuous), it may be better or useful to use a categorical variable?
the effect of sequences on the success probability may be a function of item difficulty or other item features, for instance, item position, item types (e.g., multiple-choice vs. open-ended), item contents (algebra, geometry) ?

Illustrate a ticket example:

Figure 1: An example of PS-TRE items. In this simulated web environment, respondents can access information required for ticket reservation.

This item involves a scenario in which the respondent is asked to reserve all fooball game tickets that an entire group can attend. A group of friend provides thier availabilities via an online calendar. Respondents access and evaluate information from ticket-reservation web pages and online calendars in simulated web environment. Respondents are able to:

Click on tabs for ticket reservation web pages and online calendar;
Click on checkboxes to choose game dates;
Manipulate drop-down menus for events, locations, and number of tikcets; and
Click on menu items or navigation icons.

ID	Action	Time (sec)
4016	START	0.0
4016	COMBOBOX-default_menu1.index=7	47.3
4016	COMBOBOX-default_menu2.index=2	51.8
4016	BUTTON_search-default_txt23	65.0
4016	CHECKBOX-check2	93.2
4016	BUTTON_available-pg1_txt47	96.0
4016	BUTTON_available-pg7_txt47	108.2
4016	COMBOBOX-pg2_menu1.index=19	136.7
4016	COMBOBOX-pg2_menu6.index=19	144.5
4016	BUTTON_submit-pg2_txt33	146.1
4016	BUTTON_submit_ok-u21p2pu5_txt2	148.9
4016	NEXT_INQUIRY-REQUEST	155.8
4016	END	157.3

There are 172 unique observed actions. On average, repondents spend 182 (IQR: 107) seconds on this time, and take 23 (IQR: 10) actions.

address challenges in details

The process data consists of pairs of actions and time stamps of each respondents. Major challenges to establish a statistcal model taking the process data as an input are 1) unequal length of respondents’ actions sequences; 2) large number of distinct actions transitions; and 3) …

Thanks to the recent development of natual language processing

Methods

%% stolen from neap proposal We propose to develop a new modeling framework for analyzing time-stamped sequences of actions. The innovative aspect of our proposed model is that we view users’ sequences of actions as Markov processes of possible actions mapped to Euclidean space.

With our framework, researchers and policymakers can quantify and better understand learners’ problem solving processes.

notations

Let $S$ denotes a set of all possible actions. For each action $m \in S$, $A_{m}$ denotes a set of competing actions $\{l_1, \ldots, l_{n_m}\}$ that can be taken directly after $m$. Let $t_{k,n}$ denote entry time that the $k$-th respondent starts his/her $n$-th action. So, his/her sojourn time in the $n$-th action is denoted by $\dd t_{k,n} = t_{k,n+1} - t_{k,n}$ for $n < M_{k} - 1$. Respondents are assumed to begin problem solving processes at time $t=0$. Let $Y_k(t)$ denote an action being taken by the $k$-th respondent at time $t$. Then, a sequence of the $k$-th respondent’s actions is $S_{k} = \{y_{k}(t_{k,1}), y_{k}(t_{k,1}),y_{k}(t_{k,2}), \ldots, y_{k}(t_{k,M_{k}})\}$ whose length is $M_{k}$. We define $ \delta_{k,n,m} = 1 $ if respondent $k$‘s $n$-th action is $m$; $0$ otherwise. Thus, $ \delta_{k,n,m} \delta_{k,n+1,l} = 1 $ means respondent $k$‘s $n$-th transition ($n < M_{k}$) is from action $m$ to action $l$.

Action embedding

Instead of using the action symbol as an input in the model, …

A goal for action embedding is to substitute a symbolic representation with a vectoric representation of actions. Similar procedures in natual language processing context is well established, and we will adapot a skip-gram model for the action embedding purpose.

A skip-gram model which predict actions within a certain range before and after the current action in the same sentence. This model learns parameters that lead to a high-valued cosine similarity for embeddings of frequently co-occuring actions, where the cosine similarity between two vectors $u_{i}, v_i \in \mathbb{R}^{d}$ is calculated as \[ \frac{\sum_{i=1}^{d} u_{i} v_{i}}{\sqrt{\sum_{i=1}^{d} u_{i}^{2}} \cdot \sqrt{\sum_{i=1}^{d} v_{i}^{2}}}. \] Actions that tend to “behave similarly” end up close to one another in the embedding space. The notion of “behavior” could refer to syntactic categorization or semantic association.

The training objective of the skip-gram model is to maximize the probability of predicting neighboring actions given the target action. The objective can be written as the average log probability \[ \frac{1}{T} \sum_{t=1}^{T} \sum_{-c \leq j \leq c, j \neq 0} \log p\left(m_{t+j} \mid m_{t}\right) \] where c is the window size of neighboring actions. The skip-gram formulation defines this probability using the softmax function.

\begin{equation} \label{eq:skip-gram} p\left(m_{j} \mid m_{0}, u, v\right)=\frac{\exp(u\left(m_{0}\right)’ v\left(m_{j}\right))}{\sum_{m \in S} \exp(u\left(m_{0}\right)’ v(m))} \end{equation}

where $u: S \rightarrow \mathbb{R}^{d} $ and $v: S \rightarrow \mathbb{R}^{d}$ are functions which map actions to a action embedding.

The negative sampling <&mikolov_context_2012> is a computational technique proposed by to resolve the intractable denominator in eq:skip-gram. Skip-gram modeling of the above form coupled with negative sampling is often referred to as a word2vec model <&mikolov_distributed_2013>.

The closer the cosine value to 1, the greater the similarity between actions. The closer the cosine value to -1, the greater the dis-similarity between actions.

Online visualization tool Embedding projector RegExp (regular expression) for metadata https://projector.tensorflow.org/

How to convert an action sequence to a sequence??

data structure

https://www.rdocumentation.org/packages/msm/versions/1.6.8/topics/msm2Surv Given a configured transition matrix, we use \texttt{msm} <&jackson_multi-state_2011> to transform data to a desired “long” format:

person	entry	exit	from	to	observed	cov1	cov2	time cov
1	0	10	1	2	1	D12	θ D12
1	0	10	1	3	0	D12	θ D12

Multistate model (general)

The intensity function $q_{ml}(t)$ represents the instantaneous rate of jumping from action $m$ to $l$ at time $t$:

\begin{align*} q_{ml}\left(t ; Y_{t}\right)= & \lim _{\delta t \rightarrow 0} \frac{P\left(Y(t+\delta t)=l \mid Y(t)=m, \mathcal{F}_{t}\right)}{\delta t}, \end{align*}

where $m \neq l$, $m, l \in S$, and $\mathcal{Y}_t$ denotes the process up to time $t$.

Action transition is assumed to follow Semi-Markovian, which means the intensity depends on the sojourn time ($t - t_{m}$ ; time spent on the current action). This is often called “clock reset” approach as opposed to “clock forward” approach. Let $dt_{m}$ denote the sojourn time.

Cox model

\begin{align*} q_{ml}\left(t ; \mathcal{F}_{t}\right) = & q_{ml} (t - t_{m}; \bm{\lambda}, \bm{\beta}, \mathbf{z}(t))\\
= & \lambda_{ml}(dt_{m}) e^{(\bm{\beta}’ \mathbf{z}(t) + \theta_{k}) D_{ml}}, \end{align*}

for person $k = 1,\ldots,N$, where $\mathbf{z}(t)$ is time-varying covariates, $\lambda_{kml}(t)$ is a baseline intensity function, $D_{ml} \in [-1,1]$ denotes the cosine similarity between actions $m$ and $l$. The cosine similarity is obtained using word2vec on action sequences of an item. The closer the cosine value to 1, the greater the similarity between actions. The closer the cosine value to -1, the greater the dis-similarity between actions. This mean there are $n_{m}$ corresponding intensity functions for state $m$, and overall $\sum_{m in S} n_m$ intensity functions.

We use the constant baseline hazard based on out-of-state transition speed and person’s transition speed: \[ \lambda_{ml}(dt) = \kappa_{m} \tau_{k} \text{ for } l \in A_{m}. \]

A running model has no coviarate terms: \[ q_{ml}\left(t ; \mathcal{F}_{t}\right) = q_{ml}(dt) = \kappa_{m} \tau_{k} e^{\theta_{k} D_{ml} }. \]

larger $\kappa_{m}$ shorter time staying on action $m$ (faster out-of-state transition)
larger $\tau_{k}$, faster transition speed
larger $\theta_{k}$, larger trasition rate towards a similar action. A person with large $\theta_{k}$ tends to choose more coherent actions
%% stolen from the neap proposal [multi-state survival model]

%We take one-partite network view. We take a network view on action sequences, where nodes are a set of predefined action and links represent action transitions. Given that item $k$ is chosen, the action network of student $i$ is represented by $L \times L$ adjacency matrix. Suppose student $i$ at item $k$ has chosen action $A_{i,k,l}$. The transition probability of moving from $A_{i,k,l}$ to some other action $A_{i,k,m}$ among $L$ actions is modeled with a multinomial logistic model

\begin{equation}\label{eq:action} \mathbb{P} ( A_{i,k,m} | A_{i,k,l} ) = \frac{ \exp( \alpha^{(A)}_m + \alpha^{(A)}_l + \alpha^{(A)}_{m,l} + \beta_{m,l}^{(A)} z_{i,k,l,m} ) }{ \exp( \alpha^{(A)}_m + \alpha^{(A)}_l + \alpha^{(A)}_{m,l}+ \beta_{m,l}^{(A)} z_{i,k,l,1} ) + \cdots + \exp( \alpha^{(A)}_m + \alpha^{(A)}_l + \alpha^{(A)}_{m,l}+ \beta_{m,l}^{(A)} z_{i,k,l,L} )}, \end{equation}

\noindent where $ α(A)_m$ and $\alpha^{(A)}_l$ are the main effects of the current and previous actions $m$ and $l$, and $\alpha^{(A)}_{m,l}$ is the interaction effect of the two actions. $\beta_{m,l}^{(A)}$ represents the effect of moving from action $A_{i,k,l}$ to $A_{i,k,m}$, while $ zi,k,l,m$ indicate observed or unobserved covariates that capture the movement. For example, $z_{i,k,l,m}$ can represent a distance between the two actions as in a latent space modeling approach (reference). Figure xxx illustrates the direct paths for the sequences of actions taken by two students, one represented with dashed paths and the other with solid paths.

\textcolor{red}{MJ: can we handle directions? choosing the same actions? }

\textcolor{cyan}{JY: incorporating action times in the transition probability…} We assume symmetric transition probabilities between actions. We define a function describing transition intensity (hazard) between actions $m$ and $l$ ($m \neq l$):

\begin{align*} h (t ; A_{i,k,l} \rightarrow A_{i,k,m} ) = & \lim_{\delta t \to 0} \frac{P(A_{i,k}(t + \delta t) = m | A_{i,k}(t) = l)}{\delta t} \\
= & \lambda_{k,l\rightarrow m}(t) \exp( \alpha^{(A)}_m + \alpha^{(A)}_l + \alpha^{(A)}_{m,l} + \beta_{m,l}^{(A)} z_{i,k,l,m} ), \end{align*}

where $\lambda_{k,l\rightarrow m}(t)$ is a baseline intensity function and $A_{i,k}(t)$ is an action taken by person $i$ at $t$ for item $k$. The non-transition intensity of action $m$ is \[ h (t ; A_{i,k,m} \rightarrow A_{i,k,m} ) = \lambda_{k,m\rightarrow m}(t) \exp( \alpha^{(A)}_m). \]

Then, the corresponding transition probability can be defined as

\begin{align*} \mathbb{P} (t ; A_{i,k,l} \rightarrow A_{i,k,m} ) = & \frac{h(t; A_{i,k,l} \rightarrow A_{i,k,m})}{\sum_{l=1}^{L} h(t; A_{i,k,l} \rightarrow A_{i,k,m})} \end{align*}

It is possible to include the outcome in this multi-state survival modeling framework. In such case, however, identifying meaningful ``subsequence of actions’’ would not be straightforward as appeared in \eqref{eq:no-response1}. Perhaps, we can use this model for parsing action sequence, and use the subsequence for \eqref{eq:no-response1}?

Multistate model (intercept only)

The intensity function $q_{ml}(t)$ represents the instantaneous rate of transition from action $m$ to $l$ at time $t$:

\begin{align*} q_{ml}\left(t ; \mathcal{F}_{t}\right)= & \lim _{\delta t \rightarrow 0} \frac{P\left(Y(t+\delta t)=l \mid Y(t)=m, \mathcal{F}_{t}\right)}{\delta t}, \end{align*}

where $m \neq l$, $m, l \in S$, and $\mathcal{F}_t$ denotes the process up to time $t$. Action transition is assumed to follow Semi-Markovian, which means the intensity depends olny on $\mathcal{F}_{t}$ through time since the current action is started. The intensity is assumed to follow a Cox model. We assume the exponential baseline hazard function of a product of out-of-state and respondent’s transition speed. The intensity can be written for all possible transitions as

\begin{equation} \label{eq:intensity} \begin{split} q_{ml}\left(t ; \mathcal{F}_{t}\right) = & q_{ml} (t - t_{Y_{(t)}})\\
= & \kappa_{m} \tau_{k} \exp(\theta_{k} D_{ml}), \end{split} \end{equation}

for all $m \in S$, $l \in A_{m}$, respondents $k = 1,\ldots,N$, and $D_{ml} \in [-1,1]$ denotes the cosine similarity between actions $m$ and $l$. There are $N_m = #\{A_{m}\}$ intensity functions for each action $m$, which leads to $\sum_{m in S} N_m$ intensity functions.

out-of-action speed $\kappa_{m}$ measures speed of action $m$. The parameter is related to the time to finish action $m$ and move forward to another action.
- The larger $\kappa_{m}$, The shorter time to finish the action $m \in S$.(faster out-of-state transition)
action transition speed $\tau_{k}$ measures respondents’ speed of problem solving after considering average durations of action s/he took.
- With large $\tau_{k}$, the respondent is likely to quickly finish each action.
$\theta_{k}$ measures respondents’ tendency to choose actions “behaving similarly”.
- With large $\theta_{k}$, the respondent is likely to choose the next action coherently.

Estimation

software

word2vec
multistate model + visualization

The action embedding algorithm is written using TensorFlow library in Python <10.5555/1593511>. The MCMC algorithm was written in R <r_core_team_r_2020> and C++14 <&ISO:2014:IIIb> with Stan math library <carpenter_stan_2015>. The code and documentations, along with example data sets, are found in \url{https://jonghyun-yun.github.io/procmod/}.

tidyLPA <&rosenberg_tidylpa_2018> mclust <&scrucca_mclust_2016>

likelihood

$\bm{\tau} = (\tau_{1},\ldots,\tau_{N})'$ $\bm{\theta} = (\theta_{1},\ldots,\theta_{N})'$ $\bm{\kappa} = (\kappa_{1},\ldots,\kappa_{M})'$

\begin{align*} q_{ml} (t ; \bm{\kappa, \theta, \tau}, \bm{\beta}, \mathbf{z}(t)) = & \lambda_{ml}(t) e^{(\bm{\beta}’ \mathbf{z}(t) + \theta_{k}) D_{ml}}\\
q_{ml}\left(t ; \mathcal{F}_{t}\right)= & \lim _{\delta t \rightarrow 0} \frac{P\left(Y(t+\delta t)=l \mid Y(t)=m, \mathcal{F}_{t}\right)}{\delta t}, m \neq l, m, l \in S \end{align*}

The survival function is \[ S_{ml}(dt) = e^{-\int_{0}^{dt_{m}} q_{ml}(x) \dd x}. \] Let $\nu_{mlk}(t) = 1$ if person $k$ jump from actions $m$ to $l$ at time $t$; 0 otherwise. \[ f_{ml}(t) = q_{ml}(t) S_{ml}(t) \] \[ likelihood =\prod_{k} f_{ml}(dt) \prod_{g \in A_{m}} S_{mg}(t), \] \[ f_{ml} = q_{ml}(t) S_{ml}(t), S_{ml}(t) = e^{-\int_{0}^{t^{stop} - t^{start}} q_{ml}(t)\dd t} \]

\[ S_{ml}(dt) = e^{-dt \kappa_{m} \omega_{l} \tau_{k} e^{(\theta_{k} + \beta) D_{ml} }} \]

$n = 1,\ldots,M_{k}$: n-th action of k-th person, $M_k$: sequence length

$ \delta_{k,n,m} = 1 $ if person k’s n-th action is m.

$ \delta_{k,n,m} \delta_{k,n+1,l} = 1 $ for $n < M_{k}$ if person k’s n-th transition is m to l.

time at starting state (one after START) is set to the first action (n=1), and the corresponding time is set to 0. We present a fully Bayesian approach for estimating the proposed model.

prior

For each $m$, $k$, we specify independent priors as follows:

\begin{align*} \pi\left(\kappa_{m}\right) & \sim \operatorname{Gamma}(a_{\kappa}, b_{\kappa}); \\
\pi\left(\tau_{k}\right) & \sim \operatorname{Gamma}(a_{\tau}, b_{\tau}); \\
\pi\left(\theta_{k} | \sigma^{2}\right) & \sim \operatorname{N}(\mu_{\theta}, \sigma^{2}); \\
\pi\left(\sigma^{2}\right) & \sim \operatorname{Inv-Gamma}(a_{\sigma}, b_{\sigma}),\\
\end{align*}

where $\mbox{Inv-Gamma}(\alpha,\beta)$ denotes the inverse gamma distribution with shape $\alpha >0$ and scale $\beta >0$.

The hyperparameters are chosen as \[a_{\kappa} = a_{\tau} = 0.1, b_{\kappa} = b_{\tau} = 0.1, a_{\sigma}=1.0, b_{\sigma}=1.0, \mu_{\theta}=0, \text { and } ….\] Based on our experience, the inference of $\mathbf{\Theta}$ is highly sensitive to its variance $\sigma^2$. Also, the configuration of latent embeddings highly depends on the scale parameter $\gamma$ of the latent space. Rather than choosing sub-optimal tuning parameters, we use a layer of hyper-priors to learn optimal values of these parameters from data. We choose hyperparameters such that priors are minimally informative to facilitate the flexible Bayesian learning.

update $\kappa_{m}$:

For each $m$, we draw $\kappa_m^{(t)}$ from $\mbox{Gamma}\left( a_{\tau} + \sum_{n=1}^{M_{k}} \sum_{k=1}^N \mbox{I}(\delta_{k,n,m} = 1) ,b_{\tau} + \sum_{n=1}^{M_{k}-1}\sum_{k=1}^{N} \sum_{ l \in A_m } dt_{k,n} \tau_{k}e^{(\theta_{k} + \beta) D_{ml}}\right)$

update $\tau_{k}$

For each $k$, we draw $\tau_k^{(t)}$ from $\mbox{Gamma}\left( a_{\tau} + M_k, b_{\tau} + \sum_{n=1}^{M_{k}} \sum_{m \in S, l \in A_m } dt_{k,n} \kappa_{m}e^{(\theta_{k} + \beta) D_{ml}}\right)$

update $\theta_{k}$

For each $k$, we draw $\theta_k^{* }$$ from a symmetric MH jumping distribution, and accept $\theta_{m}^{(l)} = \theta_{m}^{* }$ with probability $\min(1, r_{{\theta_{m}}^{* }})$ where

\begin{align*} \log r_{{\theta_{k}}^{* }} =& \sum_{m \in S} \sum_{n=1}^{M_{k}} \left[ \delta_{k,n,m} (\theta_{k}^{* } - \theta_{k}^{(l-1)})D_{ml} -\sum_{l \in A_m} dt_{k,n} \kappa_{m} \tau_{k} e^{ \beta D_{ml} }(e^{\theta_{k}^{* }D_{ml}} - e^{\theta_{k}^{(l-1)} D_{ml} }) + \log \frac{\pi(\theta_{k}^{* })}{\pi(\theta_{k}^{(l-1)})} \right].\\
\end{align*}

update $\sigma^{2}$

We draw $(\sigma^{2})^{(t)}$ from \[ p( \sigma^2|e.e.) \propto \mbox{Inv-Gamma}(\sigma^{2}|a,b) \prod N(\theta_{k} | \mu, \sigma^2) \] $\sigma^{2} \sim \mbox{Inv-Gamma}(a + 0.5 * N, b + 0.5 + \sum \theta_{k}^2)$ c.f) with flat prior: $\sigma^{2} \sim inv-gamma(0.5 * N, 0.5 + \sum \theta_{k}^2)$

$\mbox{Inv-Gamma}(\alpha,\beta)$ denotes the inverse gamma distribution with a density \[ \mbox{Inv-Gamma}(y|\alpha,\beta) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} y^{-(\alpha+1)} \exp \left(-\beta \frac{1}{y}\right). \]

Applications

We implement the prposed method to our motivating example.

ftime: time until the first action taken
time: total time of a person’s process

We introduce two rudimentary statistcs.

naction or #action: the action sequence length $M_{k}$. The total number of actions taken by respondent $k$.
fastness: #action divided by the total time elaped since $t_{k,1}$.

Multivariate Gaussian mixture for clustering

Varying variances and varying covariances (Model 6)
AIC and BIC ..

I need one more on cluster analysis:

I want to compare sub-sequences between clusters: longest common subsequence <&ulitzsch_combining_2021>
- common subsequences in LCS? NLP would be useful here: common phrase?
  - ans: n-grams like tri-grams
  - ans: most interesting collocations
map respondent to word embedding, mark speed / τ / θ
action frequencies cluster: top 5 actions?
how sentences are compared in NLP? Text Summarization

collcations: freq and time -> show what are these actions, explain semantic ass and time efficiency
map respondents marked by τ, θ: find neighboring actions?? not necessarily what they did
text summary: how it works? too ambiguous.. it seems to work by removing some sentences

cd_tally

Low accuracy rate, correct-answer behaviors are a few, correc-answer behavior pairs have small cosine similarities.

collocations (4 < 2 < 1 < 3): with τ and θ

Class 1

bigram	likelihood_ratio
(‘SUBMIT_complete-u03_default_txt14’, ‘NEXT_INQUIRY-REQUEST’)	1459.57552
(‘TOOLBAR_spreadApp-spreadApp’, ‘TOOLBAR_webApp-webApp’)	1189.05250
(‘TOOLBAR_webApp-webApp’, ‘COMBOBOX-menulist.index=9’)	455.87983
(‘COMBOBOX-menulist.index=9’, ‘SUBMIT_complete-u03_default_txt14’)	404.68816
(‘TOOLBAR_webApp-webApp’, ‘TOOLBAR_spreadApp-spreadApp’)	178.25025
(‘NEXT_INQUIRY-REQUEST’, ‘SUBMIT_complete-u03_default_txt14’)	84.10660
(‘SUBMIT_complete-u03_default_txt14’, ‘SUBMIT_complete-u03_default_txt14’)	77.40740
(‘SUBMIT_complete-u03_default_txt14’, ‘TOOLBAR_spreadApp-spreadApp’)	74.16434
(‘TOOLBAR_webApp-webApp’, ‘NEXT_INQUIRY-REQUEST’)	67.95933
(‘TOOLBAR_webApp-webApp’, ‘COMBOBOX-menulist.index=8’)	63.94217

Class 2

bigram	likelihood_ratio
(‘SUBMIT_complete-u03_default_txt14’, ‘NEXT_INQUIRY-REQUEST’)	2355.3848
(‘TOOLBAR_spreadApp-spreadApp’, ‘TOOLBAR_webApp-webApp’)	1392.4234
(‘COMBOBOX-menulist.index=9’, ‘SUBMIT_complete-u03_default_txt14’)	865.4099
(‘TOOLBAR_webApp-webApp’, ‘COMBOBOX-menulist.index=9’)	832.1501
(‘TOOLBAR_webApp-webApp’, ‘TOOLBAR_spreadApp-spreadApp’)	606.5532
(‘NEXT_INQUIRY-REQUEST’, ‘END_CANCEL-endtask_txt4’)	502.3374
(‘TOOLBAR_ss-sort-ss-sort’, ‘COMBOBOX-sortablecol1.index=3’)	485.4294
(‘MENUITEM_sort-key=sort’, ‘COMBOBOX-sortablecol1.index=3’)	453.2708
(‘MENU-ss-data-menu’, ‘MENUITEM_sort-key=sort’)	435.1334
(‘COMBOBOX-sortablecol1.index=3’, ‘RADIO_BTN-priority1asc’)	322.2008

Class 3

bigram	likelihood_ratio
(‘SUBMIT_complete-u03_default_txt14’, ‘NEXT_INQUIRY-REQUEST’)	340.964425
(‘COMBOBOX-menulist.index=4’, ‘SUBMIT_complete-u03_default_txt14’)	217.307581
(‘COMBOBOX-menulist.index=11’, ‘SUBMIT_complete-u03_default_txt14’)	26.081798
(‘KEYPRESS’, ‘BREAKOFF-REQUEST’)	14.724331
(‘COMBOBOX-menulist.index=26’, ‘SUBMIT_complete-u03_default_txt14’)	7.358132
(‘COMBOBOX-menulist.index=3’, ‘SUBMIT_complete-u03_default_txt14’)	7.358132
(‘COMBOBOX-menulist.index=7’, ‘SUBMIT_complete-u03_default_txt14’)	7.358132
(‘COMBOBOX-menulist.index=9’, ‘SUBMIT_complete-u03_default_txt14’)	7.358132
(‘GET_HELP-REQUEST’, ‘COMBOBOX-menulist.index=4’)	4.880417
(‘COMBOBOX-menulist.index=10’, ‘SUBMIT_complete-u03_default_txt14’)	3.669923

Class 4

bigram	likelihood_ratio
(‘SUBMIT_complete-u03_default_txt14’, ‘NEXT_INQUIRY-REQUEST’)	339.8976
(‘TOOLBAR_spreadApp-spreadApp’, ‘TOOLBAR_webApp-webApp’)	225.6964
(‘COMBOBOX-menulist.index=9’, ‘SUBMIT_complete-u03_default_txt14’)	206.3333
(‘TEXTBOX_ONFOCUS-searchtfield.value=’, ‘KEYPRESS’)	195.4571
(‘TOOLBAR_webApp-webApp’, ‘COMBOBOX-menulist.index=9’)	192.6315
(‘TOOLBAR_webApp-webApp’, ‘TOOLBAR_spreadApp-spreadApp’)	179.7811
(‘CD_FIND_ok-spread-search-btn’, ‘TOOLBAR_spread-search-next-spread-search-next-btn’)	177.6448
(‘NEXT_INQUIRY-REQUEST’, ‘END_CANCEL-endtask_txt4’)	129.0081
(‘TOOLBAR_ss-sort-ss-sort’, ‘COMBOBOX-sortablecol1.index=3’)	115.3623
(‘MENU-ss-data-menu’, ‘MENUITEM_sort-key=sort’)	104.0280

collocations (3 < 2 < 1): with NO τ and θ

Class 1

bigram	likelihood_ratio
(‘SUBMIT_complete-u03_default_txt14’, ‘NEXT_INQUIRY-REQUEST’)	1486.28982
(‘TOOLBAR_spreadApp-spreadApp’, ‘TOOLBAR_webApp-webApp’)	909.08500
(‘TOOLBAR_webApp-webApp’, ‘COMBOBOX-menulist.index=9’)	415.20157
(‘COMBOBOX-menulist.index=9’, ‘SUBMIT_complete-u03_default_txt14’)	308.73772
(‘COMBOBOX-menulist.index=4’, ‘SUBMIT_complete-u03_default_txt14’)	176.97870
(‘NEXT_INQUIRY-REQUEST’, ‘SUBMIT_complete-u03_default_txt14’)	117.60547
(‘TOOLBAR_webApp-webApp’, ‘TOOLBAR_spreadApp-spreadApp’)	103.84381
(‘SUBMIT_complete-u03_default_txt14’, ‘SUBMIT_complete-u03_default_txt14’)	96.02468
(‘NEXT_INQUIRY-REQUEST’, ‘TOOLBAR_spreadApp-spreadApp’)	70.42173
(‘TOOLBAR_webApp-webApp’, ‘COMBOBOX-menulist.index=8’)	64.24856

Class 2

bigram	likelihood_ratio
(‘SUBMIT_complete-u03_default_txt14’, ‘NEXT_INQUIRY-REQUEST’)	2593.5186
(‘TOOLBAR_spreadApp-spreadApp’, ‘TOOLBAR_webApp-webApp’)	1667.7831
(‘COMBOBOX-menulist.index=9’, ‘SUBMIT_complete-u03_default_txt14’)	929.1073
(‘TOOLBAR_webApp-webApp’, ‘COMBOBOX-menulist.index=9’)	901.7786
(‘TOOLBAR_webApp-webApp’, ‘TOOLBAR_spreadApp-spreadApp’)	687.2626
(‘TOOLBAR_ss-sort-ss-sort’, ‘COMBOBOX-sortablecol1.index=3’)	495.8573
(‘NEXT_INQUIRY-REQUEST’, ‘END_CANCEL-endtask_txt4’)	472.7580
(‘MENUITEM_sort-key=sort’, ‘COMBOBOX-sortablecol1.index=3’)	465.8816
(‘MENU-ss-data-menu’, ‘MENUITEM_sort-key=sort’)	440.6443
(‘COMBOBOX-sortablecol1.index=3’, ‘CD_SORT_ok-sortValidation’)	335.6071

Class 3

bigram	likelihood_ratio
(‘SUBMIT_complete-u03_default_txt14’, ‘NEXT_INQUIRY-REQUEST’)	412.4794
(‘TOOLBAR_spreadApp-spreadApp’, ‘TOOLBAR_webApp-webApp’)	253.2414
(‘COMBOBOX-menulist.index=9’, ‘SUBMIT_complete-u03_default_txt14’)	242.5732
(‘TEXTBOX_ONFOCUS-searchtfield.value=’, ‘KEYPRESS’)	204.3482
(‘TOOLBAR_webApp-webApp’, ‘TOOLBAR_spreadApp-spreadApp’)	201.1515
(‘TOOLBAR_webApp-webApp’, ‘COMBOBOX-menulist.index=9’)	198.8969
(‘CD_FIND_ok-spread-search-btn’, ‘TOOLBAR_spread-search-next-spread-search-next-btn’)	185.6252
(‘MENU-ss-data-menu’, ‘MENUITEM_sort-key=sort’)	130.1036
(‘TOOLBAR_ss-sort-ss-sort’, ‘COMBOBOX-sortablecol1.index=3’)	119.0180
(‘NEXT_INQUIRY-REQUEST’, ‘END_CANCEL-endtask_txt4’)	117.9283

visualise cluster

cluster boxplots

cluster violins

response

w/ tau and theta

tau	theta	naction	spd	res	n
-0.46 (0.82)	0.02 (1.00)	-0.45 (0.21)	-0.63 (0.34)	5.02 (2.83)	345.00 (0.00)
0.14 (0.94)	0.12 (1.08)	0.24 (0.48)	0.21 (0.65)	3.54 (2.97)	425.00 (0.00)
0.06 (0.40)	-0.37 (0.58)	-0.73 (0.08)	0.01 (1.02)	6.91 (0.72)	137.00 (0.00)
1.54 (1.39)	-0.13 (0.98)	2.60 (2.22)	2.12 (1.78)	2.53 (2.64)	59.00 (0.00)

w/o tau and theta

naction	spd	CPROB1	CPROB2	res	n
-0.54 (0.19)	-0.61 (0.36)	0.86 (0.13)	0.14 (0.13)	5.75 (2.44)	423.00 (0.00)
0.15 (0.50)	0.23 (0.66)	0.07 (0.13)	0.90 (0.13)	3.61 (2.98)	476.00 (0.00)
2.30 (2.27)	2.25 (1.66)	0.00 (0.00)	0.07 (0.12)	2.70 (2.72)	67.00 (0.00)

book_order

collocations (1 < 2 < 4 < 3): with τ and θ

Class 1

bigram	likelihood_ratio
(‘BUYNOW_pg4-pg4_txt11’, ‘BUYBOOK_pg4_ok-pg4_pu6_okbtn’)	1517.3255
(‘CHECK_pg7-pg7_txt4’, ‘BUTTON_close-popup4_txt3’)	1478.8744
(‘TEXTLINK-default_txt9.href=unit7page3.target=self’, ‘TEXTLINK-pg3_txt5.href=unit7page8.target=self’)	1450.1274
(‘TEXTLINK-pg1_txt7.href=u07_pg1_popup1.target=popup’, ‘BUTTON_close-popup1_txt2’)	1407.2608
(‘BUYBOOK_pg4_ok-pg4_pu6_okbtn’, ‘NEXT_INQUIRY-REQUEST’)	1398.3322
(‘TEXTLINK-default_txt18.href=unit7page7.target=self’, ‘CHECK_pg7-pg7_txt4’)	1289.2778
(‘TEXTLINK-default_txt3.href=unit7page1.target=self’, ‘TEXTLINK-pg1_txt7.href=u07_pg1_popup1.target=popup’)	1126.0653
(‘TOOLBAR_back-toolbar_back_btn’, ‘TEXTLINK-default_txt12.href=unit7page4.target=self’)	1109.0009
(‘TOOLBAR_back-toolbar_back_btn’, ‘TOOLBAR_back-toolbar_back_btn’)	1041.7560
(‘TEXTLINK-default_txt12.href=unit7page4.target=self’, ‘TEXTLINK-pg4_txt1.href=unit7page5.target=self’)	743.5791

Class 2

bigram	likelihood_ratio
(‘BUYNOW_pg4-pg4_txt11’, ‘BUYBOOK_pg4_ok-pg4_pu6_okbtn’)	1462.3971
(‘TEXTLINK-pg1_txt7.href=u07_pg1_popup1.target=popup’, ‘BUTTON_close-popup1_txt2’)	1366.2648
(‘BUYBOOK_pg4_ok-pg4_pu6_okbtn’, ‘NEXT_INQUIRY-REQUEST’)	1229.1815
(‘TEXTLINK-default_txt9.href=unit7page3.target=self’, ‘TEXTLINK-pg3_txt5.href=unit7page8.target=self’)	1223.2732
(‘CHECK_pg7-pg7_txt4’, ‘BUTTON_close-popup4_txt3’)	1206.8436
(‘TEXTLINK-default_txt3.href=unit7page1.target=self’, ‘TEXTLINK-pg1_txt7.href=u07_pg1_popup1.target=popup’)	1115.7798
(‘TOOLBAR_back-toolbar_back_btn’, ‘TEXTLINK-default_txt12.href=unit7page4.target=self’)	1038.0396
(‘TEXTLINK-default_txt18.href=unit7page7.target=self’, ‘CHECK_pg7-pg7_txt4’)	942.0183
(‘TOOLBAR_back-toolbar_back_btn’, ‘TOOLBAR_back-toolbar_back_btn’)	923.0133
(‘TEXTLINK-default_txt6.href=unit7page2.target=self’, ‘TOOLBAR_back-toolbar_back_btn’)	634.8878

Class 3

bigram	likelihood_ratio
(‘BUYNOW_pg7-pg7_txt19’, ‘BUYBOOK_pg7_ok-pg7_pu6_okbtn’)	220.56459
(‘TEXTLINK-default_txt18.href=unit7page7.target=self’, ‘BUYNOW_pg7-pg7_txt19’)	197.00413
(‘BUYNOW_pg3-pg3_txt4’, ‘BUYBOOK_pg3_ok-pg3_pu6_okbtn’)	194.11490
(‘BUYNOW_pg1-pg1_txt8’, ‘BUYBOOK_pg1_ok-pg1_pu6_okbtn’)	185.88803
(‘TEXTLINK-default_txt3.href=unit7page1.target=self’, ‘BUYNOW_pg1-pg1_txt8’)	160.02840
(‘TEXTLINK-default_txt9.href=unit7page3.target=self’, ‘BUYNOW_pg3-pg3_txt4’)	157.16978
(‘BUYNOW_pg4-pg4_txt11’, ‘BUYBOOK_pg4_ok-pg4_pu6_okbtn’)	109.12486
(‘BUYBOOK_pg7_ok-pg7_pu6_okbtn’, ‘NEXT_INQUIRY-REQUEST’)	101.90101
(‘TEXTLINK-default_txt12.href=unit7page4.target=self’, ‘BUYNOW_pg4-pg4_txt11’)	91.06483
(‘BUYBOOK_pg1_ok-pg1_pu6_okbtn’, ‘NEXT_INQUIRY-REQUEST’)	85.38611

Class 4

bigram	likelihood_ratio
(‘CHECK_pg7-pg7_txt4’, ‘BUTTON_close-popup4_txt3’)	204.15766
(‘TEXTLINK-pg1_txt7.href=u07_pg1_popup1.target=popup’, ‘BUTTON_close-popup1_txt2’)	203.05135
(‘TEXTLINK-default_txt18.href=unit7page7.target=self’, ‘CHECK_pg7-pg7_txt4’)	166.80637
(‘TEXTLINK-pg6_txt12.href=u07_pg6_popup3.target=popup’, ‘BUTTON_close-popup3_txt2’)	150.81368
(‘TEXTLINK-default_txt9.href=unit7page3.target=self’, ‘TEXTLINK-pg3_txt5.href=unit7page8.target=self’)	149.72934
(‘TEXTLINK-default_txt3.href=unit7page1.target=self’, ‘TEXTLINK-pg1_txt7.href=u07_pg1_popup1.target=popup’)	129.50057
(‘TOOLBAR_back-toolbar_back_btn’, ‘TEXTLINK-default_txt12.href=unit7page4.target=self’)	126.61287
(‘BUYNOW_pg4-pg4_txt11’, ‘BUYBOOK_pg4_ok-pg4_pu6_okbtn’)	118.99832
(‘TOOLBAR_back-toolbar_back_btn’, ‘TEXTLINK-default_txt9.href=unit7page3.target=self’)	96.55958
(‘TOOLBAR_back-toolbar_back_btn’, ‘TEXTLINK-default_txt15.href=unit7page6.target=self’)	95.65276

collocations (1 < 3 < 2): with NO τ and θ

Class 1

bigram	likelihood_ratio
(‘BUYNOW_pg4-pg4_txt11’, ‘BUYBOOK_pg4_ok-pg4_pu6_okbtn’)	1517.3255
(‘CHECK_pg7-pg7_txt4’, ‘BUTTON_close-popup4_txt3’)	1478.8744
(‘TEXTLINK-default_txt9.href=unit7page3.target=self’, ‘TEXTLINK-pg3_txt5.href=unit7page8.target=self’)	1450.1274
(‘TEXTLINK-pg1_txt7.href=u07_pg1_popup1.target=popup’, ‘BUTTON_close-popup1_txt2’)	1407.2608
(‘BUYBOOK_pg4_ok-pg4_pu6_okbtn’, ‘NEXT_INQUIRY-REQUEST’)	1398.3322
(‘TEXTLINK-default_txt18.href=unit7page7.target=self’, ‘CHECK_pg7-pg7_txt4’)	1289.2778
(‘TEXTLINK-default_txt3.href=unit7page1.target=self’, ‘TEXTLINK-pg1_txt7.href=u07_pg1_popup1.target=popup’)	1126.0653
(‘TOOLBAR_back-toolbar_back_btn’, ‘TEXTLINK-default_txt12.href=unit7page4.target=self’)	1109.0009
(‘TOOLBAR_back-toolbar_back_btn’, ‘TOOLBAR_back-toolbar_back_btn’)	1041.7560
(‘TEXTLINK-default_txt12.href=unit7page4.target=self’, ‘TEXTLINK-pg4_txt1.href=unit7page5.target=self’)	743.5791

Class 2

bigram	likelihood_ratio
(‘BUYNOW_pg4-pg4_txt11’, ‘BUYBOOK_pg4_ok-pg4_pu6_okbtn’)	1462.3971
(‘TEXTLINK-pg1_txt7.href=u07_pg1_popup1.target=popup’, ‘BUTTON_close-popup1_txt2’)	1366.2648
(‘BUYBOOK_pg4_ok-pg4_pu6_okbtn’, ‘NEXT_INQUIRY-REQUEST’)	1229.1815
(‘TEXTLINK-default_txt9.href=unit7page3.target=self’, ‘TEXTLINK-pg3_txt5.href=unit7page8.target=self’)	1223.2732
(‘CHECK_pg7-pg7_txt4’, ‘BUTTON_close-popup4_txt3’)	1206.8436
(‘TEXTLINK-default_txt3.href=unit7page1.target=self’, ‘TEXTLINK-pg1_txt7.href=u07_pg1_popup1.target=popup’)	1115.7798
(‘TOOLBAR_back-toolbar_back_btn’, ‘TEXTLINK-default_txt12.href=unit7page4.target=self’)	1038.0396
(‘TEXTLINK-default_txt18.href=unit7page7.target=self’, ‘CHECK_pg7-pg7_txt4’)	942.0183
(‘TOOLBAR_back-toolbar_back_btn’, ‘TOOLBAR_back-toolbar_back_btn’)	923.0133
(‘TEXTLINK-default_txt6.href=unit7page2.target=self’, ‘TOOLBAR_back-toolbar_back_btn’)	634.8878

Class 3

bigram	likelihood_ratio
(‘BUYNOW_pg7-pg7_txt19’, ‘BUYBOOK_pg7_ok-pg7_pu6_okbtn’)	220.56459
(‘TEXTLINK-default_txt18.href=unit7page7.target=self’, ‘BUYNOW_pg7-pg7_txt19’)	197.00413
(‘BUYNOW_pg3-pg3_txt4’, ‘BUYBOOK_pg3_ok-pg3_pu6_okbtn’)	194.11490
(‘BUYNOW_pg1-pg1_txt8’, ‘BUYBOOK_pg1_ok-pg1_pu6_okbtn’)	185.88803
(‘TEXTLINK-default_txt3.href=unit7page1.target=self’, ‘BUYNOW_pg1-pg1_txt8’)	160.02840
(‘TEXTLINK-default_txt9.href=unit7page3.target=self’, ‘BUYNOW_pg3-pg3_txt4’)	157.16978
(‘BUYNOW_pg4-pg4_txt11’, ‘BUYBOOK_pg4_ok-pg4_pu6_okbtn’)	109.12486
(‘BUYBOOK_pg7_ok-pg7_pu6_okbtn’, ‘NEXT_INQUIRY-REQUEST’)	101.90101
(‘TEXTLINK-default_txt12.href=unit7page4.target=self’, ‘BUYNOW_pg4-pg4_txt11’)	91.06483
(‘BUYBOOK_pg1_ok-pg1_pu6_okbtn’, ‘NEXT_INQUIRY-REQUEST’)	85.38611