Model V2

- [pattern discovery](#pattern-discovery)
- [justification + what item to use](#justification-plus-what-item-to-use)

Let \(S\) denotes a set of all possible action. For each state \(m \in S\), \(A_{m}\) denotes a set of competing transitions \(\{l_1, \ldots, l_{n_m}\}\) that can be taken directly after \(m\). Let \(Y_k(t)\) denote an action of k-th respondent at time \(t\). All respondents assmed to begin problem solving processes at time \(t=0\).

The intensity function \(q_{ml}(\cdot)\) represents the instantaneous risk of moving from action \(m\) to \(l\):

\begin{align*} q_{ml}\left(t ; \mathcal{F}_{t}\right)= & \lim _{\delta t \rightarrow 0} \frac{P\left(Y(t+\delta t)=l \mid Y(t)=m, \mathcal{F}_{t}\right)}{\delta t}, m \neq l, m, l \in S, \end{align*}

where \(\mathcal{F}_t\) denotes the process up to time \(t\).

Action transition is assumed to follow Semi-Markovian, which means the intensity depends on the sojourn time (\(t - t_{m}\) ; time spent on the current action). This is often called “clock reset” approach as opposed to “clock forward” approach. Let \(dt_{m}\) denote the sojourn time.

Cox model

\begin{align} q_{ml}\left(t ; \mathcal{F}_{t}\right) = & q_{ml} (t - t_{m}; \boldsymbol{\lambda}, \boldsymbol{\beta}, \mathbf{z}(t))\\\
= & \lambda_{ml}(dt_{m}) e^{(\boldsymbol{\beta}’ \mathbf{z}(t) + \theta_{k}) D_{ml}}, \end{align}

for person \(k = 1,\ldots,N\), where \(\mathbf{z}(t)\) is time-varying covariates, \(\lambda_{kml}(t)\) is a baseline intensity function, \(D_{ml} \in [-1,1]\) denotes the cosine similarity between actions \(m\) and \(l\). The cosine similarity is obtained using word2vec on action sequences of an item. The closer the cosine value to 1, the greater the similarity between actions. The closer the cosine value to -1, the greater the dis-similarity between actions. This mean there are \(n_{m}\) corresponding intensity functions for state \(m\), and overall \(\sum_{m in S} n_m\) intensity functions.

We use the constant baseline hazard based on out-of-state transition speed and person’s transition speed: \[ \lambda_{ml}(dt) = \kappa_{m} \tau_{k} \text{ for } l \in A_{m}. \]

A running model has no coviarate terms: \[ q_{ml}\left(t ; \mathcal{F}_{t}\right) = q_{ml}(dt) = \kappa_{m} \tau_{k} e^{\theta_{k} D_{ml} }. \]

larger \(\kappa_{m}\) shorter time staying on action \(m\) (faster out-of-state transition)
larger \(\tau_{k}\), faster transition speed
larger \(\theta_{k}\), larger trasition rate towards a similar action. A person with large \(\theta_{k}\) tends to choose more coherent actions

pattern discovery

divide tau by 1) number of actions
use 2) total time for visualization
1. divided by 2)
find persons (small # of actions, large # of actions: right vs wrong)
- 시간이 많이 걸리고 맞은사람 vs 적게 걸리고 맞은 사람.
- 적은 액션으로 맞은 사람 vs 많은 액션으로 틀린 사람.
무조건 빨리 푼다고 잘하는게 아니고, 느리거나 혹은 상이한 액션 개수로 정답에 이르는 프로세스 발견에 초점.

justification + what item to use

select a few items fulfilling the justification sheme!

https://cran.r-project.org/web/packages/tidyLPA/vignettes/Introduction%5Fto%5FtidyLPA.html 세분화된 그룹 (더 많은 그룹) 이 있으면 OK

observed covaritates + response group classification observed covaritates + tau and theta + response group classification

age control gender no significant was gender diff in EDA, after age control no difference.

https://data-edu.github.io/tidyLPA/reference/AHP.html

likelihood

\begin{align*} q_{ml} (t ; \boldsymbol{\lambda}, \boldsymbol{\beta}, \mathbf{z}(t)) = & \lambda_{ml}(t) e^{(\boldsymbol{\beta}’ \mathbf{z}(t) + \theta_{k}) D_{ml}}\\\
q_{ml}\left(t ; \mathcal{F}_{t}\right)= & \lim _{\delta t \rightarrow 0} \frac{P\left(Y(t+\delta t)=l \mid Y(t)=m, \mathcal{F}_{t}\right)}{\delta t}, m \neq l, m, l \in S \end{align*}

The survival function is \[ S_{ml}(dt) = e^{-\int_{0}^{dt_{m}} q_{ml}(x) \dd x}. \] Let \(\nu_{mlk}(t) = 1\) if person \(k\) jump from actions \(m\) to \(l\) at time \(t\); 0 otherwise. \[ f_{ml}(t) = q_{ml}(t) S_{ml}(t) \] \[ likelihood =\prod_{k} f_{ml}(dt) \prod_{g \in A_{m}} S_{mg}(t), \] \[ f_{ml} = q_{ml}(t) S_{ml}(t), S_{ml}(t) = e^{-\int_{0}^{t^{stop} - t^{start}} q_{ml}(t)\dd t} \]

\[ S_{ml}(dt) = e^{-dt \kappa_{m} \omega_{l} \tau_{k} e^{(\theta_{k} + \beta) D_{ml} }} \]

\(n = 1,\ldots,M_{k}\): n-th action of k-th person, \(M_k\): sequence length

\( \delta_{k,n,m} = 1 \) if person k’s n-th action is m.

\( \delta_{k,n,m} \delta_{k,n+1,l} = 1 \) for \(n < M_{k}\) if person k’s n-th transition is m to l.

time at starting state (one after START) is set to the first action (n=1), and the corresponding time is set to 0.

prior

The proposed method use a fully Bayesian approach for estimating the proposed latent space model, using MCMC methods. Our prior specification is as follows:

\begin{align*} \pi\left(\kappa_{m}\right) & \sim \operatorname{Gamma}\left(a_{\kappa}, b_{\kappa})\right); \\\
\pi\left(\tau_{k}\right) & \sim \operatorname{Gamma}\left(a_{\kappa}, b_{\kappa})\right); \\\
\pi\left(\theta_{k} | \sigma^{2}\right) & \sim \mathrm{N}\left(0, \sigma^{2}\right); \\\
\pi\left(\sigma^{2}\right) & \sim \operatorname{lnv}-\operatorname{Gamma}\left(a_{\sigma}, b_{\sigma}\right); \\\
\end{align*}

inv-Gamma(θ|α,β) \[ p(\theta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)} \theta^{-(\alpha+1)} e^{-\beta / \theta}, \quad \theta>0 \]

where hyperparameters are chosen as \[a_{\sigma}=0.0001, b_{\sigma}=0.0001, \mu_{\theta}=0, \text { and } ….\]

Based on our experience, the inference of \(\mathbf{\Theta}\) is highly sensitive to its variance \(\sigma^2\). Also, the configuration of latent embeddings highly depends on the scale parameter \(\gamma\) of the latent space. Rather than choosing sub-optimal tuning parameters, we use a layer of hyper-priors to learn optimal values of these parameters from data. We choose hyperparameters such that priors are minimally informative to facilitate the flexible Bayesian learning.

pseudo code

update \(\kappa_{m}\)

all \(k\) person’s having action m, all l ∈ A_m (all possible actions that can jump from m)
transition start and stop time \(dt_{k,n}\) for all \(\delta_{k,n,m} = 1\)
For each \(k,c\), a symmetric MH jumping \(J(\kappa_{m}^{(l-1)} \rightarrow \kappa_{m}^{* })\) is used to propose a new sample.
We accept \(\kappa_{m}^{(l)} = \kappa_{m}^{* }\) with probability \(\min(1, r_{{\kappa_{m}}^{* }})\) where

\begin{align*} \log r_{{\kappa_{m}}^{* }} =& \sum \delta_{k,n,m} (\log \kappa_{m}^{* } - \log \kappa_{m}^{(l-1)})\\\
&-\sum dt (\kappa_{m}^{* } - \kappa_{m}^{(l-1)}) \tau_{k} e^{(\theta_{k} + \beta) D_{ml} }

\log \frac{\pi(\kappa_{m}^{* })}{\pi(\kappa_{m}^{t})}. \end{align*}

update \(\tau_{k}\)

all \(k\) person’s m and l ∈ A_m
all kth person’s transition start and stop time

\begin{align*} \log r_{{\tau_{k}}^*} =& \sum \delta_{k,n,m} (\log \tau_{k}^* - \log \tau_{k}^{(l-1)})\\\
&-\sum dt \kappa_{m}e^{(\theta_{k} + \beta) D_{ml}} ( \tau_{k}^* - \tau_{k}^{(l-1)} )

\log \frac{\pi(\tau_{k}^*)}{\pi(\theta_{k}^{t})}. \end{align*}

update \(\theta_{k}\)

all \(k\) person’s m and l ∈ A_m
all kth person’s transition start and stop time
For each \(k\), a symmetric MH jumping \(J(\theta_{k}^{(l-1)} \rightarrow \theta_{k}^{* }\) is used to propose a new sample.
We accept \(\theta_{k}^{(l)} = \theta_{k}^{* }\) with probability \(\min(1, r_{{\theta_{k}}^{* )}})\) where

\begin{align*} \log r_{{\theta_{k}}^{* }} =& \sum \delta_{k,n,m} (\theta_{k}^{* } - \theta_{k}^{(l-1)})D_{ml}\\\
&-\sum dt \kappa_{m} \tau_{k} e^{ \beta D_{ml} }(e^{\theta_{k}^{* }D_{ml}} - e^{\theta_{k}^{(l-1)} D_{ml} })

\log \frac{\pi(\theta_{k}^{* })}{\pi(\theta_{k}^{(l-1)})}.\\\
\end{align*}

update \(\sigma\)

\[ p( \sigma^2|e.e.) \propto invGamma(\sigma^{2}|a,b) \prod N(\theta_{k} | \mu, \sigma^2) \] \(\sigma^{2} \sim inv-gamma(a + 0.5 * N, b + 0.5 + \sum \theta_{k}^2)\) with flat prior: \(\sigma^{2} \sim inv-gamma(0.5 * N, 0.5 + \sum \theta_{k}^2)\)

data structure

msm R package

https://www.rdocumentation.org/packages/msm/versions/1.6.8/topics/msm2Surv

library(msm)
msmdat <- data.frame(
 subj = c(1, 1, 1, 1, 1, 2, 2, 2),
 days = c(0, 27, 75, 97, 1106, 0, 90, 1037),
 status = c(1, 2, 3, 4, 4, 1, 2, 2),
 age = c(66, 66, 66, 66, 69, 49, 49, 51),
 treat = c(1, 1, 1, 1, 1, 0, 0, 0)
)
# transitions only allowed to next state up or state 4
Q <- rbind(c(1, 1, 0, 1),
           c(0, 1, 1, 1),
           c(0, 0, 1, 1),
           c(0, 0, 0, 0))
dat <- msm2Surv(data=msmdat, subject="subj", time="days", state="status",
         Q=Q)
dat
attr(dat, "trans")