Model V1

See jackson_flexsurv_2016 for available baseline functions. Proportional baseline: \[ \lambda_{ml}(dt_{k,n}) = \lambda_{m0}(t) \lambda_{l} \tau_{k} \text{ for } l \in S_{m} \text{and} \lambda_{s_{m,1}}=1. \] \(dt_{k,n}\) denotes tk,nstop - tk,nstart

Proportional hazard term: \[ e^{(\theta_{k} + \beta) D_{ml} } \]

  • add covariate later.

out-of-state, item, person parameters.

  • no incercept term in prop. hazard if baseline contains constant in the same level.
  • action m leads to more/less coherent action
  • \(D_{ml}\) is bi-directional similarity mapping.
  • including \(\beta_m\) doesn’t make it directional.
  • is \(\beta_k\) meaningful for item-specific action space? certainly not! this opens up the question about how actions should be defined. loosely defined without event_desciption or not.

option1: similar items share the same action space

no event description should be used.

option2: each item has its own action space (item-specific action space)

  • use multi-state modeling framework to explain?
  • target journal:
  • grant application (check deadline)
  • meeting at 4pm (CST)
  • online learning platform: interaction with online resources, with instructors, with other people (communication length, contents) - team collaboration.
    • data will be available on Aug.
    • team science program (NIH)

The intensity function \(q_{ml}(\cdot)\) represents the instantaneous risk of moving from action \(m\) to \(l\).

\begin{align*} q_{ml} (t ; \boldsymbol{\lambda}, \boldsymbol{\beta}, \mathbf{z}(t)) = & \lambda_{ml}(t) e^{\beta_j + (\beta_m + \theta_{\beta}) D_{ml}}, \end{align*}

where \(\boldsymbol{\alpha}\) is a vector of intercepts, and \(\boldsymbol{\beta}\) is coefficients associated with \(\mathbf{z}(t)\), \(\lambda_{k,m\rightarrow l}(t)\) is a baseline intensity function. For each state \(l\), there are competing transitions \(m_1, \ldots, m_{n_l}\). This mean there are \(n_{l}\) corresponding survival models for state \(l\), and overall \(K=\sum_l n_l\) models. Models with no shared parameters can be estimated Common out of state transition: \(\beta_{ml}=\beta_{m}\).

Baseline hazard: \[ \lambda_{ml}(t) = \alpha_{m1}(t) \alpha_{l} + \theta_{\lambda} \text{ for } l \neq 1. \] Proportional hazard term: \[ e^{\beta_j + (\beta_m + \theta_{\beta}) D_{ml}} \]

  • \(D_{ml}\) is bi-directional similarity embedding between actions \(m\) and \(l\).

The piecewise-constant baseline hazard is used:

\begin{equation} \label{eq:1} \lambda(t) = \lambda_j \text{ if } s_{j-1} \le t < s_{j}, \end{equation}

for \(j = 1,\ldots,J\). \(\lambda_{j}\) could be a function of the similarity. This would be similar to have a piecewise constant transition matrix (time-inhomogeneous Markov chain), but much simpler as you have a parametric model for constants. The cosine similiarity should be normalized before used.

\begin{align*} q_{ml} (t ; \boldsymbol{\alpha}, \boldsymbol{\beta}, \mathbf{z}(t)) = & \lambda_{ml}(t) \exp( \boldsymbol{\beta}_{m,l}’ \mathbf{z}_{i,m,l}(t) ), \end{align*}

\begin{align*} q_{ml} (t ; \boldsymbol{\alpha}, \boldsymbol{\beta}, \mathbf{z}(t)) = & \lambda_{k,m \rightarrow l}(t) \exp( \alpha_m + \alpha_l + \boldsymbol{\beta} d_{i,m,l} ), \end{align*}

where \(\boldsymbol{\alpha}\) is a vector of intercepts, and \(\boldsymbol{\beta}\) is coefficients associated with \(\mathbf{z}(t)\), \(\lambda_{k,m\rightarrow l}(t)\) is a baseline intensity function. For each state \(l\), there are competing transitions \(m_1, \ldots, m_{n_l}\). This mean there are \(n_{l}\) corresponding survival models for state \(l\), and overall \(K=\sum_l n_l\) models. Models with no shared parameters can be estimated separately.

Model V2

Let \(S\) denotes a set of all possible action. For each state \(m \in S\), \(A_{m}\) denotes a set of competing transitions \(\{l_1, \ldots, l_{n_m}\}\) that can be taken directly after \(m\). Let \(Y_k(t)\) denote an action of k-th respondent at time \(t\). All respondents assmed to begin problem solving processes at time \(t=0\).

The intensity function \(q_{ml}(\cdot)\) represents the instantaneous risk of moving from action \(m\) to \(l\):

\begin{align*} q_{ml}\left(t ; \mathcal{F}_{t}\right)= & \lim _{\delta t \rightarrow 0} \frac{P\left(Y(t+\delta t)=l \mid Y(t)=m, \mathcal{F}_{t}\right)}{\delta t}, m \neq l, m, l \in S, \end{align*}

where \(\mathcal{F}_t\) denotes the process up to time \(t\).

Action transition is assumed to follow Semi-Markovian, which means the intensity depends on the sojourn time (\(t - t_{m}\) ; time spent on the current action). This is often called “clock reset” approach as opposed to “clock forward” approach. Let \(dt_{m}\) denote the sojourn time.

Cox model

\begin{align} q_{ml}\left(t ; \mathcal{F}_{t}\right) = & q_{ml} (t - t_{m}; \boldsymbol{\lambda}, \boldsymbol{\beta}, \mathbf{z}(t))\\\
= & \lambda_{ml}(dt_{m}) e^{(\boldsymbol{\beta}’ \mathbf{z}(t) + \theta_{k}) D_{ml}}, \end{align}

for person \(k = 1,\ldots,N\), where \(\mathbf{z}(t)\) is time-varying covariates, \(\lambda_{kml}(t)\) is a baseline intensity function, \(D_{ml} \in [-1,1]\) denotes the cosine similarity between actions \(m\) and \(l\). The cosine similarity is obtained using word2vec on action sequences of an item. The closer the cosine value to 1, the greater the similarity between actions. The closer the cosine value to -1, the greater the dis-similarity between actions. This mean there are \(n_{m}\) corresponding intensity functions for state \(m\), and overall \(\sum_{m in S} n_m\) intensity functions.

We use the constant baseline hazard based on out-of-state transition speed and person’s transition speed: \[ \lambda_{ml}(dt) = \kappa_{m} \tau_{k} \text{ for } l \in A_{m}. \]

A running model has no coviarate terms: \[ q_{ml}\left(t ; \mathcal{F}_{t}\right) = q_{ml}(dt) = \kappa_{m} \tau_{k} e^{\theta_{k} D_{ml} }. \]

  • larger \(\kappa_{m}\) shorter time staying on action \(m\) (faster out-of-state transition)
  • larger \(\tau_{k}\), faster transition speed
  • larger \(\theta_{k}\), larger trasition rate towards a similar action. A person with large \(\theta_{k}\) tends to choose more coherent actions

pattern discovery

  • divide tau by 1) number of actions
  • use 2) total time for visualization
    1. divided by 2)
  • find persons (small # of actions, large # of actions: right vs wrong)
    • 시간이 많이 걸리고 맞은사람 vs 적게 걸리고 맞은 사람.
    • 적은 액션으로 맞은 사람 vs 많은 액션으로 틀린 사람.
  • 무조건 빨리 푼다고 잘하는게 아니고, 느리거나 혹은 상이한 액션 개수로 정답에 이르는 프로세스 발견에 초점.

justification + what item to use

select a few items fulfilling the justification sheme!

https://cran.r-project.org/web/packages/tidyLPA/vignettes/Introduction%5Fto%5FtidyLPA.html 세분화된 그룹 (더 많은 그룹) 이 있으면 OK

observed covaritates + response group classification observed covaritates + tau and theta + response group classification

age control gender no significant was gender diff in EDA, after age control no difference.

https://data-edu.github.io/tidyLPA/reference/AHP.html

likelihood

\begin{align*} q_{ml} (t ; \boldsymbol{\lambda}, \boldsymbol{\beta}, \mathbf{z}(t)) = & \lambda_{ml}(t) e^{(\boldsymbol{\beta}’ \mathbf{z}(t) + \theta_{k}) D_{ml}}\\\
q_{ml}\left(t ; \mathcal{F}_{t}\right)= & \lim _{\delta t \rightarrow 0} \frac{P\left(Y(t+\delta t)=l \mid Y(t)=m, \mathcal{F}_{t}\right)}{\delta t}, m \neq l, m, l \in S \end{align*}

The survival function is \[ S_{ml}(dt) = e^{-\int_{0}^{dt_{m}} q_{ml}(x) \dd x}. \] Let \(\nu_{mlk}(t) = 1\) if person \(k\) jump from actions \(m\) to \(l\) at time \(t\); 0 otherwise. \[ f_{ml}(t) = q_{ml}(t) S_{ml}(t) \] \[ likelihood =\prod_{k} f_{ml}(dt) \prod_{g \in A_{m}} S_{mg}(t), \] \[ f_{ml} = q_{ml}(t) S_{ml}(t), S_{ml}(t) = e^{-\int_{0}^{t^{stop} - t^{start}} q_{ml}(t)\dd t} \]

\[ S_{ml}(dt) = e^{-dt \kappa_{m} \omega_{l} \tau_{k} e^{(\theta_{k} + \beta) D_{ml} }} \]

\(n = 1,\ldots,M_{k}\): n-th action of k-th person, \(M_k\): sequence length

\( \delta_{k,n,m} = 1 \) if person k’s n-th action is m.

\( \delta_{k,n,m} \delta_{k,n+1,l} = 1 \) for \(n < M_{k}\) if person k’s n-th transition is m to l.

time at starting state (one after START) is set to the first action (n=1), and the corresponding time is set to 0.

prior

The proposed method use a fully Bayesian approach for estimating the proposed latent space model, using MCMC methods. Our prior specification is as follows:

\begin{align*} \pi\left(\kappa_{m}\right) & \sim \operatorname{Gamma}\left(a_{\kappa}, b_{\kappa})\right); \\\
\pi\left(\tau_{k}\right) & \sim \operatorname{Gamma}\left(a_{\kappa}, b_{\kappa})\right); \\\
\pi\left(\theta_{k} | \sigma^{2}\right) & \sim \mathrm{N}\left(0, \sigma^{2}\right); \\\
\pi\left(\sigma^{2}\right) & \sim \operatorname{lnv}-\operatorname{Gamma}\left(a_{\sigma}, b_{\sigma}\right); \\\
\end{align*}

inv-Gamma(θ|α,β) \[ p(\theta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)} \theta^{-(\alpha+1)} e^{-\beta / \theta}, \quad \theta>0 \]

where hyperparameters are chosen as \[a_{\sigma}=0.0001, b_{\sigma}=0.0001, \mu_{\theta}=0, \text { and } ….\]

Based on our experience, the inference of \(\mathbf{\Theta}\) is highly sensitive to its variance \(\sigma^2\). Also, the configuration of latent embeddings highly depends on the scale parameter \(\gamma\) of the latent space. Rather than choosing sub-optimal tuning parameters, we use a layer of hyper-priors to learn optimal values of these parameters from data. We choose hyperparameters such that priors are minimally informative to facilitate the flexible Bayesian learning.

pseudo code

update \(\kappa_{m}\)

  • all \(k\) person’s having action m, all l ∈ A_m (all possible actions that can jump from m)
  • transition start and stop time \(dt_{k,n}\) for all \(\delta_{k,n,m} = 1\)
  • For each \(k,c\), a symmetric MH jumping \(J(\kappa_{m}^{(l-1)} \rightarrow \kappa_{m}^{* })\) is used to propose a new sample.
  • We accept \(\kappa_{m}^{(l)} = \kappa_{m}^{* }\) with probability \(\min(1, r_{{\kappa_{m}}^{* }})\) where

\begin{align*} \log r_{{\kappa_{m}}^{* }} =& \sum \delta_{k,n,m} (\log \kappa_{m}^{* } - \log \kappa_{m}^{(l-1)})\\\
&-\sum dt (\kappa_{m}^{* } - \kappa_{m}^{(l-1)}) \tau_{k} e^{(\theta_{k} + \beta) D_{ml} }

  • \log \frac{\pi(\kappa_{m}^{* })}{\pi(\kappa_{m}^{t})}. \end{align*}

update \(\tau_{k}\)

  • all \(k\) person’s m and l ∈ A_m
  • all kth person’s transition start and stop time

\begin{align*} \log r_{{\tau_{k}}^*} =& \sum \delta_{k,n,m} (\log \tau_{k}^* - \log \tau_{k}^{(l-1)})\\\
&-\sum dt \kappa_{m}e^{(\theta_{k} + \beta) D_{ml}} ( \tau_{k}^* - \tau_{k}^{(l-1)} )

  • \log \frac{\pi(\tau_{k}^*)}{\pi(\theta_{k}^{t})}. \end{align*}

update \(\theta_{k}\)

  • all \(k\) person’s m and l ∈ A_m
  • all kth person’s transition start and stop time
  • For each \(k\), a symmetric MH jumping \(J(\theta_{k}^{(l-1)} \rightarrow \theta_{k}^{* }\) is used to propose a new sample.
  • We accept \(\theta_{k}^{(l)} = \theta_{k}^{* }\) with probability \(\min(1, r_{{\theta_{k}}^{* )}})\) where

\begin{align*} \log r_{{\theta_{k}}^{* }} =& \sum \delta_{k,n,m} (\theta_{k}^{* } - \theta_{k}^{(l-1)})D_{ml}\\\
&-\sum dt \kappa_{m} \tau_{k} e^{ \beta D_{ml} }(e^{\theta_{k}^{* }D_{ml}} - e^{\theta_{k}^{(l-1)} D_{ml} })

  • \log \frac{\pi(\theta_{k}^{* })}{\pi(\theta_{k}^{(l-1)})}.\\\
    \end{align*}

update \(\sigma\)

\[ p( \sigma^2|e.e.) \propto invGamma(\sigma^{2}|a,b) \prod N(\theta_{k} | \mu, \sigma^2) \] \(\sigma^{2} \sim inv-gamma(a + 0.5 * N, b + 0.5 + \sum \theta_{k}^2)\) with flat prior: \(\sigma^{2} \sim inv-gamma(0.5 * N, 0.5 + \sum \theta_{k}^2)\)

data structure

msm R package

https://www.rdocumentation.org/packages/msm/versions/1.6.8/topics/msm2Surv

library(msm)
msmdat <- data.frame(
 subj = c(1, 1, 1, 1, 1, 2, 2, 2),
 days = c(0, 27, 75, 97, 1106, 0, 90, 1037),
 status = c(1, 2, 3, 4, 4, 1, 2, 2),
 age = c(66, 66, 66, 66, 69, 49, 49, 51),
 treat = c(1, 1, 1, 1, 1, 0, 0, 0)
)
# transitions only allowed to next state up or state 4
Q <- rbind(c(1, 1, 0, 1),
           c(0, 1, 1, 1),
           c(0, 0, 1, 1),
           c(0, 0, 0, 0))
dat <- msm2Surv(data=msmdat, subject="subj", time="days", state="status",
         Q=Q)
dat
attr(dat, "trans")