0% found this document useful (0 votes)

16 views

Bayesian Factor Zero-Inflated Poisson Model For Multiple Grouped Count Data

Uploaded by

guydumais

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Bayesian Factor Zero-Inflated Poisson Model For Multiple Grouped Count Data

Uploaded by

guydumais

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Bayesian factor zero-inflated Poisson model for multiple

grouped count data

Genya Kobayashi1∗ and Yuta Yamauchi2

arXiv:2405.06335v1 [stat.ME] 10 May 2024

1 School of Commerce, Meiji University

2 Department of Economics, Nagoya University

Abstract

This paper proposes a computationally efficient Bayesian factor model for multiple grouped count
data. Adopting the link function approach, the proposed model can capture the association within and
between the at-risk probabilities and Poisson counts over multiple dimensions. The likelihood function
for the grouped count data consists of the differences of the cumulative distribution functions evaluated
at the endpoints of the groups, defining the probabilities of each data point falling in the groups.
The combination of the data augmentation of underlying counts, the Pólya-Gamma augmentation to
approximate the Poisson distribution, and parameter expansion for the factor components is used to
facilitate posterior computing. The efficacy of the proposed factor model is demonstrated using the
simulated data and real data on the involvement of youths in the nineteen illegal activities.

Key words: data augmentation; factor model; Markov chain Monte Carlo; multivariate count data;
parameter expansion, Pólya-gamma augmentation;

1 Introduction

Zero-inflation is a prevalent issue in the statistical analysis of count data in various applications,
such as epidemiology, health services research, and social studies. Several well-developed statistical
methods exist for analysing zero-inflated count data, with the zero-inflated model (Lambert, 1992)
being one of the commonly used approaches. See, for example, Neelon et al. (2016) for a review.
Another significant challenge in the count data analysis arises from the occurrence of ‘grouped
counts’. Instead of actual counts, grouped count data provide frequencies of individuals for predefined
∗
Author of correspondance: gkobayashi@meiji.ac.jp

1
ordinal groups. Grouping occurs due to various factors, such as the sensitivity of the data topic and
cognitive burden experienced by interviewees (Fu et al., 2018). For example, in our real data analysis,
the frequencies of involvement in illegal activities are reported in categories such as ‘never’, ‘once’,
‘twice’, ‘between three and five times’, ‘between six and ten times’, ‘between eleven and fifty times’
and ‘over fifty times’, instead of the exact frequencies.
Although there exists a body of studies analysing grouped continuous data, especially in the context
of income data analysis (see, e.g., Kobayashi et al., 2022, 2023), the statistical analysis of grouped
count data has much less attention, though grouped count data frequently arises, especially in applied
social science. To our knowledge, McGinley et al. (2015) is the only study introducing the model for
the grouped zero-inflated count data. McGinley et al. (2015) employs the likelihood function of an
ordinal response model where the likelihood contribution of each group is expressed by the difference
between the cumulative distribution function of a discrete probability distribution evaluated at the
endpoints of the group. These differences define the probabilities of the data points falling into the
groups. However, when zero-inflation is high, an analysis of zero-inflated grouped count data using
a univariate model can be distorted by the severe scarcity of information due to grouping and zero-
inflation. If the data include multiple count responses, leveraging shared information among them by
analysing them jointly considering a multivariate structure rather than treating them independently
would be beneficial.
In addition, there has also been a growing demand for the joint analysis of multiple count data
of which some or all dimensions are zero-inflated (see, e.g., Berry and West, 2020). However, unlike
continuous distributions such as the normal, developing and implementing a multivariate count model
is generally cumbersome, especially when the multivariate counts are zero-inflated, as in the recent
study of Liu and Tian (2015).
Factor analysis stands out as a common approach to analysing multivariate count data in a par-
simonious and computationally convenient manner. To introduce a factor structure into count data
analysis, a link function is commonly used to model a latent linear predictor incorporating latent
factors and covariates (Wedel et al., 2003). As an alternative approach, Larsson (2020) introduced a
distinct type of factor model for discrete data, differing from classical count factor models, which is
based on a dependent Poisson model (see, e.g. Karlis, 2003). Some previous research exists on the
factor models of zero-inflated count data, such as Neelon and Chung (2017) and Xu et al. (2021).
Neelon and Chung (2017) introduced the factor structure into the at-risk probability and the mean
count using the multiplicative function of the latent factor and regression components. Xu et al.
(2021) used the link-function approach to connect the zero-inflated count and latent linear predictor

2
with factors. The fundamental difference between our approach and the previous approaches lies in
the flexibility of the factor structure. Due to its multiplicative structure, the factor model studied in
Neelon and Chung (2017) permits only positive factors. Xu et al. (2021) employed the common latent
linear predictor for both the at-risk probability and the mean count, which results in a restrictive
correlation structure.
Based on the preceding, we propose the zero-inflated Poisson model with a flexible latent factor
structure for multiple grouped count data. For modelling grouped count in each dimension, we follow
McGinley et al. (2015) and introduce the likelihood function for an ordinal model described above. To
introduce the association within and between the at-risk and Poisson parts over different dimensions,
we introduce the individual-specific latent factors with the dimension-specific factor loadings for the
at-risk and Poisson parts. To facilitate posterior computation, we employ the Pólya-Gamma (PG)
mixture representation of Polson et al. (2013). Since our model is Poisson-based, following Hamura
et al. (2021), we approximate the Poisson model by the negative binomial model and apply PG data
augmentation. This augmentation enables us to carry out an efficient Gibbs sampling. Moreover, for
efficient sampling, we also borrow the idea of the parameter expansion technique of Ghosh and Dunson
(2009) for the factor components, but without the positive lower triangular constraints. The MCMC
draws of the unidentified working parameters are post-processed using the algorithm of Papastamoulis
and Ntzoufras (2022). While achieving a stable sampling of the factor components in the low layer of
the hierarchical model may seem challenging, our sampling method works well, as illustrated in the
real data analysis where the counts are highly zero-inflated and highly coarsened into groups.
The remainder of this paper is organised as follows. Section 2 introduces the proposed factor model
for zero-inflated grouped counts. Then, the MCMC algorithm for the posterior inference is provided by
applying the PG augmentation, data augmentation of the underlying counts, and parameter expansion.
We also describe the post-processing for producing identified MCMC draws. The efficacy of the joint
modelling through the latent factors is demonstrated by using the simulated data in Section 3 and real
data in Section 4. Specifically, Section 4 analyses the grouped count data of National Longitudinal
Study of Youths 1979 (NLSY79) on the illegal activities by youths. Finally, Section 5 provides some
conclusion and discussion.

3
2 Method

2.1 Model

Let yi = (yi1 , . . . , yiJ )′ denote the J dimensional vector of the response variables. Each element of yi
consists of zero-inflated grouped count data. Let yi∗ = (yi1
∗ , . . . , y ∗ )′ denote the vector of the latent
iJ

count data, and each element of yi∗ is assumed to follow the zero-inflated Poisson distribution (ZIP)
model expressed as

∗ ∗
yij ∼ (1 − πij )I(zij = 0, yij = 0) + πij P o(µij )I(zij = 1), i = 1, . . . , N, j = 1, . . . , J.

where P o(µ) denotes the Poisson distribution with the mean parameter µ, zij is the latent binary
indicator such that zij = 1 with probability πij and zij = 0 otherwise. If zij = 0, the latent count
is equal to structurally zero with probability and otherwise follows the Poisson distribution (at-risk).
∗ ). Generally, c is in the form of
Given a known grouping mechanism c, yij is observed as c(yij

∗
yij = g iff κg ≤ yij < κg+1 , g = 0, . . . , G − 1, (1)

where κg ’s define the thresholds of the ordinal groups (see for example, McGinley et al., 2015). Typi-
cally, κg = 0 and κG = ∞. We utilise this data augmentation form for the posterior computation.
The at-risk probability πij = Pr(zij = 1) is modelled using the logistic model given by

exp(η1ij )
πij = , i = 1, . . . , N, j = 1, . . . , J.
1 + exp(η1ij )

The Poisson mean is modelled through the log-link function µij = exp(η2ij ).
In order to connect the 2 × J responses, the common latent factor is introduced to the linear
predictor in such a way that
ηhij = x′ij β hj + u′i λhj , h = 1, 2,

where xij is the P × 1 vector of covariates with the associated coefficient β hj , ui = (ui1 , . . . , uiK )′ is
the K × 1 common latent factor and λhj = (λhj1 , . . . , λhjK )′ is the corresponding factor loading for
h = 1, 2 and j = 1, . . . , J.
In order to facilitate the posterior computation and identification, we follow Hamura et al. (2021)
to approximate the Poisson model by the negative binomial model and apply the Pólya-Gamma (PG)
mixture of Polson et al. (2013). It is well known that the negative binomial distribution has the

4
following mixture representation:

Y ∼ P o(ϵeη ), ϵ ∼ Ga(r, r),

where Ga(a, b) denotes the gamma distribution with the mean a/b. The marginal probability function
of Y is given by
Γ(y + r) (eη /r)y Γ(y + r) (eψ )y
p(y) = = ,
Γ(r)y! (1 + eη /r)y+r Γ(r)y! (1 + eψ )y+r

where ψ = η − log r. The Poisson distribution is obtained in the limit of r → ∞. Therefore, for a
sufficiently large r, we can apply the Pólya-Gamma (PG) mixture representation to this approximate
Poisson model:
∞
(eψ )a
Z
2 /2
= 2−b eκψ e−ωψ p(ω|b, 0)dω,
(1 + eψ )b 0

where a = y, b = y + r, κ = a − b/2 and ω follows the PG distribution P G(b, 0) with the density
p(ω|b, 0).
Collecting the 2J terms of the PG mixture, the contribution of the ith individual to the augmented
∗ and
QJ
likelihood function conditionally on ω1ij , ω2ij , yij j=1 zij = 1 is proportional to

J n ω o
1ij
Y
exp − (x′i β 1j + u′i λ1j )2 + κ1ij (x′i β 1j + u′i λ1j )
2
j=1 (2)
n ω o
2ij
× exp − (x′i β 2j + u′i λ2j − log r)2 + κ2ij (x′i β 2j + u′i λ2j − log r)
2
1
∝ exp − (di − βxi − Λui − r)′ Ωi (di − βxi − Λui − r)
2

∗ −r)/2, r = (0, . . . , 0, log r, . . . , log r )′ , d = (d , . . . , d

where κ1ij = zij −1/2, κ2ij = (yij ′
i 1i1 1iJ , d2i1 , . . . , d2iJ )
| {z } | {z }
J J
∗ − r )/(2ω ), β is the 2J × p matrix such that
(2J × 1 vector), d1ij = (zij − 1/2)/ω1ij , d2ij = (yij j 2ij

β ′ = (β 11 , . . . , β 1J , β 21 , . . . , β 2J ), Ωi = diag(ω1i1 , . . . , ω1iJ , ω2i1 , . . . , ω2iJ ), Λ is the 2J × K matrix

such that Λ′ = [λ11 , . . . , λ1J , λ21 , . . . , λ2J ]. Conditionally on ωhij and ui , the model for the 2 × J
transformed response di is the normal with the diagonal covariance matrix Ωi . Therefore, in this
conditionally normal model, the factor component term Λui captures the association within and be-
tween the at-risk and Poisson parts over J dimensions. Based on this representation, the following
subsection introduces the parameter expansion for efficient sampling of the factor components.

2.2 Parameter expansion and prior distributions

For the regression parameters β hj , we assume the conditionally conjugate priors N (b0 , B0 ) for h = 1, 2
and j = 1, . . . , J. The standard prior distributions for the common factors and loadings would be

5
uij ∼ N (0, 1) and λhj ∼ N (0, 1). Under this prior specification, however, the mixing of an MCMC
algorithm tends to be very slow.
The augmented model above is expanded for efficient posterior sampling, borrowing the idea of
Ghosh and Dunson (2009). Specifically, we introduce the working parameters λ∗hj = (λ∗hj1 , . . . , λ∗hjK )′
and u∗i = (u∗i1 , . . . , u∗iK )′ . The MCMC algorithm samples the working parameters from their posterior
distributions. The likelihood contribution of the ith individual in the expanded model is obtained
by simply replacing ui and λhj in (2) with u∗i and λ∗hj , respectively. The prior distributions for
the working parameters are given by λ∗hjk ∼ N (0, 1), h = 1, 2, j = 1, . . . , J, k = 1, . . . , K, and
u∗i ∼ N (0, Φ), i = 1, . . . , N where Φ = diag(ϕ1 , . . . , ϕK ). Further, it is assumed ϕk ∼ IG(ak , bk ), k =
1, . . . , K.
Our prior specification differs slightly from that in Ghosh and Dunson (2009). To correct for
the invariance of the factor loadings due to rotation and sign-switching, Ghosh and Dunson (2009)
employed the positive lower triangular (PLT) constraint where the diagonal elements of the factor
loading matrix are strictly positive, and the upper triangle elements are fixed to zero a-priori. In our
model, it would have been λ∗1jk = 0, k = min(j, K) + 1, . . . , K and λ∗2jk = 0, k = min(J + j, K) +
1, . . . , K. However, PLT only partially solves the identification issues. For example, the identifiability
is lost when the loading for the first variable is close to zero. In this case, reordering the variables is
required. See Papastamoulis and Ntzoufras (2022) and references therein for the recent development
in the approaches to achieving identifiability of the factor model and their limitations.
Therefore, this paper employs the parameter expansion without constraining the factor loading
matrix. The MCMC draws of the unidentified parameters are post-processed to produce the posterior
draws of the identified parameters. See Section 2.4.

2.3 MCMC algorithm

The parameters and latent variables are sampled using the Gibbs sampler described in the following.
′
In this section, ηhij is expressed in terms of the working parameters ηhij = x′ij β hj + u∗i λ∗hj .
The joint distribution of the parameters and latent variables under the expanded model is propor-

6
tional to

J Y N h n ω o
1ij ′ ′
Y
exp − (x′i β 1j + u∗i λ∗1j )2 + κ1ij (x′i β 1j + u∗i λ∗1j ) p(ω1ij )
2
j=1 i=1
n ω o I(zij =1)I(yij =g,yij∗ ∈[κg ,κg+1 ))
2ij ′ ∗′ ∗ 2 ′ ∗′ ∗
× exp − (xi β 2j + ui λ2j − log r) + κ2ij (xi β 2j + ui λ2j − log r) p(ω2ij )
2
"N # 2 J K 
2 Y J

Y YYY Y
× p(u∗i |Φ)  p(λ∗hjk )  p(β hj ) p(Φ)
i=1 h=1 j=1 k=1 h=1 j=1

(3)

where I(·) is the indicator function, p(λ∗hjk ), p(β hj ) and p(Φ) denote the prior densities. Then,
∗ }, {β }, {z }, {ω }, {λ∗ }, {u∗ } and {ϕ } from their
the Gibbs sampler alternately samples {yij hj ij ij hj i k

respective full conditional distributions.

∗ , i = 1, . . . , N, j = 1, . . . , J: From (1) and conditionally on z = 1, after integrating

1. Sampling yij ij
∗ is proportional to
ω2ij , the full conditional distribution of yij

∗
∗ + r)
Γ(yij (eψ
ij )
yij
∗
p(yij |zij = 1, Rest) ∝ ∗! ∗ I(yij = g, yij ∈ [κg , κg+1 )),
Γ(r)yij (1 + eψij )yij +r

where ψij = η2ij − log r. This full conditional distribution is the negative binomial distribution
truncated on the interval [κg , κg+1 ).

2. The sampling steps of zij , β hj , λ∗hj and ωhij are similar to those provided in Neelon (2019).

• Sampling zij , i = 1, . . . , N, j = 1, . . . , J: The full conditional distribution of zij is given by

r
πij vij
∗
Pr(zij = 1|yij = 0, Rest) = r ),
1 − πij (1 − vij

where vij = 1/(1 + eψij ).

• Sampling ωhij , h = 1, 2, i = 1, . . . , N, j = 1, . . . , J: ω1ij is sampled from P G(1, ηitj ).

∗ ,η
Similarly, for i and j such that zij = 1, ω2ij is sampled from P G(r + yij 2ij − log r)

• Sampling β 1j and λ∗1j , j = 1, . . . , J: We sample β 1j and λ∗1j in one block. The full
′
conditional distribution of (β ′1j , λ∗1j )′ is given by N (b1j , B1j ) where

"N #−1 " N #

X X 1
B1j = ω1ij x̃ij x̃′ij + B̃−1
0 , b1j = B1j x̃ij zij − + B̃−1
0 b̃0 ,
2
i=1 i=1

′
where x̃ij = (x′ij , u∗i )′ , B̃0 is the block diagonal matrix with B0 and Iℓ on the diagonal

7
blocks and b̃0 = (b′0 , 0′K )′ .

• Sampling β 2j and λ∗2j , j = 1, . . . , J: Similarly, β 2j and λ∗2j are sampled in one block. The
′
full conditional distribution of (β ′2j , λ∗2j )′ is given by N (b2 , B2 ) where

 −1  
X X ∗ −r
yij

B2j =  ω2ij x̃ij x̃ij + B̃−1
0
 , b2j = B2j  x̃ij + ω2ij log r + B̃−1
0 b̃0 ,

2
i:zij =1 i:zij =1

3. Sampling u∗i , i = 1, . . . , N : The full conditional distribution of u∗i is N (mi , Vi ) where

 −1
J
′ ′
X X
Vi =  ω1ij λ∗1j λ∗1j + ω2ij λ∗2j λ∗2j + Φ−1  ,
j=1 j:zij =1
 
XJ X
mi = Vi  (κ1ij − ω1ij x′i β 1j )λ∗1j + (κ2ij − ω2ij (x′i β 2j − log r))λ∗2j 
j=1 j:zij =1

4. Sampling ϕk , k = 1, . . . , K: The full conditional distribution of ϕk is given by IG(ak + N/2, bk +

PN ∗2
i=1 uik /2)

2.4 Post-processing

The MCMC draws of the factor components are processed in the following two steps. First, the
sampled working parameters are not identified in terms of scale (Section 2.2). The original parameters
are recovered through

1/2 −1/2
λhjk = λ∗hjk ϕk , uik = u∗ik ϕk , j = 1, . . . , J, k = 1, . . . , K. (4)

Then, these parameters are still subject to the rotational and sign-switching invariance. We apply the
post-processing algorithm of Papastamoulis and Ntzoufras (2022) to the MCMC draws of λhjk . The
algorithm first applies the varimax rotation to each MCMC draw to solve the rotational invariance,
then to solve the sign-switching invariance, it applies the signed permutations to the MCMC output
until the transformed loadings are sufficiently close to some reference value. Their algorithm is provided
in the R package factor.switching. See Papastamoulis and Ntzoufras (2022) for details.

3 Simulation study

Here, the performance of the proposed model is investigated using the simulated data. We set
N = 1000, J = 10, K = 1 and P = 2. The regression coefficients are given by β true
1j = (0.5, 0.5),

8
β true
2j = (−0.5, −1) for j = 1, . . . , J. The covariate vector is xij = (1, xi )′ for i = 1, . . . , N , j =
1, . . . , J, and xi ∼ N (0, 1). For the factor loadings, λtrue
1 = (0.89, 0, 0.25, 0, 0.8, 0, 0.5, 0, 0, 0)′ and
λtrue
2 = (0, 0, 0.85, 0.8, 0, 0.75, 0.75, 0, 0.8, 0.8)′ . We consider the following two settings for the grouping
mechanisms. In Setting 1, it is set {0}, {1}, {2}, [3, 5], [6, 10], [11, 50], [51, ∞), which is the same
as the NLSY79 data in Section 4. Setting 2 considers the finer grouping mechanism such that the
grouped data contain more information: {0}, {1}, {2}, . . . , {10}, [11, 15], [16, 20], [21, 25], [26, 30],
[31, 40], [41, 50], [51, ∞). The data are replicated R = 100 times. The overall proportion of structural
zeros is approximately 0.6.
The proposed factor ZIP model for grouped data (GFZIP) is compared with the following three
models. Firstly, the ZIP model for grouped data (GZIP) is considered. Since this model does not
include factors that provide links among structural zeros and grouped counts, it is essentially a uni-
variate model and thus is estimated separately for each j. Secondly, ZINB (McGinley et al., 2015)
for grouped data is also considered. Finally, to assess the effect of the loss of information due to the
grouping mechanism, the factor ZIP (FZIP) model for the ungrouped count data is considered and
fitted to the underlying count data without the grouping mechanism.
For all models, we assume β hj ∼ N (0, 100I) for h = 1, 2 and j = 1, . . . , J. For each model, the
MCMC algorithm is run for 22,000 iterations, with the initial 2,000 draws discarded as the burn-in
period. The parameter estimation is based on the remaining 20,000 MCMC draws.

(r)
The performance of the models is assessed based on the bias Bias(βhjp ) = R1 R true
P
r=1 β̂hjp − β hjp
r 2
1 P R (r) true
and root mean squared errors (RMSE) RMSE(βhjp ) = R r=1 β̂hjp − βhjp for h = 1, 2, j =
(r)
1, . . . , J and p = 1, . . . , P , where β̂hjp is the posterior mean from rth replication of the data. For
the factor loadings, we compute the bias and RMSE for vec(ΛΛ′ ), as the post-processed signs of the
loadings vary over the replications.
We also evaluate the true positive (TPR), true negative (TNR), false positive (FPR) and false
negative (FNR) rates for being at-risk conditionally on the zero response. The posterior probability of
ith individual being at-risk in jth dimension given the response yij = 0 is denoted by Pr(zij = 1|yij =
1 PM m
0). It is estimated by π̂ij = M m=1 zij for i such that yij = 0 based on the M draws of the MCMC

algorithm. The individual i is deemed to be at-risk in jth dimension if π̂ij > 0.5. Then, the TPR,
TNR, FPR, and FNR are calculated as

PN true = 1)
PN true = 0)
i=1 I(π̂ij > 0.5, yij = 0, zij i=1 I(π̂ij ≤ 0.5, yij = 0, zij
TPRj = PN , TNRj = PN
true = 1) true = 0)
i=1 I(yij = 0, zij i=1 I(yij = 0, zij
PN true = 0)
PN true = 1)
i=1 I(π̂ij > 0.5, yij = 0, zij i=1 I(π̂ij ≤ 0.5, yij = 0, zij
FPRj = PN , FNRj = PN ,
I(y = 0, z true = 0) I(y = 0, z true = 1)
i=1 ij ij i=1 ij ij

9
true denotes the true value of the latent at-risk indicator z .
for j = 1, . . . , J, where zij ij

As in the real data application on the youths’ involvement in illegal activities in Section 4, the
proportion of at-risk individuals among those whose responses are zero would be a quantity of interest.
The proportion of interest is defined by
PN
i=1 I(π̂ij > 0.5, yij = 0)
R̂j ≡ PN , j = 1, . . . , J. (5)
i=1 I(yij = 0)

Tables 1 presents the biases and RMSEs for the coefficients β 1j and β 2j from 100 replications of the
data averaged over J dimensions. Since FZIP, which knows the true underlying counts before grouping,
is not affected by the grouping mechanism, it produces identical results under both simulation settings.
Therefore, the cells for FZIP in Setting 2 are left blank.
When comparing the proposed GFZIP, GZIP, and GZINB, ignoring the factor structure among
the at-risk probabilities and Poisson parts leads to larger bias and RMSE. As expected, the GFZIP
performed the best among the three models. GZINB resulted in large RSME for the at-risk coefficients
β 1jp , especially in the case of Setting 1. This is due to the numerical instability from the coarse grouped
data. Compared to FZIP, GFZIP resulted in increased bias and RMSE for the Poisson coefficients β 1j
due to the loss of information through the grouping mechanism. It is also seen that the performance
of GFZIP regarding the Poisson coefficients improves as the number of groups increases from Setting 1
to Setting 2, where the grouped data contain more information. This is also the case for GZIP and
GZINB, and this phenomenon was also observed in McGinley et al. (2015).
Figure 1 presents the boxplots of the bias and RMSE for vec(ΛΛ′ ) under GFZIP and FZIP. The
bias under GFZIP is larger than that under FZIP in both settings due to grouping. It is also seen
that the bias under GFZIP decreases as the finer grouping mechanism is used in Setting 2. A similar
pattern is observed for the RMSE.
Figures 2 and 3 present TPR, TNR, FPR, and FNR averaged over 100 replications under GFZIP,
GZIP, GZINB and FZIP in Settings 1 and 2, respectively. Firstly, we observe that the results under
GFZIP and FZIP become almost identical in Setting 2, while there are some discrepancies in Setting 1.
In both settings, GZIP resulted in TPR and FPR for some dimensions being close to zero. On the
contrary, TNR and FNR in those dimensions are close to one.
Figure 4 presents the estimated proportions of at-risk individuals given yij = 0, R̂j , averaged over
100 replications. The GFZIP and FZIP models seem to work well, with the estimates being close
to the truth. Their results become almost identical in Setting 2, similar to Figures 2 and 3. The
figure also shows that GZINB overestimates Rj in both Settings. The results for GZIP are similar
to those in the previous figures. In some dimensions, the estimates of Rj under GZIP are close to

10
zero, implying the false negative rates are close to one. In the other dimensions, the estimates under
GZIP are similar to those of GFZIP. The behaviour in GZIP results from ignoring the association
between the dimensions. In the real data analysis in Section 4, where it would be natural to consider
the association between the youths’ illegal activities, we observe a similar result under GZIP.

Table 1: Bias and RMSE for the at-risk coefficients β1jp from 100 replications averaged over J = 10
dimensions. The results for FZIP, which are not affected by the grouping mechanism, are the same
for both settings.
Bias RMSE

Parameter Setting GFZIP GZIP GZINB FZIP GFZIP GZIP GZINB FZIP
β1.1 1 -0.093 -0.618 1.549 0.110 0.290 0.667 2.320 0.303
β1.2 -0.043 -0.329 0.389 0.067 0.205 0.382 0.797 0.218
β1.1 2 0.115 -0.428 1.044 —— 0.302 0.565 1.680 ——
β1.2 0.071 -0.210 0.330 —— 0.217 0.315 0.633 ——
β2j1 1 0.087 0.533 0.084 -0.023 0.153 0.555 0.194 0.111
β2j2 -0.036 -0.178 -0.029 -0.006 0.076 0.163 0.098 0.062
β2j1 2 -0.025 0.394 0.048 —— 0.111 0.449 0.188 ——
β2j2 -0.009 -0.076 -0.007 —— 0.063 0.125 0.083 ——
0.4
0.10
0.05

0.3
−0.05 0.00

RMSE
bias

0.2
0.1
−0.15

0.0

GFZIP GFZIP GFZIP GFZIP

Setting 1 Setting 2 FZIP Setting 1 Setting 2 FZIP

Figure 1: Boxplots of bias and RMSE for vec(ΛΛ′ )

11
TPR TNR
1.0

1.0
GFZIP GFZIP
GZIP GZIP
GZINB GZINB
FZIP FZIP
0.8

0.8
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0
2 4 6 8 10 2 4 6 8 10

j j

FPR FNR
1.0

1.0
GFZIP GFZIP
GZIP GZIP
GZINB GZINB
FZIP FZIP
0.8

0.8
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

2 4 6 8 10 2 4 6 8 10

j j

Figure 2: True positive (TPR), true negative (TNR), false positive (FPR) and false negative rates
(FNR) for GFZIP, GZIP, GZINB, and FZIP in Setting 1

12
TPR TNR
1.0

1.0
GFZIP GFZIP
GZIP GZIP
GZINB GZINB
FZIP FZIP
0.8

0.8
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0
2 4 6 8 10 2 4 6 8 10

j j

FPR FNR
1.0

1.0
GFZIP GFZIP
GZIP GZIP
GZINB GZINB
FZIP FZIP
0.8

0.8
0.6

0.6
0.4

0.4
0.2

0.2
0.0

0.0

2 4 6 8 10 2 4 6 8 10

j j

Figure 3: True positive (TPR), true negative (TNR), false positive (FPR) and false negative rates
(FNR) for GFZIP, GZIP, GZINB, and FZIP in Setting 2

13
1.0 Setting 1 Setting 2

1.0
True GFZIP GZIP GZINB FZIP True GFZIP GZIP GZINB FZIP
0.8

0.8
0.6

0.6
Rj

Rj
0.4

0.4
0.2

0.2
0.0

0.0
2 4 6 8 10 2 4 6 8 10

j j

Figure 4: At-risk proportions among those with yij = 0, R̂j

4 Analysis of illegal activities of youth

4.1 Data and setting

We consider the number of times youths were involved in the nineteen illegal activities (J = 19)
obtained from the 1980 round of the National Longitudinal Study of Youth 1979 (NLSY79) data. In
NLSY79, the questionnaire was designed so that the respondents answered at an exact frequency or
interval of frequencies of each illegal activity in the year prior to the interview. Then, the answers are
published as the grouped count data. Although this is old data, it provides valuable information on
the problematic behaviour of youths, of which statistical analyses are still relevant today.
The choices are ‘never’ (g = 0: {0}), ‘once’ (g = 1: {1}), ‘twice’ (g = 2: {2}), ‘between three
and five times’ (g = 3: [3, 5]), ‘between six and ten times’ (g = 4: [6, 10]), ‘between eleven and fifty
times’ (g = 5: [11, 50]) and ‘over fifty times’ (g = 6: [51, ∞]). Table 2 describes the nineteen activities
considered in this analysis and associated labels used in the following figures and tables.
Figure 5 presents the histograms of the times 2865 youths were involved in the 19 illegal activities
in the previous year. The numbers in the panels indicate the fractions of zeros. A substantially large
proportion of youths were not involved in each activity, exhibiting many zeros. For example, the
observed proportions of zeros for the activities with highly criminal nature, such as sell marijuana,
sell hard drugs and break in, are above 0.9, and are particularly high. The proportion of zeros for
alcohol is 0.39. It is much lower than those for other activities as it is more common for youths,
though this value may be relatively high in the context of zero-inflated count data.

14
The histograms only reveal the distribution of involvement in each activity separately and the
extent of the zero inflation. However, we are also interested in the association among the activities
because it would be natural to assume that involvement in one activity and its frequency may be
associated with those in another activity, such as the use of alcohol and marijuana. Figure 6 presents
the heatmaps of log frequencies for the arbitrarily selected pairs of activities. The frequencies are added
with one before taking the log. Some observations from the figure are as follows. The frequencies for
non-involvement in neither are the highest for all pairs of activities. The top left and middle panels
indicate that a certain fraction of youths had experience using marijuana or hard drugs while they
did not sell them. The top left panel also shows that the youths who sold marijuana frequently used
marijuana frequently, as indicated by the darker shades in the top right corner of the panel. A similar
pattern is seen in the pair of hard drugs and marijuana in the top right panel, where most youths
tended to use marijuana only, but the frequent users used both of them. The bottom left panel shows
that frequent drinking of alcohol is associated with frequent use of marijuana, indicating they may be
used together. Therefore, it would be more appropriate to analyse all activities jointly rather than
treat each separately. The proposed GFZIP model can take into account these data characteristics.
Since the information specific to each involvement in an activity is not available, only the individual
characteristics are used as covariates: xij = xi for i = 1, . . . , N . The covariate information includes the
constant, age, gender, race, grade, residence, poverty and mental status. Table 3 presents the summary
of the covariates. For the prior distributions for the coefficient vectors, we use β j ∼ N (0, 100I) for
j = 1, . . . , J.
As in the simulation study, we compare the proposed GFZIP model with GZIP and GZINB models.
For GFZIP, we consider the three cases for the number of factors: K = 1, 2, 3. For each model, the
MCMC algorithm is run for 60,000 iterations. The first 20,000 draws are discarded as burn-in period
and the remaining 40,000 draws are retaind for the posterior inference. The models are compared
based on a version of the posterior predictive loss (PPL) of Gelfand and Ghosh (1998), which is
similar to the one considered by Sugasawa et al. (2020):

J G J G
1 XX M 1 XX M 2
PPL(M) = Vjg + (cjg − Ejg )
N N +1
j=1 g=0 j=1 g=0

M and
where cjg is the number of individuals belonging to the gth group for the jth activity, and Ejg
M , respectively, are the mean and variance of the posterior predictive distribution for c
Vjg jg under

model M.

15
Table 2: Illegal activities in NLSY79
Label Description
alcohol Drank beer, wine, or liquor without parents’ permission
run away Run away from home
damage Purposely damaged or destroyed property
fight Got into a physical fight
shoplift Taken something from a store without paying
steal lt $50 Stolen other’s belongings worth less than $50
steal ge $50 Stolen other’s belongings worth equal to or more than $50
extort Used force to get money or things from a person
threaten Hit or seriously threatened to hit someone
attack Attacked someone with the idea of seriously hurting or killing
use marijuana Smoked marijuana or hashish
use hard drugs Used any drugs or chemicals except marijuana
sell marijuana Sold marijuana or hashish
sell hard drugs Sold hard drugs
con Tried to get something by lying to a person
vehicle Taken a vehicle without the owner’s permission
break in Broken into a building or vehicle
sell stolen Sold or held stolen goods
gambling Helped in a gambling operation

16
alcohol run_away damage fight
2500

2500

2500
0.39 0.917 0.754 0.628
2000

2000

2000
1500

1500

1500
1000

1000

1000
500

500

500
0

0
0 1 2 3−5 6−10 11−50 51− 0 1 2 3−5 6−10 11−50 51− 0 1 2 3−5 6−10 11−50 51− 0 1 2 3−5 6−10 11−50 51−

shoplift steal_lt_50 steal_ge_50 extort

2500

2500
0.702 0.804 0.942 0.95
2000

2000

2000
1500

1500

1500
1000

1000

1000
500

500

500
0

0
0 1 2 3−5 6−10 11−50 51− 0 1 2 3−5 6−10 11−50 51− 0 1 2 3−5 6−10 11−50 51− 0 1 2 3−5 6−10 11−50 51−

threaten attack use_marijuana use_hard_drugs

2500

2500
0.571 0.891 0.615 0.857
2000

2000

2000
1500

1500

1500
1000

1000

1000
500

500

500
0

0 1 2 3−5 6−10 11−50 51− 0 1 2 3−5 6−10 11−50 51− 0 1 2 3−5 6−10 11−50 51− 0 0 1 2 3−5 6−10 11−50 51−

sell_marijuana sell_hard_drugs con veihcle

2500

0.909 0.981 0.716 0.894

2000

2000
1500

1500

1500
1000

1000

1000
500

500

500
0

0 1 2 3−5 6−10 11−50 51− 0 1 2 3−5 6−10 11−50 51− 0 1 2 3−5 6−10 11−50 51− 0 1 2 3−5 6−10 11−50 51−

break_in sell_stolen gambling

2500

0.928 0.871 0.978

2000

2000
1500

1500

1500
1000

1000

1000
500

500

500
0

0 1 2 3−5 6−10 11−50 51− 0 1 2 3−5 6−10 11−50 51− 0 1 2 3−5 6−10 11−50 51−

Figure 5: Histograms of NLSY79 data on the illegal activities of 2865 youths. The numbers indicate
the fractions of zeros.

17
51− 51− 51−

11−50 11−50 11−50

6−10 6−10 6−10

use_hard_drugs
sell_hard_drugs
sell_marijuana

6 6 6

3−5 3−5 3−5

4 4 4

2 2 2
2 2 2
0 0 0

1 1 1

0 0 0
0

3−5

6−10

11−50

51−

3−5

6−10

11−50

51−

3−5

6−10

11−50

51−
use_marijuana use_hard_drugs use_marijuana

51− 51− 51−

11−50 11−50 11−50

6−10 6−10 6−10

use_marijuana

6 6
6

attack
5
fight

3−5 3−5 4 3−5

4 4
3
2 2
2
2 1 2 2
0 0

1 1 1

0 0 0
0

3−5

6−10

11−50

51−

3−5

6−10

11−50

51−

3−5

6−10

11−50

51−
alcohol alcohol threaten

Figure 6: Heatmaps of log frequencies for the arbitrarily selected pairs of activities

Table 3: Covariates
Label Description Mean s.d.
age Age of respondent 16.11 0.777
male Dummary variable for male 0.505 0.250
black Dummy variable for the respondent’s race (black) 0.251 0.434
hisp Dummy variable for respondent’s race (Hispanic) 0.175 0.380
grade Highest grade achieved 9.555 1.029
self Log score of self-esteem 3.059 0.188
urban Dummy variable for the respondent living in an urban area 0.761 0.428
pov Dummy variable for the respondent in poverty 0.218 0.413

4.2 Results

First, we compare the posterior PPL presented in Table 4. It is shown that GFZIP with one factor
resulted in the smallest PPL followed by GZIP. The proposed GFZIP model, which accounts for the
association among the decisions on involvement with the activities and frequencies of involvements, is

18
more appropriate than GZIP, which treats each activity separately. The PPL increases as the number
of factors increases. This is a natural result, as the information in our dataset is severely limited due
to the coarse grouping mechanism. The GZINB resulted in the largest PPL. This would be because
the GZINB suffers from computational instability when the groups are coarsely defined, as observed
in the simulation study.
Figure 7 presents the trace plots of the Gibbs sampler for the selected parameters under GFZIP.
For the factor loadings, the reordered series are shown. Although the model includes many latent
variables in with the multiple hierarchy, it is seen the Gibbs sampler seems to be working reasonably
well.
Table 5 presents the posterior means and 95% credible intervals for the factor loadings under
GFZIP. Except for fight and gambling for at-risk, the 95% credible intervals for all activities do
not include zero. Among the credible at-risk loadings λ1 , the three factor loadings with the largest
magnitudes in the posterior means are those for sell hard drugs (−1.075), use hard drugs (−0.900)
and sell marijuana (−0.594), which are all drug-related loadings. For all Poisson factor loadings,
λ2 , the 95% credible intervals do not include zero. The loadings with the largest magnitudes in the
posterior means are also the drug-related loadings such as sell marijuana (−3.374), use marijuana
(−2.991) and use hard drugs (−2.572). Therefore, the single common latent factor included in the
model is interpreted as the drug-related factor.
Figure 8 presents the heatmaps of the posterior means of λh λ′h , h = 1, 2 under GFZIP as indicators
of the association within the at-risk and Poisson parts. The activities are ordered in each panel based
on the hierarchical clustering for better visibility and interpretability. The darker the shades of the
block for λhj λhj ′ , the greater the association between the activities j and j ′ in part h. In the top
left corner of the left panel, there is a patch of noticeable dark shade. This part corresponds to the
association among sell hard drugs and use hard drugs, the two activities with the largest factor
loadings in the at-risk part. The figures show that the involvement in these activities is also associated
with the involvement in almost all the other activities except for fight, as indicated by the left and top
edges of the heatmap. The activities such as sell marijuana, alcohol and steal ge 50 are relatively
highly associated with sell hard drugs and use hard drugs. These four activities also exhibit mild
degree of association among themselves. In the top left corner of the right panel, there is also a dark
patch indicating the association among use harddrugs, use marijuana and sell marijuana. Again,
these activities exhibit association with all the other activities, as indicated by the darker bands along
the left and top edges.
Figure 9 presents the heat map of the posterior means of λ1 λ′2 representing the association between

19
the at-risk and Poisson parts. The activities are ordered based on the hierarchical clustering. Similarly
to Figure 8, a dark patch for the drug-related activities between the at-risk and Poisson parts is
recognisable. The figures show that being at-risk for use hard drugs and sell hard drugs is highly
associated with the Poisson counts for themselves and use marijuana. It is also seen that being at-risk
for using and selling hard drugs is also associated with the Poisson counts for all the other activities
and that the Poisson counts for these three activities are also associated with being at-risk for most
activities.
Figure 10 presents the posterior means of β h for h = 1, 2 under GFZIP. The circles in the fig-
ure indicate the parameters for which the 95% credible intervals do not include zero. Overall, the
signs of the coefficients are the same for most activities. For example, the left panel shows age has
positive effects on the at-risk probabilities for sell marijuana, use hard drugs and use marijuana,
but has negative effect on fight. male has positive effects on most activities other than run away,
use marijuana, and use hard drugs. On the other hand, self has negative effects on the at-risk
probabilities for most activities other than alcohol, run away and sell stolen. This is expected
because the higher the self-esteem score, the less likely youths are to engage in illegal activities.
In the right panel, urban and male positively affect the Poisson counts of most activities. An
urban environment would offer more opportunities for various types of illegal activities. Combined
with the results on the at-risk coefficient for male, male youths are more likely to be involved in illegal
activities, and their involvements are more frequent. self has a positive impact on the frequencies
of the activities such as attack, extort, steal ge 50 and con. Most of these activities typically
involve aggressive behaviour towards other individuals or audacity. Therefore, higher self-esteem
would increase the frequency of those activities. On the other hand, self has negative impacts on the
frequencies of sell stolen, break in, sell hard drugs and use hard drugs, run away. It would be
intuitive that the frequency of these activities, especially drug-related activities and running away, is
associated with lower self-esteem.
Finally, we estimate the proportions of at-risk youths among those who answered ‘never’ for each
activity based on (5). These are the estimated fractions of youths involved in the activities, but their
responses on the frequency of involvement happened to be zero one year before the interview. Figure 11
presents R̂j for 19 activities under GFZIP and GZIP. Under the proposed GFZIP, R̂j for alcohol,
fight, threaten and use marijuana are above 0.1. Among those activities, use marijuana resulted
in the largest R̂j of 0.492. The result implies that nearly half of the youths who responded ‘never’
actually are regular users but did not use them during the one year before the interview. R̂j = 0.232
for alcohol is the second largest, followed by 0.127 for fight and 0.114 for threaten. These activities

20
might be more common among youths, as indicated by non-zero proportions in Figure 5, compared to
the other activities with a higher criminal nature, such as selling drugs and stealing vehicles. On the
contrary, R̂j ’s for the rest of the activities are zero or almost zero. The figure also shows that under
GZIP R̂j = 0 for all activities. The results for the activities such as use marijuana and alcohol are
suspected to be false negative, as observed in the simulation study. The result under the proposed
model is more reasonable and indicates the efficacy of leveraging shared information among activities
through the latent factors.

Table 4: Posterior predictive loss for NLSY79 data

GFZIP GZIP GZINB

K=1 K=2 K=3

928.3 1900.3 3188.0 943.1 7360.4

λ1 : use_marijuana λ1 : sell_hard_drugs
0.2

−0.5
0.0

−1.0
−0.2
−0.4

−1.5
−0.6

0 10000 20000 30000 40000 0 10000 20000 30000 40000

iteration iteration

λ2 : extort λ2 : con
−0.7

−1.0
−0.9

−1.1
−1.2
−1.1

−1.3
−1.3

0 10000 20000 30000 40000 0 10000 20000 30000 40000

iteration iteration

β1 : sell_stolen , urban β1 : con , black

1.0
1.0

0.8
0.5

0.6
0.4
0.0

0.2
−0.5

0.0

0 10000 20000 30000 40000 0 10000 20000 30000 40000

iteration iteration

β2 : run_away , black β2 : break_in , male

0.5

1.0
0.0

0.6
−0.5

0.2
−0.2

0 10000 20000 30000 40000 0 10000 20000 30000 40000

iteration iteration

Figure 7: Trace plots of the Gibbs sampler for the selected parameters under GFZIP

21
Table 5: Posterior means and 95% credible intervals (CI) for the factor loadings under GFZIP

At-risk (λ1 ) Poisson (λ2 )

Mean 95% CI Mean 95% CI

alcohol -0.511 (-0.702, -0.309) -1.670 (-1.744, -1.595)
run away -0.372 (-0.586, -0.149) -0.662 (-0.857, -0.493)
damage -0.379 (-0.554, -0.199) -1.166 (-1.262, -1.076)
fight -0.056 (-0.205, 0.100) -0.913 (-0.975, -0.853)
shoplift -0.275 (-0.448, -0.098) -1.423 (-1.510, -1.339)
steal lt $50 -0.243 (-0.429, -0.054) -1.340 (-1.443, -1.242)
steal ge $50 -0.564 (-0.821, -0.307) -1.328 (-1.495, -1.177)
extort -0.598 (-0.836, -0.362) -0.983 (-1.131, -0.848)
threaten -0.354 (-0.498, -0.207) -1.198 (-1.262, -1.132)
attack -0.448 (-0.655, -0.238) -1.498 (-1.638, -1.364)
use marijuana -0.271 (-0.496, -0.038) -2.991 (-3.146, -2.838)
use hard drugs -0.900 (-1.174, -0.624) -2.572 (-2.792, -2.375)
sell marijuana -0.594 (-0.927, -0.263) -3.374 (-3.648, -3.028)
sell hard drugs -1.075 (-1.483, -0.660) -2.004 (-2.391, -1.699)
con -0.273 (-0.411, -0.136) -1.141 (-1.217, -1.060)
vehicle -0.465 (-0.644, -0.287) -0.986 (-1.088, -0.885)
break in -0.296 (-0.587, 0.006) -1.949 (-2.129, -1.784)
sell stolen -0.329 (-0.564, -0.093) -1.784 (-1.928, -1.641)
gambling -0.179 (-0.602, 0.269) -1.777 (-2.175, -1.470)

22
at−risk (h=1) Poisson (h=2)
use_hard_drugs use_hard_drugs

sell_hard_drugs use_marijuana

fight sell_marijuana

sell_stolen run_away

threaten fight

run_away extort

damage vehicle

gambling threaten

steal_lt_50 con 9
0.9
break_in damage
0.6 6
use_marijuana steal_ge_50
0.3 3
con steal_lt_50

shoplift shoplift

alcohol attack

attack break_in

vehicle sell_hard_drugs

steal_ge_50 alcohol

sell_marijuana gambling

extort sell_stolen
use_hard_drugs

sell_hard_drugs

fight

sell_stolen

threaten

run_away

damage

gambling

steal_lt_50

break_in

use_marijuana

con

shoplift

alcohol

attack

vehicle

steal_ge_50

sell_marijuana

extort

use_hard_drugs

use_marijuana

sell_marijuana

run_away

fight

extort

vehicle

threaten

con

damage

steal_ge_50

steal_lt_50

shoplift

attack

break_in

sell_hard_drugs

alcohol

gambling

sell_stolen
Figure 8: Posterior means of λh λ′h under GFZIP. The activities are ordered based on the hierarchical
clustering.

use_hard_drugs

sell_hard_drugs

fight

sell_stolen

threaten

run_away

damage

gambling
at−risk (h=1)

steal_lt_50 3

break_in 2
use_marijuana
1
con

shoplift

alcohol

attack

vehicle

steal_ge_50

sell_marijuana

extort
use_hard_drugs

use_marijuana

sell_marijuana

run_away

fight

extort

vehicle

threaten

con

damage

steal_ge_50

steal_lt_50

shoplift

attack

break_in

sell_hard_drugs

alcohol

gambling

sell_stolen

Poisson (h=2)

Figure 9: Posterior means of λ1 λ′2 under GFZIP. The activities are ordered based on the hierarchical
clustering.

23
GFZIP at−risk (h=1) GFZIP Poisson (h=2)
gambling gambling

sell_stolen sell_stolen

break_in break_in

vehicle vehicle

con con

sell_hard_drugs sell_hard_drugs

sell_marijuana sell_marijuana

use_hard_drugs use_hard_drugs

use_marijuana 4 use_marijuana
0
attack 2 attack
−3
0
threaten threaten
−6
−2
extort extort
−9
steal_ge_50 steal_ge_50

steal_lt_50 steal_lt_50

shoplift shoplift

fight fight

damage damage

run_away run_away

alcohol alcohol
const.

age

male

black

hisp

self

urban

const.

age

male

black

hisp

self

urban
Figure 10: Posterior means of β h under GFZIP. The circles indicate the parameters for which the 95%
credible intervals do not include zero.
0.5

GFZIP
GZIP
0.4
0.3
Rj

0.2
0.1
0.0

alcohol
run_away
damage
fight
shoplift
steal_lt_50
steal_ge_50
extort
threaten
attack
use_marijuana
use_hard_drugs
sell_marijuana
sell_hard_drugs
con
vehicle
break_in
sell_stolen
gambling

Figure 11: Proportions of at-risk youths among those who answered ‘never’

24
5 Conclusion

We have proposed the Poisson factor zero-inflated model for multiple grouped count data, which
includes latent factors to account for association among the multiple count responses. Based on
the data augmentation, Pólya-Gamma augmentation and parameter expansion, we have developed
an efficient MCMC algorithm. The identification of the factor components is achieved through the
post-processing algorithm. We have demonstrated the efficacy of the proposed model through the
numerical examples. Notably, in the analysis of illegal activities of youths, we have found a single
common factor, which can be interpreted as the drug-related factor, producing a strong association
among the drug-related activities both in at-risk and Poisson parts. The proposed model also revealed
the individuals at risk among those who reported zero in each activity, while treating each activity
separately completely failed to do so.

Acknowledgement

This work was supported by JSPS KAKENHI (#21K01421, #21H00699, #20H00080, #22K13376,
#24K00244).

References

Berry, L. R. and M. West (2020). Bayesian forecasting of many count-valued time series. Journal of
Business & Economic Statistics 38 (4), 872–887.

Fu, Q., X. Guo, and K. C. Land (2018). A poisson-multinomial mixture approach to grouped and
right-censored counts. Communications in Statistics - Theory and Methods 47 (2), 427–447.

Gelfand, A. E. and S. K. Ghosh (1998). Model choice: A minimum posterior predictive loss approach.
Biometrika 85 (1), 1–11.

Ghosh, J. and D. B. Dunson (2009). Default prior distributions and efficient posterior computation
in bayesian factor analysis. Journal of Computational and Graphical Statistics 18 (2), 306–320.

Hamura, Y., K. Irie, and S. Sugasawa (2021). Robust hierarchical modeling of counts under zero-
inflation and outliers. arXiv:2106.10503v1 .

Karlis, D. (2003). An em algorithm for multivariate poisson distribution and related models. Journal
of Applied Statistics 30 (1), 63–77.

25
Kobayashi, G., S. Sugasawa, and Y. Kawakubo (2023). Spatio-temporal smoothing, interpolation and
prediction of income distributions based on grouped data.

Kobayashi, G., Y. Yamauchi, K. Kakamu, Y. Kawakubo, and S. Sugasawa (2022). Bayesian approach
to lorenz curve using time series grouped data. Journal of Business & Economic Statistics 40 (2),
897–912.

Lambert, D. (1992). Zero-inflated poisson regression, with an application to defects in manufacturing.

Technometrics 34 (1), 1–14.

Larsson, R. (2020). Discrete factor analysis using a dependent poisson model. Computational Statis-
tics 35 (3), 1133–1152.

Liu, Y. and G.-L. Tian (2015). Type i multivariate zero-inflated poisson distribution with applications.
Computational Statistics & Data Analysis 83, 200–222.

McGinley, J. S., P. J. Curran, and D. Hedeker (2015). A novel modeling framework for ordinal data
defined by collapsed counts. Statistics in Medicine 34 (15), 2312–2324.

Neelon, B. (2019). Bayesian Zero-Inflated Negative Binomial Regression Based on Pólya-Gamma

Mixtures. Bayesian Analysis 14 (3), 829 – 855.

Neelon, B. and D. Chung (2017). The LZIP: A bayesian latent factor model for correlated zero-inflated
counts. Biometrics 73 (1), 185–196.

Neelon, B., A. J. O’Malley, and V. A. Smith (2016). Modeling zero-modified count and semicontinuous
data in health services research part 1: background and overview. Statistics in Medicine 35 (27),
5070–5093.

Papastamoulis, P. and I. Ntzoufras (2022). On the identifiability of bayesian factor analytic models.
Statistics and Computing 32 (2), 23.

Polson, N. G., J. G. Scott, and J. Windle (2013). Bayesian inference for logistic models using
pólya–gamma latent variables. Journal of the American Statistical Association 108 (504), 1339–
1349.

Sugasawa, S., G. Kobayashi, and Y. Kawakubo (2020). Estimation and inference for area-wise spatial
income distributions from grouped data. Computational Statistics & Data Analysis 145, 106904.

Wedel, M., U. Böckenholt, and W. A. Kamakura (2003). Factor models for multivariate count data.
Journal of Multivariate Analysis 87 (2), 356–369.

26
Xu, T., R. T. Demmer, and G. Li (2021). Zero-inflated poisson factor model with application to
microbiome read counts. Biometrics 77 (1), 91–101.

Modeling Count Data (Joseph M. Hilbe)
No ratings yet
Modeling Count Data (Joseph M. Hilbe)
304 pages
Yang 2013
No ratings yet
Yang 2013
9 pages
3357901H_365-373
No ratings yet
3357901H_365-373
9 pages
14
No ratings yet
14
10 pages
Zero-Inflated Generalized Poisson Regression Model With An Application To Domestic Violence Data
No ratings yet
Zero-Inflated Generalized Poisson Regression Model With An Application To Domestic Violence Data
14 pages
Reference Papr
No ratings yet
Reference Papr
14 pages
countreg
No ratings yet
countreg
11 pages
Zhu (2012)
No ratings yet
Zhu (2012)
14 pages
Baltagi Poisson
No ratings yet
Baltagi Poisson
37 pages
Heilbron (1994)
No ratings yet
Heilbron (1994)
17 pages
Comparison of Count Modeling Techniques For Estimating Environmental Monitoring Limits in Clean Rooms
No ratings yet
Comparison of Count Modeling Techniques For Estimating Environmental Monitoring Limits in Clean Rooms
25 pages
A Bayesian Test For Excess Zeros in A Zero-Inflated Power Series Distribution
No ratings yet
A Bayesian Test For Excess Zeros in A Zero-Inflated Power Series Distribution
17 pages
El 31 4 01
No ratings yet
El 31 4 01
10 pages
Score Tests For Heterogeneity and Overdispersion in Zero-Inflated Poisson and Binomial Regression Models
No ratings yet
Score Tests For Heterogeneity and Overdispersion in Zero-Inflated Poisson and Binomial Regression Models
16 pages
Bayesian Factor Analysis For Mixed Ordinal and Continuous Responses
No ratings yet
Bayesian Factor Analysis For Mixed Ordinal and Continuous Responses
16 pages
Zero-Inflated Model
No ratings yet
Zero-Inflated Model
5 pages
Larsson 2020
No ratings yet
Larsson 2020
20 pages
EconomicsLetters NBP
No ratings yet
EconomicsLetters NBP
6 pages
Decision Tree Approaches For Zero-Inflated Count Data: Seong-Keon Lee & Seohoon Jin
100% (1)
Decision Tree Approaches For Zero-Inflated Count Data: Seong-Keon Lee & Seohoon Jin
15 pages
Modelling Meningococcal Disease Using Computer Intensive Methods
No ratings yet
Modelling Meningococcal Disease Using Computer Intensive Methods
10 pages
Essoham Ali
No ratings yet
Essoham Ali
27 pages
Almost Unbiased Ridge Estimator in ZINB Model
No ratings yet
Almost Unbiased Ridge Estimator in ZINB Model
9 pages
Modeling Count Data. ISBN 1107611253, 978-1107611252
100% (11)
Modeling Count Data. ISBN 1107611253, 978-1107611252
23 pages
Iste Biostat19v1n1 1
No ratings yet
Iste Biostat19v1n1 1
19 pages
This Content Downloaded From 128.164.162.2 On Fri, 10 Mar 2023 14:04:47 UTC
No ratings yet
This Content Downloaded From 128.164.162.2 On Fri, 10 Mar 2023 14:04:47 UTC
20 pages
Zero-inflated Poisson regression mixture model
No ratings yet
Zero-inflated Poisson regression mixture model
8 pages
On Some Discrete Distributions and Their Applications With Real L
No ratings yet
On Some Discrete Distributions and Their Applications With Real L
26 pages
04-Barekeng Sinta2-DwiAgustin S3-KS AK
No ratings yet
04-Barekeng Sinta2-DwiAgustin S3-KS AK
12 pages
V27i08 PDF
No ratings yet
V27i08 PDF
25 pages
Nonparametric Analysis of Factorial Designs With Random Missingness: Bivariate Data
No ratings yet
Nonparametric Analysis of Factorial Designs With Random Missingness: Bivariate Data
38 pages
Model Selection and Akaikes Information Criterion
No ratings yet
Model Selection and Akaikes Information Criterion
57 pages
Bayesian Tensor Factorisations For Time Series of Counts 3pk66m9hsc
No ratings yet
Bayesian Tensor Factorisations For Time Series of Counts 3pk66m9hsc
21 pages
Modeling Count Data
No ratings yet
Modeling Count Data
6 pages
EM Alert Limits PDA - Full
No ratings yet
EM Alert Limits PDA - Full
9 pages
Regression Models For Count Data in R: Achim Zeileis Christian Kleiber Simon Jackman
No ratings yet
Regression Models For Count Data in R: Achim Zeileis Christian Kleiber Simon Jackman
25 pages
Tutorial 106b - Poisson Regression and Log-Linear Models (Bayesian)
No ratings yet
Tutorial 106b - Poisson Regression and Log-Linear Models (Bayesian)
122 pages
PSSN-CP-2021 - Template (Conf Proceedings)
No ratings yet
PSSN-CP-2021 - Template (Conf Proceedings)
7 pages
Sorbom, Dag (1974)
No ratings yet
Sorbom, Dag (1974)
11 pages
Fixed vs. Random Effects Panel Data Models: Revisiting The Omitted Latent Variables and Individual Heterogeneity Arguments
No ratings yet
Fixed vs. Random Effects Panel Data Models: Revisiting The Omitted Latent Variables and Individual Heterogeneity Arguments
20 pages
AUZIPRE
No ratings yet
AUZIPRE
12 pages
The Gamma-Count Distribution in The Analysis of Experimental Underdispersed Data
No ratings yet
The Gamma-Count Distribution in The Analysis of Experimental Underdispersed Data
12 pages
Modeling
100% (1)
Modeling
300 pages
Working Paper 2014-03 - cp4
No ratings yet
Working Paper 2014-03 - cp4
24 pages
Bayesian Zero Inflated Negative Binomial Regression Model For The Parkinson Data
No ratings yet
Bayesian Zero Inflated Negative Binomial Regression Model For The Parkinson Data
8 pages
Bhati 2016
No ratings yet
Bhati 2016
32 pages
CSKHKPV Palampurujhjhj
No ratings yet
CSKHKPV Palampurujhjhj
18 pages
Hausman EconometricModelsCount 1984
No ratings yet
Hausman EconometricModelsCount 1984
31 pages
Book Review: Regression Analysis of Count Data
No ratings yet
Book Review: Regression Analysis of Count Data
2 pages
EJMCM Volume 7 Issue 10 Pages 1400-1409
No ratings yet
EJMCM Volume 7 Issue 10 Pages 1400-1409
10 pages
Chap1 Introduction 2may24
No ratings yet
Chap1 Introduction 2may24
21 pages
Transactions of Society of Actuaries 1 9 9 5 VOL. 47
No ratings yet
Transactions of Society of Actuaries 1 9 9 5 VOL. 47
26 pages
Estimating Econometric Models With Fixed Effects
No ratings yet
Estimating Econometric Models With Fixed Effects
14 pages
Factor Analysis
No ratings yet
Factor Analysis
12 pages
Likelihood and Conditional Likelihood Inference For Generalized Additive Mixed Models For Clustered Data
No ratings yet
Likelihood and Conditional Likelihood Inference For Generalized Additive Mixed Models For Clustered Data
17 pages
Zero Inflated Poisson and Geographically Weighted Zero-Inflated Poisson Regression, Application to Filariasis Data
No ratings yet
Zero Inflated Poisson and Geographically Weighted Zero-Inflated Poisson Regression, Application to Filariasis Data
9 pages
MIT14 382S17 Lec10
No ratings yet
MIT14 382S17 Lec10
10 pages
Bivariate Poissson Count
No ratings yet
Bivariate Poissson Count
18 pages
Example - Zero-Inflated, Generalized Linear Mixed Model For Count Data
No ratings yet
Example - Zero-Inflated, Generalized Linear Mixed Model For Count Data
35 pages
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Bayesian Factor Zero-Inflated Poisson Model For Multiple Grouped Count Data
No ratings yet
Bayesian Factor Zero-Inflated Poisson Model For Multiple Grouped Count Data
27 pages
10 1016@j Agee 2019 02 006
No ratings yet
10 1016@j Agee 2019 02 006
11 pages
JBDS V3N2
No ratings yet
JBDS V3N2
129 pages
Instant Access to (Ebook) Statistics for Health Data Science: An Organic Approach by Ruth Etzioni, Micha Mandel, Roman Gulati ISBN 9783030598884, 3030598888 ebook Full Chapters
100% (2)
Instant Access to (Ebook) Statistics for Health Data Science: An Organic Approach by Ruth Etzioni, Micha Mandel, Roman Gulati ISBN 9783030598884, 3030598888 ebook Full Chapters
65 pages
Complete Download Negative binomial regression 1st Edition Joseph Hilbe PDF All Chapters
100% (7)
Complete Download Negative binomial regression 1st Edition Joseph Hilbe PDF All Chapters
85 pages
Journal 44-4 20
No ratings yet
Journal 44-4 20
12 pages
Consumption Frequency For Precooked Bean Products Among Households in Machakos County Kenya
No ratings yet
Consumption Frequency For Precooked Bean Products Among Households in Machakos County Kenya
18 pages