3b: Generalized linear mixed models

Author

Douglas Bates and Claudia Solis-Lemus

Published

2022-07-30

Load the packages to be used

Code
using AlgebraOfGraphics
using CairoMakie
using DataFrameMacros
using DataFrames
using MixedModels
using MixedModelsMakie
using ProgressMeter

CairoMakie.activate!(; type = "svg");
ProgressMeter.ijulia_behavior(:clear);

Matrix notation for the sleepstudy model

sleepstudy = DataFrame(MixedModels.dataset(:sleepstudy))

180 rows × 3 columns

subjdaysreaction
StringInt8Float64
1S3080249.56
2S3081258.705
3S3082250.801
4S3083321.44
5S3084356.852
6S3085414.69
7S3086382.204
8S3087290.149
9S3088430.585
10S3089466.353
11S3090222.734
12S3091205.266
13S3092202.978
14S3093204.707
15S3094207.716
16S3095215.962
17S3096213.63
18S3097217.727
19S3098224.296
20S3099237.314
21S3100199.054
22S3101194.332
23S3102234.32
24S3103232.842
25S3104229.307
26S3105220.458
27S3106235.421
28S3107255.751
29S3108261.012
30S3109247.515
contrasts = Dict(:subj => Grouping())
m1 = let
  form = @formula(reaction ~ 1 + days + (1 + days | subj))
  fit(MixedModel, form, sleepstudy; contrasts)
end
println(m1)
Minimizing 58    Time: 0:00:00 ( 5.19 ms/it)
Linear mixed model fit by maximum likelihood
 reaction ~ 1 + days + (1 + days | subj)
   logLik   -2 logLik     AIC       AICc        BIC    
  -875.9697  1751.9393  1763.9393  1764.4249  1783.0971

Variance components:

            Column    Variance Std.Dev.   Corr.
subj     (Intercept)  565.51067 23.78047
         days          32.68212  5.71683 +0.08
Residual              654.94145 25.59182
 Number of obs: 180; levels of grouping factors: 18

  Fixed-effects parameters:
──────────────────────────────────────────────────
                Coef.  Std. Error      z  Pr(>|z|)
──────────────────────────────────────────────────
(Intercept)  
251.405      6.63226  37.91    <1e-99
days          10.4673     1.50224   6.97    <1e-11
──────────────────────────────────────────────────

The response vector, y, has 180 elements. The fixed-effects coefficient vector, β, has 2 elements and the fixed-effects model matrix, X, is of size 180 × 2.

m1.y
180-element view(::Matrix{Float64}, :, 3) with eltype Float64:
 249.56
 258.7047
 250.8006
 321.4398
 356.8519
 414.6901
 382.2038
 290.1486
 430.5853
 466.3535
 222.7339
 205.2658
 202.9778
   ⋮
 350.7807
 369.4692
 269.4117
 273.474
 297.5968
 310.6316
 287.1726
 329.6076
 334.4818
 343.2199
 369.1417
 364.1236
m1.β
2-element Vector{Float64}:
 251.40510484848375
  10.467285959595792
m1.X
180×2 Matrix{Float64}:
 1.0  0.0
 1.0  1.0
 1.0  2.0
 1.0  3.0
 1.0  4.0
 1.0  5.0
 1.0  6.0
 1.0  7.0
 1.0  8.0
 1.0  9.0
 1.0  0.0
 1.0  1.0
 1.0  2.0
 ⋮    
 1.0  8.0
 1.0  9.0
 1.0  0.0
 1.0  1.0
 1.0  2.0
 1.0  3.0
 1.0  4.0
 1.0  5.0
 1.0  6.0
 1.0  7.0
 1.0  8.0
 1.0  9.0

The second column of X is just the days vector and the first column is all 1’s.

There are 36 random effects, 2 for each of the 18 levels of subj. The “estimates” (technically, the conditional means or conditional modes) are returned as a vector of matrices, one matrix for each grouping factor. In this case there is only one grouping factor for the random effects so there is one one matrix which contains 18 intercept random effects and 18 slope random effects.

m1.b
1-element Vector{Matrix{Float64}}:
 [2.8158191325913045 -40.04844170604461 … 0.7232620377478233 12.118907799956531; 9.07551171690184 -8.644079452057246 … -0.9710526342424729 1.31069806890679]
only(m1.b)   # only one grouping factor
2×18 Matrix{Float64}:
 2.81582  -40.0484   -38.4331  22.8321   …  -24.7101   0.723262  12.1189
 9.07551   -8.64408   -5.5134  -4.65872       4.6597  -0.971053   1.3107

There is a model matrix, \(\mathbf{Z}\), for the random effects. In general it has one chunk of columns for the first grouping factor, a chunk of columns for the second grouping factor, etc.

In this case there is only one grouping factor.

Int.(only(m1.reterms))
180×36 Matrix{Int64}:
 1  0  0  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0  0  0  0  0  0  0  0
 1  1  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 1  2  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 1  3  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 1  4  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 1  5  0  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0  0  0  0  0  0  0  0
 1  6  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 1  7  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 1  8  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 1  9  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  1  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  1  1  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  1  2  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 ⋮              ⋮              ⋮        ⋱     ⋮              ⋮              ⋮
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  1  8  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  1  9  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0  0  0  0  0  0  1  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  1  1
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  1  2
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  1  3
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  1  4
 0  0  0  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0  0  0  0  0  0  1  5
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  1  6
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  1  7
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  1  8
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  1  9

The defining property of a linear model or linear mixed model is that the fitted values are linear combinations of the fixed-effects parameters and the random effects. We can write the fitted values as

m1.X * m1.β + only(m1.reterms) * vec(only(m1.b))
180-element Vector{Float64}:
 254.22092398107506
 273.7637216575727
 293.3065193340703
 312.849317010568
 332.3921146870656
 351.93491236356317
 371.47771004006086
 391.0205077165585
 410.5633053930561
 430.1061030695537
 211.35666314243912
 213.1798696499777
 215.0030761575162
   ⋮
 328.09823348905815
 337.5944668144114
 263.52401264844025
 275.3019966769429
 287.07998070544545
 298.85796473394805
 310.63594876245065
 322.4139327909532
 334.1919168194558
 345.9699008479584
 357.74788487646094
 369.5258689049635
fitted(m1)   # just to check that these are indeed the same as calculated above
180-element Vector{Float64}:
 254.22092398107506
 273.76372165757266
 293.3065193340703
 312.8493170105679
 332.3921146870656
 351.93491236356317
 371.47771004006086
 391.0205077165585
 410.5633053930561
 430.1061030695537
 211.35666314243912
 213.1798696499777
 215.0030761575162
   ⋮
 328.0982334890581
 337.5944668144114
 263.52401264844025
 275.3019966769429
 287.07998070544545
 298.857964733948
 310.63594876245065
 322.4139327909532
 334.19191681945574
 345.9699008479584
 357.74788487646094
 369.5258689049635

In symbols we would write the linear predictor expression as

\[ \boldsymbol{\eta} = \mathbf{X}\boldsymbol{\beta} +\mathbf{Z b} \]

where \(\boldsymbol{\eta}\) has 180 elements, \(\boldsymbol{\beta}\) has 2 elements, \(\bf b\) has 36 elements, \(\bf X\) is of size 180 × 2 and \(\bf Z\) is of size 180 × 36.

For a linear model or linear mixed model the linear predictor is the mean response, \(\boldsymbol\mu\). That is, we can write the probability model in terms of a 180-dimensional random variable, \(\mathcal Y\), for the response and a 36-dimensional random variable, \(\mathcal B\), for the random effects as

\[ \begin{aligned} (\mathcal{Y} | \mathcal{B}=\bf{b}) &\sim\mathcal{N}(\bf{ X\boldsymbol\beta + Z b},\sigma^2\bf{I})\\\\ \mathcal{B}&\sim\mathcal{N}(\bf{0},\boldsymbol{\Sigma}_{\boldsymbol\theta}) . \end{aligned} \]

where \(\boldsymbol{\Sigma}_\boldsymbol{\theta}\) is a 36 × 36 symmetric covariance matrix that has a special form - it consists of 18 diagonal blocks, each of size 2 × 2 and all the same.

This symmetric matrix can be constructed from the parameters \(\boldsymbol\theta\), which generate the lower triangular matrix \(\boldsymbol\lambda\), and the estimate \(\widehat{\sigma^2}\).

m1.θ
3-element Vector{Float64}:
 0.9292213186823386
 0.018168380890783722
 0.22264487369155986
λ = only(m1.λ)  # with multiple grouping factors there will be multiple λ's
2×2 LinearAlgebra.LowerTriangular{Float64, Matrix{Float64}}:
 0.929221    ⋅ 
 0.0181684  0.222645
Σ = varest(m1) ** λ')
2×2 Matrix{Float64}:
 565.511  11.057
  11.057  32.6821

Compare the diagonal elements to the Variance column of

VarCorr(m1)
Column Variance Std.Dev Corr.
subj (Intercept) 565.51067 23.78047
days 32.68212 5.71683 +0.08
Residual 654.94145 25.59182

Linear predictors in LMMs and GLMMs

Writing the model for \(\mathcal Y\) as

\[ (\mathcal{Y} | \mathcal{B}=\bf{b})\sim\mathcal{N}(\bf{ X\boldsymbol\beta + Z b},\sigma^2\bf{I}) \]

may seem like over-mathematization (or “overkill”, if you prefer) relative to expressions like

\[ y_i = \beta_1 x_{i,1} + \beta_2 x_{i,2}+ b_1 z_{i,1} +\dots+b_{36} z_{i,36}+\epsilon_i \]

but this more abstract form is necessary for generalizations.

The way that I read the first form is

The conditional distribution of the response vector, \(\mathcal Y\), given that the random effects vector, \(\mathcal B =\bf b\), is a multivariate normal (or Gaussian) distribution whose mean, \(\boldsymbol\mu\), is the linear predictor, \(\boldsymbol\eta=\bf{X\boldsymbol\beta+Zb}\), and whose covariance matrix is \(\sigma^2\bf I\). That is, conditional on \(\bf b\), the elements of \(\mathcal Y\) are independent normal random variables with constant variance, \(\sigma^2\), and means of the form \(\boldsymbol\mu = \boldsymbol\eta = \bf{X\boldsymbol\beta+Zb}\).

So the only things that differ in the distributions of the \(y_i\)’s are the means and they are determined by this linear predictor, \(\boldsymbol\eta = \bf{X\boldsymbol\beta+Zb}\).

Generalized Linear Mixed Models

Consider first a GLMM for a vector, \(\bf y\), of binary (i.e. yes/no) responses. The probability model for the conditional distribution \(\mathcal Y|\mathcal B=\bf b\) consists of independent Bernoulli distributions where the mean, \(\mu_i\), for the i’th response is again determined by the i’th element of a linear predictor, \(\boldsymbol\eta = \mathbf{X}\boldsymbol\beta+\mathbf{Z b}\).

However, in this case we will run into trouble if we try to make \(\boldsymbol\mu=\boldsymbol\eta\) because \(\mu_i\) is the probability of “success” for the i’th response and must be between 0 and 1. We can’t guarantee that the i’th component of \(\boldsymbol\eta\) will be between 0 and 1. To get around this problem we apply a transformation to take \(\eta_i\) to \(\mu_i\). For historical reasons this transformation is called the inverse link, written \(g^{-1}\), and the opposite transformation - from the probability scale to an unbounded scale - is called the link, g.

Each probability distribution in the exponential family (which is most of the important ones), has a canonical link which comes from the form of the distribution itself. The details aren’t as important as recognizing that the distribution itself determines a preferred link function.

For the Bernoulli distribution, the canonical link is the logit or log-odds function,

\[ \eta = g(\mu) = \log\left(\frac{\mu}{1-\mu}\right), \]

(it’s called log-odds because it is the logarithm of the odds ratio, \(p/(1-p)\)) and the canonical inverse link is the logistic

\[ \mu=g^{-1}(\eta)=\frac{1}{1+\exp(-\eta)}. \]

This is why fitting a binary response is sometimes called logistic regression.

For later use we define a Julia logistic function. See this presentation for more information than you could possible want to know on how Julia converts code like this to run on the processor.

increment(x) = x + one(x)
logistic(η) = inv(increment(exp(-η)))
logistic (generic function with 1 method)

To reiterate, the probability model for a Generalized Linear Mixed Model (GLMM) is

\[ \begin{aligned} (\mathcal{Y} | \mathcal{B}=\bf{b}) &\sim\mathcal{D}(\bf{g^{-1}(X\boldsymbol\beta + Z b)},\phi)\\\\ \mathcal{B}&\sim\mathcal{N}(\bf{0},\Sigma_{\boldsymbol\theta}) . \end{aligned} \]

where \(\mathcal{D}\) is the distribution family (such as Bernoulli or Poisson), \(g^{-1}\) is the inverse link and \(\phi\) is a scale parameter for \(\mathcal{D}\) if it has one. The important cases of the Bernoulli and Poisson distributions don’t have a scale parameter - once you know the mean you know everything you need to know about the distribution. (For those following the presentation, this poem by John Keats is the one with the couplet “Beauty is truth, truth beauty - that is all ye know on earth and all ye need to know.”)

An example of a Bernoulli GLMM

  • The contra dataset in the MixedModels package is from a survey on the use of artificial contraception by women in Bangladesh.
contra = DataFrame(MixedModels.dataset(:contra))

1,934 rows × 5 columns

disturbanlivchageuse
StringStringStringFloat64String
1D01Y3+18.44N
2D01Y0-5.56N
3D01Y21.44N
4D01Y3+8.44N
5D01Y0-13.56N
6D01Y0-11.56N
7D01Y3+18.44N
8D01Y3+-3.56N
9D01Y1-5.56N
10D01Y3+1.44N
11D01Y0-11.56Y
12D01Y0-2.56N
13D01Y1-4.56N
14D01Y3+5.44N
15D01Y3+-0.56N
16D01Y3+4.44Y
17D01Y0-5.56N
18D01Y3+-0.56Y
19D01Y1-6.56Y
20D01Y2-3.56N
21D01Y0-4.56N
22D01Y0-9.56N
23D01Y3+2.44N
24D01Y22.44Y
25D01Y1-4.56Y
26D01Y3+14.44N
27D01Y0-6.56Y
28D01Y1-3.56Y
29D01Y1-5.56Y
30D01Y1-1.56Y
combine(groupby(contra, :dist), nrow)

60 rows × 2 columns

distnrow
StringInt64
1D01117
2D0220
3D032
4D0430
5D0539
6D0665
7D0718
8D0837
9D0923
10D1013
11D1121
12D1229
13D1324
14D14118
15D1522
16D1620
17D1724
18D1847
19D1926
20D2015
21D2118
22D2220
23D2315
24D2414
25D2567
26D2613
27D2744
28D2849
29D2932
30D3061
  • The information recorded included woman’s age, the number of live children she has, whether she lives in an urban or rural setting, and the political district in which she lives.

  • The age was centered. Unfortunately, the version of the data to which I had access did not record what the centering value was.

  • A data plot, Figure 1, shows that the probability of contraception use is not linear in age - it is low for younger women, higher for women in the middle of the range (assumed to be women in late 20’s to early 30’s) and low again for older women (late 30’s to early 40’s in this survey).

  • If we fit a model with only the age term in the fixed effects, that term will not be significant. This doesn’t mean that there is no “age effect”, it only means that there is no significant linear effect for age.

Code
draw(
  data(
    @transform(
      contra,
      :numuse = Int(:use == "Y"),
      :urb = ifelse(:urban == "Y", "Urban", "Rural")
    )
  ) *
  mapping(
    :age => "Centered age (yr)",
    :numuse => "Frequency of contraception use";
    col = :urb,
    color = :livch,
  ) *
  smooth();
  figure = (; resolution = (800, 450)),
)

Figure 1: Smoothed relative frequency of contraception use versus centered age for women in the 1989 Bangladesh Fertility Survey

contrasts = Dict(
  :dist => Grouping(),
  :urban => HelmertCoding(),
  :children => HelmertCoding(),
)
nAGQ = 9
dist = Bernoulli()
gm1 = let
  form = @formula(use ~ 1 + age + abs2(age) + urban + livch + (1 | dist))
  fit(MixedModel, form, contra, dist; nAGQ, contrasts)
end
Minimizing 217   Time: 0:00:00 ( 1.61 ms/it)
Est. SE z p σ_dist
(Intercept) -0.6871 0.1686 -4.08 <1e-04 0.4786
age 0.0035 0.0092 0.38 0.7022
abs2(age) -0.0046 0.0007 -6.29 <1e-09
urban: Y 0.3484 0.0600 5.81 <1e-08
livch: 1 0.8152 0.1622 5.02 <1e-06
livch: 2 0.9165 0.1851 4.95 <1e-06
livch: 3+ 0.9154 0.1858 4.93 <1e-06
  • Notice that the linear term for age is not significant but the quadratic term for age is highly significant.

  • We usually retain the lower order term, even if it is not significant, if the higher order term is significant.

  • Notice also that the parameter estimates for the treatment contrasts for livch are similar. Thus the distinction of 1, 2, or 3+ childen is not as important as the contrast between having any children and not having any. Those women who already have children are more likely to use artificial contraception.

  • Furthermore, the women without children have a different probability vs age profile than the women with children. To allow for this we define a binary children factor and incorporate an age&children interaction.

VarCorr(gm1)
Column Variance Std.Dev
dist (Intercept) 0.229090 0.478634
  • Notice that there is no “residual” variance being estimated. This is because the Bernoulli distribution doesn’t have a scale parameter.

Convert livch to a binary factor

@transform!(contra, :children = :livch  "0")

1,934 rows × 6 columns

disturbanlivchageusechildren
StringStringStringFloat64StringBool
1D01Y3+18.44N1
2D01Y0-5.56N0
3D01Y21.44N1
4D01Y3+8.44N1
5D01Y0-13.56N0
6D01Y0-11.56N0
7D01Y3+18.44N1
8D01Y3+-3.56N1
9D01Y1-5.56N1
10D01Y3+1.44N1
11D01Y0-11.56Y0
12D01Y0-2.56N0
13D01Y1-4.56N1
14D01Y3+5.44N1
15D01Y3+-0.56N1
16D01Y3+4.44Y1
17D01Y0-5.56N0
18D01Y3+-0.56Y1
19D01Y1-6.56Y1
20D01Y2-3.56N1
21D01Y0-4.56N0
22D01Y0-9.56N0
23D01Y3+2.44N1
24D01Y22.44Y1
25D01Y1-4.56Y1
26D01Y3+14.44N1
27D01Y0-6.56Y0
28D01Y1-3.56Y1
29D01Y1-5.56Y1
30D01Y1-1.56Y1
gm2 = let
  form = @formula(use ~ 1 + age * children + abs2(age) + urban + (1 | dist))
  fit(MixedModel, form, contra, dist; nAGQ, contrasts)
end
Minimizing 133   Time: 0:00:00 ( 0.91 ms/it)
Est. SE z p σ_dist
(Intercept) -0.3614 0.1275 -2.83 0.0046 0.4756
age -0.0131 0.0110 -1.19 0.2352
children: true 0.6054 0.1035 5.85 <1e-08
abs2(age) -0.0058 0.0008 -6.89 <1e-11
urban: Y 0.3567 0.0602 5.93 <1e-08
age & children: true 0.0342 0.0127 2.69 0.0072
  • Various model-fit statistics
Code
let
  mods = [gm2, gm1]
  DataFrame(;
    model = [:gm2, :gm1],
    npar = dof.(mods),
    deviance = deviance.(mods),
    AIC = aic.(mods),
    BIC = bic.(mods),
    AICc = aicc.(mods),
  )
end

2 rows × 6 columns

modelnpardevianceAICBICAICc
SymbolInt64Float64Float64Float64Float64
1gm272364.922379.182418.152379.24
2gm182372.462388.732433.272388.81
  • Because these models are not nested, we cannot do a likelihood ratio test. Nevertheless we see that the deviance is much lower in the model with age & children even though the 3 levels of livch have been collapsed into a single level of children.

  • There is a substantial decrease in the deviance even though there are fewer parameters in model gm2 than in gm1. This decrease is because the flexibility of the model - its ability to model the behavior of the response - is being put to better use in gm2 than in gm1.

Using urban&dist as a grouping factor

  • It turns out that there can be more difference between urban and rural settings within the same political district than there is between districts. To model this difference we build a model with urban&dist as a grouping factor.
gm3 = let
  form = @formula(
    use ~ 1 + age * children + abs2(age) + urban + (1 | urban & dist)
  )
  fit(MixedModel, form, contra, dist; nAGQ, contrasts)
end
Minimizing 158   Time: 0:00:00 ( 0.92 ms/it)
Est. SE z p σ_urban & dist
(Intercept) -0.3415 0.1269 -2.69 0.0071 0.5761
age -0.0129 0.0112 -1.16 0.2471
children: true 0.6064 0.1045 5.80 <1e-08
abs2(age) -0.0056 0.0008 -6.66 <1e-10
urban: Y 0.3936 0.0859 4.58 <1e-05
age & children: true 0.0332 0.0128 2.59 0.0096
Code
let
  mods = [gm3, gm2, gm1]
  DataFrame(;
    model = [:gm3, :gm2, :gm1],
    npar = dof.(mods),
    deviance = deviance.(mods),
    AIC = aic.(mods),
    BIC = bic.(mods),
    AICc = aicc.(mods),
  )
end

3 rows × 6 columns

modelnpardevianceAICBICAICc
SymbolInt64Float64Float64Float64Float64
1gm372353.822368.482407.462368.54
2gm272364.922379.182418.152379.24
3gm182372.462388.732433.272388.81
  • Notice that the parameter count in gm3 is the same as that of gm2 - the thing that has changed is the number of levels of the grouping factor- resulting in a much lower deviance for gm3. A simple count of the number of parameters to be estimated does not always reflect the complexity of the model.
gm2
Est. SE z p σ_dist
(Intercept) -0.3614 0.1275 -2.83 0.0046 0.4756
age -0.0131 0.0110 -1.19 0.2352
children: true 0.6054 0.1035 5.85 <1e-08
abs2(age) -0.0058 0.0008 -6.89 <1e-11
urban: Y 0.3567 0.0602 5.93 <1e-08
age & children: true 0.0342 0.0127 2.69 0.0072
gm3
Est. SE z p σ_urban & dist
(Intercept) -0.3415 0.1269 -2.69 0.0071 0.5761
age -0.0129 0.0112 -1.16 0.2471
children: true 0.6064 0.1045 5.80 <1e-08
abs2(age) -0.0056 0.0008 -6.66 <1e-10
urban: Y 0.3936 0.0859 4.58 <1e-05
age & children: true 0.0332 0.0128 2.59 0.0096
  • The coefficient for age may be regarded as insignificant but we retain it for two reasons: we have a term of age² (written abs2(age)) in the model and we have a significant interaction age & children in the model.

Summarizing the results

  • From the data plot we can see a quadratic trend in the probability by age.
  • The patterns for women with children are similar and we do not need to distinguish between 1, 2, and 3+ children.
  • We do distinguish between those women who do not have children and those with children. This shows up in a signficant age & children interaction term.