Introduction to PHEindicatormethods

Georgina Anderson

Introduction

This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.

The following packages must be installed and loaded if not already available

library(PHEindicatormethods)
library(dplyr)

Package functions

This vignette covers the following core functions available within PHEindicatormethods:

Function Type Description
phe_proportion Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_rate Non-aggregate Performs a calculation on each row of data (unless data is grouped)
phe_mean Aggregate Performs a calculation on each grouping set
calculate_dsr Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRatio Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRate Aggregate, standardised Performs a calculation on each grouping set and requires additional reference inputs

Other functions are introduced in separate vignettes.

Non-aggregate functions

Create some test data for the non-aggregate functions

The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:

df <- data.frame(
        area = rep(c("Area1","Area2","Area3","Area4"), 2),
        year = rep(2015:2016, each = 4),
        obs = sample(100, 2 * 4, replace = TRUE),
        pop = sample(100:200, 2 * 4, replace = TRUE))
df
#>    area year obs pop
#> 1 Area1 2015  27 156
#> 2 Area2 2015  36 145
#> 3 Area3 2015  29 120
#> 4 Area4 2015  79 123
#> 5 Area1 2016  53 101
#> 6 Area2 2016  46 188
#> 7 Area3 2016  91 156
#> 8 Area4 2016  43 190

Execute phe_proportion and phe_rate

INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.

OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified

# default proportion
phe_proportion(df, obs, pop)
#>    area year obs pop     value   lowercl   uppercl confidence       statistic
#> 1 Area1 2015  27 156 0.1730769 0.1217616 0.2401061        95% proportion of 1
#> 2 Area2 2015  36 145 0.2482759 0.1850655 0.3244797        95% proportion of 1
#> 3 Area3 2015  29 120 0.2416667 0.1738584 0.3255015        95% proportion of 1
#> 4 Area4 2015  79 123 0.6422764 0.5544397 0.7214953        95% proportion of 1
#> 5 Area1 2016  53 101 0.5247525 0.4282498 0.6194412        95% proportion of 1
#> 6 Area2 2016  46 188 0.2446809 0.1887455 0.3108413        95% proportion of 1
#> 7 Area3 2016  91 156 0.5833333 0.5048757 0.6577855        95% proportion of 1
#> 8 Area4 2016  43 190 0.2263158 0.1725838 0.2908953        95% proportion of 1
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence = 99.8)
#>    area year obs pop     value    lowercl   uppercl confidence       statistic
#> 1 Area1 2015  27 156 0.1730769 0.09913737 0.2847328      99.8% proportion of 1
#> 2 Area2 2015  36 145 0.2482759 0.15532170 0.3723378      99.8% proportion of 1
#> 3 Area3 2015  29 120 0.2416667 0.14293143 0.3784872      99.8% proportion of 1
#> 4 Area4 2015  79 123 0.6422764 0.50296043 0.7610918      99.8% proportion of 1
#> 5 Area1 2016  53 101 0.5247525 0.37582448 0.6694041      99.8% proportion of 1
#> 6 Area2 2016  46 188 0.2446809 0.16170173 0.3523442      99.8% proportion of 1
#> 7 Area3 2016  91 156 0.5833333 0.46002112 0.6970316      99.8% proportion of 1
#> 8 Area4 2016  43 190 0.2263158 0.14694208 0.3318841      99.8% proportion of 1
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify multiplier to output proportions as percentages
phe_proportion(df, obs, pop, multiplier = 100)
#>    area year obs pop    value  lowercl  uppercl confidence  statistic method
#> 1 Area1 2015  27 156 17.30769 12.17616 24.01061        95% percentage Wilson
#> 2 Area2 2015  36 145 24.82759 18.50655 32.44797        95% percentage Wilson
#> 3 Area3 2015  29 120 24.16667 17.38584 32.55015        95% percentage Wilson
#> 4 Area4 2015  79 123 64.22764 55.44397 72.14953        95% percentage Wilson
#> 5 Area1 2016  53 101 52.47525 42.82498 61.94412        95% percentage Wilson
#> 6 Area2 2016  46 188 24.46809 18.87455 31.08413        95% percentage Wilson
#> 7 Area3 2016  91 156 58.33333 50.48757 65.77855        95% percentage Wilson
#> 8 Area4 2016  43 190 22.63158 17.25838 29.08953        95% percentage Wilson

# specify multiplier for proportion, confidence level and remove metadata columns
phe_proportion(df, obs, pop, confidence = 99.8, multiplier = 100, type = "standard")
#>    area year obs pop    value   lowercl  uppercl
#> 1 Area1 2015  27 156 17.30769  9.913737 28.47328
#> 2 Area2 2015  36 145 24.82759 15.532170 37.23378
#> 3 Area3 2015  29 120 24.16667 14.293143 37.84872
#> 4 Area4 2015  79 123 64.22764 50.296043 76.10918
#> 5 Area1 2016  53 101 52.47525 37.582448 66.94041
#> 6 Area2 2016  46 188 24.46809 16.170173 35.23442
#> 7 Area3 2016  91 156 58.33333 46.002112 69.70316
#> 8 Area4 2016  43 190 22.63158 14.694208 33.18841

# default rate
phe_rate(df, obs, pop)
#>    area year obs pop    value  lowercl  uppercl confidence       statistic
#> 1 Area1 2015  27 156 17307.69 11403.18 25182.76        95% rate per 100000
#> 2 Area2 2015  36 145 24827.59 17386.52 34372.86        95% rate per 100000
#> 3 Area3 2015  29 120 24166.67 16181.45 34708.62        95% rate per 100000
#> 4 Area4 2015  79 123 64227.64 50847.87 80047.81        95% rate per 100000
#> 5 Area1 2016  53 101 52475.25 39304.83 68640.27        95% rate per 100000
#> 6 Area2 2016  46 188 24468.09 17912.11 32637.75        95% rate per 100000
#> 7 Area3 2016  91 156 58333.33 46965.08 71621.16        95% rate per 100000
#> 8 Area4 2016  43 190 22631.58 16376.93 30485.34        95% rate per 100000
#>   method
#> 1  Byars
#> 2  Byars
#> 3  Byars
#> 4  Byars
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify multiplier for rate and confidence level
phe_rate(df, obs, pop, confidence = 99.8, multiplier = 100)
#>    area year obs pop    value   lowercl  uppercl confidence    statistic method
#> 1 Area1 2015  27 156 17.30769  8.783552 30.29970      99.8% rate per 100  Byars
#> 2 Area2 2015  36 145 24.82759 13.952944 40.48648      99.8% rate per 100  Byars
#> 3 Area3 2015  29 120 24.16667 12.601525 41.53299      99.8% rate per 100  Byars
#> 4 Area4 2015  79 123 64.22764 44.173569 89.86280      99.8% rate per 100  Byars
#> 5 Area1 2016  53 101 52.47525 32.961126 78.81964      99.8% rate per 100  Byars
#> 6 Area2 2016  46 188 24.46809 14.799981 37.81279      99.8% rate per 100  Byars
#> 7 Area3 2016  91 156 58.33333 41.233844 79.82645      99.8% rate per 100  Byars
#> 8 Area4 2016  43 190 22.63158 13.429676 35.47490      99.8% rate per 100  Byars

# specify multiplier for rate, confidence level and remove metadata columns
phe_rate(df, obs, pop, type = "standard", confidence = 99.8, multiplier = 100)
#>    area year obs pop    value   lowercl  uppercl
#> 1 Area1 2015  27 156 17.30769  8.783552 30.29970
#> 2 Area2 2015  36 145 24.82759 13.952944 40.48648
#> 3 Area3 2015  29 120 24.16667 12.601525 41.53299
#> 4 Area4 2015  79 123 64.22764 44.173569 89.86280
#> 5 Area1 2016  53 101 52.47525 32.961126 78.81964
#> 6 Area2 2016  46 188 24.46809 14.799981 37.81279
#> 7 Area3 2016  91 156 58.33333 41.233844 79.82645
#> 8 Area4 2016  43 190 22.63158 13.429676 35.47490

These functions can also return aggregate data if the input dataframes are grouped:

# default proportion - grouped
df %>%
  group_by(year) %>%
  phe_proportion(obs, pop)
#> # A tibble: 2 × 9
#> # Groups:   year [2]
#>    year   obs   pop value lowercl uppercl confidence statistic       method
#>   <int> <int> <int> <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015   171   544 0.314   0.277   0.355 95%        proportion of 1 Wilson
#> 2  2016   233   635 0.367   0.330   0.405 95%        proportion of 1 Wilson

# default rate - grouped
df %>%
  group_by(year) %>%
  phe_rate(obs, pop)
#> # A tibble: 2 × 9
#> # Groups:   year [2]
#>    year   obs   pop  value lowercl uppercl confidence statistic       method
#>   <int> <int> <int>  <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015   171   544 31434.  26899.  36515. 95%        rate per 100000 Byars 
#> 2  2016   233   635 36693.  32132.  41719. 95%        rate per 100000 Byars



Aggregate functions

The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.

The df test data generated earlier can be used to demonstrate phe_mean:

Execute phe_mean

INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified

# default mean
phe_mean(df,obs)
#>   value_sum value_count    stdev value  lowercl  uppercl confidence statistic
#> 1       404           8 23.17634  50.5 31.12409 69.87591        95%      mean
#>                     method
#> 1 Student's t-distribution

# multiple means in a single execution with 99.8% confidence
df %>%
    group_by(year) %>%
        phe_mean(obs, confidence = 0.998)
#> # A tibble: 2 × 10
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl confidence statistic
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl> <chr>      <chr>    
#> 1  2015       171           4  24.5  42.8   -82.2    168. 99.8%      mean     
#> 2  2016       233           4  22.2  58.2   -55.3    172. 99.8%      mean     
#> # ℹ 1 more variable: method <chr>

# multiple means in a single execution with 99.8% confidence and data-only output
df %>%
    group_by(year) %>%
        phe_mean(obs, type = "standard", confidence = 0.998)
#> # A tibble: 2 × 7
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl>
#> 1  2015       171           4  24.5  42.8   -82.2    168.
#> 2  2016       233           4  22.2  58.2   -55.3    172.

Standardised Aggregate functions

Create some test data for the standardised aggregate functions

The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:

df_std <- data.frame(
            area = rep(c("Area1", "Area2", "Area3", "Area4"), each = 19 * 2 * 5),
            year = rep(2006:2010, each = 19 * 2),
            sex = rep(rep(c("Male", "Female"), each = 19), 5),
            ageband = rep(c(0, 5,10,15,20,25,30,35,40,45,
                           50,55,60,65,70,75,80,85,90), times = 10),
            obs = sample(200, 19 * 2 * 5 * 4, replace = TRUE),
            pop = sample(10000:20000, 19 * 2 * 5 * 4, replace = TRUE))
head(df_std)
#>    area year  sex ageband obs   pop
#> 1 Area1 2006 Male       0 195 15786
#> 2 Area1 2006 Male       5 159 18310
#> 3 Area1 2006 Male      10 129 14997
#> 4 Area1 2006 Male      15 121 18247
#> 5 Area1 2006 Male      20  77 10446
#> 6 Area1 2006 Male      25  46 15595

Execute calculate_dsr

INPUT: The minimum input requirement for the calculate_dsr function is a single data frame with columns representing the numerators and denominators and standard populations for each standardisation category. The standard populations must be appended to the input data frame by the user prior to execution of the function. The 2013 European Standard Population is provided within the package in vector form (esp2013), which you can join to your dataset. Alternative standard populations can be used but must be provided by the user.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output. It is also possible to calculate CIs when we can’t assume events are independent - further details can be found in the DSR vignette.

Here are some example code chunks to demonstrate the calculate_dsr function and the arguments that can optionally be specified


# Append the standard populations to the data frame
# calculate separate dsrs for each area, year and sex
df_std %>%
    mutate(refpop = rep(esp2013, 40)) %>%
    group_by(area, year, sex) %>%
    calculate_dsr(obs,pop, stdpop = refpop)
#> # A tibble: 40 × 11
#>    area   year sex    total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female        1976    285485  712.    679.    746. 95%       
#>  2 Area1  2006 Male          2067    292506  713.    680.    748. 95%       
#>  3 Area1  2007 Female        2027    304681  670.    638.    702. 95%       
#>  4 Area1  2007 Male          2110    292002  761.    726.    796. 95%       
#>  5 Area1  2008 Female        1872    288328  706.    673.    740. 95%       
#>  6 Area1  2008 Male          2172    284452  807.    771.    844. 95%       
#>  7 Area1  2009 Female        1765    272181  651.    619.    684. 95%       
#>  8 Area1  2009 Male          1935    280477  760.    725.    797. 95%       
#>  9 Area1  2010 Female        2419    293242  924.    884.    965. 95%       
#> 10 Area1  2010 Male          2137    266901  878.    839.    918. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>


# Append the standard populations to the data frame
# calculate separate dsrs for each area, year and sex and drop metadata fields from output
df_std %>%
    mutate(refpop = rep(esp2013, 40)) %>%
    group_by(area, year, sex) %>%
    calculate_dsr(obs,pop, stdpop = refpop, type = "standard")
#> # A tibble: 40 × 8
#>    area   year sex    total_count total_pop value lowercl uppercl
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female        1976    285485  712.    679.    746.
#>  2 Area1  2006 Male          2067    292506  713.    680.    748.
#>  3 Area1  2007 Female        2027    304681  670.    638.    702.
#>  4 Area1  2007 Male          2110    292002  761.    726.    796.
#>  5 Area1  2008 Female        1872    288328  706.    673.    740.
#>  6 Area1  2008 Male          2172    284452  807.    771.    844.
#>  7 Area1  2009 Female        1765    272181  651.    619.    684.
#>  8 Area1  2009 Male          1935    280477  760.    725.    797.
#>  9 Area1  2010 Female        2419    293242  924.    884.    965.
#> 10 Area1  2010 Male          2137    266901  878.    839.    918.
#> # ℹ 30 more rows

# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
df_std %>%
  filter(ageband <= 70) %>%
  mutate(refpop = rep(esp2013[1:15], 40)) %>%
  group_by(area, year, sex) %>%
  calculate_dsr(obs, pop, stdpop = refpop)
#> # A tibble: 40 × 11
#>    area   year sex    total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female        1613    227232  716.    681.    752. 95%       
#>  2 Area1  2006 Male          1541    222402  722.    685.    760. 95%       
#>  3 Area1  2007 Female        1484    235090  667.    632.    702. 95%       
#>  4 Area1  2007 Male          1688    226660  772.    735.    811. 95%       
#>  5 Area1  2008 Female        1594    231582  714.    679.    751. 95%       
#>  6 Area1  2008 Male          1797    227822  825.    786.    865. 95%       
#>  7 Area1  2009 Female        1361    223702  635.    601.    670. 95%       
#>  8 Area1  2009 Male          1722    216161  806.    767.    846. 95%       
#>  9 Area1  2010 Female        2020    215807  978.    935.   1023. 95%       
#> 10 Area1  2010 Male          1748    201405  891.    849.    934. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

Execute calculate_ISRatio and calculate_ISRate

INPUT: These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.

The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:

OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (ISRate only), the indirectly standardised rate or ratio, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:

df_ref <- df_std %>%
    filter(year == 2006) %>%
    group_by(ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last")
    
head(df_ref)
#> # A tibble: 6 × 3
#>   ageband   obs    pop
#>     <dbl> <int>  <int>
#> 1       0   970 113564
#> 2       5   971 103121
#> 3      10   831 112802
#> 4      15   865 128519
#> 5      20   952 111298
#> 6      25   400 129930

Here are some example code chunks to demonstrate the calculate_ISRatio function and the arguments that can optionally be specified

# calculate separate smrs for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl confidence
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female     1976    1921. 1.03    0.984    1.08 95%       
#>  2 Area1  2006 Male       2067    1971. 1.05    1.00     1.09 95%       
#>  3 Area1  2007 Female     2027    2100. 0.965   0.924    1.01 95%       
#>  4 Area1  2007 Male       2110    1984. 1.06    1.02     1.11 95%       
#>  5 Area1  2008 Female     1872    1957. 0.957   0.914    1.00 95%       
#>  6 Area1  2008 Male       2172    1958. 1.11    1.06     1.16 95%       
#>  7 Area1  2009 Female     1765    1848. 0.955   0.911    1.00 95%       
#>  8 Area1  2009 Male       1935    1869. 1.04    0.990    1.08 95%       
#>  9 Area1  2010 Female     2419    2013. 1.20    1.15     1.25 95%       
#> 10 Area1  2010 Male       2137    1799. 1.19    1.14     1.24 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

# calculate the same smrs by appending the reference data to the data frame
# and drop metadata columns from output
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, refobs, refpop, refpoptype = "field",
                      type = "standard")
#> # A tibble: 40 × 8
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female     1976    1921. 1.03    0.984    1.08
#>  2 Area1  2006 Male       2067    1971. 1.05    1.00     1.09
#>  3 Area1  2007 Female     2027    2100. 0.965   0.924    1.01
#>  4 Area1  2007 Male       2110    1984. 1.06    1.02     1.11
#>  5 Area1  2008 Female     1872    1957. 0.957   0.914    1.00
#>  6 Area1  2008 Male       2172    1958. 1.11    1.06     1.16
#>  7 Area1  2009 Female     1765    1848. 0.955   0.911    1.00
#>  8 Area1  2009 Male       1935    1869. 1.04    0.990    1.08
#>  9 Area1  2010 Female     2419    2013. 1.20    1.15     1.25
#> 10 Area1  2010 Male       2137    1799. 1.19    1.14     1.24
#> # ℹ 30 more rows

The calculate_ISRate function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:

# calculate separate indirectly standardised rates for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 12
#> # Groups:   area, year, sex [40]
#>    area   year sex   observed expected ref_rate value lowercl uppercl confidence
#>    <chr> <int> <chr>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema…     1976    1921.     671.  690.    660.    721. 95%       
#>  2 Area1  2006 Male      2067    1971.     671.  703.    673.    734. 95%       
#>  3 Area1  2007 Fema…     2027    2100.     671.  647.    619.    676. 95%       
#>  4 Area1  2007 Male      2110    1984.     671.  713.    683.    744. 95%       
#>  5 Area1  2008 Fema…     1872    1957.     671.  641.    613.    671. 95%       
#>  6 Area1  2008 Male      2172    1958.     671.  744.    713.    776. 95%       
#>  7 Area1  2009 Fema…     1765    1848.     671.  640.    611.    671. 95%       
#>  8 Area1  2009 Male      1935    1869.     671.  694.    664.    726. 95%       
#>  9 Area1  2010 Fema…     2419    2013.     671.  806.    774.    839. 95%       
#> 10 Area1  2010 Male      2137    1799.     671.  797.    763.    831. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

# calculate the same indirectly standardised rates by appending the reference data to the data frame
# and drop metadata columns from output
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, refobs, refpop, refpoptype = "field",
                     type = "standard")
#> # A tibble: 40 × 9
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected ref_rate value lowercl uppercl
#>    <chr> <int> <chr>     <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female     1976    1921.     671.  690.    660.    721.
#>  2 Area1  2006 Male       2067    1971.     671.  703.    673.    734.
#>  3 Area1  2007 Female     2027    2100.     671.  647.    619.    676.
#>  4 Area1  2007 Male       2110    1984.     671.  713.    683.    744.
#>  5 Area1  2008 Female     1872    1957.     671.  641.    613.    671.
#>  6 Area1  2008 Male       2172    1958.     671.  744.    713.    776.
#>  7 Area1  2009 Female     1765    1848.     671.  640.    611.    671.
#>  8 Area1  2009 Male       1935    1869.     671.  694.    664.    726.
#>  9 Area1  2010 Female     2419    2013.     671.  806.    774.    839.
#> 10 Area1  2010 Male       2137    1799.     671.  797.    763.    831.
#> # ℹ 30 more rows