;+
; NAME:
;    gev_fit
;
; PURPOSE:
;    This function estimates the parameters of the Generalised Extreme Value 
;    distribution for a set of block maxima.
;
; CATEGORY:
;    Statistics
;
; CALLING SEQUENCE:
;    result = gev_fit( blockmax )
;
; INPUTS:
;    BLOCKMAX:  A required float vector of length N_BLOCK containing the maxima 
;        of the N_BLOCK blocks.
;    BOOTSTRAP_SEED, CHARSIZE, CI_COLOR, COLOR, COVARIATE_DATA, 
;      COVARIATE_PARAM_ID, N_BOOTSTRAP, FONT, N_BOOTSTRAP, P_VALUE, 
;      PLOT_PROBABILITY, PLOT_QUANTILE, THICK, TITLE, XTHICK, XTITLE, YTHICK, 
;      YTITLE
;
; KEYWORD PARAMETERS:
;    BOOTSTRAP_CDF:  If CDF is set and N_BOOTSTRAP is input, then this returns 
;        the cumulative distribution function values for the bootstrap GEV 
;        distributions at the BLOCKMAX locations.
;    BOOTSTRAP_SEED:  An optional scalar integer containing the seed for the 
;        random number generator used for the bootstrap sampling.  This is 
;        useful for reproducibility.
;    CHARSIZE:  The optional CHARSIZE keyword parameter for the plot function.
;    CDF:  If set, then this returns a N_BLOCK float vector containing the 
;        cumulative distribution function values of the GEV distribution at the 
;        BLOCKMAX locations.
;    CI_COLOR:  An optional scalar integer specifying the color index for 
;        plotting confidence intervals.
;    COLOR:  The optional COLOR keyword parameter for the plot function.
;    COVARIATE_DATA:  An optional float array containing values of covariate 
;        functions at the location of each of the N_BLOCK blocks.  Of size 
;        N_BLOCK,N_COVARIATE, where N_COVARIATE is the number of covariates.  
;        If input then COVARIATE_PARAM_ID must also be input.
;    COVARIATE_PARAM_ID:  An optional integer array if length N_COVARIATE 
;        specifying the GEV parameter incorporating each of the N_COVARIATE 
;        covariates in COVARIATE_DATA.  0 specifies the location parameter, 1 
;        the scale parameter, and 2 the shape parameter.  Required if 
;        COVARIATE_DATA is input.
;    FONT:  The optional FONT keyword parameter for the plot function.
;    N_BOOTSTRAP:  An optional scalar integer defining the number of bootstrap 
;        samples to use in estimating confidence intervals on the parameters 
;        using a bootstrap approach.
;    P_VALUE:  An optional scalar float containing the p-value for any 
;        confidence interval estimates.  The default is 0.10.
;    PLOT_PROBABILITY:  If set then the function plots a probability plot.
;    PLOT_QUANTILE:  If set then the function plots a quantile-quantile plot.  
;        This is not possible if there are covariates defined.
;    PARAMS_CI:  Returns a 2,(3+N_COVARIATE) float array containing the 
;        estimated 1-P_VALUE confidence intervals on the GEV model parameters.  
;        The first dimension returns the lower and upper bounds respectively, 
;        while the order of parameters in the second dimension corresponds to 
;        the order in RESULT.  If N_BOOTSTRAP is input, these confidence 
;        intervals are estimated using a bootstrap approach;  no other method 
;        is implemented yet.
;    THICK:  The optional THICK keyword parameter for the plot function.
;    [X,Y]THICK:  The optional XTHICK and YTHICK keyword parameters for the 
;        plot function.
;    TITLE:  The optional TITLE keyword parameter for the plot function.
;    [X,Y]TITLE:  The optional XTITLE and YTITLE keyword parameters for the 
;        plot function.
;
; OUTPUTS:
;    RESULT:  A float vector containing the estimated values of the GEV model 
;        parameters.  The first element contains the value for the location 
;        parameter, the second element contains the value for the scale 
;        parameter, the third element contains the value for the shape 
;        parameter, and any further elements contain the values for the 
;        regression parameters on the N_COVARIATE covariate functions.
;    BOOTSTRAP_CDF, CDF, PARAMS_CI
;
; USES:
;    gev_cdf.pro
;    gev_pdf.pro
;    quantile_threshold.pro
;    shuffle.pro
;    ;gev_fit_eqn.pro (included in this file)
;
; PROCEDURE:
;    This function estimates the parameters of the GEV model using the maximum 
;    log likelihood method.  If covariates are included, the total log 
;    likelihood function is the sum of the likelihood function at each of the 
;    specific N_BLOCK locations.  The log likelihood function is contained 
;    within the gev_fit_eqn.pro subfunction.
;
; EXAMPLE:
;    ; Generate 100 years of daily data following a Gaussian distribution
;    n_day = 365
;    n_year = 100
;    data = randomn( 1, n_day, n_year )
;    ; Add a trend covariate to the location
;    covariate_data = -( n_year - 1. ) / 2. + findgen( n_year )
;    for i = 0, n_year - 1 do data[*,i] = data[*,i] + 0.01 * covariate_data[i]
;    ; Take the annual maxima
;    blockmax = max( data, dimension=1 )
;    ; Calculate the parameters of the GEV model with a linear covariate on the 
;    ; location parameter, and plot the probability plot
;    params = gev_fit( blockmax, covariate_data=covariate_data, $
;        covariate_param_id=0, n_bootstrap=1000, plot_probability=1 )
;
; LICENSE:
;    This code was written as part of the Extreme Weather Event Real-Time 
;    Attribution Machine (EWERAM) project supported by the New Zealand Ministry 
;    of Business, Innovation and Employment.  The code is free for use under 
;    the terms of the Creative Commons License v2.0 
;    (http://creativecommons.org/licenses/by-nc-sa/2.0/).
;
; MODIFICATION HISTORY:
;    Written by:  Daithi A. Stone (dastone@runbox.com), 2019-11-06
;-

;***********************************************************************

FUNCTION GEV_FIT, $
    BLOCKMAX, $
    BOOTSTRAP_SEED=bootstrap_seed, $
    CHARSIZE=charsize, FONT=font, $
    COLOR=color, CI_COLOR=ci_color, $
    COVARIATE_DATA=covariate_data, COVARIATE_PARAM_ID=covariate_param_id, $
    N_BOOTSTRAP=n_bootstrap, $
    P_VALUE=p_value, $
    THICK=thick, XTHICK=xthick, YTHICK=ythick, $
    TITLE=title, XTITLE=xtitle, YTITLE=ytitle, $
    PLOT_PROBABILITY=plot_probability_opt, PLOT_QUANTILE=plot_quantile_opt, $
    BOOTSTRAP_CDF=bootstrap_cdf, $
    CDF=cdf, $
    PARAMS_CI=params_ci

;***********************************************************************
; Constants and options

; Create common block for sharing with gev_fit_eqn.pro
common common_gev_fit_eqn, gev_blockmax, gev_covariate_data, $
    gev_covariate_param_id

; The tolerance value for the amoeba function
amoeba_ftol = 1.0e-5

; The default p-value
if not( keyword_set( p_value ) ) then p_value = 0.1

; The number of input blocks
n_block = n_elements( blockmax )
if n_block le 3 then stop

; Check covariate inputs
n_covariate = n_elements( covariate_param_id )
if n_covariate gt 0 then begin
  ; Confirm everything is there
  if not( keyword_set( covariate_data ) ) then stop
  if n_elements( covariate_data[*,0] ) ne n_block then stop
  if n_elements( covariate_data[0,*] ) ne n_covariate then stop
  ; Calculate covariate ranges (for use in setting up amoeba solver)
  covariate_range = max( covariate_data, dimension=1 ) $
      - min( covariate_data, dimension=1 )
  ; Confirm covariates vary
  if min( covariate_range ) eq 0 then stop
endif

; Copy data required by minimisation function to common block
; (Fitting is unreliable without double precision)
gev_blockmax = double( blockmax )
if n_covariate gt 0 then begin
  gev_covariate_data = double( covariate_data )
  gev_covariate_param_id = covariate_param_id
; Otherwise clear unneeded variables
endif else begin
  if n_elements( gev_covariate_data ) gt 0 then begin
    temp = temporary( gev_covariate_data )
  endif
  if n_elements( gev_covariate_param_id ) gt 0 then begin
    temp = temporary( gev_covariate_param_id )
  endif
endelse

; Ensure the CDF is calculated if plotting is requested
if keyword_set( plot_probability_opt ) or keyword_set( plot_quantile_opt ) $
    then begin
  cdf = 1
endif

;***********************************************************************
; Estimate parameters

; Make a first guess of parameters (assuming near-Gumbel)
params_0 = dblarr( 3 )
params_0[2] = 1.e-8
params_0[1] = sqrt( 6. ) * stddev( gev_blockmax ) / !pi
params_0[0] = mean( gev_blockmax ) - 0.57722 * params_0[1]
; Define scale for first iteration of solver's search
params_scale = [ 1., 0.5 * params_0[1], 0.1 ]
; Add any covariates to the first guess
if n_covariate gt 0 then begin
  ; Assume zero scaling on covariates
  params_0 = [ params_0, fltarr( n_covariate ) ]
  ; Define scale for first iteration of solver's search
  params_scale = [ params_scale, fltarr( n_covariate ) ]
  for i_covariate = 0, n_covariate - 1 do begin
    params_scale[3+i_covariate] = params_0[covariate_param_id] * 0.5 $
        / covariate_range[i_covariate]
  endfor
endif

; Fit parameters
params = amoeba( amoeba_ftol, function_name='gev_fit_eqn', p0=params_0, $
    scale=params_scale )

; Estimate confidence intervals using bootstrap
if keyword_set( n_bootstrap ) then begin
  ; Initialise array containing bootstrap parameter estimates
  boot_params = fltarr( 3 + n_covariate, n_bootstrap )
  ; Iterate through bootstrap samples
  i_boot = 0
  n_boot_fail = 0
  while i_boot ne n_bootstrap do begin
    ; Select random sample of block maxima
    gev_blockmax = shuffle( double( blockmax ), seed=bootstrap_seed, $
        replace=1, index=boot_index )
    if n_covariate gt 0 then begin
      gev_covariate_data = double( covariate_data[boot_index,*] )
    endif
    ; Fit parameters
    boot_params[*,i_boot] = amoeba( 1.0e-5, function_name='gev_fit_eqn', $
        p0=params_0, scale=params_scale )
    ; Go to the next bootstrap sample only if a solution was found, otherwise 
    ; repeat
    if boot_params[1,i_boot] gt -0.1 then begin
      i_boot = i_boot + 1
    endif else begin
      n_boot_fail = n_boot_fail + 1
    endelse
  endwhile
  ; Print if there were any failures
  if n_boot_fail gt 0 then begin
    temp_str = 'gev_fit.pro:  Warning, there were ' $
        + strtrim( string( n_boot_fail ), 2 ) + ' failed bootstrap attempts (' $
        + strtrim( string( 100. * n_boot_fail / n_bootstrap ), 2 ) + '%)'
    print, temp_str
  endif
  ; Determine confidence interval
  params_ci = fltarr( 2, 3 + n_covariate )
  for i_param = 0, 3 + n_covariate - 1 do begin
    temp = quantile_threshold( reform( boot_params[i_param,*] ), $
        [ p_value / 2., 1. - p_value / 2. ] )
    params_ci[1,i_param] = params[i_param] + ( params[i_param] - temp[0] )
    params_ci[0,i_param] = params[i_param] - ( temp[1] - params[i_param] )
  endfor
endif

; Calculate the CDF for the PARAMS parameter values
if keyword_set( cdf ) then begin
  ; Assemble parameters
  mi = params[0]
  sigma = params[1]
  xi = params[2]
  if n_covariate gt 0 then begin
    id = where( covariate_param_id eq 0, n_id )
    if n_id gt 0 then mi = mi + params[3+id] ## covariate_data[*,id]
    id = where( covariate_param_id eq 1, n_id )
    if n_id gt 0 then sigma = sigma + params[3+id] ## covariate_data[*,id]
    id = where( covariate_param_id eq 2, n_id )
    if n_id gt 0 then xi = xi + params[3+id] ## covariate_data[*,id]
  endif
  ; Calculate CDF
  cdf = gev_cdf( blockmax, location=mi, scale=sigma, shape=xi )
  ; If we have bootstrap samples
  if keyword_set( n_bootstrap ) then begin
    ; Initialise bootstrap CDF array
    bootstrap_cdf = fltarr( n_block, n_bootstrap )
    ; Iterate through bootstrap samples
    for i_boot = 0, n_bootstrap - 1 do begin
      ; Assemble parameters
      mi = boot_params[0,i_boot]
      sigma = boot_params[1,i_boot]
      xi = boot_params[2,i_boot]
      if n_covariate gt 0 then begin
        id = where( covariate_param_id eq 0, n_id )
        if n_id gt 0 then begin
          mi = mi + boot_params[3+id,i_boot] ## covariate_data[*,id]
        endif
        id = where( covariate_param_id eq 1, n_id )
        if n_id gt 0 then begin
          sigma = sigma + boot_params[3+id,i_boot] ## covariate_data[*,id]
        endif
        id = where( covariate_param_id eq 2, n_id )
        if n_id gt 0 then begin
          xi = xi + boot_params[3+id,i_boot] ## covariate_data[*,id]
        endif
      endif
      ; Calculate the CDF
      bootstrap_cdf[*,i_boot] = gev_cdf( blockmax, location=mi, scale=sigma, $
          shape=xi )
    endfor
  endif
endif

;***********************************************************************
; Plot output

; Produce a probability plot
if keyword_set( plot_probability_opt ) then begin
  ; Default axis titles
  if not( keyword_set( xtitle ) ) then xtitle = 'Empirical'
  if not( keyword_set( ytitle ) ) then ytitle = 'Model'
  ; Determine the empirical quantiles
  quantiles = ( findgen( n_block ) + 0.5 ) / n_block
  ; Set up plotting window, including diagonal
  plot, [0,1], [0,1], isotropic=1, linestyle=1, xtitle=xtitle, ytitle=ytitle, $
      xthick=xthick, ythick=ythick, thick=thick, charsize=charsize, font=font, $
      title=title
  ; If we have bootstrap samples
  if keyword_set( n_bootstrap ) then begin
    ; Sort bootstrap samples
    for i_boot = 0, n_bootstrap - 1 do begin
      id = sort( bootstrap_cdf[*,i_boot] )
      bootstrap_cdf[*,i_boot] = bootstrap_cdf[id,i_boot]
    endfor
    ; Plot confidence range at each quantile
    for i_block = 0, n_block - 1 do begin
      temp = quantile_threshold( bootstrap_cdf[i_block,*], $
          [ p_value / 2., 1. - p_value / 2. ] )
      oplot, quantiles[i_block]+[0,0], temp, thick=thick, color=ci_color
    endfor
  endif
  ; Plot the points
  id = sort( cdf )
  oplot, quantiles, cdf[id], psym=4, thick=thick, color=color
endif

; Produce a quantile-quantile plot
if keyword_set( plot_quantile_opt ) then begin
  ; I do not think this is possible if there are covariates
  if n_covariate gt 0 then stop
  ; Default axis titles
  if not( keyword_set( xtitle ) ) then xtitle = 'Data'
  if not( keyword_set( ytitle ) ) then ytitle = 'Model'
  ; Determine the empirical locations for the quantiles
  id_sort = sort( blockmax )
  temp_blockmax = blockmax[id_sort]
  ; Determine the GEV model locations for the empirical quantiles
  quantiles = ( findgen( n_block ) + 0.5 ) / n_block
  inv_quantiles = params[0] + params[1] / params[2] $
        * ( ( ( -alog( quantiles ) ) ^ ( -params[2] ) ) - 1. )
  ; Determine plotting range
  xyrange = [ min( [ inv_quantiles, blockmax ], max=temp ), temp ]
  xyrange = ( xyrange - mean( xyrange ) ) * 1.05 + mean( xyrange )
  ; Set up plotting window
  plot, xyrange, xyrange, xstyle=1, ystyle=1, isotropic=1, linestyle=1, $
      xtitle=xtitle, ytitle=ytitle, xthick=xthick, ythick=ythick, thick=thick, $
      charsize=charsize, font=font, title=title
  ; If we have bootstrap samples
  if keyword_set( n_bootstrap ) then begin
    ; Calculate the bootstrap GEV model locations
    boot_inv_quantiles = fltarr( n_block, n_bootstrap )
    for i_boot = 0, n_bootstrap - 1 do begin
      boot_inv_quantiles[*,i_boot] = boot_params[0,i_boot] $
          + boot_params[1,i_boot] / boot_params[2,i_boot] $
          * ( ( ( -alog( quantiles ) ) ^ ( -boot_params[2,i_boot] ) ) - 1. )
    endfor
    ; Sort bootstrap samples
    for i_boot = 0, n_bootstrap - 1 do begin
      id = sort( boot_inv_quantiles[*,i_boot] )
      boot_inv_quantiles[*,i_boot] = boot_inv_quantiles[id,i_boot]
    endfor
    ; Plot confidence range at each quantile
    for i_block = 0, n_block - 1 do begin
      temp = quantile_threshold( boot_inv_quantiles[i_block,*], $
          [ p_value / 2., 1. - p_value / 2. ] )
      oplot, temp_blockmax[[i_block,i_block]], temp, thick=thick, $
          color=ci_color
    endfor
  endif
  ; Plot quantile-quantile plot
  oplot, temp_blockmax, inv_quantiles, psym=4, thick=thick, color=color
endif

;***********************************************************************
; The end

return, params
END


;***********************************************************************
; NAME:
;    gev_fit
; PURPOSE:
;    This function contains the GEV likelihood function to be minimised by 
;    gev_fit.pro.
; INPUTS:
;    PARAMS:  A required float vector containing proposed values of the GEV 
;        model parameters.  The length and order are the same as for RESULT 
;        from gev_fit.pro.
; COMMON BLOCK common_gev_fit_eqn inputs:
;    GEV_BLOCKMAX:  Same as BLOCKMAX input for gev_fit.pro.  Required.
;    GEV_COVARIATE_DATA:  Same as COVARIATE_DATA keyword input for 
;        gev_fit.pro.  Required if covariates are to be included.
;    GEV_COVARIATE_PARAM_ID:  Same as COVARIATE_PARAM_ID keyword input for 
;        gev_fit.pro.  Required if covariates are to be included.
; USES:
;    gev_pdf.pro
; OUTPUTS:
;    RESULT:  The negative of the log likelihood function for the GEV model 
;        given the input set of parameters.  The negative is returned because 
;        IDL's amoeba function performs a minimisation, while we want a 
;        maximumisation of the log likelihood.
;***********************************************************************

FUNCTION GEV_FIT_EQN, $
    PARAMS

;***********************************************************************
; Constants and inputs

; Get input values
common common_gev_fit_eqn
; Determine the data length
n_block = n_elements( gev_blockmax )
; Determine the number of covariates
n_covariate = n_elements( gev_covariate_param_id )
; Define parameters for use in calculations (including covariates)
mi = params[0]
sigma = params[1]
xi = params[2]
if n_covariate gt 0 then begin
  id = where( gev_covariate_param_id eq 0, n_id )
  if n_id gt 0 then begin
    mi = mi + params[3+id] ## gev_covariate_data[*,id]
  endif else begin
    mi = mi + fltarr( n_block )
  endelse
  id = where( gev_covariate_param_id eq 1, n_id )
  if n_id gt 0 then begin
    sigma = sigma + params[3+id] ## gev_covariate_data[*,id]
  endif else begin
    sigma = sigma + fltarr( n_block )
  endelse
  id = where( gev_covariate_param_id eq 2, n_id )
  if n_id gt 0 then begin
    xi = xi + params[3+id] ## gev_covariate_data[*,id]
  endif else begin
    xi = xi + fltarr( n_block )
  endelse
endif

;***********************************************************************
; Calculate log likelihood

; Abort if illegal scale parameter
if min( sigma ) lt 0 then return, double( n_block ) * 100

; Ensure legal parameter combination
temp = 1. + xi * ( gev_blockmax - mi ) / sigma
if min( temp ) lt 0 then return, double( n_block ) * 100

; Calculate the log likelihood
result = gev_pdf( gev_blockmax, location=mi, scale=sigma, shape=xi, log=1 )
result = total( result )

; Take negative (so amoeba is effectively maximising rather than minimising)
result = -result

;***********************************************************************
; The end

return, result
END
