### INTRODUCTION

### MATERIALS AND METHODS

### Study area

### Data sources

### Statistical analysis

#### Raw standardized incidence ratio

*O*,

_{i}*E*, and

_{i}*θ*are the observed number, of casesnumber, the expected number of cases, and the relative risk for neighborhood

_{i}*i*, respectively. The number of expected events is calculated as follows:

*n*is the number of women aged 15 and over in neighborhood

_{i}*i*

*y*is the observed number of events in the neighborhood

_{i}*i*. The SIR can be calculated by as the observed observed-to to-expected ratio.

### Besag, York, and Mollie model

*α*is a the log-relative risk baseline, and

*v*and

_{i}*u*indicates random random-effects components regarding to spatial and non-spatial factors.

_{i}*v*) is induced by the conditional autoregressive (CAR) model. The CAR model represents risk factors with spatial structures, so that specific risk estimates of a given area will tend to shrunk shrink toward a local mean. The CAR model within the BYM model is as follows:

_{i}*i*and

*j*are neighbors of each other, the weight is equal to 1, and otherwise the weight is 0.

*u*represents risk factors with non-spatially structures, so such that that the specific risk estimate of a given area will tend to shrunk shrink toward a the global mean of the study area. This component in the BYM model is as follows:

_{i}*τ*and

_{v}*τ*is the gamma distribution G(a, b) with expected value

_{u}*a*=0.5 and

_{v}*b*=0.005 for spatially structured random effects and

_{v}*a*=0.5 and

_{u}*b*=0.5 for non-spatially structured random effects.

_{u}### Spatial empirical Bayesian methods

*θ*does depend on the data

_{i}*O*and

_{i}*E*from the other regions (

_{i}*j*≠

*i*). In other words, the parameters of the prior distribution are not fixed, and will beare estimated empirically and based on all available data. Smoothing raw SIRs with empirical bayes Bayesian methods was done using second-order queen weights in GeoDa.

### Detection and identification of breast cancer clusters

_{in}) with disease cases outside that area (θ

_{out}). Since the results of this analysis can be sensitive to model parameters, particularly window size, the maximum spatial cluster size is defined using the Gini coefficient [22]. It has been argued that the Gini coefficient is a very intuitive and systematic way to identify the best collection and non-overlapping of clusters [22].

_{o}: θ

_{in}=θ

_{out}; H

_{a}: θ

_{in}≠ θ

_{out}) for a specific window is proportional to 1:

*C*is the total number of BC cases,

*c*is the observed number of BC cases within a window,

*E[c]*is the crude expected number of cases within the window under the null hypothesis, and

*C−E[c]*is the expected number of cases outside the window.

*R*is the number of random datasets with a LRS higher than the LRS in the real dataset and

_{beat}*R*is the total number of random datasets. A window shows statistical significance at

*α*=0.05 when its LRS is higher than approximately 95% of the LRS values of the random dataset. The windows with the most statistically significant likelihood ratios were defined as the most likely, secondary, and tertiary clusters, respectively. The p-values of <0.05 using 999 permutations were considered to indicate statistical significance within the Moran index and spatial clusters. Sufficient statistical power was ensured by the use of 999 replications in the Monte Carlo simulation. All cartographic manipulations and displays were performed in ArcGIS version 10.3 (Esri, Redlands, CA, USA).