The aim of this study was to determine whether there is agreement between different methods of binary seasonal classification when applied to time series derived from diagnosis codes in observational data. We used databases of different sizes, types and origins to eliminate the possibility of disagreement caused by mere database choice. The results of this study, as shown in FIG. 1, indicate that the methods are generally inconsistent, with disagreements observed in 60 to 80% of the time series in 10 databases. As tables 3† 4and 5 reveal, the methods show significant variations within the database, even when considering only the portion of the time series classified as seasonal. The existence of this variation across all databases and levels of significance indicates that the source of the variation is not the data, but the methods themselves.

### Sources of Disagreement

Ultimately, the source of disagreement stems from the different ways the methods assess seasonality. While there are similarities, each method focuses on a different aspect of a time series to assess seasonality (Table 2† For example, half of the methods (ET, AA, AR, ED) fit a time series with a hypothetical model and test the model for seasonality, while the other half (FR, KW, WE, QS) test different aspects of a time series directly, without using an assumed model. To take the discussion further and generalize where we can, we distinguish between types of concordance and types of peaks. With regard to concordance, we define ‘positive concordance’ as unanimous agreement between the methods that a time series is seasonal while ‘negative concordance’ is unanimous agreement that a time series is non-seasonal. Therefore, the methods for a given time series are discordant when there is no positive concordance or negative concordance. With regard to peaks, we say that peaks are “persistent” if they occur year after year, and they are “consistent” if they occur in the same month. We make this distinction because peaks relate to important aspects of time series analysis that are relevant to seasonality; specific variation and autocorrelation. Peaks can come in different sizes, of course. Time series with large peaks suggest greater variation than those with small peaks. Persistent peaks (small or large) indicate the possibility of underlying cyclical behavior in the time series. Consistent peaks, if they are consistent, indicate autocorrelation in the time series. We use Fig. 2 and 3 to navigate the rest of the discussion.

From Fig3.ts1 (*N*= 2809) and Fig3.ts9 (*N*= 1498), we learn that the methods show concordance only 4307/11.137 = 38.7% of the time. figure 2 provides valuable insight into the degree of disagreement between the methods. Of the 40 unique combinations, we see that some combinations occur more often than others and this is due to similarities in the test procedure (table 2† For example, methods that group time series data by month and test for differences between groups assess seasonality differently than methods that fit a hypothetical model and then determine seasonality by minimizing forecast error. Recognizing the differences in how the methods assess seasonality is important not only in understanding the amount of disagreement observed, but also in recognizing that these differences indicate a disagreement about how seasonality is defined. Indeed, if the methods were highly concordant despite their contrasting approaches, we would have to admit that the contrasting approaches are ultimately just different ways of expressing the same aspect of a time series. This can be seen more clearly by Fig. 3† In Fig3.ts1, …, Fig3.ts4 we observe time series that to the human eye appear seasonal and very similar. Identifying such time series as seasonal is a very old idea in time series analysis, with Beveridge [24] and Yule [25] the use of harmonic functions to model time series with cyclic behavior. However, despite a clear cyclical pattern and visual similarities, Fig3.ts2, Fig3.ts3 and Fig3.ts4 all show discordance. This is because, except for the ED method, the methods do not test for seasonality by providing the data with harmonic functions. Thus, the different methods of assessing seasonality ultimately result in different definitions of seasonality.

As we mentioned before, the behavior of peaks plays an important role in concordance. We use Fig. 3 further to explore the relationship between peaks, variation, and disagreement, and provide general principles about when a method is more likely to classify a time series as seasonal rather than non-seasonal.

### Positive Agreement

Since each method assesses seasonality differently, positive agreement is only reached when multiple conditions are present at the same time. Sustained and consistent peaks are most important for ED, AA, AR and ET. Peaks will result in a seasonal rating by ED as long as there is enough difference between the peaks and troughs in the data. However, even with persistent and consistent peaks, variation (especially between peaks) over time can lead to a non-seasonal classification by AA, AR or ET (Fig3.ts2, Fig3.ts3 and Fig3.ts4). Indeed, we experimentally confirmed that we can achieve a positive agreement for the time series in Fig3.ts2, Fig3.ts3 and Fig3.ts4, by removing the data before 2016. Since time series with sustained and consistent peaks have a high correlation between seasonal delays, they are classified seasonally by QS. Variation is key for FR, KW and WE. In the absence of the prominent peaks we see in Fig3.ts1, …, Fig3.ts4, sufficient variation in the time series data FR, KW and WE can lead to a seasonal classification (Fig3.ts6). Therefore, with regard to positive concordance, we see tensions between the methods in that variation can lead some methods to classify apparently seasonal time series as non-seasonal (Fig3.ts2, Fig3.ts3 and Fig3.ts4) and apparently non-seasonal time. series as seasonal (Fig3.ts5, …, Fig3.ts8).

### Negative agreement

The relationship between negative concordance and variation is simpler. The time series in Fig3.ts5, …, Fig3.ts9 are similar in that one cannot determine the results of the methods by visual inspection alone (remember that any linear trend in each of the original series has been removed prior to the application of the method). Given the similarity of the time series in Fig3.ts5, …, Fig3.ts9, it is reasonable to wonder why not all of them show negative concordance. Ultimately, time series that are constant or stationary around a constant mean with minimal variation will result in negative agreement between the methods. However, a time series with both large peaks and variation will show negative concordance if there is no monthly or annual autocorrelation (for example, a time series generated from N(μ,σ)^{2}† As noted in the Results section, the 1498 time series for which the methods exhibit negative concordance report a mean variance of 0 to four decimal places.

### Generalization and Limitations

We have explained common scenarios where we can expect negative and positive concordance, but further generalization is more difficult. As fig. 3 reveals, there are thousands of different combinations of discord (M = 2168, …, 1267) for each time series, making it difficult to predict which particular combination of disagreements to expect based on visual inspection of the time series alone. However, a direct consequence of this study is that researchers using different methods implicitly define seasonality differently. Given the discrepancy between the methods, researchers relying on different methods are likely to encounter different results, leading to conflicting understandings of the seasonality of a time series.

Finally, we note that the study and evaluation of methods was limited to 10 observational databases and eight methods of binary seasonal classification. Different results may have been observed by changing one or more of the design choices. As explained in the Discussion section, include aspects of a time series that influence the classification of seasonality variance, autocorrelation, peak persistence, and peak consistency. Time series constructed to affect one or more of those aspects can affect concordance. We chose 10 observational databases. Perhaps adding tens or hundreds of other databases would reveal different levels of agreement between the methods. Likewise, we have chosen 8 binary seasonal classification methods. A different group of methods may have resulted in different levels of agreement.

#Empirical #Assessment #Alternative #Methods #Identifying #Seasonality #Observational #Care #Data #BMC #Medical #Research #Methodology