apcc logo

위성 및 장기예측자료의 기계학습을 통한 가뭄예측

저자
이진영 박사
 
작성일
2016.03.14
조회
296
  • 요약
  • 목차

The impact of droughts can be reduced through sustainable drought management and proactive measures against drought disasters. Accurate and timely provision of drought information is the most essential. This study developed a drought forecasting model to provide high-resolution drought information based on drought indicators that is useful for decision-makers. The purposes of the study are (1) to identify the needs of decision-makers regarding drought information through a two-part survey; (2) to develop a drought forecasting model using remote sensing and long-range forecast data based on machine learning, then assess its performance; and (3) to provide improvable ranges of drought forecasting capability in case of enhanced long-range forecast.

 

A two-part survey, in the form of a questionnaire, was conducted. Fifty-one government officials from 17 municipal governments were asked to participate in the survey. These government officials’ main duties include addressing issues on droughts, heatwaves, disasters, wildfires, and urban greening. The first round of surveys had a return rate of 53% (27 participants), and the second round of surveys had a return rate of 19% (5 participants). Although the return rate is low, each response was considered seriously and applied to the design of the study because the survey was conducted replacing one-on-one interview.

If an institution that is trusted by the public, were to provide drought information tailored to different sectors, the use of this drought information would be improved. Most respondents preferred to have weekly temporal resolution while only some preferred daily or monthly information. For administrative districts, most preferred spatial information while some preferred spatially distributed information (with spatial resolution as high as 1 × 1 km). The need for 1-month to 1-year lead times of drought forecasting was also identified. Judging by these needs, it seems to be necessary to develop a drought forecasting model with a weekly temporal resolution and very fine spatial resolution. This study attempts to start the process of getting to that goal by developing a drought forecasting model with a monthly temporal resolution matching the resolution of the long-range forecast and a 0.5 × 0.5 ° spatial resolution, which matches the resolution of remote sensing-based data. Drought forecasting models with finer temporal and spatial resolutions will be developed in following studies.

 

The survey responses also called for drought information for gauges, as well as spatially distributed information based on remote sensing data. Sixty percent of the respondents of the second survey intend to use remote sensing-based drought indicators, even if the accuracy is lower than gauges. Drought risk assessment was performed for general drought impact categories, but no significant results were obtained due to the small number of respondents. Despite this, it is still meaningful that the respondents gave high priority to environmental and social drought impacts, which have long been ignored in comparison to other categories. This calls for a more detailed drought risk assessment for each drought impact category.

Several drought forecasting models were developed that predict drought indices of the 6-month Standardized Precipitation Index (SPI6), the 6-month Standardized Precipitation Evapotranspiration Index (SPEI6), as well as the Normalized Difference Water Index (NDWI). The Multiquadric Spline Interpolation method and three different machine learning models of Decision Tree (DT model), Random Forest (RF model), and Extremely Randomized Trees (ET model) were tested. Interpolation methods have traditionally been used to derive information for ungauged locations using information from neighboring gauges and their distances. Machine learning models were tested to enhance the provision of initial conditions of droughts based on remote sensing data, as initial conditions are some of the most important factors for drought forecasting.

 

Classification of drought categories and regression of the values of drought indicators were performed. The input variables of machine learning models predicting SPI6 or SPEI6 include the: 6-month accumulated precipitation (PRCP), 6-month accumulated potential evapotranspiration (PET), Normalized Difference Vegetation Index (NDVI), Normalized Difference Water Index (NDWI), Daytime Land Surface Temperature (LST_DAY), Nighttime Land Surface Temperature (LST_NIGHT), Multivariate ENSO Index (MEI), Arctic Oscillation Index (AOI), and month (MONTH). The target variables are SPI6 or SPEI6. The models predicting NDWI use the periods matching the lead time for PRCP and PET instead of using 6 months.

 

Sensitivity analyses were performed to determine model parameters of the three machine learning models. In order to avoid overfitting, Leave-One-Year-Out cross- validation was applied by excluding one year repeatedly from 2003-2015 input data for training and using the data for the test. The performance measures used are: the accuracy of classifying drought categories such as Extreme Drought (ED), Severe Drought (SD), and Moderate Drought (MD); and the Mean Absolute Error (MAE) for regression. They were evaluated for 61 Automated Synoptic Observing System (ASOS) gauge locations.

 

The performance of long-range forecast is out of the scope of this study, although it is the most important factor of drought forecasting. Instead, the performance of long-range forecast data was evaluated against the use of climatological data (baseline), for the purposes of filling the future period of the lead time. A perfect forecast was simulated and improvable ranges were provided. The interpolation method was compared to machine learning models for drought forecasting for ungauged areas.

 

When the Long-Range Forecast Method (F method) was compared to the Climatology Method (C method) for gauges. The C method outperformed for both classification and regression. Classification accuracy rapidly decreased and regression MAE increased with longer lead times.

 

The Climatology-Interpolation Method (C-I method), the Long-Range Forecast-Interpolation Method (F-I method), the Climatology-Machine Learning Method (C-ML method), and the Long-Range Forecast-Machine Learning Method (F-ML method) were compared for drought forecasting in ungauged areas. Machine learning-based methods performed better than interpolation methods for both classification and regression, and the methods using climatology data outperformed the methods using long-range forecast. In most cases, the C-ML method performed the best in drought forecasting.

 

A perfect forecast was simulated and the Perfect Forecast-Interpolation (PF-I method) and the Perfect Forecast-Machine Learning (PF-ML method) were compared to the F-I and F-ML methods, respectively. Classification accuracy of the PF-I method was in the range of 0.34-0.38 for SPI6 and SPEI6, indicating that the classification accuracy will be still below 0.4 despite a perfect forecast. The PF-ML method produced higher classification accuracy values ranging from 0.39-0.56. The ET model performed the best for SPI6 and SPEI6 forecast. Regression MAE of PF-I method was in the range of 0.47-0.53, and MAE for the methods using machine learning models produced an even lower range of 0.35-0.42. The RF and ET models performed very well. The performance of the F-ML method is expected to surpass the C-ML method as the skill of long-range forecast improves.

 

The Vegetation-Climatology-Machine Learning Method (V-C-ML method) and the Vegetation-Long-Range Forecast-Machine Learning (V-F-ML method) were compared for NDWI forecasting. The classification accuracy of the V-C-ML method is 0.52-0.53 for 1 to 3-month lead times making it similar to the PF-ML method’s (ET model) performance for SPEI6 forecast. The V-F-ML method (ET model) performed the best for the 4 to 6-month lead time with the classification accuracy of 0.51-0.53. The increase of MAE with lead time was not steep. The performance either decreased or was only slightly improved with perfect forecast simulations.

 

The model that outperformed overall, was the one that was based on climatological data and utilized the machine learning method. Although the contribution of long-range forecast for drought forecasting was not large, the use of machine learning modeling based on remote sensing data contributed to the enhancement of drought forecasting skill. Drought forecasting based on long-range forecast is expected to outperform the forecasting based on climatological data, as the skill of the long-range forecast improves.

 

In conclusion, it is recommended to forecast SPI6 or SPEI6 values based on machine learning using climatological data to provide spatially distributed drought information with a spatial resolution of 0.05 × 0.05 °, as used in this study. The classification accuracy will be in the range of 0.47-0.52 with 1-month lead time, and will decrease to 0.21-0.35 with 6-month lead time for ungauged areas. The regression MAE will be in the range of 0.41-0.47 with 1-month lead time, and in the range of 0.56-0.59 with 6-month lead time for ungauged areas. The long-range forecast will be more useful with the improvement of their forecasting skill, the classification accuracy with perfect forecast will reach 0.50-0.56, and the regression MAE will reach to 0.35-0.40 for ungauged areas.

 

The models predicting NDWI were less sensitive to the performance of the long-range forecast, compared to the models predicting SPI6 or SPEI6, and showed higher classification accuracy as well as lower regression MAE. NDWI can be useful as an indicator related to environmental drought impacts. It is recommended to: (1) improve the performance of the long-range forecast in order to provide drought information to decision-makers using drought indicators familiar to them; (2) use models predicting NDWI; and (3) produce and disseminate drought information related to environmental drought impacts using NDWI. The classification accuracy of the V-C-ML method ranges between 0.51-0.53 and the regression MAE ranges between 0.45-0.50. Although this study used NDWI as a drought indicator, remote sensing-based drought indicators combining several variables, such as the Scaled Drought Condition Index (SDCI), can also be used.

 

The performance measure results derived from the models tested in this study can be used as baselines for future drought forecasting studies in the study area. To reduce uncertainty of drought forecasting, the developed model can be used with the existing drought information system as an ensemble.