Taming the Haystack of Data to find the Needles

Next generation research tooling is built on massive and diverse data aggregation.

From gigabytes to petabytes and databases to data lakes, information throughout society is growing in scale, diversity, and complexity. Until very recently, it’s also been growing beyond technology’s ability to gather it, process it, and make sense of it all. In science and complex research with myriad variables, the data glut is growing with the addition of new types of data sets from mobile, the Internet of Things, cloud services, and sensor-enabled tools.

But taming massive haystacks of data to find the hidden needles of actionable insight is now possible. Next-generation data management platforms can ingest or federate any data source on-premise or in the cloud into a harmonized view in real-time, securely, and make it easily accessible in a unified decision and innovation layer.

An example of how these features can benefit scientific research will soon be deployed, 22,000 miles above the Earth.

Getting the Jump on Solar ‘Objects of Interest’

Satellite components that handle communications, electrical power, GPS, and other functions can be severely damaged by certain types of solar activities, like solar flares and solar wind. Satellite engineers call these ‘objects of interest’ and they can damage satellite components by generating ionizing radiation, high-energy atomic nuclei, extreme heat, and other effects. Damage may include knocking out a satellite’s power and communications and even altering its orbit.

A major satellite vendor wanted to enhance the ability of researchers to classify and detect solar objects of interest. The goal was to be able to detect the warning signs of impending solar events so that proactive steps can be taken to protect satellites. Proactive protective measures include the raising of satellite shields and turning off sensors so they do not sustain damage.

A Unified Data Platform 

SAP was enlisted to help design a proof of concept data platform that could generate a unified model to do solar objects of interest classification. The data platform would be used by a deep neural network algorithm to automate the classification process, identify classes for different events, and predict the next time an event is likely to occur. Currently this process is handled by different algorithms for different classes of solar events. Data is dispersed instead of integrated. A unified platform that could aggregate all types of data, quickly, could provide better information and in real-time.

The SAP team downloaded and stored relevant history of solar images in Hadoop and then configured the SAP HANA Data Management Suite to process and classify images using Python libraries and the TensorFlow framework for dataflow programming.  Orchestration and automation of necessary processes were also enabled.

With this data platform in place, the neural network can sort through the information, imposing parameters that allow the artificial intelligence to focus on the information pertaining to two specific types of solar events for the PoC.

Training the Neural Network

A large amount of historical data was used to train the expert system. That data included years of solar activity, impacts to satellites, damage to satellite infrastructure, and patterns to better calculate warning signs of solar events (like increases in magnetism, solar spots, solar cycles). The SAP platform also was used to aggregate data from many other sources (including telescopes around the world and other satellites). Fully automated, it aggregates all of the data into a single data set that SAP HANA can process in-memory, at tremendous speed and scale.

Scientific research around the world and in the farthest reaches of navigable space will be greatly enhanced with this type of unified data platform, able to aggregate diverse types of data from myriad sources in real-time. Combined with artificial intelligence and automation, these next generation research tools will play a prominent role in cutting-edge science in the coming years.

To eliminate information fragmentation within your company and become a more intelligent enterprise, learn more about the SAP HANA Data Management Suite.

VN:F [1.9.22_1171]
Average User Rating
Rating: 5.0/5 (3 votes cast)
Taming the Haystack of Data to find the Needles, 5.0 out of 5 based on 3 ratings

197 Views