Client: IT MNC
Application: Big Data Analytics Platform
Scope: Data Ingestion, ETL and Reporting
Tools/ Languages(s): HDFS, Hive, Sqoop, Spark, Kafka, Java, Scala, Cognos, Analytics
Monitoring millions of host machines and servers across geographies was never too easy for this IT infrastructure MNC. Different IT controls were in place through different tools for application updates, OS patch installations, antivirus updates, encryption and data loss prevention. However, compliance data pertaining to different controls was stored in different databases. There was no single view for gaining insights on non-compliant machines, areas of non-compliance or improvement trends in IT controls compliance. How a Hadoop based data lake solution was leveraged to achieve this?
Infrastructure managers were perplexed on getting different reports from different sources daily on dozens of IT control parameters. Many times data from one report was in contradiction with data from other report. Also, network logs were analysed in isolation from other IT control parameters. It was finally decided to create a single data lake on a Hadoop cluster for ingesting, processing, combining and storing structured data from 8 different databases and semi-structured data from network logs into a single Hive data warehouse.
A single dashboard solution with drill downs and drill-through reports was developed to provide single view of millions of machines and servers based upon machine types, OS types, location-wise, domain-wise and compliant vs non-compliant and periodic trends of compliance across the organization. Also, dashboards for monitoring network events like SSH connection, Deny/Drop on specific hosts and ports, Failed login attempts etc were provided. It helped Infrastructure BU in focusing on the areas of non-compliance and pugging those loopholes.