Monday, June 28, 2021
Types of Graph Analysis
Monday, June 21, 2021
Fintech Use Case for Graph Analytics
In my previous blog, I had written about high-level use cases for Graph Analytics. In today’s blog, lets' dive in deeper and take a look at how Fintech companies can use graph analytics to allocate credit to customers & manage risks.
Today, banks and other Fintech companies have access to tonnes of information – but their current databases and IT solutions do not have the ability to make the best use of it. Customer information such as KYC data, and other demographic data are often storied in traditional RDBMS database, while transactional data is stored in a separate database, customer interactions on web/mobile apps, customer interactions data are stored in Big Data HDFS stores, while the data from Social network or other network data about customers are often not even used.
This is where graph databases such as Neo4J or Oracle Autonomous database etc come into play.
A graph database can connect the dots between different sources of information and one can build a really cool, intelligent AI solution to make predictions on future purchases, credit needs, and risks. Prediction data can then be validated with actual transactional data to iterate and build better models.
Graph databases are essentially built for high scalability and performance. There are several open-source algorithms and libraries that can detect components and make connections within the data, you can evaluate these connections and start making predictions, which over time will only get better.
Thursday, August 30, 2018
Interesting Careers in Big Data
Big Data & Data analytics has opened a wide range of new & interesting career opportunities. There is an urgent need for Big Data professionals in organizations.
Not all these careers are new, and many of them are remapping or enhancements of older job functions. For example, a Statistician was formerly deployed mostly in government organization or in sales/manufacturing for sales forecast for financial analysis, statisticians today have become center of business operations. Similarly, Business analysts have become key for data analytics – as business analysts play a critical role of understanding business processes and identifying solutions.
Here are 12 interesting & fast growing careers in Big Data.
1. Big Data Engineer
Architect, Build & maintain IT systems for storing & analyzing big data. They are responsible for designing a Hadoop cluster used for data analytics. These engineers need to have a good understanding of computer architectures and develop complex IT systems which are needed to run analytics.
2. Data Engineer
Data engineers understand the source, volume and destination of data, and have to build solutions to handle this volume of data. This could include setting up databases for handling structured data, setting up data lakes for unstructured data, securing all the data, and managing data throughout its lifecycle.
3. Data Scientist
Data Scientist is relatively a new role. They are primarily mathematicians who can build complex models, from which one extract meaningful analysis.
4. Statistician
Statisticians are masters in crunching structured numerical data & developing models that can test business assumptions, enhance business decisions and make predictions.
5. Business Analyst
Business analysts are the conduits between big data team and businesses. They understand business processes, understand business requirements, and identify solutions to help businesses. Business analysts work with data scientists, analytics solution architects and businesses to create a common understanding of the problem and the proposed solution.
6. AI/ML Scientist
This is relatively a new role in data analytics. Historically, this was part of large government R&D programs and today, AI/ML scientists are becoming the rock stars of data analytics.
7. Analytics Solution Architects
Solution architects are the programmers who develop software solutions – which leads to automation and reports for faster/better decisions.
8. BI Specialist
BI Specialists understand data warehouses, structured data and create reporting solutions. They also work with business to evangelize BI solutions within organizations.
9. Data Visualization Specialist
This is a relatively new career. Big data presents a big challenge in terms of how to make sense of this vast data. Data visualization specialists have the skills to convert large amounts of data into simple charts & diagrams – to visualize various aspects of business. This helps business leaders to understand what’s happening in real time and take better/faster decisions.
10. AI/ML Engineer
These are elite programmers who can build AI/ML software – based on algorithms developed by AI/ML scientists. In addition, AI/ML engineers also need to monitor AL solutions for the output & decisions done by AI systems and take corrective actions when needed.
11. BI Engineer
BI Engineers build, deploy, & maintain data warehouse solutions, manage structured data through its lifecycle and develop BI reporting solutions as needed.
12. Analytics Manager
This is relatively a new role created to help business leaders understand and use data analytics, AI/ML solutions. Analytics Managers work with business leaders to smoothen solution deployment and act as liaison between business and analytics team throughout the solution lifecycle.
Tuesday, August 21, 2018
Fundamentals of Data Management in the Age of Big Data
Data management, data privacy & security risks pose a great management challenge. In order to address these challenges, companies need to put proper data management policies in place. Here are eight fundamental policies of data management that needs to be adhered to by all companies.
Friday, August 17, 2018
4 Types of Data Analytics
Data analytics can be classified into 4 types based on complexity & Value. In general, most valuable analytics are also the most complex.
1. Descriptive analytics
Descriptive analytics answers the question: What is happening now?
For example, in IT management, it tells how many applications are running in that instant of time and how well those application are working. Tools such as Cisco AppDynamics, Solarwinds NPM etc., collect huge volumes of data and analyzes and presents it in easy to read & understand format.
Descriptive analytics compiles raw data from multiple data sources to give valuable insights into what is happening & what happened in the past. However, this analytics does not what is going wrong or even explain why, but his helps trained managers and engineers to understand current situation.
2. Diagnostic analytics
Diagnostic Analytics uses real time data and historical data to automatically deduce what has gone wrong and why? Typically, diagnostic analysis is used for root cause analysis to understand why things have gone wrong.
Large amounts of data is used to find dependencies, relationships and to identify patterns to give a deep insight into a particular problem. For example, Dell - EMC Service Assurance Suite can provide fully automated root cause analysis of IT infrastructure. This helps IT organizations to rapidly troubleshoot issues & minimize downtimes.
3. Predictive analytics
Predictive analytics tells what is likely to happen next.
It uses all the historical data to identify definite pattern of events to predict what will happen next. Descriptive and diagnostic analytics are used to detect tendencies, clusters and exceptions, and predictive analytics us built on top to predict future trends.
Advanced algorithms such as forecasting models are used to predict. It is essential to understand that forecasting is just an estimate, the accuracy of which highly depends on data quality and stability of the situation, so it requires a careful treatment and continuous optimization.
For example, HPE Infosight can predict what can happen to IT systems, based on current & historical data. This helps IT companies to manage their IT infrastructure to prevent any future disruptions.
4. Prescriptive analytics
Prescriptive analytics is used to literally prescribe what action to take when a problem occurs.
It uses a vast data sets and intelligence to analyze the outcome of the possible action and then select the best option. This state-of-the-art type of data analytics requires not only historical data, but also external information from human experts (also called as Expert systems) in its algorithms to choose the bast possible decision.
Prescriptive analytics uses sophisticated tools and technologies, like machine learning, business rules and algorithms, which makes it sophisticated to implement and manage.
For example, IBM Runbook Automation tools helps IT Operations teams to simplify and automate repetitive tasks. Runbooks are typically created by technical writers working for top tier managed service providers. They include procedures for every anticipated scenario, and generally use step-by-step decision trees to determine the effective response to a particular scenario.
Thursday, August 16, 2018
Successful IoT deployment Requires Continuous Monitoring
Growth of the IOT has created new challenges to business. The massive volume of IoT devices and the deluge of data it creates becomes a challenge — particularly when one uses IoT as key part of their business operations. These challenges can be mitigated with real-time monitoring tools that has to be tied to the ITIL workflows for rapid diagnostics and remediation.
Failure to monitor IoT devices leads to a failed IoT deployment.
Thursday, July 26, 2018
Wednesday, July 25, 2018
Why Edge Computing is critical for IoT success?
Edge computing is the practice of processing data near the edge of your network, where the data is being generated, instead of in a centralised data-processing warehouse.
Edge computing is a distributed, open IT architecture that features decentralised processing power, enabling mobile computing and Internet of Things (IoT) technologies. In edge computing, data is processed by the device itself or by a local computer or server, rather than being transmitted to a data centre.
Edge computing enables data-stream acceleration, including real-time data processing without latency. It allows smart applications and devices to respond to data almost instantaneously, as its being created, eliminating lag time. This is critical for technologies such as self-driving cars, and has equally important benefits for business.
Edge computing allows for efficient data processing in that large amounts of data can be processed near the source, reducing Internet bandwidth usage. This both eliminates costs and ensures that applications can be used effectively in remote locations. In addition, the ability to process data without ever putting it into a public cloud adds a useful layer of security for sensitive data.
Thursday, July 19, 2018
5 Pillars of Data Management for Data Analytics
Data is the lifeblood for Big data analytics and all the AI/ML solutions built on top.
Here are 5 basic data management principles that must never be broken.
1. Secure Data at Rest
- Most of the data is stored in storage systems which must be secured.
- All data in storage must be encrypted
2. Fast & Secure Data Access
- Fast access to data from databases, storage systems. This implies using fast storage servers and FC SAN networks.
- Strong access control & authentication is essential
3. Manage Networks for Data in Transit
- This involves building fast networks - a 40Gb Ethernet for compute clusters and 100Gb FC SAN networks
- Fast SD-WAN technologies ensure that globally distributed data can be used for data analytics.
4. Secure IoT Data Stream
- IoT endpoints are often in remote locations and have to be secured.
- Corrupt data from IoT will break Analytics.
- Having Intelligent Edge helps in preprocessing IoT data - for data quality & security
5. Rock Solid Data backup and recovery
- Accidents & Disasters do happen. Protect from data loss & data unavailability with a rock solid data backup solutions.
- Robust disaster recovery solutions can give zero RTO/RPO.
Wednesday, July 18, 2018
Business Success with Data Analytics
Data and advanced analytics have arrived. Data is becoming ubiquitous but several organizations are struggling to use data analytics in everyday business process. Companies who adapt data analytics in the truest and deepest levels will have a significant competitive advantage, ; those who fall behind risk becoming irrelevant. Analytics has the potential to upend the prevailing business models in many industries, and CEOs are struggling to understand how analytics can help.
Here are 10 key points that must be followed to succeed.
- Understand how Analytics can disrupt your industry
- Define ways in which Analytics can create value & new opportunities
- Top managers should learn to love metrics and measurements
- Change Organizational structures to enable analytics based decision making
- Experiment with data driven, test-n-learn decision making processes
- Data Ownership must be well defined & Data Access must be made easier
- Invest in data management, data Security & analytics tools
- Invest in training & hiring people to drive analytics
- Establish Organizational Benchmarks for data analytics
- Layout a long term road map for business success with Analytics
Wednesday, July 04, 2018
Top Challenges Facing AI Projects in Legacy Companies
Companies relutcantly start few AI projects - only to abandon them.
Here are are the top 7 challenges AI projects face in legacy companies:
1. Management Reluctance
Fear of Exacerbating asymmetrical power of AI
Need to Protect their domains
Pressure to maintain statusquo
2. Ensuring Corporate Accountability
Internal Fissures
Legacy Processes hinder accountability on AI systems
3. Copyrights & Legal Compliance
- Inability to agree on data copyrights
- Legacy Processes hinder compliance when new AI systems are implemented
4. Lack of Strategic Vision
- Top management lacksstrategic vision on AI
- Leaders are unaware of AI's potential
- AI projects are not fully funded
5. Data Authenticity
- Lack of tools to verify data Authenticity
- Multiple data sources
- Duplicate Data
- Incomplete Data
6. Understanding Unstructured Data
- Lack of tools to analyze Unstructured data
- Middle management does not understand value of information in unstructured data
- Incomplete data for AI tools
7. Data Availability
- Lack of tools to consolidate data
- Lack of knowledge on sources of data
- Legacy systems that prevent data sharing
Monday, July 02, 2018
Big Data Analytics for Digital Banking
Big Data has a huge impact on banking, especially in the era of digital banking.
Here are six main benefits for data analytics for banks.
1. Customer Insights
Banks can follow customer's social media & gain valuable insights on customer behavior patterns
Social media analysis gives a more accurate insights than traditional customer surveys
Social media analysis can be near real time, thus helping understand customer needs better
2. Customer Service
Big data analysis based on customer's historical data, current web data can be used to identify customer issues proactively and resolve them even before customer complains
Eg: Analyzing customers geographical data can help banks optimize ATM locations
3. Customer Experience
Banks can use big data analytics to customize website in real time - to enhance customer experience.
Banks can use analytics to send real time messages/communications regarding account status etc.,
With Big Data analytics, Banks can be proactive to enhance custoemr service.
4. Boosting Sales
Social media analysis gives a more accurate insights into customer's needs and help promote the right banking products to customers. For e.g., customers looking at housing advertisements and discussing housing finance in social media - are most likely in need of a housing loan.
Data analytics can accurately acess customer's needs & banks can promote right types of solutions.
5. Fraud Detection
Big Data analysis can detect fraud in real time and prevent it
Data from third parties and banking networks holds valuable information about customer interactions.
6. New Product Introduction
Big Data analysis can identify new needs and develop products that meet those needs
Eg: Mobile Payment services, Open Bank APIs, ERP Integration gateways, International currency exchange services etc are all based on data analytics
Wednesday, June 20, 2018
Data Life Cycle Management in the Age of Big Data
Organizations are eager to harness the power of big data. Big data creates tremendous opportunities and challenges.
The data lifecycle stretches through multiple phases as data is created, used, shared, updated, stored and eventually archived or defensively disposed. Data lifecycle management plays an especially key role in three of these phases of data’s existence:
1. Disclose Data
2. Manipulate Data
3. Consume Data
Organizations can benefit from data only if they can manage the entire data lifecycle, focus on good governance, use, share and monetize data.
Tuesday, June 19, 2018
How Machine Learning Aids New Software Product Development
Developing new software products has always been a challenge. The traditional product management processes for developing new products takes lot more time/resources and cannot meet needs of all users. With new Machine Learning tools and technologies, one can augment traditional product management with data analysis and automated learning systems and tests.
Traditional New Product Development process can be broken into 5 main steps:
1. Understand
2. Define
3. Ideate
4. Prototype
5. Test
In each of the five steps, one can use data analysis & ML techniques to accelerate the process and improve the outcomes. With Machine Learning, the new 5 step program becomes:
- Understand – Analyze:Understand User RequirementsAnalyze user needs from user data. In case of Web Apps, one can collect huge amounts of user data from Social networks, digital surveys, email campaigns, etc.
- Define – Synthesize: Defining user needs & user personas can be enhanced by synthesizing user's behavioral models based on data analysis.
- Ideate – Prioritize: Developing product ideas and prioritizing them becomes lot faster and more accurate with data analysis on customer preferences.
- Prototype – Tuning: Prototypes demonstrate basic functionality and these prototypes can be rapidly, automatically tuned to meet each customer needs. This aids in meeting needs of multiple customer segments.Machine Learning based Auto-tuning of software allows for rapid experimentation and data collected in this phase can help the next stage.
- Test – Validate: Prototypes are tested for user feedback. ML systems can receive feedback and analyze results for product validation and model validation. In addition, ML systems can auto-tune, auto configure products to better fit customer needs and re-test the prototypes.
Closing Thoughts
For a long time, product managers had to rely on their understanding of user needs. Real user data was difficult to collect and product managers had to rely on surveys and market analysis and other secondary sources for data. But in the digital world, one can collect vast volumes of data, and use data analysis tools and Machine learning to accelerate new software product development process and also improve success rates.
Thursday, May 17, 2018
How to select uses cases for AI automation
AI is rapidly growing and companies are actively looking at how to use AI in their organization and automate things to improve profitability.
Approaching the problem from business management perspective, the ideal areas to automate will be around the periphery of business operations where jobs are usually routine, repetitive but needs little human intelligence - like warehouse operators, metro train drivers etc., These jobs follow a set pattern and even if there is a mistake either by human operator or by a robot - the costs are very low.
Business operations tends to employ large number of people with minimum skills and use lots of safety systems to minimize costs of errors. It is these areas that are usually the low hanging fruits for automation with AI & robotics..
Developing an AI application is a lot more complex, but all apps have 4 basic steps:
1. Identify area for automation: Areas where automation solves a business problem & saves money
2. Identify data sources. Automation needs tones of data. So one needs to identify all possible sources of data and start collecting & organizing all the data
Once data is collected, AI applications can be developed. Today, there are several AI libraries and AI tools to develop new applications. My next blog talks about all the popular AI application development tools.
Once an AI tool to automate a business process is developed, it has to be deployed, monitored and checked for additional improvements - which should be part of regular business improvement program.
Monday, May 14, 2018
Popular Programming Languages for Data Analytics
I have listed down the most popular programming languages for data analysis.
Thursday, May 03, 2018
Data Analytics for Competitive Advantage
Data Analytics is touted as 'THE" tool for competitive advantage.
In this article, I have done a break down of data analytics into its three main components and further listed down various activities that are done in each category.
Three Main Components of Data Analytics
1. Data Management
2. Standard Analytics
3. Advanced Analytics
Data Management
Data Management forms the foundation of data analytics. About 80% of efforts & costs are incurred in data management functions. The world of data management is vast and complex, it consists of several activities that needs to be done:
1. Data Architecture
2. Data Governance
3. Data Development
4. Data Security
4. Master Data Management
5. Metadata Management
6. Data Quality Management
7. Document & Content Management
8. Database & Data warehousing Operations
Standard Analytics
Advanced Analytics
Friday, April 13, 2018
Better Way to Manage Big Data with Containers
Storing data in Software Defined Storage systems (using HDFS over Object Store) allows for shared storage and also allows for in-place analytics, this is more efficient than copying all the files over to local disks - thus saving time & getting faster time to insights.
Also running on containers, allows IT to offer BigData-as-a-Service model as the entire infrastructure can be managed with existing cloud management tools, and this simplifies & reduces time for new deployments.