Friday, October 21, 2016

Hiring Analytics

Recently my wife asked me about hiring analytics. Though I am not into HR analytics, I had read enough of IBM Watson Analytics and Hadoop, Spark use cases and in my MS course in Texas A&M, I had studied Neural Networks - and I had wide range of data to talk about Hiring Analytics.

So, I could wing it off in a discussion and I decided to write what I said in this blog ( My Wife's suggestion to blog this)

Why use Analytics for Hiring?

In today's fast paced economy, companies tend to hire people with relevant skills from outside. For example, a bank is willingto hire a business analyst from a manufacturing sector - rather than train a banker in analytics.

In short, hiring is critical to build capabilities quickly. Therefore it becomes important to hire employees who can meet its requirements and fit into its corporate culture. There in lies the challenge: "How to hire someone from outside - who has relevant knowledge needed in banking and who will fit in with the existing corporate culture."

This challenge can be solved by using data analytics during the selection process.

Today, every individual creates tonnes of digital data and also leave a wide digital trail behind. By looking at this digital data trail, and other digital data, one can develop fairly sophisticated analytics tools for hiring.

The two most popular tools in Hiring Analytics are:

  1. LinkedIn Talent Solutions 
    Ebook on using LinkedIn Talent Solution can be seen here:
  2. IBM Watson for Hiring 

The benefits of hiring right candidates are well know: Will more likely perform better and stay longer.

What goes into this Analytics

Hiring Analytics works best when we use standard verifiable Biometric Data - things that can be verified:

  • How many jobs the person had till date?
  • How long they stayed in those jobs?
  • How many promotions they had? 
  • Level of Education?
  • Photos of the candidate on Internet.
  • Industry relevant skills?
  • Public records.
  • Frequency of continuous learning and development.
  • Affiliations to various organizations - cultural, political etc.
  • Cultural Background

In addition to basic resume, Data from Social media: Facebook, Twitter, and LinkedIn are used for this analysis. The Business insights from this analysis can provide insights on candidate sentiment on various organizational factors such as productivity, business growth, or other objectives.  

For a particular role, say for example a system architect, a company can then set a basic set of requirements which defines the basic talent pool. Once this requirements are collected, automatic tools from Monster, LinkedIn etc. can be used to scour through millions of profiles and each matching profile, all the relevant data points can be collected and modeled or numerically analyzed with various analytics tools - Sentiment Analysis using MapReduce, Skill level analysis using Recursive Neural Networks, Cultural fitment analysis using RNTN etc.

The results of this analysis can then be used to short-list a set of potential candidates from a large pool of potential candidates.

During the interview process, further information regarding the candidate can be captured using a basic survey or psychometric tests or other testing tools. Data from these tools can then used to identify & rank candidate based on company specific parameters such as:

  • Willingness to Join
  • Time to Join
  • Onboarding process
  • Skills & training development needs,
  • Retention schemes 
  • Cost to Company, (level of Salary expected by candidate)

The main objective of hiring analytics is to automate hiring process as much as possible and provide hiring managers with necessary information about the candidate - so that they make the right hiring decision.

Apart from actual hiring, Hiring analytics can also be used on existing staff or new staff - to know the retention ratios and plan for future hiring as well.

Closing Thoughts

Information Technology is rapidly transforming hiring process. Industry had come a long way from the days of placing recruitment advertisements in newspapers. Today companies can rapidly analyze millions of profiles and short list potential candidates - not just based on key word search, but based on candidates digital data trails - which gives a bigger picture than just the standard resume.

Hiring analytics is still in its infancy and company is still taking small baby steps. The power of analytics is enormous. Technical advances in data-driven analytics is being used for hiring. Companies are adopting predictive analytics to make hiring decisions and HR strategies. Data analytics can be used for other HR functions such as attrition risk management, employee sentiment analysis, employee skill training plan development, etc.

Someday in future, the analytics tools will themselves find & hire the right candidate - with no human involvement.

Recall the movie "Gattaca" -where companies use DNA analysis to hire candidates!
While the technology for DNA analysis exists today - it is not used in standard hiring, however, I guess agencies like NASA, NSA, CIA may be using it today but in secret!

Wednesday, October 19, 2016

Future of Business IT : APIs

There was a time when one had to walk into a secure facility to access a computer - it was called as the mainframe era. Today all computing power is available on a mobile devices over the Internet via APIs!

Mobile Apps and Web apps rely on APIs to connect, communicate programmatically over Internet. Google's recent purchase of Apigee underscores the need for application programming interfaces in today's connected economy.  In this new world of mobile apps, APIs are used to connect different systems via APIs. The constant exchange of data between systems powers the app economy.

For example when you open a Flipkart mobile app and browse a particular product supplied by a vendor, the app  in your mobile interacts with retailer's catalog via APIs and retailers site will also gather pricing data and other shipping data from vendor's side via another APIs to provide you a complete picture.

In short, Data is the new currency in the App world! The data is made available over API.

The Importance of APIs

As apps proliferate, the life-cycle of apps can be as low as few weeks!

In such a short life-cycle, it makes more sense to use APIs to collect, collate, present information to users and also transport information back to main IT systems. In short, APIs become more than just an interface, the APIs are the central hub of the applications.

APIs make business processes simpler and smoother by connecting firms' customer-facing apps to different back end IT systems. For example, one can launch a web site with uses facial recognition API from Clarifi to authenticate users, and present stock data from Bloomberg over another API and enable trade with another API.

The ability to integrate multiple APIs to create a seamless user friendly services is the need of the day. APIs allows firms to expand into new markets that was not possible before. For example, a bank in Vietnam can offer global investment services via APIs - which was not possible few years ago.

APIs enable interactions/transactions between a business and its customers across multiple devices, apps, social networks, business networks, and cloud services.

A Growing Market for APIs

Today, it is estimated that there are 15K - 20K APIs. This will grow into several millions by 2020. The market for paid API could exceed $3 Billion in 2020.

The size of the market implies that APIs are no longer a novelty. APIs are becoming he preferred way to interact with IT systems, exchange information/data and thus build valuable new products in the app economy.

Its no wonder that companies like IBM is offering open access to Watson, its AI platform via APIs. This allows IBM to attract new customers globally and develop an unprecedented range of new services.

Closing Thoughts

APIs as not just a technical concept. It is the preferred way to offer new, valuable services. Soon APIs will be the only way customers will interact, and APIs will be the way companies interconnect and interoperate.

Tuesday, October 18, 2016

Fintech Needs High-Performance Computing

Newer Fintech companies are planning to disrupt financial markets. According to Accenture, the newer Fintech companeis are targeting ever faster settlement times.

To compete, current incumbents will have to match the turnaround time of the newer Fintech companies. In order to get to such fast turnaround times with existing workloads - companies will need High Performance Computing (HPC)

Historically, Financial companies have been first adaptors of advanced computing technologies such as Mainframes in 1960's Unix servers in 1990's. Today, activities such as high-frequency trading, complex simulations and real-time analytics are built on dedicated data centers filled with a diverse set of HPC systems.

These HPC systems are used to gather, parse, analyze and act on huge amounts of data - often several Petabytes/day. Having greater computing power increases competitive advantages in the market.

Let us see how HPC aids in building competitive advantages.

The only way for financial companies to address challenges is to use HPC solutions. 

Now, lets look at what constitues  HPC systems. 

From a hardware perspective, HPC systems has four components:

Market Outlook

It is clear that HPC provides competitive advantages to financial companies. According to IDC, total global revenue for the HPC market (including servers, storage, software and services) will increase from $21 billion in 2014 to $31.3 billion by 2019!

Tuesday, August 16, 2016

Five core attributes of a streaming data platform

Currently, I am planning a system to handle streaming data. Today, in any data driven organization, there are several new streaming data sources such as mobile apps, IoT devices, Sensors, IP cameras, websites, Point-of-sale devices etc

Before designing a system, first we need to understand the attributes that are necessary to implement an integrated streaming data platform.

As a data-driven organization, the first core consideration is to understand what it takes to acquire all the streaming data: The Variety of data, Velocity of data and Volume of data. Once these three parameters are known, it is time to plan and design of an integrated streaming platform and allow for both the acquisition of streaming data and the analytics that make streaming applications.

For the system design, there are five core attributes that needs to be considered:

1. System Latency
2. System Salability
3. System Diversity
4. Durability
5. Centralized Data Management

System Latency

Streaming data platforms need to match the pace of the incoming data from various sources that is part of the stream. One of the keys to streaming data platforms is the ability to match the speed of data acquisition, data ingestion into the data lake.

This implies designing the data pipelines that required to transfer data into data lake, running the necessary processes to parse & prepare data for analytics on Hadoop clusters. Data quality is a key parameter here. It takes definite amount of time and compute resources for data to be sanitized - to avoid "garbage in, garbage out" situations.

Data Security and authenticity has to be established first before data gets ingested into data lake. As data is collected from a disparate sources, basic level of data authentication and validation must be carried out so the core data lake is not corrupted. For example in case of Real time Traffic analysis for Bangalore, one needs to check if the data is indeed coming from sensors in Bangalore, else all analysis will not be accurate.

Data security is one of the most critical components to consider when designing a data lake. The subject of security can be a lengthy one and has to be customized for each use case. There are many open source tools available that address data governance and data security; Apache Atlas (Data Governance Initiative by Hortonworks), Apache Falcon (automates data pipelines and data replication for recovery and other emergencies), Apache Knox Gateway (provides edge protection) and Apache Ranger (authentication and authorization platform for Hadoop) provided in Hortonworks Data Platform, and Cloudera Navigator (Cloudera enterprise edition) provided in the Cloudera Data Hub.

Depending on Data producers, data ingestion can be "pushed" or "pulled". The choice of pull or push defines the choice of tools and strategies for data ingestion.  If there is a need for real-time analytics, then the system requirements needs to be taken into consideration. In case of real-time analytics, the streaming data has to be fed into the compute farm without delays. This implies planning out the network capacities, compute memory sizing and compute capacity planning.

In case of on-demand or near real time analytics, then there will be additional latency for the data to be landed into the data warehouse or get ingested into data lake. Then the systems needed to feed the ingested data to BI system or a Hadoop cluster for analysis. If there are location based dependencies, then one needs to build a distributed Hadoop clusters.

System Salability

The size of data streaming from device is not a constant. As more data collection devices are added or when the data collection devices are upgraded the size of incoming data stream increases. For example, incase of IP Cameras, the data stream size will increase when the number of cameras increase or when the cameras are upgraded to collect higher resolution images or when more data is collated - in form of infrared images, voice etc.

Streaming data platforms need to be able to match the projected growth of data sizes. This means that streaming data platforms will need to be able to stream data from a large number of sources or/and bigger data sizes. As data size changes, all the connected infrastructure must be capable to scaling up (or scale out) to meet the new demands.

System Diversity

The system must be designed to handle diverse set of data sources. Streaming data platforms will need to support not just "new era" data sources from mobile devices, cloud sources, or the Internet of Things. Streaming data platforms will also be required to support "legacy" platforms such as relational databases, data warehouses, and operational applications like ERP, CRM, and SCM. These are the platforms with the information to place streaming devices, mobile apps, and browser click information into context to provide value-added insights.

System Durability 

Once the data is captured in the system and is registered in the data lake, the value of historical data depends on the value of historical analysis. Many of the data sets at the source could have changed and data needs to be constantly updated for meaningful analysis (or purged). This must be policy/rule based data refresh.  

Centralized Data Management 

One of the core tenants of a streaming data platform is to make the entire streaming system easy to monitor and manage. This make the system easier to maintain and sustain. Using a centralized architecture, streaming data platforms can not only reduce the number of potential connections between streaming data sources and streaming data destinations, but they can provide a centralized repository of technical and business meta data to enable common data formats and transformations.

A data streaming system can easily contain hundreds of nodes. This makes it important to use tools that monitor, control, and process the data lake. Currently, there are two main options for this: Cloudera Manager and Hortonworks' open source tool Ambari. With such tools, you can easily decrease or increase the size of your data lake while finding and addressing issues such as bad hard drives, node failure, and stopped services. Such tools also make it easy to add new services, ensure compatible tool versions, and upgrade the data lake.

For a complete list of streaming data tools see: 

Closing Thoughts 

Designing a streaming data platform for data analysis is not easy. But having these five core attributes as the foundation for designing  the streaming data platform, one can ensure it is built on robust platform and a complete solution that will meet the needs of the data-driven organization will be built upon.

Understanding the Business Benefits of Colocation

Digital transformation and the move towards private cloud is really shaking up the design and implementation of data centers. As companies start their journey to the cloud, they realize that having sets of dedicated servers for each application will not help them and they need to change their data centers.

Historically, companies started out with a small server room to host a few servers that run their business applications. The server room was located in their office space and it was a small set up. As business became more compute centric, the small server room became unviable. This led to data centers - which was often located in their office.

The office buildings were not designed for hosting data centers and had to be modified to get more air-conditioning, networking and power into the data center. The limitation of existing buildings created limitation on efficient cooling and power management.

But now when companies are planning to move to private cloud, they are now seeing huge benefits of having a dedicated data center - a purpose built facility for hosting large number of computers, switches, storage and power systems. These purpose built data centers are built with better power supply solutions, better & more efficient liquid cooling solutions and more importantly offer a wide range of networking connectivity and network services.

As a result these dedicated data centers can save money on IT operations and also provide greater reliability & resilience.

But not all companies need a large data center which can benefit from economies of scale. Very few large enterprises really have a need for such large dedicated purpose built data centers. So there is a new solution - Colocation of Data Centers.

For CIOs colocation provides the perfect win-win scenario, providing cost savings and delivering state-of-the-art infrastructure. When comparing the capabilities of a standard server room to a colocated data center solution, often times, the benefit from power bills alone is enough to justify the project.

These dedicated data centers are built on large scale in Industrial zones with dedicated power lines and backup power systems - that the power cost will be much lower than before. Moreover, these dedicated data centers can employ newer & more efficient cooling systems that reduces the over all power consumption of the data center.

Business Benefits of Colocation

Apart from reductions in operational expenditure, there are several other benefits from colocation. Having a dedicated team where people are available 24/7/365 to monitor and manage the IT infrastructure is a huge benefit.

  1. Cost Savings on Power & Taxes

    Dedicated data centers are built in locations that offer cheap power. Companies can also negotiate the tax breaks for building in remote or industrial areas. In addition to a lower price of power, the data centers are designed to include diverse power feeds and efficient distribution paths. These data centers are dual generator systems that can be refueled while in operation as well as on-site fuel reserves, and have multiple UPS support in place.

    In addition to power costs, dedicated data centers will have engineers and technicians who will monitor the power levels, battery levels 24/7 so that the center has 100% uptime.

    Additionally, data centers have the time, resources and impetus to continually invest in and research green technologies. This means that businesses can reduce their carbon footprint at their office locations and benefit from continual efficiency saving research. Companies that move their servers from in-house server rooms typically save 90 percent on their own carbon emissions.

  2. Network Connected Globally, Securely and QuicklyToday, high speed network connectivity is the key to business. And it is lot more difficult to get big fat pipes of network connectivity into central office locations. It is lot more easier to get network connectivity to a centralized data center. A dedicated data center will have many network service providers providing connectivity and often at a lower price than at a office location.

    Dedicated data centers also provide resilient connectivity at a fairly low price – delivering 100 Mbps of bandwidth might be hard at an office location and trying to create a redundant solution is often financially unviable. Data centers are connected to multiple transit providers and also have large bandwidth pipes meaning that businesses often benefit from a better service for less cost.

    Colocation enables organizations to benefit from faster networking and cheaper network connections.

  3. Monitoring IT InfrastructureBuilding a dedicated data center makes it easier to monitor the health of IT infrastructure. The economies of scale that comes from colocation helps to build a robust IT Infrastructure monitoring solution that can monitor the entire IT infrastructure and ensure SLAs are being met.

  4. Better SecurityA dedicated data center and colocation will have better physical security than a data center in a office location. The physical isolation of the data center enables the service provider to provide better security measures that include biometric scanners, closed circuit cameras, on-site security, coded access, alarm systems, ISO 27001 accredited processes, onsite security teams and more. With colocation, all these service costs are shared - thus bringing down the costs while improving the level of security.

  5. ScalabilityPlatform 3 paradigms such as digital transformation, IoT, Big Data, etc are driving up the scale of IT infrastructure. As the demand for computing shoots up with time, the data center must be able to cope up with it. With colocation, the scale up requirements can be negotiated ahead of time and with just one call to the colocation provider and scale of the IT  infrastructure can be increased as needed.

    Data centers and colocation providers have the ability to have businesses up and running within hours, as well as provide the flexibility to grow alongside your organization. Colocation space, power, bandwidth and connection speeds can all be increased when required.

    The complexity of rack space management, power  management etc is outsourced to the colocation service provider.

  6. Environment friendly & Green ITA large scale data center has more incentives to run a greener IT operations as it results in lower energy costs. Often times these data centers are located in Industrial areas where better cooling technologies can be safely deployed - which makes it possible to improve the over all operational efficiency. Typically, a colocated data center often adhere to global green standards such as ASHRAE 90.1-2013, ASHRAE 62.1, LEED certifications etc.

    A bigger data centers enable better IT ewaste management and recycling. Old computers, UPS Batteries and other equipment can be safely & securely disposed.

  7. Additional Services from Colocational PartnersColocational service providers may also host other cloud services such as:

    a. Elastic salability to ramp up IT resources when there is seasonal demand and scale down when demand falls.

    b. Data Backups and Data Archiving to tapes and storing it in secure location

    c. Disaster Recovery in a multiple data centers & Data Redundancy to protect from data loss incase of natural disasters.

    d. Network Security and Monitoring against malicious network attacks - this is usually in form of a "Critical Incident Center." Critical Incident centers are like Network Operation Center - but monitors the data security and active network security. 
Closing Thoughts

In conclusion, a dedicated data center offers tremendous cost advantages. But if the company's IT scale does not warrant a dedicated data center, then the best option is to move to a colocated data center. Colocation providers are able to meet business requirements at a lower cost than if the service was kept in-house.

A colocation solution provides companies with a variety of opportunities, with exceptional SLAs and having data secured off-site, providing organizations with added levels of risk management and the chance to invest in better equipment and state-of-the-art servers.