Saturday, February 28, 2015

Data Lake: Storage for Hadoop & Big Data Analytics

Companies have been tackling data challenges at scale for many years now. The vast majority of data produced within the enterprise comes from ERP, CRM, & other systems supporting a given enterprise function. This data was stored in Enterprise Data Warehouse (EDW)  for Business Intelligence analytics. Naturally, many organizations tried to use the existing Data Warehouses to serve as model for Big Data analytics.

As companies adapt big data analytics in a big way, the existing enterprise data warehouse (EDW) systems built on Network File Systems (NFS) are not scaling up.

Big data analytics on primary storage was good for a proof-of-concepts, but when it comes to production workloads, the traditional EDW systems on NFS arrays are too expensive and inadequate.

Effort to use EDW for Big data makes no sense!

Storing BIG data on Data Lake  

Apache Hadoop is designed to be a distributed file system and it has all the embedded availability, replication and protection mechanisms you need for storing huge amounts of data safely and, above all, it's very inexpensive. HDFS storage can be created  by simply adding disks into cluster nodes, and all the analytics & data management tools are running on the server.

Despite its advantage of being local to a cluster, HDFS has its challenges. One has to move large chunks of data from various systems into the cluster. Data in HDFS cluster is also poses a security risk and there is an inherent data loss problem with HDFS. Also companies are bound by various regulations and rules to protect and retain data. All this increases the total cost of acquisition and total cost of big data analytics.

An enterprise data lake - such as EMC Data Lake Foundation complement existing EDW and provides additional core benefits:

New efficiencies

Increase efficiency for data architecture through EMC Data Lake Foundation. ECS and Isilon significantly lower cost of storage and through optimization of data processing workloads such as data transformation and integration.  ECS provides a lower cost than a traditional Hadoop, while Isilon optimizes data processing workloads such as data transformation and integration.

New opportunities 

With EMC Data Lake Foundation, all data for analytics can be accessed - without having to copy over the data to a Hadoop cluster. This allows for a flexible 'schema-on-read' access to all enterprise data, and through multi-use and multi-workload data processing on the same sets of data: for both batch to real-time analytics.

EMC Data Lake Foundation is designed to support Apache Hadoop: HDFS for data storage & Hadoop YARN.

With ECS, HDFS data is stored in commodity drives - that provides scalable and reliable data storage that is designed to span across multiple data centers and even across continents. As a result, companies can build stable, reliable, & highly scalable data lakes - that span across multiple data centers.

Apache Hadoop YARN. YARN provides a pluggable architecture and resource management for data processing engines to interact with data stored in HDFS.

Hadoop YARN allows Multi-use, Multi-workload Data Processing. By supporting multiple access methods (batch, realtime, streaming, in-memory, etc.) to a common data set, Hadoop enables analysts to transform and view data in multiple ways (across various schemas) to obtain closed-loop analytics by bringing time-to-insight closer to real time than ever before.

Enterprise Scale Data Lake

EMC Data Lake Foundation is designed to provide Enterprise class data management capability - while keeping the costs low, and without disrupting the existing EDW workflows. ECS is designed to use commodity drives - that allows for a dramatically lower overall cost of storage. In particular when compared to standard enterprise NAS systems. The scale-out commodity storage with ECS provides a compelling alternative to Hadoop storage in commodity servers. ECS allows user to scale out their storage as and when their data needs grow, and completely decouples growth of compute from storage. This cost dynamic makes it possible to store, process, analyze, and access more data than ever before.

Cost of EMC Data Lake Foundation is designed to be lower than traditional Hadoop cluster.

EMC Data Lake Foundation would augment exiting ETL systems with Hadoop. For eg: With traditional EDW-BI Applications, companies would store only one year of raw data and store the BI reports in NAS. With Hadoop, it is possible to store 10 years of raw data plus all the BI ETL results. This results in much richer applications with far greater historical context. This allows companies to keep all source data and ETL results for future analytics.

Benefits of Unified Enterprise Data with EMC Data Lake foundation  

  • Store & Process all corporate data
  • Access all data simultaneously in multiple ways: Batch, Interactive, Real-Time
  • Automate all data management based on Policy
  • Provide Enterprise grade security for all data: Access control, Authentication, Data protection.
  • Use existing data management & security tools to manage Hadoop data
  • Enable both existing & new analytics applications to provide value to the organization
  • Provide a geo-spread scale-out data lake; but with single plane of management.
  • Provide a choice of data storage systems including traditional SAN array, NAS array, scale-out NAS, Object Storage and Hadoop
  • Efficient Data management Operations: Provision, manage, monitor & operate big data at a global scale.

In a nutshell, EMC Data Lake Foundation allows companies to:

  1. Collect everything: Store all data, both raw and results for extended periods of time.
  2. Dive in anywhere: Enables users across multiple business units to refine, explore and enrich data on their terms.
  3. Flexible access: Access data in multiple ways across a shared infrastructure: batch, interactive, online, search, in-memory and other processing engines.

The result:- EMC Data Lake Foundation delivers maximum scale and insight with the lowest possible friction and cost.

Friday, February 27, 2015

Product Management: Manage expectations & Create value

What do successful companies know about creating new products?

When you study successful products, there is one thing that is common. Successful products create value to customers.

Value can be delivered in many ways. Henry Ford did not invent the car nor did he invent the assembly line. But he combined them to drive down the cost of a new car and passed on that savings to customers - that revolutionized an industry.

In every industry, there are several ways to create value to customers. For example, in case of phones: Graham Bell invented the phone, but Motorola added value to it by making it mobile. Nokia further changed the customer value by making cell phones fashionable. Blackberry converted phones into secure office communication device. Apple made the cell phone personal and interactive. Note that all through the cell phone evolution, the cost of cell phone did not really go down - but customer found value in different ways. Providing value to customers is not just about adding new features or reducing prices.

To customers, the value is the difference between what they perceive and what they pay for the product. In the world of technology - be it Phones, Computers or software, the customer value is not in lowering prices or by reducing costs, the real customer value is done by adding features that make customers feel they are getting more with each version of the product.

Too often, I see companies focus too much on the costs - that they neglect the most important goal: determining why someone would want to buy their product. For example, Nokia fought a long & losing battle by releasing cheaper touch screen phones - but totally forgot the user experience. BY the time, Nokia released a really good touch screen phone, Nokia had ignored the Apps.

Today doing a new product that's cheaper simply doesn't cut it. Microsoft HyperV is free with Windows Server, but customers still prefer VMWare!

For developing successful products, you have to concentrated on growing value for customers. Leaders in successful product companies know their customers' economic expectations and have the skills to deliver it to customers.

Understanding Hyperconverged Infrastructure

Computer technology undergoes a massive shift every so often as new models emerge to meet changing business needs. Explosive growth of Mobile apps  & Big Data has spurred uncontrolled demand on IT & has put more strain on existing resources. The existing data centers were built around setting up purpose built  infrastructure - that just cannot scaleup to the new needs.

Discreet Servers, Network Switches/Routers, Storage array (SAN, NAS) which dominated the datacetner is getting replaced by converged infrastructure such as VBLOCK or Flexpod.

At the basic level, converged infrastructure simply brings together existing individual storage, compute, and network switching products into pre-tested, prevalidated solutions sold as a single solution.

These converged infrastructure was still built out of discreet servers (Cisco UCS), discreet switches (Cisco Nexus) & discreet Storage array (EMC VMAX/VNX).  VCE, the vendor of converged infrastructure would integrate all the discreet components together, and have the setup pre-configured in factory before shipping it off the customers. It simplified the purchase and upgrade cycle.

Converged Infrastructure systems did offer a few benefits:

  1. Single point of contact for their infrastructure, from purchase to end of life.
  2. These systems are always tested and almost always arrive at the customer site fully racked & cabled, so they're ready to go.

While converged infrastructure saved time/money to customers in terms of standardizing IT infrastructure and having faster time to deployment, it still did not solve some of the niggling issues with their IT infrastructure.

Virtualization of compute - with VMWare ESX solved server utilization problem. But for network & storage, Utilization, planning, configuration, & change management was still be big headache.  Different tools were needed to manage underlying components: Servers to be managed with UCS manager, Network with Nexus Manager, Storage with Unisphere, and vCenter for VM management. A common single unified tools were sorely missing.

Converged Infrastructure fails to address ongoing operational challenges that have been introduced with the advent of virtualization. Network LAN, Storage LUNs were still created in the old way, WAN optimizers to acquire and configure, and third party backup and replication products had to be purchased separately and maintained.

There was another big disadvantage. Once the existing converged infrastructure was fully utilized - either on compute, network or storage, Customers will have to buy another BIG chunk of infrastructure. For example, if customer wanted ten additional servers, he would get storage and network bundled with it - which led to poor utilization of other resources.

As a result, there were islands of storage & Network with poor utilization. Customers could not use existing legacy storage with converged infrastructure. Converged infrastructure also did not address the performance issues with legacy application. And system management was not really unified and customers still needed to run individual element managers underneath a unified global management tool.

As time went by, IT vendors learnt from the limitation of converged infrastructure and developed a solution. It is Hyperconverged infrastructure

Hyperconverged infrastructure is the culmination and conglomeration of a number of innovations, all of which provide value to IT infrastructure.

What is hyperconvergence? 

Hyperconverged Infrastructure is a server with large amounts of data storage capacity and also has IP networking - mainly Ethernet switch with Layer-2/3 overlay SDN to connect to other hyperconverged boxes.

These boxes are preconfigured and can be stacked up to create bigger capacities, so that compute and storage can be pooled & shared across multiple boxes. Hyperconvergence is a scalable building-block approach that allows IT to expand by adding units, just like in a LEGO set.

Hyperconvergence is a way to enable cloud like functionality & scale without compromising the availability, performance, & reliability. This is achieved by total virtulization. Compute, Storage (SDS), and Network(SDN). This allows the entire Hyperconverged infrastructure to be treated as one big pool of virtual resources that can be managed completely by software: All provisioning, configuration, performance, security etc., is all done through a common software.

Virtualization of the entire datacenter will fundamentally and permanently change how IT services are delivered from the data center. This enables IT to take a "virtualized first" approach to new application and service deployment - i.e., completely virtual environment is used for running all new applications.

Using the entire infrastructure as a resource pool, organizations can gain efficiency, flexibility and scalability. Hyperconverged infrastructure provides significant benefits:

  • Data efficiency: Hyperconverged infrastructure reduces storage, bandwidth, and IOPS requirements - by one time data de-duplication, compression & optimization.

  • Elasticity: Hyperconvergence makes it easy to scale out/in resources as required by business demands. Hyperconvergence is a scalable building-block approach that allows IT to expand by adding units, just like in a LEGO set. This allows to scale the data center environment easily and linearly.
  • VM-centricity: A focus on the virtual machine (VM) or workload as the cornerstone of enterprise IT, with all supporting constructs revolving around individual VMs. Virtualization fundamentally and permanently changed IT and the data center. Today, most services are running inside virtual environments, and IT often takes a "virtualized first" approach to new application and service deployment. That is, administrators consider the virtual environment for running new applications rather than just building a new physical environment.

  • Data protection: Software Ensuring that data can be restored in the event of loss or corruption is a key IT requirement, made far easier by hyperconverged infrastructure.

  • VM mobility: Hyperconvergence enables greater application/workload mobility. Homogenous resource pools also make it is easier to move applications from one virtual resource to another, 

  • High availability: Hyperconvergence enables higher levels of availability than possible in legacy systems. Homogenous resource pools also make it is easier to afford spare components that serve for increased redundancy. At the same time, a simplified administration leaves less room for human error and thereby increases overall uptime.

  • Cost efficiency: By avoiding overprovisioning of resources. Virtulized resources can now be dynamically provisioned to match the workloads and thus avoid overprovisioning Hyperconverged infrastructure brings to IT a sustainable step-based economic model that eliminates waste. lower CAPEX as a result of lower upfront prices for infrastructure, lower OPEX through reductions in operational expenses and personnel, and faster time-to-value for new business needs.

A side benefit: The hyperconverged infrastructure provides a single vendor approach to procurement, implementation, and operation. There's no more vendor blame game, and there's just one number to call when a data center problem arises.

Closing Thoughts

Hyperconverged infrastructure (also known as hyperconvergence) is a data center architecture that embraces cloud principles and economics. Based on software, hyperconverged infrastructure consolidates server compute, storage, network switch, hypervisor, data protection, data efficiency, global management, and other enterprise functionality on commodity x86 building blocks to simplify IT, increase efficiency, enable seamless scalability, improve agility, and reduce costs

Thursday, February 26, 2015

Product Management - Design For Reliability

The role of quality and reliability in a product success cannot be disputed. Product failures in the field inevitably lead to losses in the form of repair cost, product recalls, lost sales,  warranty claims, customer dissatisfaction, product recalls, loss of sales, and in extreme cases, loss of life. Thus, quality and reliability play a critical role in product development.

Quality and reliability has become a standard in products - Airplanes, medical devices, Cars,  Robotics, Industrial automation etc., Yet when it comes to software products - the reliability and quality seems to be sadly lacking.

Often times during product development cycle, reliability and quality testing is compromised in favor of faster time to market. The general attitude is that - "If customers find a bug, we will fix it in a patch release."

In addition, practice of agile product development and rapid release cadence: A new release every quarter, or month and in case of extreme programing - daily updates!

The idea of quickly fixed all known defects, security failures etc., has led to products that has poor reliability.

Today, customers typically wait for 2-3 quarters after the product release - before getting that product into production. Large enterprise customers have to test new software products before getting it into production. But with shrinking product life cycles, companies are being forced to build products that has specific design features for reliability.

As a result, new enterprise products are now being designed for reliability. From a product design concept, reliability is about an application's ability to operate failure free.

This includes ensuring accurate data is coming into the system and data transformation is error free, Error-free state management, and non-corrupting recovery when failure conditions are detected failure.

Creating a high-reliability application starts early in development life cycle - right at the product specifications and is built right into architecture, design, coding, testing, deployment, and operational maintenance.

Reliability cannot be built into an application at deployment stage. Though it is quite common  from early design specification, through building and testing, to deployment and ongoing operational maintenance. You can't add reliability onto an application just before deployment.

Common steps for building reliability into a product are:

  1. Product Reliability requirements are defined in product specification.
  2. Product architecture includes reliability eg: Distributed Vapp architecture
  3. Application management information is built into the application.
  4. Use redundancy for reliability.
  5. Use quality development tools.
  6. Use built-in application health checks
  7. Use consistent error handling
  8. Build error recovery mechanism into the product
  9. Incorporate Design for Debug functionality - for easy debug.

Many of the reliability design ideas also overlap with high availability - where the system resilience is built into software. In High-Availablity systems two or more instance of the software are running separately - but synchronously. New software systems are designed for geo-distributed deployment, where customers can continue to use the product - even if a data center goes down.

There is a very close relationship between reliability and availability. While reliability is about how long an application runs between failures, availability is about an application's capacity to immediately begin handling all service requests, and especially — if a failure occurs — to recover quickly and thereby minimize the time when the application is not available. Obviously, when an application's components and services are highly reliable, they cause fewer failures from which to recover and thereby help increase availability.

Improving Software Reliability

Software and system reliability can be improved by giving attention to the following factors:

  1. Focus strongly and systematically on requirements development, validation, and traceability, with particular emphasis on software usage and software management aspects. Full requirements development also requires specifying things that the system must do and what the systems must not do. (e.g., heat-seeking missiles should not boomerang and return to the installation that  fired them).
  2. Formally capture a "lessons learned" database and use it to avoid past issues with reliability and thus mitigate potential failures during the design process. Think defensively. Examine how the code handles off-normal program inputs. Design to mitigate these conditions.
  3. Beta software releases are most helpful in clarifying the software's requirements. The user can see what the software will do and what it will not do. This  will help to clarify the user's needs and the developer's understanding of the user's requirements. Beta releases help the user and the developer gather experience and promote better operational and functional definition of the product. Beta releases also help clarify the user environmental and system exception conditions that the code must handle.
  4. Build diagnostic capability into the product.  When software systems fail, the software must collect all required information needed to debug the case automatically.
  5. Carry out a potential failure modes and effects analysis to harden the system against abnormal conditions.
  6. Software Failures  at customer site should always be analyzed down to their underlying  root cause for repair and to prevent reoccurrence. To be the most proactive, the system software should be parsed to see if other instances exist where this same type of failure could result.
  7. Every common failures must be treated as critical and must be resolved to its root cause and remedied.
  8. Capture and document the most significant failures - understand what caused the failure and develop designs to prevent such failures in future.
  9. Fault injection testing must be part of system testing.

Benefits of Design for Reliability

The concept of design for reliability (DFR)  in software is becoming a standard in recent years and will continue to develop and evolve in years to come. Design for reliability shifts the focus from "test-analyze-fix" philosophy to designing reliability into products and processes using best available technologies.

DFR also changes test engineering from product testing for defect detection to testing for system stability and system resilience.

As DFR standards evolve, product companies are setting up reliability engineering teams as an enterprise wide activity - which gives guidance on advice on how to design for reliability, provide risk assessments, provide templates for reliability analysis, develop quantitative models to derive the probability of failure for products.

DFR impacts the entire product lifecycle: reducing life-cycle risks and minimizing the combined cost of design, manufacturing, quality, warranty, and service. Advances in system disgnotics/prognostics and system health management is helping the development of new models and algorithms that can predict the future reliability of a product by assessing the extent of degradation from its expected operating conditions.

DFR principles and methods are aimed proactively to prevent faults, failures, and product malfunctions, which result in cheaper, faster, and better products. Product reliability is best used as a tool to gain customer loyalty and customer trust. For example, lot of customers still use Sun/Oracle Computers, IBM Z series systems, Unix OS for its reliability.  

Tuesday, February 24, 2015

EMC's DataLake Foundation

On February 23rd 2015, EMC announced the "Data Lake Foundation" - which is suite of EMC products and solutions to build a rock solid Data Lake - which is the foundation the supports all big data analytics.

The rise of big data and the demand for real-time information is putting more pressure than ever on enterprise storage.

Big data analytics needs & creates massive volumes of data, This unprecedented data growth - which can quickly overwhelm existing storage systems. Over last one year, EMC has been building storage systems to address the specific needs  of big data.

In 2015, EMC announced  Data Lake Foundation strategy - which is based in products like EMC Isilon and EMC ECS (Elastic Cloud Storage). These storage systems are designed to work with HDFS (Hadoop Files Systems) and is easily integrated with Pivotal, Cloudera & Hortonworks stack data analytics tools - thus make it simple to store and  analyze massive volumes of data.

EMC has certified DataLake Foundation to work with the rich analytics tools from vendors: Pivotal, Cloudera and Hortonworks provide. Pivotal and EMC have worked together to test, benchmark and size the Data Lake Apache Hadoop solution.

Isilon OneFS 7.2 OS will support newer and more current versions of Hadoop protocols including HDFS 2.3 and HDFS 2.4 delivering faster time to insights.  It will also support for OpenStack Swift to support both file and object – the unstructured data types that are growing the fastest.

EMC's DataLake Foundation makes it easy for enterprises to run their analytics tools; Helps eliminate storage silos and provide simpler ways to store and manage data so they can focus efforts more toward gaining insights and value from their data.

Here's what the DataLake Foundation brings to the enterprise:

  1. Efficient Storage: Eliminates storage silos, simplifies management, and improves utilization.
  2. Massive Scalability: Built from scale-out architectures that are massively scalable and simple to manage.
  3. Increased Operational Flexibility: Multi-protocol and next-generation access capabilities support traditional and emerging applications.
  4. Enterprise Attributes: Protects data with efficient and resilient backup, disaster recovery and security options. Enterprise class data protection to maximize availability and security options to meet business requirements.
  5. In-Place Big Data Analytics: Leverages shared storage and support for protocols such as HDFS to deliver cost-efficient, in-place analytics with faster time to results.

Two products from EMC portfolio that form the Data Lake foundation are EMC Isilon and EMC Elastic Cloud Storage (ECS).

EMC Isilon provides an enterprise-scale, file-based Data Lake Foundation with the ability to run traditional and next-gen workloads. Starting at 2PB, scaling upto 50PB per cluster, Isilon provides a great balance of performance and capacity for analytics workloads.

EMC ECS is the scalable Object storage for next generation of modern applications. ECS delivers geo-distributed high availability, nearly infinite capacity for big data analytics - provided on commodity storage.

With ECS and the new Isilon platform and features, customers have everything they need to store, protect, secure, manage and analyze all unstructured data now and is built to scale out for all the future needs.

Business Benefits

EMC DataLake Foundation is replicating VBLOCK strategy, where all the components needed for Bigdata analytics is pre-configured, and comes with Pivotal HAWQ subscriptions and Pivotal HD.

This simplifies deployment of all BigData analytics programs, while providing EMC's enterprise grade support, nearly infinite scalablility, and data security.


  1. IDC Lab Validation Brief.  
  2. Pivotal Blog: Pivotal and EMC Come Together To Shore Up The Data Lake 
  3. EMC Isilon x410 release 
  4. Pivotal HD 
  5. Pivotal Big Data Suite  
  6. Pivotal webinar: Querying External Data Sources with Hadoop 
  7. Hadoop for the Enterprise 

Monday, February 23, 2015

Role of Customer in New Product Development

In my previous article: First Steps in Developing New Software products,  I had made a reference to a stage of defining product features. There are many ways to identify product features and functionality. One of the ways is to involve a potential customer in the early stage of new product development.

The key advantage of involving customers at an early stage is to minimize the risk of developing features & functions - which are of no value to customers. It also helps in taking active customer feedback at the time of product definition and that helps in a BIG way to minimize risks of product failure.

Improve the Odds By working with the Customer

I had good fortune to work at both a large technology company in Santa Clara, and also at a startup and also in a very large technology company. All through my experience, I have been involved in product development, from both technology side and business side. Based on my experience, I seen the value of customer involvement at an early stage of product definition.

Normally, all new product development goes through cycles of  "IDEA" -> "Build Product" -> "Measure Customer Response" -> "Learn from Customer Data" -> "New Idea".

Involving customers at an early stage of product development has several benefits.

  1. It will help avoid mistakes and it allows the developer to explore and iterate during the cheapest phase of development - before any code is written, and when the product is still in the mockup stage. Customers can give valuable inputs and validates the initial assumptions.
  2. It also gives a clearer picture of  customer needs and competitive alternatives. Talking to customer gives a much deeper insight into actual customer usage models and needs - there are invaluable for defining new product features/functions. In my past experience, we had a case where got so impressed with the product idea that they were willing to invest and co-develop the product. Customer, being a Fortune-500 company, thus ensured the product was an overnight success.
  3. It also helps to uncover new opportunities for differentiation from competition, and helps in clear market positioning and helps develop the product launch, and product marketing plans.
  4. It will reduce or eliminate unnecessary features, thus it will reduce the amount of product that needed to be built, and speeds up time to market!
  5. It is always better to be first in the market - even with minimum viable product. This reduces cost of development and time to market.

Customers are eager to help 

It is surprising to know that most customers are eager to help and talk about their needs - even to companions that don't even have a product.

During my interactions with customers, I have noticed that customer often tend to request features that are far more ambitious than their current needs and usage, but are willing to accept a product that meets their minimum needs.  As a result, we were able to release new product in months - and without many features which was initially planned.

Customer are also willing to give time for additional features - which helps in product road map definition. Just asking for customer priority of features helps in identifying the time scales for subsequent product releases and this also helps shape the future direction of the product.

Customer Involvement is actually an opportunity to build stronger relationships with some of our customers. We'll choose those most likely to be receptive, and we'll set expectations appropriately.

However, customer involvement does not mean building a custom product - which meets the needs of only one customer.  Its a process for gathering information, and it will require a skilled product manager to prioritize that information and figure out what and how we respond to it - and help product management do their jobs more effectively.

Customer involvement gives information on how individual customers behave and buy. This type of insights cannot be captured from market research or from usability testing. Market research and usability testing are still very imprint and customer involvement does not eliminate it.

Customer involvement is the best way to validate assumptions on who the customer is, what he needs and what he'll buy.

Sunday, February 22, 2015

Creativity & Innovation in Indian Social Life

Innovation and Creativity is a prerequisite for all aspects of life - irrespective of the field. In a hyper competitive world, creativity and innovation has become a key factor for success. Even in politics - leaders like Narendra Modi or Arvind Kejeriwal are getting elected because they ran a very creative campaign.

Just focusing on operational issues: Planning, Organization and optimizing processes is just not enough to succeed.

Today in Indian Society, Innovation and Creativity has taken roots in every aspect of social life: Education (eLearning, peer-learning) Work habits (Mobile-enabled-Uber workforce, work-from-home),  Parenting (Focused parenting workshops, quality time) and even sex (alternate lifestyles, 50 shades of Grey), and virtually every field of human endeavor.

As a result, we are seeing rapid social innovations in all fields. The established norms of social life of the previous/past generations are being questioned and changed to meet newer social challenges.
Consensus decision making & emphasis on conformity is being replaced by creative & competitive decision making.

Every individual are learning to break the existing norms, break from existing institutions and create new norms, rules of their own in order to work in the ways most appropriate to their hyper-competitive,  idiosyncratic lifestyles..

In a knowledge driven economy, people are being encouraged to be creative right from childhood. Little children in schools and preschools are being encouraged to "think on their own" and do things creatively. As this generation grows up, they are innovating and challenging established norms and making the society accept new and creative lifestyles.

In the last decade alone, we have seen more changes to the social norms than in several centuries. For example, if nuclear families was the fist major change in social norms. Now gay lifestyles, single & swinging lifestyle, or childless marriages are being accepted. Legalization of Marijuana is another example of such acceptance. Society is now accepting and adapting to this rapid change.

Creative & Innovative changes to Social life is no longer treated as products of "outsiders". Creative individuals are no longer from rich backgrounds - who have access to large resources at their disposal.

Creativity and Innovation has become pervasive in all aspects of life and all social institutions are adapting to it and in the process, we are seeing a rapid increase in the rate at which progress is made. Society and social institutions are now ready to facilitate the creative process and embrace change.

Today, society at large, is ready to pursue ideas that goes against consensus opinion - regardless of the field. People are making informed decisions and taking risks. Risk of failure does not seem to hinder them. People are seeking new means to succeed and are investing in knowledge, information in form of Internet & Instant communications & Social Networks.

Stigma of failure and the risks of failure is also being minimized by the rapidly growing economy. People who failed are now getting a second chance and new opportunities to try again.

The Risk-reward system is building tolerance for disagreement and people are harnessing it into a good to action:  "So you think I'm wrong.  Well, let me show you...." , "You say, it can't be.  Prove it to me...."

Indian society today is not making arguments that are for-or-against the idea, but rather what can be done with it!

Saturday, February 21, 2015

First steps in Developing New Software Products

Software Product development is much more challenging than software services. Product development needs much higher level of domain expertise, technological expertise and long term commitment to develop the product. In addition of technical competence it takes a very high level of marketing competence to succeed.

The initial team that is drafted to develop a new product will have to address various marketing and technical issues first - before starting the actual product development even starts (a.k.a. coding)

These steps are very important in new product development and must not be compromised upon.

Step-1: Product Definition & Product Positioning
Step-2: Product Features & Quality Imperatives
Step-3: Product Road map & Release Cadence
Step-4: Product Technology & Architecture
Step-5: Project Resource Plan
Step-6: Product Deployment Options
Step-7: Product Marketing Plan
Step-8: Financial Project Plan

Note that product development is a completely cross-functional effort. Software engineering, Technology, Marketing, Finance & HR management roles have to work together to build a successful product.

Step-1: Product Definition and Positioning 

The first step in software product development is the clear definition of the product:
What is the product?
What problems it will solve?
Who are the users?
What is competitive advantage?
What are the distinguishing features?

Answering these five questions will give a clear definition of the product itself - which is essential first step for new product development. Answering these questions requires deep domain knowledge and deep understanding the market needs. If product definition is not done properly, then the resulting product will be like a solution searching for a problem! (Which is bound to fail in market)

Step-2: Product Features & Quality Imperatives

Once a product is defined, the next step is to define the features and functions the product must provide. The product features and functions must be defined in terms of the intended users. For an enterprise product, this requires deep domain knowledge of (potential) customers and customer work flows. It is not uncommon to involve people from potential customer industries for this. For a consumer product, focus must be on good user experience while meeting the user needs.

In addition to user requirement features, there are other additional features that must be considered. Government regulatory requirements & Industry standards Requirements. For example HIPPA, SOX, PCI, DCI, ISI, ITAR, FCC, etc., are defined government and industry standards that the product must meet - else it will not be allowed to sell!

In an ideal world, one can develop a product which has all the required features and with best quality in time. However, real world imposes has its own limitations. Not all features can be developed in time nor can the product meet all the quality requirements. This implies a quality Vs Features tradeoffs.

As part of product features definition, the quality requirements must also be identified. One needs to answer: Is more like a Proof-of-concept? Or Is it an enterprise class product? Or is it a Consumer grade product? For each class of product - there are different quality imperative that must be addressed while defining the product features.

Step-3: Product Road map & Release Cadence

Once product features is defined, it will become clear that not all features can be developed in one release, especially in an complex enterprise product. Even in case of simple consumer product, the market conditions and user requirements will change with time - and the product needs to adapt to meet those changes.

Product with reasonable quality/stability and that can be developed in a reasonable time frame - form the first version of the product.

Other features & functions will have to be developed and released in a phased manner. This is called as a product roadmap. Product roadmap is an intended plan of how all the intended features will be developed and when it will be released.

Release cadence has to be defined in this plan - so that it give an indication as to when a particular feature will be released: is it coming in next 6 months or one year or 18 months etc. Having a planned releases helps in several ways: One, it helps in fixing bugs found at customer site, Two it helps in modifying features to better match user requirements, Third, it helps to add new features incrementally which that helps in project management and also helps to avoid scope creep on a single release.

Planning the product roadmap & release cadence helps in next steps.

Step-4: Product Technology & Architecture

In software product development, choosing the right product technology and platforms is very important and can make or break a product in market. In large organizations, a separate CTO - Chief Technology Office is often established, which determines the right technology choices. For example, when developing an enterprise application, choosing a right Database, choosing a right middleware, the OS platforms etc., - could make/break a product.

Software Technology also refers to development tools such as compilers, version control tools, testing tools, performance testing tools etc.

Software Technology also refers to choice of using opensource code or licensed utilities (JBOSS, Jinfonet etc.)

Product Architecture forms the foundation of the product. Depending on the intended market and time to market, proper architectural choices has to be made. Once the product architecture is finalized and product development starts, it becomes very expensive to change the product architecture. Product architecture has to be very carefully planned and designed.

When it comes to making choices of product technology and architecture, it is always good to look ahead and choose newer/scalable/flexible technology & architecture, so that, when needed,  changes or upgrades & addition of new features can be easily done on the product to suit market needs.

Product Technology and Architecture also determines the resources needed for new product development.

Step-5: Project Resource Plan  

Once the roadmap and features are identified, it is now easier to plan for the required resources needed to develop the product. In a software product development, the biggest resource are engineers & software development tools: Servers/computers & software tools.

Identifying the scale & timing of resources needed and by when helps in the next step: the Financial Project Plan.

Step-6: Product Deployment Options  

Software products often have several deployment options.  Typically, a consumer grade software product should work right out of the box. Customers should be able to install it themselves and start using it.

In the world of cloud & mobile apps, customer expect ease of deployment as a basic requirement. For such products, the product distribution platforms (App Stores, or Web Site or Retail distribution) has be decided.

In case of enterprise products - there is a whole lot of configuration & customization that needs to be done for each customer. In such a case, Product plan must also determine which of the features should be offered out-of-the-box and which features are to be customized at customer site. Ideally, all the common components should be out of the box and all customizable components must be done through config files - i.e., no compilation of custom code at customer site.

Identifying the product deployment options helps in Product marketing Plan

Step-7: Product Marketing Plan

Products have to be pushed to the customer or Customers can be induced to pull in the product from the distribution channels. The choice of Push or Pull determines the marketing strategy.

Some products can be sold via partnerships or by bundling with other products. For example, Microsoft can bundle Lync with its Office suite, SAP can bundle Hana with its R/3 Suite etc.  Google Maps gets bundled with Andriod etc.

In some cases, the product can be sold via partnerships. A partner company may choose to sell the software along with its suite of products. For example Cisco sells HP Opsware along with its suite of network management tools.

In case of pull strategy, customers have to be induced to buy the product. For customers to buy a product, they must first be made aware of the product - via advertisements or marketing communications, and if the customer likes what they hear about the product, they may opt to buy it. For example Apple Pages, or Whatsapp or WeChat, or Intuit or Turbotax etc. These products are heavily marketing via different channels and customers will buy from retail channels.

Selecting a right marketing plan has a big impact on determining the financial plan.

Step-8: Financial Project Plan

It takes money to develop new products. The total amount of money and the timing of expenses has to be carefully worked out and planned. In large companies, funding may not be a problem - provided the product business plan is complete & agreed upon by relevant stake holders. In case of small/medium size companies, finance will be a major constrain - which impacts every aspect of new product development plan.

Closing Thoughts

In this article, I have put financial project plan as a last step. In many cases,  financial plan becomes the first necessary step - which then determines the scale & scope of new product development projects.

For sake of simplicity and ease of understanding, I have presented Software product development plan as a 8 step waterfall process, which is an ideal situation. But in reality, all steps happen in parallel & in iterative process. Developing new software products is never such a clean & smooth process. It requires lots of deliberations and analysis - which will make these steps happen in parallel and also in iterative process - where each planning stage has to be revisited iteratively all through product development life cycle.   

Friday, February 20, 2015

Policy Changes to Encourage Software Product Development in India

Skilled human capital remains the most crucial factor for software innovation

As software becomes the heart and lifeblood of the modern economy, Software is becoming the driver, an enabler, and a diffuser of innovation across all sectors and industries. For example, look at Uber or Flipkart or BigBasket - which is transforming retail in India and Taxi services.  Software  delivered via mobile devices can transform all aspects of economy - and this is an ABSOLUTE TRUTH!  

India has established as a leader in software services. With millions of engineers, India can emerge as a world leader in software technology.

Unfortunately, India has been a nonexistent player in developing any new software technologies. In areas of core technology development, India has fallen short in a BIG way!

Other countries such as Israel, Taiwan, Korea & China have emerged as product leaders, while India - despite its enormous manpower talent pool has failed to develop world class products and has been playing a secondary role of software services.

Unlike most countries, India does not suffer from scarcity of skilled human capital. India produces lots (quantity) of engineers with adequate skills (quality). Indian engineers have immigrated abroad and have created several product companies. So the problem is not with talent or skills.

The main problem is with the government policy towards product development.

Over the years, Indian government policy has tilted heavily towards software services by offering liberal tax rebates, exception from rigid labor laws, while product developers are penalized with higher taxes, bureaucratic red tape and crippling labor laws.

As a result Indian software companies have largely stayed away from product development. Even
industry majors that set ambitious targets for product revenues have given up their targets over time.

For example, Infosys Technologies, had about 4% revenues from software products in 2001-02, though at one time they hoped to achieve a target of 40% revenues from products by 2000.

Major hindrances in developing products from policy perspective are:

1. Lack of involvement of Indian Private sector in developing key technologies. 

Government run organizations such as DRDO, BARC, CDAC, ISRO work in total isolation during development and only after development, government agencies plan to license new technologies to private sector.

In the USA & Israel, defense development and private sector involvement go hand-in-hand. For example Google got its initial funding from DARPA for a web indexing project.

2. Weak protection of Intellectual Property Rights and rampant piracy. 

When a fledging startup develops a new product and wants to sell to potential customers - mostly large organizations, they are under the risk of piracy. Startups cannot afford to sue bigger companies for piracy as they lack financial resources to fight a long winding legal battle. Laws regarding protecting IPR in India is really strong - but the legal process and bureaucracy causes the hindrance.

3. High cost of Internet & Electricity

Internet speeds in India is very slow and is expensive. The government licensing policy towards allowing private sector to create Internet connectivity from India to abroad has led to acute paucity of bandwidth for International connectivity. In addition, laying cable inside the country is regulated by various state & City government laws - which has made Internet connectivity very expensive.

High cost of 3G & 4G spectrum has also not helped - which has resulted in one of the highest Internet costs among BRIC countries.

In addition, chronic power shortage has forced companies to setup data centers outside India - thus making everyone choke on the slow-speed Internet. For example, it is a lot cheaper to set up data centers in Washington state or in Iceland or Carolina - than in Karnataka or Maharashtra.

For example, If I were to send an email to my wife - both of us residing in the same location, the email is routed via US!

Lack of Internet speed, has put software product developers at a BIG disadvantage. It has prevented Indian entrepreneurs to set up data centers or cloud/Web based or gaming products in India.

4. Lack of Tax incentives for Software Product Companies

Indian Software Services sector has/had several incentives. For a long time, software services were exempt from several taxes! Companies paid no income tax, no import duty, no exercise duty etc. However, product companies have to pay their taxes on day one!

This skew in tax regime has led to Indian entrepreneurs opting to invest in software services, which minimizes risks and maximizes profits. With services, entrepreneurs can expect to break even or even make profits in the first year of operations.

Providing Tax incentives to software product companies will help make India become a product powerhouse.

5. Lack of incubation centers and investment in early state startups

Early stage startup in India have to overcome several challenges: Lack of office space and investments for product development. In recent times, Venture Capital firms have been active in India, but they do not provide funds for an early stage startups. There is a need to create an 'Angel Funds' - which must be provided by a Government agency. In the US, Government releases funds to early stage startup via DARPA, DoD and  other organizations.


Prime Minister Modi's "Make-in-India" pep-talk must be backed by policy changes to help Indian companies and entrepreneurs develop software technology products in India - for the world.

Changing the existing Tax regime and Internet policies will help in a BIG way to develop software products in India!

Thursday, February 19, 2015

Facebook's 6-Pack Punch

When a billion+ users log on to an Internet service, the underlying fundamental principles of a datacenter has to change. The traditional datacenter was designed to serve thousands of users that can scale upto a million users - but not in Billions!!

Facebook engineers have learnt what it takes to run datacenters that serve billions of users and are Open Compute Foundation.
kind enough to share that knowledge with rest of the world - via

As part of Open Compute Foundation, Facebook recently revealed "6-pack": The first open hardware modular switch. Its a scalable solution - with modular design.

6-Pack is a top-of-the-rack switch, commonly used in "Leaf-Spine" networks for large datacenters. In Leaf-spine network, a rack full of servers are connected to every other rack via a Leaf-Spine network.

6-Pack is also an "open design" Facebook will allow other companies to manufacture and sell these devices. The switch is built from off-the-shelf component & not custom built AISCs - thus bringing down the cost (in a big way) of networking.

6-Pack can scale from 32 ports of 40Gig Ethernet to 128 ports of 40Gig Ethernet - providing a total bandwidth of 1.28T!  It supports two fabric cards, and eight linecards (called as Wedge). Each line card has 16 40GigE ports.

It packs enough punch to change the bedrock of the Internet - the datacenter switches.

For full technical details see: Introducing "6-pack"


From a pure switching technology perspective, the 6-pack is not that innovative. The modular & scalable design, the 40GigE ports and fabric cards have been around for years. Cisco Nexus series, Juniper EX series & Arista are established leaders in this space.

In terms of performance, reading from the specs of the device, (I have not seen any real performance numbers) Facebook's 6-pack is on par with the top-of-the-line offerings from Cisco or Arista or Juniper.

The main innovation comes in the internal design. Each line card has a micro-server and a server inside. A Linux derivative Operating System (code-named "FBOSS") drives each of the module.

Each element runs its own FBOSS on the local server and is completely independent.

Herein lies the major Technical innovation. The server inside the linecard and fabric - runs all the network control functions and the switch can be managed as a server - i.,e provisioned & configured using existing server management tools.

Note: No separate/special network management tools needed. 6-Pack/FBOSS does not use SDN technology - like virtual network overlay, but its a hybrid SDN - where all the network switching intelligence is built into the sever.

Running each module independently, allows network engineers to modify any part of the system with no system-level impact, software or hardware. It has a unique dual backplane solution that enabled us to create a non-blocking topology.

Business Disrupter

The other major Innovation with 6-pack is its "Open-design", thus allowing any 'grey-box' manufacturer to build and sell these high performance switches. The open design, Open OS, and off-the-shelf components all comes together to reduce the costs drastically.

6-Pack is a great value proposition for Cloud service providers. Who, like Facebook, run massive datacenters that server millions and millions of users. Unlike Enterprise class switches from Cisco or Arista - which costs $600/port (approx),  Facebook's 6-pack will cost a small fraction - $90 or even less (my estimate) and will reduce further with time.

Since the networking intelligence is built into the software running on a server, the expensive proprietorial protocols and custom built switches can be replaced by 6-pack!


Note that Facebook built 6-Pack to meet its needs and not enterprise needs. This means that 6-pack is designed to work with web based service workloads and not the traditional enterprise workloads. Even for the large enterprises, the savings from lower cost/port will be offset by higher cost of integration with existing tools/frameworks, and lack support of support for Open Compute.

This implies that enterprises who do not run 100,000's of identical servers and will not require the scale and flexibility of 6-Pack. Such customers are better off with switches from Cisco or Juniper or Arista or others. But for service providers: AWS, AT&T, BT, Colt, VMWare AirCloud, CSC, etc., Facebook's 6-Pack is a game changer.

Wednesday, February 18, 2015

Data is Data - Unified Data Lake is the game changer

For many companies today, Big Data analytics is a top priority. Leaders at these companies hope to derive new insights from all available data to improve decision making that impacts: enhances customer experience, improves productivity, lowers operational costs, and create new business opportunities.

In short companies want the moon!

Unfortunately, companies too often start their Big Data efforts in silos. Marketing has its own initiate, production has its own projects, finance works in another silo and so on. The discrete silo based approach will fail in the world of big data.

Many companies rush the implementation of big data projects and confuse Big Data technologies with tools like Hadoop, Pig, Python, and others without looking at the bigger picture. Many of these new projects fail to include existing data discovery and analytics tools.

The siloed approach leads to quick initial success but it opens up a series of much bigger challenges which eventually leads to re-architecting all the Big Data Projects while costs goes out of control. The main challenge is data growth. Success of Big Data analytics relies on massive volumes of data. The rapid data growth will quickly overwhelm the existing IT infrastructure in respective silos, forcing departments to replicate Big Data infrastructure.

Data used by businesses for decision making is exploding. Typically, data volumes double every 12 to 18 months. Use of unstructured data from: emails, logs, call logs, documents, files, images/videos and social media streams (twitter/Facebook etc)  are become mainstream in data analysis.  Moreover regulatory rules insist that data must be retained for a long period of time. This results in the huge volume of data that must be stored, protected and managed - which gets more expensive.

Many Big Data projects involve the use of vastly different data types: Unstructured, files, meta data, blob data, log files etc. For example, One project may use searching a database for pattern matching, another project uses word searches in different types of files, while another may use real time data from social media and match it with historical trending data sets etc. Each of these projects has a different requirement on data capture, extraction, and storage. Having a dedicated or separate data storage/management system is unviable.

Business analytics projects tend to focus on spotting trends by looking at the historical data and matching it with the current events. How well did this product sell in that store? How many customers who got this email clicked on an embedded link and completed a purchase or transaction? Which supplier's products contributed to the highest failure rates in a product line? These types of analytics is best answered by analyzing data that had been orderly collected and stored in a structured database or data warehouse and then enhanced with real time data.

In a dynamic business enviroment, Big Data analytics can help answer new questions rapidly. For example: Can information from social media tell me how customer sentiments are changing in response to the new advertisements? Can real-time customer interactions in the store help in inventory management? Can real-time transactional data help in fraud detection?

The answer to such questions can be quickly found out if the analytics tools have access to the required data, which has to come from both traditional sources and new sources.

A traditional silo based approach to Big Data analytics projects tend to get more expensive and management does not get the full value of the new insights they desired.

The best way to approach Big Data analytics is to take an approach that accommodates both rapid growth of data and uses existing analytics and tools, while leveraging existing infrastructure and then adding newer tools and analytics to the mix. Existing business analytics should continue to work along with new Big Data analytics. Existing corporate and business managers do not have the time or expertise to learn how to use all of the needed techniques and tools and will rely on existing work flows and will use new analytics as needed. In short, implementing Big Data analytics must not be disruptive.

Unified Data Lake is the game Changer

Creating a unified Data Lake which had ingest and hold both traditional data in existing data warehouses and newer data types should be the first step while embarking on a big data journey.

A unified Data Lake gives companies a choice of data extraction and analytics tools and does not lock workers into using old existing solutions, Older workflows can be easily integrated with newer Big Data analytics workflow.

A unified Data Lake allows new Big Data analytics solution can use  new technologies like Hadoop, Hive, Rails etc  to capture, store, and refine data, while providing enterprise class data security, protection and access control in a centralized, integrated way that data is accessible and easily managed.

A Unified Data Lake offers several benefits including:

  • Agility: Eliminating much of the strain on IT that was common with traditional silo approaches.
  • Simplicity: Allow consumption of data in any format, thus saving time & reducing errors
  • Flexibility: Allow for the use of different analytics techniques, mix of both old and new, which helps organizations see the data differently to ask new questions and derive new insights
  • Accessibility: Provide users with easy and secure access all their data. 

Also see: 

Tuesday, February 17, 2015

Software Innovation Ecosystem

Innovation may be broadly defined as the successful commercial introduction of a new product, service or process. In general terms, Innovation refers to the implementation of New products (or processes) that as significant technological improvements to the outputs of the products (or processes).

Often times, software innovation involves technological innovation and process innovation.   Technological innovation involves a series of scientific and technological improvements. While process innovation involves a series of  organizational, financial and commercial activities.

It is important in this context to note that software R&D must look at both aspects of innovation.

In software, incremental innovation is most common. Incremental innovation is often seen as development of a new feature or new functional aspects or new application of an existing software product or improvement to the previous generation of the software product.

Innovation Ecosystems

Today, software innovation relies on several players:

1. Companies that develop commercial software.
2. Organization that develop free/open-source software.
3. Companies that develop Compute & Communication hardware.
4. Companies that develop Information communication technologies & hardware.
5. Companies that develop hardware-software integration.
6. Companies that develop IT services.
7. Companies that provide services based on Information Technologies.
8. Companies that finance purchases of software or hardware of compute & communication technologies.
9. Companies that own & license patents.
10. Government & regulatory agencies.

Today innovation in software sector is often characterized by participation of a wide range of companies listed above. The complexity of the interactions between different players add to challenge of innovation.  Many companies in the ecosystem operate beyond the norms of software development and often complement software products. Understanding the ecosystem forms the foundation to software process innovation.

The technological convergence between hardware, software and telecommunication technologies lead to product innovation - where technology synergy and interdependence across different technologies drives product innovation.

In short, software innovation today needs a greater level of collaboration across multiple players in different segments of the ecosystem.

As a result of this ecosystem, software innovation is getting concentrated in areas/cities that has a conducive environment for collaboration.