WhereScape launched their new WhereScape 3D product, with Michael Whitehead (CEO), Jason Laws (Product Marketing), Raphael Klebanov (Consulting) & Scott Humphrey (Fisherman). WhereScape has briefed BBBT previously in November 2009. Read this blog here for background on the company and earlier .
They have two products: WhereScape 3D, data warehouse planning tool and WhereScape RED, integrated development environment for building, deploying, managing and renovating data warehouses. Note 3D means "Data Driven Design"! Major partners are Microsoft and Teradata.
WhereScape 3D was launched at our BBBT session - complete with T-shirts and cake! I downloaded the beta version from here. The download is 75MB as an EXE with version is 0.9.0. It requires Java Runtime and has many JDBC/ODBC drivers, plus the Teradata JDBC driver, which requires an extra license.
The install process create a local metadata repository using Apache Derby. After install, click on Help ->Tutorial->(one of the 3 tutorials). Follow instructions closely! The tutorials are concise and well-written.
I tried the first tutorial that explores the use case of designing a star-schema data warehouse. I did my first discovery! ...and generate the ER diagram at the right. Click to view full image. This
was so cool! It would take hours otherwise!
Also, the properties for a certain column in the CUSTOMERS table are shown at the left.
My brief hands-on experience was very positive. It was a similar experience to the first time I played with Adobe Photoshop. You actually need to know about DW design concepts and techniques, which is a tribute to the WhereScape team that designed this product.
Today, Friday the 13th of May, 2011, the Boulder BI Brain Trust heard from Larry Hill [find @lkhill1 onTwitter] and Rohit Amarnath [find @ramarnat on Twitter] of Full360 [find @full360 on Twitter] about the company's elasticBI™ offering.
Serving up business intelligence in the Cloud has gone through the general hype cycles of all other software applications, from early application service providers (ASP), through the software as a service (SaaS) pitches to the current Cloud hype, including infrastructure and platform as a service (IaaS and PaaS). All the early efforts have failed. To my mind, there have been three reasons for these failures.
Security concerns on the part of customers
Logistics difficulties in bringing large amounts of data into the cloud
Operational problems in scaling single-tenant instances of the BI stack to large number of customers
Full360, a 15-year-old system integrator & consultancy, with a clientele ranging from startups to the top ten global financial institutions, has come up with a compelling Cloud BI story in elasticBI™, using a combination of open source and proprietary software to build a full BI stack from ETL [Talend OpenStudio as available through Jaspersoft] to the data mart/warehouse [Vertica] to BI reporting, dashboards and data mining [Jaspersoft partnered with Revolution Analytics], all available through Amazon Web Services (AWS). Full360 is building upon their success as Jaspersoft's primary cloud partner, and their involvement in the Rightscale Cloud Management stack, which was a 2010 winner of the SIIA CODiE award, with essentially the same stack as elasticBI.
Full360 has an excellent price point for medium size businesses, or departments within larger organizations. Initial deployment, covering set-up, engineering time and the first month's subscription, comes to less than a proof of concept might cost for a single piece of their stack. The entry level monthly subscription extended out for one year, is far less than an annual subscription or licensing costs for similar software, considering depreciation on the hardware, and the cost of personnel to maintain the system, especially considering that the monthly fee includes operations management and a small amount of consulting time, this is a great deal for medium size businesses.
The stack being offered is full-featured. Jaspersoft has, arguably, the best open source reporting tool available. Talend Open Studio is a very competitive data integration tool, with options for master data management, data quality and even an enterprise service bus for complete data integration from internal and external data sources and web services. Vertica is a very robust and high-performance column-store Analytic Database Management System (ADBMS) with "big data" capabilities that was recently purchased by HP.
All of this is wonderful, but none of it is really new, nor a differentiator from the failed BI services of the past, nor the on-going competition today. Where Full360 may win however, is in how they answer the three challenges that caused the failure of those past efforts.
Security
Full360's elasticBI™ handles the security question with the answer that they're using AWS security. More importantly, they recognized the security concerns as one of their presentation sections today stated, "Hurdles for Cloud BI" being cloud security, data security and application security. All three of these being handled by AWS standard security practices. Whether or not this is suficient, especially in the eyes of customers, is uncertain.
Operations
Operations and maintenance is one area where Full360 is taking great advantage of the evolution of current Cloud services best known methods and "devops" by using Chef opscode recipes for handling deployment, maintenance, ELT and upgrades. However, whether or not this level of automation will be sufficient to counter the lack of a multi-tenant architecture remains to be seen. There are those that argue that true Cloud or even the older SaaS differentiators and ability to scale profitably at their price-points, depends on multi-tenancy, which causes all customers to be at the same version of the stack. The heart of providing multi-tenancy is in the database, and this is the point where most SaaS vendors, other than salesforce-dot-com (SFDC), fail. However, Jaspersoft does claim support for multi-tenant architecture. It may be that Full360 will be able to maintain the balance between security/privacy and scalability with their use of devops, and without creating a new multi-tenant architecture. Also, the point of Cloud services isn't the cloud at all. That is, the fact that the hardware, software, platform, what-have-you is in a remote or distributed data center isn't the point. The point is the elastic self-provisioning. The ability of the customer to add resources on their own, and being charged accordingly.
Data Volume
The entry-level data volume for elacticBI™ is the size of a departmental data mart today. But even today, successfully loading into the Cloud, that much data in a nightly ETL run, simply isn't feasible. Full360 is leveraging Aspera's technology for high-speed data transfer, and AWS does support a form of good ol' fashioned "sneaker net", allowing customers to mail in hard drives. In addition, current customers with larger data volumes, are drawing that data from the cloud, with the source being in AWS already, or from SFDC. This is a problem that will continue to be an "arms race" into the future, with data volumes, source location and bandwidth being in a three-way pile-up.
In conclusion, Full360 has developed an excellent BI Service to suplement their professional services offerings. Larger organizations are still wary of allowing their data out of their control, or may be afraid of the target web services provide for hackers, as exemplified by the recent bank & retailer email spammers, er marketing, and Sony break-ins. Smaller companies, which might find the price attractive enough to offset security concerns, haven't seen the need for BI. So, the question remains as to whether or not the market is interested in BI in the Cloud.
Actuate Corporation is presenting with Nobby Akiha, SVP Marketing and Mark Gamble, Director Technical Marketing. Nobby started with an overview and status of the company. Founded in 1993, the company generated $134.7 million in FY2010 (23% operating margin) with 570 employees.
They have had strong OEM customers using BIRT as an information delivery tool to customers and partners of BIRT developers. BIRT is distributed as open source so that Actuate business model focuses on low sales costs/cycles that accelerates revenue...to a global audience. The motivation to purchase the for-fee enterprise version of BIRT are the additional features beyond simple reports. Licensing of the open source allows full permissions to embedded and distribution.
Mark continued to explain the open-source ('the free stuff') version of BIRT. BIRT means "Business Intelligence and Reporting Tools" and is a top-level Eclipse project. The open-source code that was developed and contributed primarily by Actuate to Eclipse Foundation. Mark gave a demo of BIRT, starting with the Eclipse IDE, data source, and data set. Then, creating a grid, chart within one cell of the grid, connecting to the data set, etc.
The architecture (expand by clicking on the right) shows that the BIRT Designer generates the XML design file, which is interpreted by the BIRT Engine producing the end document. BIRT is highly extensible with data sources, app logic, viz formats, and rendering output, along with open APIs among components.
Actuate's enterprise BIRT version extends features in development, user interaction, deployment and scalability. Finally, check out the hub for BIRT developers at BIRT-Exchange.
My Take...
BIRT has an active global open-source community with the Eclipse Foundation. The challenge for Actuate is to translate that activity into revenue from the licensing of the enterprise version. For developers already involved with Java and Eclipse, BIRT is an easy adoption for generating simple web-based reporting. As the business requires growth in scale and functionality, BIRT's extensibility and Actuate value-add provides a roadmap for future BI requirements.
Last week five other VCs and I were invited by IBM’s Venture Capital Group to a private viewing of Jeopardy’s final round where Watson was competing. Following the show, and Watson’s win, we engaged in a very interesting conversation with Eric Brown, Watson’s technical lead, and Anant Jhingran, CTO of IBM’s Information Management Division, about the potential applications of the technologies that made Watson’s win possible.
Let’s first talk about the overall system. Watson is a question-answering system that represents 100 person-years effort by a team of 25 IBM scientists. Watson is running on 750 Linux servers with 2880 cores, 15TB of RAM and providing 80 teraflops of computing. Its software architecture is based on UIMA and integrates a variety of machine learning algorithms that have been used to develop approximately 3000 predictive models. These models are running in parallel every time Watson is trying to answer a question. The predictive models were tuned over several months by utilizing Watson’s results from a series of 134 “sparring matches” with past Jeopardy winners. Watson’s “knowledge base” was seeded with 70GB of curated, i.e., noise-free, full-text documents and eventually grew to 500GB, consisting of additional curated documents and information derived from the seed database. The final knowledge base represented approximately 200M pages of textual content. The text data was preprocessed using Hadoop. However, Hadoop was not used while Watson was competing in Jeopardy. Watson’s knowledge base also included various sets of handcrafted rules including rules that provide clues on what to look for and others that describe strategies on how players select within a category so that they can find Jeopardy’s daily double.
Watson operates through a cycle of hypothesizing an answer, gathering evidence that supports each hypothesized answer, evaluating the probabilities associated with the collected evidence and proposing the final answer. As audiences discovered over 3 days of play, all this is done in “warp” speed.
There are several reasons of why we should care about Watson as a Big Data analytics system.
The interaction with Watson through spoken language will be particularly important for the broader use of analytic applications by business users. The use of such applications is often limited because business users are often intimidated by their complex interfaces. Watson demonstrated convincingly that natural spoken language interaction with computers is no longer science fiction or the result of Hollywood special effects teams. In prior posts I wrote about Watson exhibited superior natural language-understanding skills; and this is a really big deal. It was able to address the inherent ambiguity of natural language and understand word-meaning and context. Jeopardy’s statements represent much harder problem descriptions than the queries we typically pose to search engines. They use puns, double meanings and misspellings to convey meaning. While for a human this may be second nature, for a computer it is a very hard task.
Watson is making decisions and is responding to questions by utilizing a knowledge base of textual, unstructured information, rather than a knowledge base of well-structured concepts and rules that has previously digested and represented using a knowledge representation language, such as CYC. In this respect Watson is significantly different from other question-answering systems such as Wolfram|Alpha. Stephen Wolfram wrote an interesting post regarding the different approaches taken by Alpha and Watson.
While the data sets Watson was analyzing during Jeopardy were relatively small by Big Data standards, being able to quickly and effectively analyze unstructured data is representative of many big data analytics situations, where you don’t always know what data you will need to analyze, where it will come from, how large each data set will be, how clean it will be and how long you will have to provide an answer.
Watson concurrently utilizes a large number of predictive models to analyze big data and come up with answers in real time. This is significant because it provides another important approach to analyzing big data, i.e., rather than parallelizing a single analysis algorithm and then using MapReduce to apply it on a big data set, as is typically done in various Hadoop/MapReduce implementations, Watson applies several different predictive and scoring algorithms concurrently. Some of these algorithms may be parallelizable and thus able to take advantage of MapReduce. The application of these algorithms on text data is particularly important to IBM since the majority of its customers possess a lot of such data. Admittedly the data analyzed by Watson was curated and clean. As mentioned above, the majority of big data is not of such quality. IBM will need to test the system’s performance with noisier data, since corporate data is rather noisy, as well as with voice and video data. Moreover, incorporating online data will introduce additional noise which will undoubtedly impact the Watson’s performance. Watson cannot deal with incremental additions to its knowledge base regardless of the form these additions take, i.e., text documents or rules. Such additions necessitate the re-tuning of the predictive models used.
According to IBM, the majority of the software used in Watson is open source. That we can build such a sophisticated system from open source components is a feat in itself.
Of course IBM is very interested in business applications of the technologies that made Watson so successful. These application areas must involve the use of complex language, including the use of such language in the data to be analyzed, the need for reaching high-precision responses/decisions/actions using ambiguous and noisy data, as well as the need to provide responses with confidence in real-time. Some initial application ideas we discussed following the show included:
Medical diagnosis, including telemedicine. The body of medical knowledge is becoming large and growing very fast. Being able to work directly from text rather than having to first represent medical knowledge in some intermediate language, as was the case with expert systems in the past, could represent a big breakthrough.
Technical support. Help desks use mostly text data to provide answers to product issues. Increasing the accuracy of these responses while improving the overall user experience is important.
Insurance claims analysis. This is another area where the captured data is in text form and a system like Watson can be used to analyze it and provide a better user experience when consumers interact with their insurance provider.
Other application areas we discussed included online advertising, air traffic control and financial portfolio creation where Watson’s real-time analytics can be the central component of an overall solution.
The business model under which to offer Watson-like technology is another issue IBM is thinking about. Should such a system be offered as a service or as on-premise software? Of course it will depend on the application. One of the ideas we discussed was to run Watson as a service and charge by the difficulty of the question asked or the importance of the answer provided. This may be something that IBM can do for medical applications, for example, in collaboration with its traditional partners Mayo Clinic and Cleveland Clinic.
The Watson team deserves all the kudos they have received. Not only for their win in Jeopardy! but, more importantly, for their technical accomplishments, the water-cooler discussions that resulted the day after each broadcast which will hopefully make more people (particularly young people) interested in technology and engineering, and the technical conversations they motivated among engineers and scientists about what is possible regarding big data analytics and human/machine interactions.
Last week I attended Strata, a conference organized by O’ Reilly and devoted to big data. I was a large conference (790 attendees) whose content included both technical talks and tutorials about the new generation of big data tools, e.g., Hadoop, Cassandra, visualization, as well presentations on big data business applications. The diversity and size of the audience and the reported business successes provided a strong indication of how important and popular the area of big data has become.
Big data is pervasive in many of the companies Trident has funded the last few years. We have invested in companies that generate and/or process big data, e.g., eXelate, Extole, HomeAway, Sojern, Turn, Xata, as well as companies that provide platforms for storing, managing and analyzing big data, .e.g., Acteea, Host Analytics, Pivotlink. We recognize that many of the companies we invest in the future will need to have competence in big data.
There is a big difference between big data and data warehousing stemming primarily from the nature of the data. Data warehousing was all about analyzing transactional data that was captured from enterprise applications such an ERP or POS system. In addition to the actual transactions, big data is about capturing, storing, managing and analyzing data about the behavior of transactions, i.e., what happens before and after a transaction. This has several implications. First it means that the captured data is less structured. It is easier to analyze a collection of purchasing transactions in order to try to identify a pattern, instead of analyzing a series of selections made across of set of web pages to establish a pattern of behavior. Second it implies that meaning must be extracted from events, e.g., the browsing activity prior to buying an item. To be effective in this more open-ended exploratory data analysis one has to break through the data silos that are typically found in enterprises and bring all available data to bear. It also means that one must be collecting all available data rather than trying to decide a priori which data to collect and keep.
Data science is becoming a field. Big data is eliminating the segregation between the people who manage the data, the people who analyze the data, and the people who present/visualize the data. A good data scientist must be able to do all three, though, as I wrote last week, translating business requirements to a data problem and the resulting insights to business actions and value remain largely missing skills in data scientists. Good data scientists are in high demand, as indicated by the jobs being advertised at the conference and as reported at the conference by LinkedIn. They are expected to play a significant role on how their companies evolve. That’s not something we were used to hearing about data analysts who were always considered fixtures of the back office. I know because I started my career in data analysis.
Corporations have a lot to learn about big data from consumer-oriented companies that generate, manage and analyze big data, e.g., Amazon, eBay, Facebook, Twitter, and LinkedIn to name a few. This is a reversal of sorts. In the mid 90s when I was with IBM I was running an organization that was devoted to building data warehouses and providing analytical tools and services to Global 1000 companies. At that time various companies, including many of the then nascent Internet companies, were trying to learn from the data warehousing and business intelligence practices of Walmart, Citibank, and First Data. Today such companies will do well by trying to understand and apply the big data techniques being developed by many internet and social media companies. One big difference is how such companies approach data stores. Traditional businesses see the enterprise data warehouse as storing the “single version of truth” about the data. Big data stores are viewed as containing multiple perspectives. Their contents must be analyzed with the right set of tools in order to gain a perspective about the problem at hand.
Talking to the conference’s attendees I got the impression that more companies than ever before are starting to view data as an invaluable asset and a potential key to their success. They are no longer intimidated by data volumes and are using the new generation of big data management and analysis tools to bring more data under their control.
Strata was a great conference that brought under one roof the leaders in big data thinking, and doing. It also showed that, though increasingly important, this is still a small community and in many respects its overall size has not changed since the time I was one of the analysts. We all need to find ways to accelerate the education and introduction to market of new data scientists. The ability of many companies to continuously innovate, become leaders, and remain in this position could largely depend on their ability to recruit data scientists who can effectively exploit their big data assets.
Last week my partners and I hosted a meeting of our IT Advisory Board. This board consists of senior IT executives from Global 2000 companies including CIOs and CTOs. I will write about the topics discussed in this meeting in a few days, once I had the opportunity to clean up the several pages of notes I took. Today I wanted to relate one of the conversations I had with a couple of the board’s members during one of the meeting’s breaks. We started talking about the effective utilization of business analytics by companies. Both executives commented that their companies are increasing the utilization of analytics to understand their consumer customers. In fact, one of them stated that during the last 3 years their analytics group grew from 4 analysts to 20. Moreover, they stated that senior business management in their companies is more sensitized to the importance of analyzing corporate data to gain any type of competitive advantage. As I tried to ask them about the analytic tools and solutions their companies currently employ and how these have been changing over time to deal with the increasing volumes of data, they both stopped and said that their biggest obstacle for the broader utilization of business analytics was not technology but the proper and effective application of the information they extract from the analyses they perform.
Both of these companies have always been early adopters of analytic and associated data management technologies. Both executives indicated that they feel particularly good about the caliber of their corporate data analyst groups. However, today, by their admission, both of their companies lack the people who can provide a “two-way translation,” i.e.., first to properly translate a business problem to a data analysis problem (that can subsequently be tackled by the quants), and second to formulate (or re-translate) the analysis results in a way that business executives will understand, appreciate, and be able to act on. Companies that provide business intelligence (BI), analytics and even data warehousing solutions talk about how they target “business analysts.” The business analyst has become a mythical position in corporations. The business analyst is generic description for an individual who works for a business unit, as opposed to an IT organization, and uses such analytic solutions in the course of business. However, our advisory board members said that the business analysts in their companies use such solutions to address well-understood problems and activities, not novel situations that may call for an data-driven analytic approach. These business analysts may use a query and reporting solution like Qliktech’s or a multidimensional analysis solution such as Microstrategy’s but they are not able to provide the two-way “translation” I described above. In their opinion, the right translator/analyst must have the appropriate level of business understanding and experience to understand complex business problems in their entirety, be articulate enough to describe it appropriately to data analysts, the right amount of data knowledge to be able to broadly identify the types of data that will need to be utilized by the quants to provide insights and information, the ability to take these insights and relate them to the original problem providing actionable solutions and finally the executive gravitas to present these solutions to the business executive(s) who will act on them.
There exist independent consultants who play this two-way analytics translator role very effectively but their extensive and continuous use by corporations is not feasible, mostly for financial reasons; they are too expensive and in too high demand. In discussions I’ve had with some members of IBM’s 7000-strong analytics and optimization consulting unit I heard that those of their consultants who can provide such services are in the highest demand. In fact IBM can’t find enough of them to hire and deploy with corporate clients around the world.
So while we celebrate the development of new analytic tools or solutions that can deal with even larger and more complex quantities of data that must be processed faster than ever before, we must not lose sight of the fact that we must address our inability to identify enough “translators” who can help us analyze the right problems and effectively use the insights we discover.
Happy New Year to all! Like every year I am writing about the technology areas I will be following and focusing on during 2011. These areas build upon those my partners and I followed during 2010. During the holidays I wrote about online advertising, mobile and social web as areas Trident will continue to target.
Tablets and smartphones. In a couple of days I’ll be heading to CES where I expect that several vendors will be introducing new tablets and smartphones targeting different customer segments. My interest around these devices centers on the platforms they support, e.g., HTML5, novel features they will incorporate, e.g., NFC, the new types of applications these features will enable, e.g., mobile wallets, and the types of data they will be generating. Tablets and smartphones are sensor platforms.
Cloud computing, SaaS, and virtualization. Cloud computing was one of the biggest technology trends for 2010 and corporations continued to virtualize their data centers (see my comments from the Goldman conference). Cloud management and management of virtualized environments are two important areas we are targeting. Cloud management in particular is becoming a hot space. We just lost a deal in this area after significant competition with two other venture firms. I am also following closely the evolution of PaaS platforms and the SaaS applications they will be enabling, particularly now that enterprises have started aggressively adopting SaaS applications and developing their own cloud-based applications. We will continue looking for context-aware, social (see below) and vertical SaaS applications. In 2010 we invested in Acclaris (healthcare IT) but passed on several others.
App stores and application models. I am watching how the app store is developing as a general purpose applicaiton distribution mechanism. App stores are moving beyond smartphones (see what Apple is doing with the Mac App Store) into other consumer electronics devices, e.g., TVs, cars, (another area I’ll be watching at CES) and finally the enterprise. An area of interest is application discovery within app stores. As the number of applications offered by an app store increases, identifying those with the functionality that is appropriate for a particular task or specific business process will become very important. Finally, between the proliferation of app stores and the more extensive use of PaaS for application development we see a new model emerging for enterprise application delivery and licensing. Enterprise application functionality will be developed in much smaller chunks and will be priced accordingly, very much like it is happening today in smartphones.
Social computing for the enterprise. We are focusing on three areas within social computing for the enterprise: customer service where I think there is opportunity for significant innovation in business settings, marketing, where word of mouth and friend referral programs are proving very effective for B2C and B2B businesses, and Facebook ecommerce, because so many companies are now setting up their stores within Facebook. We are rethinking the workflows and business processes as we try to better understand how social computing can be used effectively in the enterprise.
Big data and analytics. We will be moving from just collecting and managing/organizing big data (web site data, social data, mobile data, data from the Internet of Things) to thinking how to effectively analyze it. In-memory analytics, Hadoop, Google’s Percolator are technologies we follow. Privacy and security will be important data-related issues that started coming to fore during 2010 and will remain so during 2011. While I don’t expect to see technology-driven solutions to these issues, I anticipate that during 2011 we will need to engage in healthy dialogs about what data privacy in today’s environment really means.
Long gone are the days when Dreamforce was a smallish conference devoted to SaaS; the first conference 10 years ago had fewer than 1000 attendees. This year's conference had over 30k attendees (business users, IT users and vendors) almost 70% higher than last year's. The lines in and around the Moscone, the hotel rates and the jammed restaurants, bars and parking lots around the conference venue provided adequate proof of the high attendance. This was an event of high importance to Salesforce and even to SaaS in general. My impressions:
Based on the attendee affiliations (small and large companies, business and IT users, foreign and domestic delegates) the event provided additional proof that SaaS and cloud computing have penetrated the enterprise for good, as several of us have been predicting. Sarah Friar of Goldman Sachs calls it the "unstoppable SaaS wave." Heroku is very significant acquisition for Salesforce. In addition to the development environment it provides, 1m Ruby application developers that are Heroku's community, including developers of mobile applications, can be channeled toward the platform Salesforce is putting in place.
The introduction of database.com along with Heroku's Ruby-based development environment now position Salesforce among the premier PaaS providers along with Microsoft, VMWare, and maybe even Red Hat thought its acquisition Makara. This is a significant development since Salesforce's force.com platform and APEX language alone were not adequate to provide a general purpose, world-class PaaS (in a previous post I wrote some initial thoughts on force.com). In addition, because of its applications heritage, Salesforce has a wealth of application know-how that it can reflect to its PaaS, whereas companies like Microsoft and VMWare must rely on their third party application developers to acquire the corresponding know-how. Salesforce needs to work quickly to integrate together all its pieces (Chatter, Jigsaw, force.com, database.com, Heroku tools, etc.), in the process defining and exposing the right APIs. In this way developers will be able to create applications for a variety of tasks and complexity, not just CRM-related applications as was the case with force.com. It was already announced that objects and services (application and platform) will be exposed through SOAP and REST APIs. Developers will not be restricted to program only in Ruby but will be able to use any language like Java, C# and PHP. They will also be able to create their own data models. Moreover, by opening up its PaaS, Salesforce will allow developers to use applications developed in other similar platforms like Azure.
The announcements of additional "clouds," such as the one for web site development, prove that Salesforce continues to have a strong vision for where SaaS and cloud computing can go.
As we've seen in previously published surveys, security is no longer the top concern for SaaS adoption. Data and application integration have claimed that spot indicating that we are moving to a phase of trying to make on premise systems work well with the cloud-based ones. The presence of several of the major Indian and Chinese IT outsourcing companies all of which had big booths at the show indicates that the they now see a significant opportunity around systems integration that involves SaaS applications.
As investors we are excited particularly about the PaaS announcements. The emergence of another strong PaaS and the competition it is bound to generate among Salesforce, VMWare, Microsoft, Red Hat, and potentially Google, will be beneficial on two fronts. First, the competition will result in further PaaS innovations. This is obviously good for SaaS application developers who will consider more seriously a PaaS as a viable alternative on top of which to develop a new SaaS application. The improved capabilities of PaaS platforms will also accelerate application development resulting in the creation of new, and most likely, innovative packaged SaaS applications; the type we as investors like funding. The competition among PaaS providers will not only good for the continuing penetration of SaaS applications, but also for lowering the operating costs of deploying and supporting a SaaS applications, thus improving the application vendors' margins. While the PaaS pricing announced by Salesforce announced for the PaaS are on the high side, particularly for smaller ISVs, I expect that competition will lead to lower prices. My only concern from Salesforce's PaaS-related announcements is whether the company can develop the right DNA and evolve into an infrastructure company to ultimately implement the world-class PaaS it announced, since at heart it is still an applications company.
In 1996, while I was running IBM’s BI solutions organization, one of my groups developed the Surfaid web analytics solution. Surfaid, one of the first such solutions, was later acquired by Coremetrics (that was in turn recently acquired by IBM and made part of its marketing automation solution). Later on Omniture dominated the high-end web analytics market by figuring out the right ingredients of a web analytics solution (quick set up, effective data collection and management, informative reports for the emarketer) and the right model for delivering it (SaaS). Google also entered the market and dominated the low-end with a free offering. The evolution of this market over the past 10 years has taught us that web analytics will remain a relatively small component of the overall analytics and BI market.
Ecommerce marketers and merchandisers were some of the earliest adopters of web analytics. However, the continued growth of ecommerce combined with the increasing complexity of the decisions these business users must make is causing retailers (both pureplay and multi-channel) to look for more sophisticated analytic solutions than those offered by the web analytics vendors, or the ecommerce platform vendors, e.g., Demandware and ATG (recently acquired by Oracle), could offer. While web analytics solutions can be used to determine which landing pages encourage customers to make a purchase, or which pay per click ad campaigns are most effective, ecommerce marketers and merchandisers want to now understand customer loyalty, the impact of their customer retention strategies (discounts, coupons, extra services), and the customer segments with the greatest Lifetime Value (LTV). Today’s web analytics solutions cannot address these needs.
During the summer of 2009 Infopia, one of my portfolio companies and an ecommerce platform company at the time, asked the larger of its 300 etailers about their analytics needs. The company found out that while all of its customers were using a web analytics solution, none were able to address their particular ecommerce decision needs through these solutions. The market demand for a sophisticated ecommerce analytic application was so strong that the company’s board decided to direct significant resources towards the development of such an application. A year later the company sold its ecommerce platform business to Versata, changed its name to Acteea, did a pivot and is now a SaaS ecommerce analytic application company. Acteea’s SaaS analytic application enables merchandisers and marketers to track customer LTV and define winning product and customer segment strategies.
Acteea’s analytic application integrates into a cloud-based data mart pricing data, marketing campaign effectiveness data, adword data (words bid and words purchased), customer order data, inventory data, web site activity data (what is typically fed to the web analytics solutions), customer activity data from other channels, e.g., catalog, promotions data, and competitor pricing data. Data integration and cleaning has become a very complex process compared to the data integration that web analytics solutions must address. Extole, another of my portfolio companies, has developed a SaaS platform for social marketing that is quickly being adopted by etailers. The data produced by the platform’s applications (3 to date) will undoubtedly become another source to Acteea’s analytics as it can help etailers with decisions around social commerce.
In a recent board meeting we reviewed some of the early successes Acteea’s customers are having through the use of the company’s analytic application. For example, one of the companies analyzed keyword, adword, web analytics, pricing and product catalog data, refined its adword bidding approach and identified “driver” products that drew existing target customers into making add-on sales. Another customer began measuring the total return on marketing investment, and cart value by customer segment which led to revamping customer segmentation based on channel-loyalty, and cross-channel behavior leading to improved quarter-over-quarter sales. Finally, a third customer analyzed product sales and gross margin return on inventory and quickly identified the lowest performing products eliminating them from the appropriate channels, including the web site.
It is still too early to tell whether Acteea’s pivot will be successful, though the initial results are encouraging. Regardless, the company’s work is proving the market’s need for complex ecommerce analytic solutions that are distinct from the existing web analytic toolkits that have been available.
Over the past couple of years I have met with several startups that offer analytic solutions for mobile data. I have not invested in any of them. I had felt that the data captured from feature phones and early generations of smartphones was not rich enough to lead to interesting and distinct analytics. For example, while data captured from a mobile web browser such as sites visited, pageviews, time spent browsing could be analyzed, we didn’t need a new company to do that. Omniture could do that just fine. However, the new smartphones capture more interesting data. These data sets could drive the creation of a new and interesting analytics. As a result, I am becoming interested in mobile data analytics and have been actively looking for investment opportunities in this sector.
The new smartphones are becoming sensor platforms, as well as being computing platforms. In addition to photo and video camera, touch screen, GPS and accelerometer, new types of sensors are being connected to smartphones. For example, Bling Nation has introduced a sensor that adheres to a smartphone and is linked to the user’s PayPal account. Our own portfolio company Zeo has announced that it will connect its sensor to the iPhone in order to capture sleep-related data. Some of the data sets generated by all these sensors that I find interesting include:
The time-series of GPS and accelerometer data for each subscriber. By analyzing these time-series one can predict where and when the subscriber will be next and offer relevant services at the predicted location, e.g., parking availability with offers from parking garages.
Data generated from the use of augmented reality (AR) application can create new advertising opportunities, as well the opportunities to serve up relevant content the user had not thought to ask for.
Configuration data on the complete software stack running on each phone (from firmware to operating system to application software). This data can then be used, for example, by an app store to recommend new available applications that will be augment the user’s productivity. Such configuration databases today exist only in corporate IT settings.
Mobile payments data combined with geolocation data. Analyzing this data can lead to predictions about customer brand or product loyalty.
Entertainment-related applications, e.g., gaming, and health care applications, e.g., prescription dispensing, will also benefit from the analysis of this type of data. I am not certain whether new data management systems will be necessary for such data sets, though I imagine that the data will be big and complex, particularly as various time-series are captured, and will be stored in the cloud.
The wireless carriers may not be in the best position of collecting this data, not only because of their lack of experience with diverse data types, but also because they are regulated businesses. Google and Apple are in a much better position because they already collect much of this data through their Android and iOS platforms respectively. While these companies may also be best able to mine the data, they won’t enter this business in the near term. Instead it will be startups that will first experiment with creating interesting data sets out of the collected data and analyzing them. My assumption, also driving my interest in the sector, is that companies like Google will wait to see how these “experiments” go and proceed to acquire the more interesting of the analytics startup companies.
Users will need to give their permission for this rich data to be collected and combined. Vendors, including wireless carriers, will get the users’ permission by offering free services (something for which consumers have shown interest and affinity), better experience (optimized bandwidth, improved application performance, more accurate recommendations around applications, products, services, social connections, etc), and more accurate targeting of ads in ad-supported services.
The mobile space remains highly fragmented and the talent to create and analyze these data sets may be hard to find. The new smartphone platforms present opportunities for collecting valuable data sets that will lead to the development of unique analytics which will in turn drive important and novel decisions. Startups can lead the way to create these analytics and the enterprise platforms that manage them.