Checklist para advanced analytics

Ya sabemos que la palabra de moda es Analytics (ya sea como machine learning, artificial inteligence o deep learning). Aquellas empresas que no están generando estas capacidades dejarán de ser competitivas (lo dice todo el mundo Gartner, IDC, Harvard Business School, MIT,…). Podemos discutir (o no) si, se creemos en esto o si debemos creerlo con matices (no es la discusión de hoy).

Hoy estoy de retrospectiva.

Desde TDWI (en 2009!), se preparó una lista  – checklist lo llamaban – para advanced analytics que toda persona interesada en el tema debía (según ellos) echar un vistazo (como os podéis imaginar el enlace ya no existe o al menos yo no lo he encontrado). Creo que es interesante recuperar el tema. Veamos la lista:

  1. Uso de advanced analytics para descubrir relaciones y anticipar el futuro.
  2. Escalar la integración de datos para aumentar el alcance de volúmenes de datos a analizar.
  3. Identificar que reporting y analytics tiene diferentes objetivos y necesidades.
  4. Distinguir entre data warehouse, data mart y bases de datos analíticas.
  5. Diseñar una arquitectura de data warehouse que encaje con el análisis.
  6. Preparar los datos para cumplir las necesidades del método de análisis escogido.
  7. Preservar la riqueza de los datos, dado que en ella están ocultas los patrones buscados.
  8. Mejorar los datos después de trabajar con ellos, no antes. Es decir, incorporar los resultados a los datos.
  9. Aplicar el análisis al BI y al DW.

Algunos encontrarán esta lista obvia (de sentido común, dirán otros). Lo interesante de esta lista es que con unas ligeras modificaciones la adaptamos al contexto actual.

  1. Uso de machine learning y deep learning para descubrir relaciones y anticipar el futuro.
  2. Escalar la ingestión de datos para aumentar el alcance de datos complejos (big data) a analizar.
  3. Identificar que reporting y analytics tiene diferentes objetivos y necesidades.
  4. Distinguir entre big data, data lakedata warehousedata mart y bases de datos analíticas.
  5. Diseñar una arquitectura de data warehouse y/o big data que encaje con el análisis.
  6. Preparar los datos para cumplir las necesidades del método de análisis escogido.
  7. Preservar la riqueza de los datos, dado que en ella están ocultas los patrones buscados.
  8. Mejorar los datos después de trabajar con ellos, no antes. Es decir, incorporar los resultados a los datos.
  9. Aplicar el análisis al BI, DW y al Big Data.

Está claro que detrás de esta lista muy simple hay muchos detalles complicados. Como, por ejemplo, el punto 4 puede llegar a ser realmente interesante. Preguntas como: ¿es necesario desplegar una arquitectura ad-hoc, es suficiente con una out-of-the-box -que haremos evolucionar- y/o hacemos uso de recurso de cloud computing?

En todo caso, es interesante ver que esta lista sigue bastante vigente.

Pentaho Business Analytics 1st Edition Review

Just a few weeks ago, Pentaho updated of its products (both CE and commercial). Business as usual. The latest version, currently 5.1, is accompanied by the latest developer tools as usual. You can get them from Sourceforge. In summary, the new version includes better integration with MongoDB, R, Weka and Yarn.

Packt Publishing has given me the opportunity to review the new book for Pentaho which name is “Pentaho Business Analytics”, the new name for the platform. Let’s start with the good news: “Pentaho Business Analytics” from Sergio Ramazzina provides the most up-to-date overview of the Pentaho platform by examples. It’s a quick walkthrough (625 pages!) that covers the main aspects that a developer must know about the platform.

The bad news. This is not a book about Business Analytics so don’t confuse the marketing name of the Pentaho platform with the data strategy. You will not find any Business Analytics recipes in this book. Another drawback is that the new capabilities and features are omitted but this is understandable as this is not the objective of the book. It may be a nice chapter to add for the second version.

The book starts with the Pentaho User Console (chapter 1) and BA Server instance configuration (chapter 2). After so many books explaining time after time how to install pentaho server is good to find a fresh approach. At least considers that there is enough public information to be able to do these initial steps.

These first two chapters includes the most common tasks that a developer / administrator needs to manage a Pentaho server instance. Taking into consideration the huge changes from version 4 to 5, this knowledge will be quite welcome to the reader.

Then it continues with: how to deal with data sources (chapter 3), working pentaho metadata editor (chapter 4), creating my first report with pentaho interactive reporting (chapter 5), creating my first mondrian model and OLAP analysis (chapter 6), creating my first report with pentaho reporting (chapter 7), creating my first dashboard (chapter 8), how to use scheduling (chapter 9), how to use and setup pentaho mobile (chapter 10) and server customization (chapter 11). Many of these chapters should be know by an average developer, if not this is the right time to start.

As you can imagine if you want to master one of the developers tools, this is not your book. You have other books such as “Pentaho Reporting 5.0 by Example Beginner’s Guide” that provides a detailed overview of using Pentaho Report Designer by examples.

Why this book may be interesting for you?

If you are new to Pentaho you will appreciate the walkthrough that helps you to introduce the complete platform. If you want to upgrade from version 4 to 5, chapters 1, 2, 9 and 11 will help you to become familiar with the new environment and how to be able to do again all the main tasks to manage your platform. However if you are already working with version 5 and you have already mastered the platform this book will be provided you limited value. Moreover, there is a strong focus on the professional flavour of the platform and only some sections here and there for the community version. It will be nice to add a comparison at the beginning of each chapter.

In summary, a more-than-welcome book about the latest version of BI pentaho platform. It tries to provide a complete overview (based on foodmart data) going through metadata, reporting and OLAP. If you are struggling with users, permissions, folders,… this is you book!

You can buy the book here if you are interested.