Discussão:Observatório de dados/BI/Capacidades
Adicionar tópico...
Outras classificações
[editar código]- Dados: arquivo, online-dataset, conexão SQL, select lines, select rows, etc. e a gestão de metadados ou controles de visualização amarrados à fonte de dados.
- Visualize
- Model
- Evaluate
- Unsupervised
- Data Fusion: relacionamentos,
- Educational
- Text
- Network: incluindo network file, explorer, e network analysis
- Bioinformatics
- Associate: regular expressions
- Image Analytics
Arquitetura interna do Orange:
- Data model (data): Data Storage (storage); Data Table (table); SQL table (data.sql); Domain description (domain); Variable Descriptors (variable); Values (value); Data Instance (instance); Data Filters (filter); Loading and saving data (io)
- Data Preprocessing (preprocess): Impute; Discretization; Continuization; Normalization; Randomization; Remove; Feature selection; Preprocessors
- Classification (classification): Logistic Regression; Random Forest; Simple Random Forest; Softmax Regression; k-Nearest Neighbors; Naive Bayes; Support Vector Machines; Linear Support Vector Machines; Nu-Support Vector Machines; One Class Support Vector Machines; Classification Tree; Simple Tree; Majority Classifier; Elliptic Envelope; Neural Network; CN2 Rule Induction
- Regression (regression): Linear Regression; Polynomial; Mean; Random Forest; Simple Random Forest; Regression Tree; Neural Network
- Clustering (clustering): Hierarchical (hierarchical)
- Distance (distance): Handling discrete and missing data; Supported distances
- Evaluation (evaluation): Sampling procedures for testing models (testing); Scoring methods (scoring)
- Projection (projection): PCA; FreeViz; LDA
- Comparação
- Variação no tempo (análise dinâmica)
- Ranking (ordenação)
- Análise espacial (geoprocessamento tipo Turf)
- Visualização de fluxo
- Parte-todo (relecionamentos e joins)
- Análise de distribuição
- Análise de correlações
- visualização solteira (visualizar dados soltos)
- filtro
- narrativa (contar história com os dados)
Gartner's Critical Capabilities Definition
[editar código]Apesar de ser o mais citado, é tão difuso quanto os demais, e amplamente criticado por desconsiderar outras classificações, produtos clássicos (como SQL e ferramentas clássicas de report e estatística), e por ter sido "feito sob encomenda" para as ferramentas promovidas pelas grandes (IBM, Tablot e MS), em particular a Microsoft.
Ainda assim uma parte das definições e terminologia pode ser aproveitada para dados abertos... Aparentemente as principais são:
- (Infrastructure for) Cloud BI
- (Infrastructure for) Data Source Connectivity and Ingestion
- Metadata Management
- Self-Contained ETL (Extraction, Transformation and Loading) and Data Storage
- Data Preparation
- Analytic Dashboards
- Interactive Visual Exploration
- Publish, Share and Collaborate
Abaixo reprodução do relatório Gartner de 2018.
"Magic Quadrant for Analytics and Business Intelligence Platforms", published in 26 February 2018, ID G00326555. http://resources.mynewsdesk.com/image/upload/t_attachment/xk56jpklxthehxes8fr2.pdf.
Infrastructure
[editar código]1. BI Platform Administration, Security and Architecture. Capabilities that enable platform security, administering users, auditing platform access and utilization, and ensuring high availability and disaster recovery.
2. Cloud BI. Platform-as-a-service and analytic-application-as-a-service capabilities for building, deploying and managing analytics and analytic applications in the cloud, based on data both in the cloud and on-premises.
3. Data Source Connectivity and Ingestion. Capabilities that allow users to connect to structured and unstructured data contained within various types of storage platforms (relational and nonrelational), both on-premises and in the cloud.
Data Management
[editar código]4. Metadata Management. Tools for enabling users to leverage a common semantic model and metadata. These should provide a robust and centralized way for administrators to search, capture, store, reuse and publish metadata objects such as dimensions, hierarchies, measures, performance metrics/key performance indicators (KPIs), also report layout objects, parameters and so on. Administrators should have the ability to promote a business-user-defined data mashup and metadata to the SOR metadata.
5. Self-Contained Extraction, Transformation and Loading (ETL) and Data Storage. Platform capabilities for accessing, integrating, transforming and loading data into a self-contained performance engine, with the ability to index data and manage data loads and refresh scheduling.
6. Self-Service Data Preparation. "Drag and drop" user-driven data combination of different sources, and the creation of analytic models such as user-defined measures, sets, groups and hierarchies. Advanced capabilities include machine-learning-enabled semantic autodiscovery, intelligent joins, intelligent profiling, hierarchy generation, data lineage and data blending on varied data sources, including multistructured data.
7. Scalability and Data Model Complexity. The degree to which the in-memory engine or indatabase architecture handles high volumes of data, complex data models, performance optimization and large user deployments.
Analysis and Content Creation
[editar código]8. Advanced Analytics for Citizen Data Scientist. Enables users to easily access advanced analytics capabilities that are self-contained within the platform itself through menudriven options or through the import and integration of externally developed models.
9. Analytic Dashboards. The ability to create highly interactive dashboards and content with visual exploration and embedded advanced and geospatial analytics to be consumed by others.
10. Interactive Visual Exploration. Enables the exploration of data via an array of visualization options that go beyond those of basic pie, bar and line charts to include heat and tree maps, geographic maps, scatter plots and other special-purpose visuals. These tools enable users to analyze and manipulate the data by interacting directly with a visual representation of it to display as percentages, bins and groups.
11. Augmented Data Discovery. Automatically finds, visualizes and narrates important findings such as correlations, exceptions, clusters, links and predictions in data that are relevant to users without requiring them to build models or write algorithms. Users explore data via visualizations, natural-language-generated narration, search and naturallanguage query (NLQ) technologies.
12. Mobile Exploration and Authoring. Enables organizations to develop and deliver content to mobile devices in a publishing and/or interactive mode, and takes advantage of mobile devices' native capabilities, such as touchscreen, camera and location awareness.
Sharing of Findings
[editar código]13. Embedding Analytic Content. Capabilities including a software developer's kit with APIs and support for open standards for creating and modifying analytic content, visualizations and applications, embedding them into a business process and/or an application or portal. These capabilities can reside outside the application, reusing the analytic infrastructure, but must be easily and seamlessly accessible from inside the application without forcing users to switch between systems. The capabilities for integrating analytics and BI with the application architecture will enable users to choose where in the business process the analytics should be embedded.
14. Publish, Share and Collaborate on Analytic Content. Capabilities that allow users to publish, deploy and operationalize analytic content through various output types and distribution methods, with support for content search, scheduling and alerts. These capabilities enable users to share, discuss and track information, analysis, analytic content and decisions via discussion threads, chat and annotations.
Overall platform capabilities were also assessed
[editar código]15. Ease of Use, Visual Appeal and Workflow Integration. Ease of use to administer and deploy the platform, create content, consume and interact with content, as well as the degree to which the product is visually appealing. This capability also considers the degree to which capabilities are offered in a single, seamless product and workflow, or across multiple products with little integration.