DBMS functions are the responsibility of and performed by the DBMS 'for all applications'. There are several categories thereof, but here we will focus on the one including the three functions defined by a data model such as the RDM: structure, integrity and manipulation. A true RDBMS has a relational 'data sub-language' with which (1) relations and (2) integrity constraints are defined and data can be retrieved by (3) manipulating relations mathematically as sets -- i.e., applying the set operations of the relational algebra on them. Restricting the sub-language to the three data management functions makes first order predicate logic (FOPL) sufficient, with the following advantages:
- The language is declarative, decidable, and simpler;
- Logical validity and semantic correctness of query results are 'system-guaranteed';
- Physical and logical independence are supported.
[Note: Other DBMS functions are storage management, performance optimization, concurrency control, transaction management. They are not the purview of the data model, are performed by the DBMS, but not via the data sub-language, because they require higher logic than FOPL (see next).]
Application-specific functions are the responsibility of and performed by each application. They include (1) result preparation and (2) presentation and (3) communication with the DBMS. They are not limited to data management, so they require higher logic than FOPL and, thus, 'computationally complete' languages (CCL). A CCL (e.g., programming language), is expressively more powerful than a relational data sub-language, but:
- Is imperative (procedural), prone to undecidability and significantly more complex;
- Logical validity and semantic correctness are not system-guaranteed;
- Physical and logical independence are not supported.
'Proper hosting' of relational data sub-languages in CCLs is the optimal division of labor between the DBMS and applications: relational advantages are preserved for data management functions performed by the DBMS and only application-specific functions are relegated to applications. Violating this arrangement by extending either (a) the data sub-language with application functions, or (b) the host CCL with data management functions destroys optimality and causes a plethora of costly complications
Consider now the above two "solutions" for analytics. SQL was intended to be a relational data sub-language and, as such, should have expressed strictly the three data management DBMS functions. So, first, forcing SQL to perform analytics functions is a type (a) violation. Second, analytic functions are application-specific and, therefore, tools that purport to "perform analytics without SQL" is a trivial promise that should be the case 'by definition.' The correct perspective for 'databased analytics' is, of course, that applications use CCLs that (1) host SQL for data management DBMS functions -- data retrieval -- and (2) perform analytics functions that do not require, or rely on SQL.
This should not be taken as a SQL recommendation for data management functions. It is neither truly and fully relational, nor a well designed language, nor a strictly data sub-language, as it haphazardly includes some application-specific functions. But unfortunately, SQL is the industry's standard database language that has no superior alternative. This is in itself due to its authors lacking a good grasp of data fundamentals and the RDM and contributes to the persistent failure to understand and appreciate the optimal division of labor and to provide tools that enforce it.
Therefore, to avoid the traps that industry practices and products impose, it behooves the analyst to acquire foundation knowledge.