Skip to content

Commit

Permalink
Update several documentation files
Browse files Browse the repository at this point in the history
  • Loading branch information
frbattid committed Mar 1, 2017
1 parent 2e38bae commit c7545c1
Show file tree
Hide file tree
Showing 5 changed files with 55 additions and 31 deletions.
39 changes: 33 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,46 @@
#Cosmos
#<a name="top"></a>Cosmos
[![License Badge](https://img.shields.io/badge/license-AGPL-blue.svg)](https://opensource.org/licenses/AGPL-3.0)
[![Documentation Status](https://readthedocs.org/projects/fiware-cosmos/badge/?version=latest)](http://fiware-cosmos.readthedocs.org/en/latest/?badge=latest)

This project is part of [FIWARE](http://fiware.org).

[Cosmos](http://catalogue.fiware.org/enablers/bigdata-analysis-cosmos) is the code name for the Reference Implementation of the BigData Generic Enabler of FIWARE.

Cosmos comprises several different sub-projects:
[Cosmos](http://catalogue.fiware.org/enablers/bigdata-analysis-cosmos) is the code name for the Reference Implementation of the BigData Generic Enabler of FIWARE, a set of tools and developments helping in the task of enabling a Hadoop as a Service (HasS) deployment:

* A set of administration tools such as HDFS data copiers and much more, under [cosmos-admin](./cosmos-admin) folder.
* An OAuth2 tokens generator, under [cosmos-auth](./cosmos-auth) folder.
* A web portal for users and accounts management, running MapReduce jobs and doing I/O of big data, under [cosmos-gui](./cosmos-gui) folder.
* A custom authentication provider for Hive, under [cosmos-hive-auth-provider](./cosmos-hive-auth-provider).
* A REST API for running MapReduce jobs in a shared Hadoop cluster, under [cosmos-tidoop-api](./cosmos-tidoop-api).
* A specific OAuth2-base proxy for Http/REST operations [cosmos-proxy](./cosmos-proxy).
* A specific OAuth2-base proxy for Http/REST operations, under [cosmos-proxy](./cosmos-proxy).

[Top](#top)

##If you want to use Cosmos Global Instance in FIWARE Lab
If you are looking for information regarding the specific deployment of Cosmos Global Instance in FIWARE Lab, a HaaS ready to use, please check this documentation:

* [Quick Start Guide](./doc/manuals/quick_start_guide_new.md) for Cosmos users.
* Details on using [OAuth2 tokens](./doc/manuals/user_and_programer_manual/using_oauth2.md) as authentication and authorization mechanism.
* Details on using [WebHDFS](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html) REST API for data I/O (you can also check [this](./doc/manuals/user_and_programer_manual/data_management_and_io.md) link).
* Details on using [Tidoop](./doc/manuals/user_and_programer_manual/using_tidoop.md) REST API for MapReduce job submission.
* Details on developing [MapReduce jobs and Hive clients](./doc/manuals/user_and_programer_manual/using_hadoop_and_ecosystem.md) (Already developed Hive clients can also be found [here](./resources/hiveclients/)).
* In general, you may be insterested in the [User and Programming Guide](./doc/manuals/user_and_programer_manual), also available in [readthedocs](http://fiware-cosmos.readthedocs.io/en/latest/).

[Top](#top)

##If you want to deploy and use your own private Hadoop instance
This is the case you don't rely on the Global Instance of Cosmos in FIWARE Lab. In this case, you'll have to install, configure and manage your own Hadoop private instance. The Internet is plenty of documentation that will help you.

##<a name="contact"></a>Reporting issues and contact information
[Top](#top)

##If you want to deploy your own public Cosmos instance
In the (extremly rare) case you are not interested in using the Global Instance of Cosmos or a private instance of Hadoop, but you want to become a Big Data service provider, and you want to base on Cosmos software, you may be interested in the following links:

* [Deployment details](doc/deployment_examples/cosmos/fiware_lab.md) for administrators trying to replicate Cosmos Global Instance in FIWARE Lab.
* In general, you may be insterested in the [Installation and Administration Guide](./doc/manuals/installation_and_administration_manual), also available in [readthedocs](http://fiware-cosmos.readthedocs.io/en/latest/).

[Top](#top)

##Reporting issues and contact information
There are several channels suited for reporting issues and asking for doubts in general. Each one depends on the nature of the question:

* Use [stackoverflow.com](http://stackoverflow.com) for specific questions about this software. Typically, these will be related to installation problems, errors and bugs. Development questions when forking the code are welcome as well. Use the `fiware-cosmos` tag.
Expand All @@ -27,3 +52,5 @@ There are several channels suited for reporting issues and asking for doubts in
* [[email protected]]([email protected]) **[Contributor]**

**NOTE**: Please try to avoid personally emailing the contributors unless they ask for it. In fact, if you send a private email you will probably receive an automatic response enforcing you to use [stackoverflow.com](http://stackoverflow.com) or [ask.fiware.org](https://ask.fiware.org/questions/). This is because using the mentioned methods will create a public database of knowledge that can be useful for future users; private email is just private and cannot be shared.

[Top](#top)
4 changes: 2 additions & 2 deletions cosmos-tidoop-api/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
#Tidoop REST API
cosmos-tidoop-api exposes a RESTful API for running MapReduce jobs in a shared Hadoop environment.

Why emphasize in <i>a shared Hadoop environment</i>? Because shared Hadoops require special management of the data and the analysis processes being run (storage and computation). There are tools like [Oozie](https://oozie.apache.org/) in charge of running MapReduce jobs as well through an API, but they do not take into account the access to the run jobs, their status, results, etc must be controlled. In other words, using Oozie any user may kill a job by knowing its ID; using cosmos-tidoop-api only the owner of the job will be able to.
Please observe we emphasize in <i>a shared Hadoop environment</i>. This is because shared Hadoops require special management of the data and the analysis processes being run (storage and computation). There are tools like [Oozie](https://oozie.apache.org/) in charge of running MapReduce jobs as well through an API, but they do not take into account the access to the run jobs, their status, results, etc must be controlled. In other words, using Oozie any user may kill a job by knowing its ID; using cosmos-tidoop-api only the owner of the job will be able to.

The key point is to relate all the MapReduce operations (run, kill, retrieve status, etc) to the user space in HDFS. This way, simple but effective authorization policies can be stablished per user space (in the most basic approach, allowing only a user to access it own user space). This can be easily combined with authentication mechanisms such as [OAuth2](http://oauth.net/2/).

Finally, it is important to remark cosmos-tidoop is being designed to run in a computing cluster, but in charge of analyzing the data within a storage cluster. Sometimes, of course, both storage and computing cluster may be the same, splitted software is ready for that.
Finally, it is important to remark cosmos-tidoop-api is being designed to run in a computing cluster, in charge of analyzing the data within a storage cluster. Sometimes, of course, both storage and computing cluster may be the same; even in that case the software is ready for that.

Further information can be found in the documentation at [fiware-cosmos.readthedocs.io](http://fiware-cosmos.readthedocs.io/en/latest/).
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,18 @@ Content:
* [Administration](#section4)

##<a name="section1"></a>Installation
This is a software written in JavaScript, specifically suited for [Node.js](https://nodejs.org) (<i>JavaScript on the server side</i>). JavaScript is an interpreted programming language thus it is not necessary to compile it nor build any package; having the source code downloaded somewhere in your machine is enough.

###<a name="section1.1"></a>Prerequisites
This PER proxy has no sense if an Identity Manager (Keyrock implementation can be found [here](http://catalogue.fiware.org/enablers/identity-management-keyrock)) is not installed. The same applies to [Cosmos](http://catalogue.fiware.org/enablers/bigdata-analysis-cosmos).

As said, cosmos-proxy is a Node.js application, therefore install it from the official [download](https://nodejs.org/download/). An advanced alternative is to install [Node Version Manager](https://github.com/creationix/nvm) (nvm) by creationix/Tim Caswell, which will allow you to have several versions of Node.js and switch among them.
cosmos-proxy is a Node.js application, therefore install it from the official [download](https://nodejs.org/download/). An advanced alternative is to install [Node Version Manager](https://github.com/creationix/nvm) (nvm) by creationix/Tim Caswell, which will allow you to have several versions of Node.js and switch among them.

Of course, common tools such as `git` and `curl` may be needed.

[Top](#top)

###<a name="section1.2"></a>Installation
This is a software written in JavaScript, specifically suited for [Node.js](https://nodejs.org) (<i>JavaScript on the server side</i>). JavaScript is an interpreted programming language thus it is not necessary to compile it nor build any package; having the source code downloaded somewhere in your machine is enough.

Start by creating, if not yet created, a Unix user named `cosmos-proxy`; it is needed for installing and running the application. You can only do this as root, or as another sudoer user:

$ sudo useradd cosmos-proxy
Expand Down Expand Up @@ -71,7 +71,7 @@ cosmos-proxy is configured through a JSON file. These are the available paramete
* **idm**:
* **host**: FQDN or IP address where the Identity Manager runs. Do not write it in URL form!
* **port**: port where the Identity Manager listens for requests. Typically 443.
* **public_paths_list**: paths can be reached for all users.
* **public\_paths\_list**: paths can be reached for all users.
* **superuser**: superuser authorized to access all the HDFS paths.
* **log**:
* **file_name**: path of the file where the log traces will be saved in a daily rotation basis. This file must be within the logging folder owned by the the user `cosmos-auth`.
Expand Down
20 changes: 9 additions & 11 deletions doc/manuals/installation_and_administration_manual/tidoop.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#<a name="top"></a>Tidoop
#<a name="top"></a>Tidoop REST API
Content:

* [Introduction](#section1)
Expand All @@ -13,31 +13,29 @@ Content:
* [Submitted jobs](#section5.2)

##<a name="section1"></a>Introduction
Tidoop is the codename for all the developments about [Hadoop](http://hadoop.apache.org/) made by FIWARE team at Telefónica Investigación y Desarrollo (Telefónica Research and Development, in english), or <i>TID</i> in its abreviated form.
cosmos-tidoop-api exposes a RESTful API for running MapReduce jobs in a shared Hadoop environment.

Tidoop comprises several different projects:
Please observe we emphasize in <i>a shared Hadoop environment</i>. This is because shared Hadoops require special management of the data and the analysis processes being run (storage and computation). There are tools like [Oozie](https://oozie.apache.org/) in charge of running MapReduce jobs as well through an API, but they do not take into account the access to the run jobs, their status, results, etc must be controlled. In other words, using Oozie any user may kill a job by knowing its ID; using cosmos-tidoop-api only the owner of the job will be able to.

* API extensions for Hadoop (tidoop-hadoop-ext).
* MapReduce job library (tidoop-mr-lib).
* REST API for the above MapReduce job library (tidoop-mr-lib-api).
The key point is to relate all the MapReduce operations (run, kill, retrieve status, etc) to the user space in HDFS. This way, simple but effective authorization policies can be stablished per user space (in the most basic approach, allowing only a user to access it own user space). This can be easily combined with authentication mechanisms such as [OAuth2](http://oauth.net/2/).

Fully detailed information about Tidoop can be found at [Gihub](http://github.com/telefonicaid/fiware-tidoop).
Finally, it is important to remark cosmos-tidoop-api is being designed to run in a computing cluster, in charge of analyzing the data within a storage cluster. Sometimes, of course, both storage and computing cluster may be the same; even in that case the software is ready for that.

[Top](#top)

##<a name="section2"></a>Installation
This is a software written in JavaScript, specifically suited for [Node.js](https://nodejs.org) (<i>JavaScript on the server side</i>). JavaScript is an interpreted programming language thus it is not necessary to compile it nor build any package; having the source code downloaded somewhere in your machine is enough.

###<a name="section2.1"></a>Prerequisites
This REST API has no sense if tidoop-mr-lib is not installed. And tidoop-mr-lib has only sense in a [Hadoop](http://hadoop.apache.org/) cluster, thus both the library and Hadoop are required.
This REST API has only sense with a [Hadoop](http://hadoop.apache.org/) cluster administrated by the reader.

As said, cosmos-tidoop-api is a Node.js application, therefore install it from the official [download](https://nodejs.org/download/). An advanced alternative is to install [Node Version Manager](https://github.com/creationix/nvm) (nvm) by creationix/Tim Caswell, which will allow you to have several versions of Node.js and switch among them.
cosmos-tidoop-api is a Node.js application, therefore install it from the official [download](https://nodejs.org/download/). An advanced alternative is to install [Node Version Manager](https://github.com/creationix/nvm) (nvm) by creationix/Tim Caswell, which will allow you to have several versions of Node.js and switch among them.

Of course, common tools such as `git` and `curl` are needed.

[Top](#top)

###<a name="section2.2"></a>API installation
This is a software written in JavaScript, specifically suited for [Node.js](https://nodejs.org) (<i>JavaScript on the server side</i>). JavaScript is an interpreted programming language thus it is not necessary to compile it nor build any package; having the source code downloaded somewhere in your machine is enough.

Start by creating, if not yet created, a Unix user named `cosmos-tidoop`; it is needed for installing and running the application. You can only do this as root, or as another sudoer user:

$ sudo useradd cosmos-tidoop
Expand Down
15 changes: 7 additions & 8 deletions resources/hiveclients/python/hiveserver2-client.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-

# Copyright 2016 Telefonica Investigación y Desarrollo, S.A.U
# Copyright 2015 Telefonica Investigación y Desarrollo, S.A.U
#
# This file is part of fiware-cosmos (FI-WARE project).
# This file is part of fiware-cygnus (FI-WARE project).
#
# fiware-cosmos is free software: you can redistribute it and/or modify it under the terms of the GNU Affero
# fiware-cygnus is free software: you can redistribute it and/or modify it under the terms of the GNU Affero
# General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your
# option) any later version.
# fiware-cosmos is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the
# fiware-cygnus is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the
# implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License
# for more details.
#
# You should have received a copy of the GNU Affero General Public License along with fiware-cosmos. If not, see
# You should have received a copy of the GNU Affero General Public License along with fiware-cygnus. If not, see
# http://www.gnu.org/licenses/.
#
# For those usages not covered by the GNU Affero General Public License please contact with iot_support at tid dot es
Expand All @@ -33,14 +33,14 @@
hadoopUser = sys.argv[4]
hadoopPassword = sys.argv[5]

# do the connection
# do the connection
with pyhs2.connect(host=hiveHost,
port=hivePort,
authMechanism="PLAIN",
user=hadoopUser,
password=hadoopPassword,
database=dbName) as conn:
# get a client
# get a client
with conn.cursor() as client:
# create a loop attending HiveQL queries
while (1):
Expand All @@ -62,4 +62,3 @@

except Pyhs2Exception, ex:
print ex.errorMessage

0 comments on commit c7545c1

Please sign in to comment.