Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port to Atomese API #8

Open
linas opened this issue May 17, 2022 · 87 comments
Open

Port to Atomese API #8

linas opened this issue May 17, 2022 · 87 comments

Comments

@linas
Copy link
Member

linas commented May 17, 2022

Change request, per discussions with @edyirdaw in pull req #6

Get rid of the RESTful API and port to Atomese. If you don't understand why, let me know, I can explain. If you don't understand how, I can explain that too.

There are three ways to port to Atomese.

  • One is to open a connection to the cogserver, and just send whatever scheme you want, and parse the replies. This allows you to do pretty much anything at all. If you open the connection in "quiet" mode, then parsing the results will be easier, as you won't get the extra markup to make it human-readable. This requires you to write a brand-new parser, in js, for Atomese. This would be all-new code, no one has done this before.

  • A second option is to do the above, but limit yourself to the StorageNode API. This is almost the same as above, except that it's faster, tailored for good network performance and ... prints no error messages, making it harder to debug. See https://wiki.opencog.org/w/StorageNode half-way down the page.

  • Either of the above would be written in js, and, because its generic, should probably be a brand-new js project, so that others can use it (and not just the explorer) I can create a git repo for you, to hold this new code.

  • Write a brand-new cogserver plugin to return json. This would replace the RESTful thing. The returned json would need to be designed to fix all the mistakes in the RESTful API. This might be easier to do (??) than the first two options above. The brand-new code would be in C++. I don't really recommend this option, but ... it wouldn't be that bad. The API would be the same StorageNode API, except sending and receiving json instead of s-expressions.

So basically, if you want to write js, pick the first few options; if you want to write C++, pick the last option. I can help with any of these.

@chris-altamimi
Copy link

chris-altamimi commented May 25, 2022

Hello Linas!

We're very interested in helping this with this. Our extensive experience in Javascript / Typescript with both Restful APIs as well as React based UIs should allow us to knock this out easily using option 1 or 2.

@Mforrest20 and I will likely get started on this later this week or early next week.

Edit:

What do you suppose is the better of the two options? I can recognize some of the benefits that one brings over the other such as option 2 having better maintainability and ease-of-development.

However, I would love to know what you would choose if you had the resources to accomplish any of these options?

@mjsduncan
Copy link
Collaborator

i'm not linus but i'm eager to use a good graphical interface to the atomspace! one possibility to consider is porting the metagraph rendering code here to this visualizer, which has a good interface (using option 1 i think) but only visualizes the scheme s-expressions.

there's also a guile module to parse json for options 1 & 2

@linas
Copy link
Member Author

linas commented May 27, 2022

Hi Chris!

That would be great! I'm not sure what I need to say to get things moving. Do you want to have a long conversation here in github comments? Or maybe chat ... there's an opencog server in discord, I sort-of-ish glance at it every now and then. The task is "easy" in that I know exactly what to do; it's hard only in that I would have to go over the details, so that you don't end up wasting time & breaking your head on dead-ends.

@linas
Copy link
Member Author

linas commented May 27, 2022

Oh, to answer your question:

I would love to know what you would choose if you had the resources to accomplish any of these options?

I'm neutral. Here are some pros-n-cons:

  • Option 3 is to write a cogserver plugin (in C++) to send-receive json. This allows "anyone" intersted in json to use it. I could even volunteer to write the code myself: a day or two to get the basics working, a day or two to polish it up, make it pretty, a day or two to write the docs. My only complaint is that this is about 5 days longer than I can easily spare...
  • Option 1 & 2 are "almost identical", require no new C++ code, no effort from me to make anything work. Mostly, I'd have to make sure that you clearly understand what the AtomSpace actually is, so that you know what to do with the returned data. Well, option 3 requires this as well: if you don't understand what the AtomSpace is, then it's ... tough going.

The key concepts are "what is an Atom" "what is a Value" "what is the incoming set" and "what is the outgoing set". Wrap your mind around those, I think you'll be OK ...

@Ontopic
Copy link

Ontopic commented Jun 6, 2022

Would love to have a go at exposing interacting with Atomspace through different web-based interfaces. A basic C++ implementation (Docker would be extra nice) for a JSON send-receive would I feel open up the possibilities of exploring the OpenCog universe to quite a few interested developers to build cool stuff on, not just me.

@chris-altamimi
Copy link

chris-altamimi commented Jun 7, 2022

I am a fan of the C++ implementation only because we can rely natively on the C++ api rather than having to develop a completely separate parser for the atomspace. I would expect this to be relatively more cumbersome and require extra work to maintain if the atomspace is extended or edited.

However, at the same time, I feel that the JS approach would open things up to UI focused developers since they tend to favor JS/TS ( React, Angular, Vue etc etc ), which I feel could very likely make a much more elegant and user friendly visualizer if they have a javascript API to work with.

On top of this, I think that there would be great benefit to having an Atomese Parser outside of the visualizer such as analytics outside visualization.

@Ontopic are you saying you could likely pull something together in C++ fairly quickly? If this is the case then I don't want to step on any toes, please feel free to take over. As we've just now started to piece together a js approach.

Ultimately, I'd rather approach with the best solution for the community.

@chris-altamimi
Copy link

chris-altamimi commented Jun 7, 2022

Also,

@linas We will join the discord server, however I think a conversation at this point might be premature and result in wasting your time, which I am keen to avoid, given your noted lack of time to spare.

I truly would like to hear everyone's opinions regarding the best approach.

While I feel I have the expertise to tackle this issue ( heavy experience in both C++ and JS ), I think the experience using OpenCog is invaluable, and thus I am all ears to those with more of it than I.

As soon as we land on a choice, then I feel we would be ready to have a discussion on your requirements, constraints and anything else you wish to share about the matter at hand.

@linas
Copy link
Member Author

linas commented Jun 7, 2022

I could knock out a basic cogserver plugin (written in c++) in about a day or two, if you're serious about using it and ask me nicely. You would use it by opening a socket to the cogserver, sending any one of about a dozen rather simple JSON snippets, and then getting back a (usually large) answer, in JSON.

I'm offering to do this mostly because its a cut-n-paste of an existing plugin, which changeups for handling json instead of s-expressions. If course, you could do this cut-n-paste job too. It's not hard.

I've failed to look at discord for a few weeks. Ooops! Looking now.

@Ontopic
Copy link

Ontopic commented Jun 9, 2022

My C++ skills are sadly limited, especially in an unfamiliar codebase. I whole heartedly love all OpenCog is doing and would love to use it in an interactive way. I'm sure there's a lot more people that would be able to grasp the concepts of Atomspace when given a proper in browser visualizer / editor (repl) in a modern UI.

If this JSON interface would be here, I'd be building an editor and Atomspace playgrounds right now

@Ontopic
Copy link

Ontopic commented Jun 9, 2022

So, kind sir linas, if you might find yourself in a position with some time to throw at this, and in the mood to do so, I hope you do! Thanks for all your efforts anyhow. Was following the issue in the REST repo, I feel it was a little unfair how you were treated, still, with this out of the way, next time you can say "you can implement the REST yourself, based on the JSON interface" ;)

@linas
Copy link
Member Author

linas commented Jun 10, 2022

Huh. Well, looking around, it seems that a bare-bones json interface already exists. See the README here: https://github.com/opencog/atomspace/tree/master/opencog/persist/json It's maybe enough to get you started. I will poke around a bit today, and try to add some of the missing features.

More generally, this is all kind of useless, unless you have some specific dataset you want to work with. I can quickly and easily offer two: one is a gene+protein dataset, another is a language dataset. You can run them locally, or I can set up a remote server running these.

@linas
Copy link
Member Author

linas commented Jun 11, 2022

Added more stuff w/ pull req opencog/atomspace#2952

Notes:

  • The API is pseudo-JSON -- if it looks weird or isn't pure-json-enough or is otherwise broken, open a bug report. (or send a pull req)
  • There might be bugs. I hand-tested case by case but was too lazy to write unit tests. Yes, skipping unit tests is bad coding practice. Today, I made an exception.

@linas
Copy link
Member Author

linas commented Jun 11, 2022

I'll give write permission to https://github.com/opencog/atomspace-js to anyone who wants to create javascript wrappers for this thing. ... or anything else.

@Ontopic
Copy link

Ontopic commented Jun 12, 2022

Thank you! Will see what I can do about those wrappers.

Apart from exposing the JSON to some JS apps (node deno, vanilla, React Typescript, Vue et cetera), I've been meaning to do a WASM (AssemblyScript) / Neon (Rust for Node, https://neon-bindings.com/) implementation, that might (although for now inefficient with the JSON interface) perhaps be a good starting point later on to create "real bindings" to JS, that can truly integrate with the backend (at which point I am gonna need help again), but at least the "exposed" functionality will be clear. So hopefully will be easier for someone with knowledge of the OpenCog codebase to integrate smoothly. Perhaps looking into running the Python bindings in PyScript (yes, Python in your browser is also very real) might also be interesting. Switching the storage to sqlite probably one of the first things I'll look into once I get closer to the "core" ;)

@Ontopic
Copy link

Ontopic commented Jun 12, 2022

I might have overlooked it during my quick scan, but did you get a chance to put any of the datasets up somewhere? To me the language dataset would be the most interesting.

@Ontopic
Copy link

Ontopic commented Jun 12, 2022

While I'm at it anyway; what is currently the recommended method of hosting the server?
I saw some instructions on https://github.com/opencog/atomspace/blob/master/opencog/persist/json/README.md#network-api
But the amount of different (outdated) Docker repos and install instructions with a certain order is starting to get a little confusing. What would be the quickest way to get a backend with JSON api rolling?

I feel like such a spoiled webdeveloper right now 😅 Once the setup is clear to me I'll contribute a docker-compose to start things together with the JSON API

@Ontopic
Copy link

Ontopic commented Jun 12, 2022

And about those unit tests; the "frontend" will test against the JSON API, so that should also be a good point where we could connect the "backed" and "frontend" in the future, making not writing tests yourself in this case an efficient exception. What working functionality I can extract from your backend, I'll put in tests, functionalities I wish I'd had, would be broken tests that hopefully someone could assist me with at some point ;)

I consider this a good starting point for bringing Atomspace closer to the browser and an interactive experience, of course far from perfect for now. Really hope I can assist with my type of knowledge to get things there, just need some help getting things clear on the required backend for now. Thanks, truly, again for the effort that was put in already.

@Ontopic
Copy link

Ontopic commented Jun 12, 2022

TLDR;

  1. Was there a dataset made available for testing?
    Language one would have my preference
  2. Is there a good "starting" setup for an OpenCog server right now, that would work correctly with the JSON-api and expose all of its available integrations?
    Preferably Docker-based, but I will bring things together in a docker-compose setup myself anyway, including storage [Postress vs RocksDB?] and other services needed to quickly get a JSON-api exposed, so I might as well write clean Dockerfiles myself with the instructions from the (right) set of repos that are needed
  3. Thank you and the OpenCog contributors

@linas
Copy link
Member Author

linas commented Jun 12, 2022

But the amount of different (outdated) Docker repos and install instructions with a certain order is starting to get a little confusing

I only know of just one docker repo, and yes, it is outdated and unmaintained.

Where are you finding install instructions (that seem confusing or outdated)?

The process shouldn't be that hard: install the atomspace, and then install the cogserver, and I think that's it. Then run the json instructions.

I'm trying to figure out where the datasets are now.

@linas
Copy link
Member Author

linas commented Jun 12, 2022

There's a language dataset at r13-all-in-one.tar.bz2

@Ontopic
Copy link

Ontopic commented Jun 12, 2022

Alright, the docker repo seems so outdated and geared toward specific (abandoned) projects that I'm not sure anymore what is really required, and what was once useful and what is now worth to perhaps add. Then the "direct install" assumes knowledge on things as e.g. guile setup. F.e. a question for me, would it be worthwhile to include Moses into a docker-compose when it comes to the JSON-api? I'm sure I'll figure these things out along the way, and your confirmation on "atomspace, then cogserver" is already helping me a lot on feeling I'm on the right track, yet, as said, it's far from a quick-start. So any other "pro-tips" as a rough guidance on where to look and what to keep in mind are very much appreciated. All I wanted to convey ;)

Hoping to contribute my efforts into making that quick-start for other newbies though, so why I'm somewhat allowing myself to ask stupid questions to hopefully get things right. I'll post the docker-compose with JSON-interface integration with basic frontend integrations here for early review once I have something. Is there a preferred platform / channel to discuss any issues / demos along the way?

@Ontopic
Copy link

Ontopic commented Jun 12, 2022

Perhaps just archiving repos that are no longer in use and mentioning DEPRECATED on their wiki pages could really help others getting started quicker a lot aready? Currently it is unclear to me which pages / repos are essential to the foundation of OpenCog and which are forgotten experiments. Perhaps a docker-core repo? To get a quick repl and some sample commands to load in the all-in-one dataset quickly. The try-at-home and move-to-production-later starter

@linas
Copy link
Member Author

linas commented Jun 12, 2022

There is a learning curve. This isn't a chatbot, where you open a window and chat to it.

Docker is a pain in the neck, LXC is a zillion times better.

MOSES is NOT needed.

For general use, you'll want to have the RocksDB backend atomspace-rocks

For language, you'll also need lg-atomese

For bio data, agi-bio

There's several steps to load the datasets (I have some automation scripts for this) and then there's even more to explain what the heck is in them. I can provide private access to a running server, with everything preloaded, if you email me.

@Ontopic
Copy link

Ontopic commented Jun 12, 2022

I know it isn't a chatbot 😅 I know it's a superior system for modelling knowledge. Why I mentioned "I feel like such a webdeveloper right now". Implying; I expect an npm install and npm run all-the-things with 100's of examples of different implementations.

In the consumer market "knowledge-bases" are a hot topic though. I feel OpenCog is in the right position, with some extra tooling, to simplify entry into being able to contribute "knowledge" into a solid structure [with the contribution of other "AI's"], like OpenCog offers, while working on their personal knowledge base and a shared one.

@Ontopic
Copy link

Ontopic commented Jun 12, 2022

Oh, and thanks again! I'll e-mail you for now, to see where you'd like me to post in the future.

@linas
Copy link
Member Author

linas commented Jun 12, 2022

Perhaps just archiving repos that are no longer in use and mentioning DEPRECATED on their wiki pages could really help others getting started quicker a lot aready?

This is currently the case. There are several dozen repos in "active" use, depending on the faction who is interested in and using them. Despite this, not all of these are maintained.

Currently it is unclear to me which pages / repos are essential to the foundation of OpenCog and which are forgotten experiments.

Only one is central: the atomsspace. However, you need another half-dozen before you get sometihing useful. Which half-dozen depends on the specific project. There are/have been projects in biology robotics, perception, game-playing, 3D virtual worlds, language learning, chatbots, visualizers and more. I can't maintain all of them. I only maintain the stuff that I currently need and use.

Perhaps a docker-core repo?

Someone would need to volunteer to do that.

To get a quick repl and some sample commands to load in the all-in-one dataset quickly.

No. Go through the tutorials first. After that, we can have meaningful discussions about what's in those datasets. The data is complex and structured. For the language data, you have to be interested in linguistics, and grasp some of the basis of grammar, before any of it will make sense. That's a much bigger task, than trying to create a "visualizer"

The try-at-home and move-to-production-later starter.

These are science data dumps. There's no particular "try-before-you-buy" in large dataset dumps. I can explain what's in there, but ... (a) its complex and (b) it makes most people yawn. Like all large datasets, its all just ones and zeros and is unrewarding without putting in elbow grease.

@Ontopic
Copy link

Ontopic commented Jun 12, 2022

I'm playing the devil's advocate here, I believe I understand the complexity that brings OpenCog is part of it intrinsic value. Yet, there's a lot to be exposed to onboard a larger crowd, in which regard I feel OpenCog may be failing compared to what it has to offer compared to inferior system, that did focus on presentation more than on content. Feel it's worth a shot trying to blance things out!

@linas
Copy link
Member Author

linas commented Jun 12, 2022

I expect an npm install

Well, there was a debian maintainer who had built all the packages, and so you could say apt get and everything would install. But these haven't maintained...

npm run all-the-things with 100's of examples

Have you ever played with a database? You download it, and install it, and ... then what? Where's the data? Well, databases don't come with any, and that makes them rather wonky: you have to know why you need one and what you plan to do with it. The AtomSpace is a kind-of database.

In the consumer market "knowledge-bases" are a hot topic though.

They're "hot" mostly because there are dozens of small and medium corporations with marketing departments trying to sell you services that can only be solved with whatever knowledge-base they're peddling. Cough up enough $$$ and it will solve your problems.

The core issue here is the AtomSpace doesn't have a startup, a corporation with a marketing department, doing outreach to cloud developers. So there's no "buzz" -- no one is talking about it over lunch, and saying things like "I use it at work too, my boss said that ..." There are demos, but there's no "visual bling" -- without a visualizer, without some snazzy app that runs on your cellphone, its all very ... underwhelming.

And the stuff that is well-maintained and polished, it's got a steep learning curve that scares everyone off after a week or two.

@mjsduncan
Copy link
Collaborator

mjsduncan commented Jun 15, 2022 via email

@edyirdaw
Copy link
Collaborator

edyirdaw commented Jun 15, 2022

To make explicit the minimum requirements needed to make the initial launch of atomspace-as-a-service,

  1. Adding user account functionality (which has been duly noted by ontopic already). This helps to keep track of which atomspace belongs to which user.
  2. Have a UI component that enables users to populate/work on the atomspace. It would be something similar as found in https://github.com/opencog/cogprotolab, specifically the interactive window at the left hand-side.

Needless to say, these suggestions are open for discussion.

@mjsduncan
Copy link
Collaborator

mjsduncan commented Jun 15, 2022 via email

@linas
Copy link
Member Author

linas commented Jun 15, 2022

I just scrubbed all the docker containers, fixing all blebs and failures and misconfigs. Use git pull to get the latest (from https://github.com/opencog/docker) You can then build a fully working container holding language data in it. This should be enough to do bona-fide javascript development to interface to the atomspace. Build this container as follows:

cd opencog
./docker-build.sh -l -u

Above will take hours and will build a half-dozen containers. Then cd lang-model and read Dockerfile for further instructions.

Replies to various comments:

300gb ram

A running docker container holding empty AtomSpace seems to be taking about 90 megabytes, on my system. Atoms take about 1KByte to 2KByte RAM each. Think of an Atom as a graph vertex or edge, plus pointers to nearest neighbors, plus any names, tags and annotations on it (lists of floats, bag of key-value pairs etc.) Usually, there are a few huge atoms, but most are just tiny. Some are mid-sized. On average 1KB to 2KB each, so a million Atoms works out to 1-2 GBytes RAM. The tutorials never create more than a few dozen Atoms. You have to be doing some pretty serious data-mangling to get up to a million. (The docker file above has a little over 9 million Atoms in it, and is using a little more than 9GB RAM)

Exploring running parts in the browser (through WASM, AssemblyScript, PyScript et cetera)

I have no clue what this means. Can't see it would be relevant.

My original idea ... exposing a "repl" (guile) through the JSON-API and perhaps a TTL port open as well, to interface with over a websocket/rtc

Not needed/already done. You can run a guile REPL in most of the containers, as you wish. Or a python REPL. (Python might be bit-rotted.) The cogserver provides telnet ports: after connecting to the cogserver, multiple users can all safely plink on the same atomspace, at the same time, using scheme, python or the psuedo-json API. using any or all of these. When you're connected to the cogserver, the stats and the top command tell you what those users are doing, and other geeky stuff.

The cogserver does not (currently) support websockets/rpc mostly because we've dinked with that in the past, and it didn't work well, and was a performance de-accelerator. A bottleneck. (There's a benchmark suite in one of the git repos). Maybe someone real smart and with a lot of time to performance tune can do better, but I'm not holding my breath.

The pseudo-JSON API that I added to the cogserver is never going to be as efficient as the sexpr API, mostly because its is a lot more verbose. JSON requires 2x or 4x more bytes on the wire, than the sexpr API. So, its very usable for visualization, but I would strongly discourage it for bulk data transfer.

The pseudo-JSON API is also cough "experimental". I kind of expect you guys to try it out, and then come back and point out all the broken and ugly bits that need fixing. I don't use JSON, so this is just a crude attempt to make it work.

@linas
Copy link
Member Author

linas commented Jun 15, 2022

Some more replies:

edyirdaw on multiple users

Just launch a brand-new docker container for each user. They can then do whatever, however, without fear of blowing anything up. Just limit each container to some max allowed disk and RAM use, some max allowed CPU time. I assume that some popular cloud tools, maybe kubernetes, do this stuff automatically. So, in short: no modifications to opencog needed for this stuff.

However, an old, old requirement for agi-bio was to allow multiple users to share one huge giant dataset, but then each one can do small, private deltas on the dataset, without corrupting anyone else. In the years since this requirement was first stated, I've gotten older and wiser, but so has the code base, so we can talk about this again, just not here, some other thread. Maybe some agi-bio or annotation-scheme bug report.

queries in parallel

Anything using the cogserver can run anything in parallel.

minimum requirements needed to make the initial launch of atomspace-as-a-service,
Adding user account functionality

I assume kubernetes or some other cloud whiz-bang thingy does this automatically. We should not need any changes to opencog to enable this, right?

We do need someone who knows cloud tech to provide detailed HOWTO instructions. I mean, I have no clue, but there are thousands of websites offering stuff-as-a-service, so presumably, its "well known". Just need a HOWTO for it.

Have a UI component that enables users to populate/work on the atomspace

Well, yeah. That is what the code in this repo, right here is supposed to be doing.

https://github.com/opencog/cogprotolab

If you aim that thing at the docker container I give above, it should "just work". FYI @contrast-zone

@contrast-zone
Copy link

contrast-zone commented Jun 15, 2022

Hi all, I'm cogprotolab maintainer.

I just skimmed through the lengthy conversation.

Here, I wanted to say that anyone can contact me to adjust cogprotolab to meet any requirements needed. I'm interested in more people using it.

[Edit]
I may also contribute to your projects with that left-pane of interest, if it is all you'd need. Currently the pane bases its functionality on PHP, but it may be ported to node.js if necessary. It's just about telnet interface to cogserver.

@chris-altamimi
Copy link

chris-altamimi commented Jun 15, 2022

I assume kubernetes or some other cloud whiz-bang thingy does this automatically. We should not need any changes to opencog to enable this, right?

This is absolutely correct. I just started running the docker builds you updated @linas, and I can confirm when it is complete and I can run them, but as long as we have docker containers then tools such as Kube will allow us to spring up a service and all of its required components in seconds without having to do much work.

The only concern remaining there would be the UI and the Role based access control. Ideally we want users to be able to do things but not EVERYTHING. Otherwise we'd be opening ourselves to some serious vulnerabilities. But again this can totally be done with Kube's native features and tools.

We do need someone who knows cloud tech to provide detailed HOWTO instructions. I mean, I have no clue, but there are thousands of websites offering stuff-as-a-service, so presumably, its "well known". Just need a HOWTO for it.

I think I'm the someone for this. I run a cloud consultancy, so this is pretty much exactly what we do on a daily basis! However here I'm very much willing to donate my time. ( Not trying to sell anyone anything! )

Are we looking for a HOWTO regarding the whole paradigm as in running everything on the cloud from infrastructure to UI? Or are we looking for a HOWTO that focuses more so on turning opencog into Software-as-a-Service in terms of software arch? Or both? The answer here would allow me to put together a HOWTO much more quickly and ensure I cover all of the right topics / provide the right amount of detail.

Provided I have relatively clear answers, I can get us a doc put together by the end of the week.

@linas
Copy link
Member Author

linas commented Jun 15, 2022

UI and the Role based access control.

I don't really understand. For beginners, mostly all that one needs is a sandbox with limited RAM, disk, cpu, but otherwise they can do anything in that sandbox. There are two modes of operation, here:

  • Users get a cogserver port, only, nothing more. If they do something to crash the cogserver, the network port is gone, they've lost their connection, and the docker container has a crashed network server. End of story.
  • Users get a command-line into the docker container. Yes, this is more dangerous, since maybe a good hacker can break out of the docker container. Right now, I don't see any particular need to grant this to anyone.

HOWTO regarding the whole paradigm as in running everything

Good question. Basically, if/when you disappear, the next person who is interested in this should be able to follow the instructions, and have something working. Maybe the instructions start with "get an AWS account and install xyz". Assume little prior experience...

turning opencog into Software-as-a-Service

It's early to discuss this, but we can start. Right now, we have three or four use-cases.

  • The agi-bio guys have some kind of web-browser-based system that allows microbiology scientists to log in and do something with some data. All of their UI and infrastructure is cobbled together, and not designed to be useful outside of the genetics world. Their plumbing is tied into the singularity.net service infrastructure. (and I don't know how any of that works)

  • I'm doing "data science" with a bunch of scripts and code, and I like my setup and don't plan to move. It's publicly available, but I have no datasets that are ready that can be published to the world. If I had such datasets, then things could get interesting.

  • The Rocca guys. (specifically @ngeiswei ) They've got a minecraft agent. I have no clue if they would need or want a cloud instance to do... whatever it is they do.

  • general singularity-net activities. They have an AI-as-a-service marketplace, with service contracts and payment processing done on block-chain. I know nothing more about it. Supposedly it works, with some suitable definition of "works".

  • There are random projects that bubble in and out of existence, with various users, but nothing sticks out.

What might the future hold? Basically it becomes interesting when large datasets show up, and people want access to those datasets. Except for bio, we're not there yet. We meaning me.

@Ontopic
Copy link

Ontopic commented Jun 15, 2022

A lot to read! I'm slowly and carefuly reading every message and continue to build on my setup.

To not delay others; if there's anyone that feels like they could do something useful to assist all here in getting a setup for a public OpenCog server sooner (like a bare-metal setup); let me know. We can work out SSH access.

@edyirdaw
Copy link
Collaborator

edyirdaw commented Jun 16, 2022

UI and the Role based access control.

I don't really understand. For beginners, mostly all that one needs is a sandbox with limited RAM, disk, cpu, but otherwise they can do anything in that sandbox. There are two modes of operation, here:

Users get a cogserver port, only, nothing more.

Linas perhaps i see your point of view in not seeing the need of UI and access control etc here. It seems you are focusing on making the cogserver available for experimentation. I was assuming that we make the atomspace explorer available for people to access an instance of an atomspace (newly created or previously stored). Then, via a UI (similar to found in cogprorolab) users can get access to the cogserver indirectly (or actually to the python restapi endpoint which the current version of the atomspace explorer relies on). I know you dont like the restapi interface much in its current version, but to get things going and launch this service online soon enough , i think its better to use it now. It can always get improved/substituted later. User account functionality is necessary also to keep track of past atomspaces the user has worked with. I dont see any other way this can get done with. It would be cool to let people work with multiple atomspaces, attached to their accounts (but this seems to be for the future, we can start work with just one atomspace).

minimum requirements needed to make the launch of atomspace-as-a-service,
Adding user account functionality

I assume kubernetes or some other cloud whiz-
bang thingy does this automatically. We should not need any changes to opencog to enable this, right?

I dont think there is an automatic solution that i know of, others can correct me. Even if there is, we still need to write gluing code to the atomspace explorere UI. Yes, we dont need to change anything to opencog to make this work, except at the UI side of the atomspace explorer to add user account functionality and corresponding server side code which handles account based access; the server side code launches and serves a python restapi atomspace service. An example for such a service might be https://github.com/opencog/atomspace-restful/blob/master/examples/restapi/start_restapi.py. If we are to use this, we can send scheme commands to this endpoint; i have tested it and it works. The scheme command would be sent from the user via the web UI. If part of the server code tracks which atomspace to use, then this can do the job of serving multiple users.

Just launch a brand-new docker container for each user. They can then do whatever, however, without fear of blowing anything up. Just limit each container to some max allowed disk and RAM use, some max allowed CPU time. I assume that some popular cloud tools, maybe kubernetes, do this stuff automatically. So, in short: no modifications to opencog needed for this stuff.

I still think launching a new docker container for each user is inefficient wrt resource usage, especially if the service becomes popular. A web app with a UI frontend that incorporates the atomspace explorer would do the job perhaps, with a single docker container. But i leave this implementation decision to others.

In any case, no modifications would need to be made to opencog, except the UI part of the atomspace explorer.

@linas
Copy link
Member Author

linas commented Jun 16, 2022

Hi @edyirdaw -- You raise many points, and it will be difficult to respond to them. I'll do as best as I can.

Linas perhaps i see your point of view in not seeing the need of UI and access control etc here. It seems you are focusing on making the cogserver available for experimentation. I was assuming that we make the atomspace explorer available for people to access an instance of an atomspace (newly created or previously stored). Then, via a UI (similar to found in cogprorolab) users can get access to the cogserver indirectly

Yes, this is correct. Potentially, users can also be given access to the Docker bash prompt, but this is outside of the domain of opencog. This is a decision determined by non-opencog factors.

I know you dont like the restapi interface

It's not a matter of "not liking". It's broken. It doesn't work. No amount of blaming and shaming me, and trying to make me feel miserable or sad is going to cause me to attempt to fix this p.o.s.

User account functionality

If you or anyone else wants to build "user account functionality", that is fine, but it is outside of the core atomspace.

let let people work with multiple atomspaces

Multiple atomspaces work great. All unit tests pass. If you or anyone doesn't like some aspect of that, I'll accept bug reports, complaints and suggestions.

If by "user accounts", you mean you want to change the read-only flag so it is 100% secure and hacker-proof, that would be extremely hard. It would require systems programming, and we have no systems programmers around. By "systems programming", I mean someone experienced with capability resources. So, for example, to mac the atomspace, you could shmat the read-only part of the atomspace, set the cgroup on that segment to make it read-only, thunk the pointers, and hand over the compsec of managing the cgroup to the authentication subsystem. There's no one involved with opencog that knows how to shmat or mac or cgroup anything. mac==mandatory access control. Adding mac to the atomspace would be a major technical challenge.

I dont think there is an automatic solution that i know of

Holy cow. There are hundreds of thousands of websites that allow you to create a login and a password. This is very highly automated stuff. There must be dozens of open-source systems that automate user-account management.

launching a new docker container for each user is inefficient wrt resource usage,

Not true. The cost of starting a docker container is maybe 0.1% of the cost of running a cogserver. It's almost nearly free. The whole point of docker (or other containers) is that it provides mac (mandatory access control). Containers were designed by systems programmers who know about compsec. It took them about ten years to get it right. Torvalds was personally involved for much of the mac design. (I once worked on lomac) Do not underestimate the sophistication of this code. It is arguably the most important advance in computing in the 21st century.

no modifications would need to be made to opencog

That's correct.

except the UI part of the atomspace explorer.

Yes.

@edyirdaw
Copy link
Collaborator

If by "user accounts", you mean you want to change the read-only flag so it is 100% secure and hacker-proof, that would be extremely hard.

I didnt mean that, just web UI facing user accounts, which you already clarified upon.

@Ontopic
Copy link

Ontopic commented Jun 20, 2022

@edyirdaw (and linas of course, but I get a sense edyirdaw is more "frontend" oriented) you up for seeing what we can achieve with the wise words of linus and a 300gb ram, 92 core server? Perhaps there's someplace you idle where we could have a chat when stars align?

@Ontopic
Copy link

Ontopic commented Jun 20, 2022

The pseudo-JSON API that I added to the cogserver is never going to be as efficient as the sexpr API, mostly because its is a lot more verbose. JSON requires 2x or 4x more bytes on the wire, than the sexpr API. So, its very usable for visualization, but I would strongly discourage it for bulk data transfer.

The pseudo-JSON API is also cough "experimental". I kind of expect you guys to try it out, and then come back and point out all the broken and ugly bits that need fixing. I don't use JSON, so this is just a crude attempt to make it work.

This is why suggested looking into partially moving computations to the client side (if at all possible) through existing bindings and WASM (Python, Golang, Rust, Sqlite with Postgress links, kids are going crazy these days). Atomspace leans itself quite well for a (Jupyter, MyST, ObservableHQ) notebook setup, to explore things with, having a local (in-browser) "WASM-server" and a (open-)server with Docker (setup with a container per user or shared to be determined) could perhaps mitigate soms of the latency? Or is this not worth spending any more time on in your opinion, @linas?

@Ontopic
Copy link

Ontopic commented Jun 20, 2022

I'm also thinking; would an "eval" ran on a Docker container with Atomspace be so bad? So basically a TCP proxies through a websocket, to write sexpr. With a secured container per user that would be, of course.

@Ontopic
Copy link

Ontopic commented Jun 20, 2022

I don't see firing up a container per user as much of an issue during the exploration phase of this, given linus said it should be possible to keep the ram usage around 100mb for basic setup. For me storage is more of a concern; to offer a user a container as a playground is one thing, to offer them the ability to store their data and take responsibility for that another. In the end sqlite is by now quite a solid storage mechanism, that can be used in basically all environments, and could allow users to store their "work" locally. Making the whole thing a lot easier to "grasp" (we won't even really need user accounts anymore if users store their own data in whichever way they like, just basis checks to prevent abuse of the server (which let's be honest, is not gonna be a major threat in the beginning), so as long as it's just running containers that are made to fail; quick setup should be possible, right?)

Any advice on how to start up / recycle containers for possible gains (with different datasets) would be appreciated! Having the user wait for a while to get a container running something that definitely should be addressed at some point, but for now to win some souls for Atomspace shouldn't be a deal-breaker.

@linas
Copy link
Member Author

linas commented Jun 20, 2022

get a working setup

There's a fully working container with natural language data, here: https://github.com/opencog/docker/tree/master/opencog/lang-model It contains about 8GB of simple language data.

A small sample (under 1GB) of the bio data is here: https://github.com/opencog/benchmark/tree/master/query-loop

allow users to store their "work" locally.

This can already be done, using the "StorageNode" mechanism. The tutorials show how. More on this below.

moving computations to the client side

Depends on what your computations are. If they don't need much CPU and also fit into the RAM that's available on a cellphone or tablet, sure. Some computations are CPU intensive and require lots of RAM.

WASM

I get the feeling you're still tripping up on the general idea. Here's some ASCII art:

 +-----------+
 |           |  <---internet--> My AtomSpace
 |  Server A |                      ^  ^
 |           |        +-------------+  |
 +-----------+        v                v
                 +----------+   +-----------+
                 |          |   |           |
                 | Server B |   |  Server C |
                 |          |   |           |
                 +----------+   +-----------+

See the part that says My AtomSpace? That could be running on your cellphone or tablet. The box with Server A in it could be a docker container with data. Server B could be something your girlfriends BFF runs. And Server C could be the flash drive on the cellphone. To save data locally, to the cell phone, you put AtomSpace Atoms into that (using the RocksStorageNode - see tutorials) It survives power-off and power-on.

Everything above already works, today! The code is written, unit tested.

So if you want to do computations on the cell phone, well, yes, do that: My AtomSpace is running on the cell phone, it's a full complete AtomSpace, its got all the features in it. Do whatever with it. And save locally to the flash drive. No network connection needed.

We do NOT have any WASM bindings to the AtomSpace. Someone would need to create that.

We do not have any instant automatic "install on cellphone" scripts. Several people have done this by hand, in the past. One person tried it recently, they got stuck on a bug, not sure what's going on.

@Ontopic
Copy link

Ontopic commented Jun 20, 2022

I think I get more than you presume, just exploring the edge-cases of what's possible. Not presuming I understand anything is the best way to get a clear explanation like I'm getting right now ;). But before I truly make the claim I do understand, I will study everything a few more times a bit more carefully. For now, just had to say; your explanations are besides from very thorough and clear, at least really cracking me up with, let's call it, dry-humor. Once we get to the point of running things in a way where we can write some notebooks / interactive guides I hope you will assist in writing!

@Ontopic
Copy link

Ontopic commented Jun 20, 2022

I noticed the activity in https://github.com/opencog/atomspace-cog, is it better to move further discussions there or fine to keep it here?

@edyirdaw
Copy link
Collaborator

You up for seeing what we can achieve with the

Yea, I can help in any way I can, as I find the time hopefully :)

This is why suggested looking into partially moving computations to the client side (if at all possible)

I say lets first try it out and see if there is latency. Then, we can figure out a quick/cheap way to solve it, if thats the issue in the first place.

We need to have clear requirements. Do we make users write sexpr and see them rendered in the atomspace explorer? Or, you just want them to play with the cogserver via json rest api?

I suggest all data for users to get stored on the server for simplicity of starting out with things quickly; its just my suggestion I would like to stress. For that, we need to identify users with some user accounts functionality.

@Ontopic
Copy link

Ontopic commented Jun 20, 2022

So, I understand the different ways to get Atomspace running, but how to supply that as a service to others and how everything truly interacts together behind the scenes is what I'm trying to get clearer. Hopefully bringing it to a setup that allows others to supply open-servers as well in a responsible manner.

@Ontopic
Copy link

Ontopic commented Jun 20, 2022

Yea, I can help in any way I can, as I find the time hopefully :)

All I could hope for. Is there a place you idle or somewhere I could reach you, to reply when suits you?

@Ontopic
Copy link

Ontopic commented Jun 20, 2022

Perhaps a solution like https://github.com/osohq/oso could work for user-roles? It should allow for migrating to different storage systems for whatever reason later on as well. I've a few other options, but again, not truly sure where to have discussions about things like this 😅

@edyirdaw
Copy link
Collaborator

edyirdaw commented Jun 20, 2022

So, I understand the different ways to get Atomspace running, but how to supply that as a service to others and how everything truly interacts together behind the scenes is what I'm trying to get clearer.

Ontopic based on what you wrote above, I say, let our initial requirements be as follows

  1. We have user accounts functionality that identifies users and keep track of the things they run on the server. For now, no storage locally on the user side. We can also allow them to run stuff by using cookies or something similar that can keep track of their session as some web apps do, but things would be lost as the session expires in time, unless we refersh it intentionally. So, perhpas having user accounts functionality is best here.

  2. Give them web ui to interact with the cogserver perhaps

  3. Visualize the results with the atomspace explorer and/or with the cogprotolab explorer. We need to figure out how to synchronize features 2 and 3. cogprotolab has already this feature on its left side pane. We might need to do something similar for the atomspace explorer.

For (1) we need some initiall work obviously. In general, to be able to launch this service, there might be need to do some code development and some
experimentations. So, it would definetly be some work IMO.

All I could hope for. Is there a place you idle or somewhere I could reach you, to reply when suits you?

Do you have any suggestions? I think you are referring to private chats?

@Ontopic
Copy link

Ontopic commented Jun 20, 2022

Check on the three steps! I think I have more than enough to achieve most of them.

Do you have any suggestions? I think you are referring to private chats?

Yes. In the end having this discussion in the "open" is probably better, to have something to refer to when someone starts asking the same annoying noob-questions in the future ;) But for getting things running, sharing SSH access with you would probably help a lot, so for that I think we need some way to contact each other privately.

@linas
Copy link
Member Author

linas commented Jun 20, 2022

Is there a place you idle

Here's a discord invite. It seems to be the place where people hang out these days. https://discord.gg/yhzuufvu

Do we make users write sexpr and see them rendered in the atomspace explorer? Or, you just want them to play with the cogserver via json rest api?

I take this to be a question about the browser GUI interface. Yes, there should be a way for users to get a panel holding a console prompt, where they can type in arbitrary sexp's. No, they should NOT be able to get a panel to type in arbitrary json api stuff. (for many reasons...)

However, there should also be menus and buttons that allow many basic things to be done: e.g. get the incoming set of the selected atom, and draw that. (since, normally, it wouldn't be drawn..)

@Ontopic
Copy link

Ontopic commented Jun 20, 2022

Here's a discord invite. It seems to be the place where people hang out these days. https://discord.gg/yhzuufvu

Just noticed that when I was rereading from the top! Hope that @chris-altamimi and @edyirdaw can meet up there too, I'm on my way at least. Perhaps I'll request a separate channel, to avoid spoiling true Atomspace discussions though ;)

I dream of integrating it into a notebook like environment, such as https://curvenote.dev/, https://observablehq.com/@observablehq/introducing-visual-dataflow or https://myst-parser.readthedocs.io/en/latest/, but we'll see what works!

@Ontopic
Copy link

Ontopic commented Jun 20, 2022

The notebooks would techinally allow for writing your interactions with Atomspace in whatever language, be it chained JSON-calls or carefully constructed sexpr. The only true downside would be degraded performance and maybe not having access to all functionalliity sexpr would bring, correct? In a weird way I feel that to on-board non-developers, sexpr are gonna work wonders, but to on-board web-developers, we better give them things like React and proxy objects that can be chained. With these latest developments in notebooks, we could slowly work towards offering all though, in the same notebook!

Separate cells to collect the atoms to work with, and to visualise them. Work with global contexts, or just the local cell. Again, also part of the reason why I wanted to put the idea up; python and sqlite run just fine in the browser [yes, I read linas reply, and I understand the limitations, but for dreaming about a bright future; what could be done with that ;)]

@linas
Copy link
Member Author

linas commented Jun 20, 2022

Notebooks would be neat. Currently, I use LyX (TeX) as my word-processor, gnuplot to make graphs, and assorted hand-crafted scripts to do data processing. It works for me.

Unrelated to this are two other things. First, some people have the burning desire to draw graphs of the contents of the AtomSpace. That's what the code in this repo does. Take a look at the repo README again, and the screenshot there. The JSON channel exists for one and only one reason: to get the data from the AtomSpace, to the code in this repo, to draw that picture. There is no other reason for the JSON channel.

Then there is a different thing that @edyirdaw and @mjsduncan are interested in, and that is an interface for biologists and geneticists to use. These are people who don't have any interest whatsoever in the AtomSpace (or json or sexpers or scheme ...). They're scientists, not coders, they want to do cool things with gene and protein data. So the goal is to give them the dashboard to do those cool things. The existing code for this can be found at https://github.com/MOZI-AI

Now, to make ends meet: whenever a biologist wants something new that the UI does not yet do, someone has to write the scripts & stuff to do that. For MOZI-AI, its written mostly in scheme, and partly in Gnu R (because R is popular in bio work). For my language stuff, I write mostly in scheme. The neural-net/deep learning used to do all their stuff in python, before they wandered off to a different project.

So, in principle, the Web UI would need to be able to trigger scheme or python or R or bash scripts. Some of these can run instantly, in a tenth of a second. Some can take hours or days to run, and must run on a server somewhere.

The long-running jobs need a job-control dashboard, to start, stop schedule jobs, monitor them, see how they're going. I use byobu for that, but clearly that is not for everyone.

At any rate, I don't expect anyone to ever type in json by hand into some WebUI. That would be a fail, if that happened. Same for sexprs.

For complete beginners wanting to learn AtomSpace stuff, it would be nice to have the WebUI expose the scheme CogServer prompt; that would allow them to type in the example code and run them, and see what happens in the visualizer.

So that's like 4 or 5 different WebUI tasks. We should concentrate on one at a time. Perhaps getting the code in this repo running again would not be a bad idea. It's a start.

@Ontopic
Copy link

Ontopic commented Aug 5, 2022

Sorry for the lack of updates, I've been looking into a lot of different ways to achieve an integration of AtomSpace and modern web technologies. All a lot of material, so hope you can forgive me for putting the actual integration with OpenCog's software on hold for a bit.

Your last post is a very nice overview nonetheless and MOZ-AI brings me quite a bit of "code-inspiration".

Since you're very good at making clear what's nonsensical to try and what might work with a little bit of optimism, I hope to still pick your brain. Also, any post you feel like writing on this matter is very much appreciated.

To allow for a flexible storage solution for AtomSpace, to allow users to either store their data on a server, in a repository or locally, I think would be a very nice feature to use sqlite, but not sure if it is feasible. Giving datasette a scan I don't think would be a waste of your time. See the posts on running Python-in-browser as well for maybe getting some inspiration.

I've also been trying to find ways to onboard people quicker through quick ways to build up a starting point for their own AtomSpace. Through scrapers and different tools, f.e. GraphBrain, an initial graph could be produced by simply supplying a few documents. This is of course far from trivial, but I'm thinking possible, to convert into AtomSpace links.

Now, this is so far offtopic right now, feel free to disregard all of it, but whenever you're in the mood, know I appreciate all your ideas, feedback and criticism!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants