Re: notes on basic software

Coincidentally, I've been knocking Friedman this morning on T/S and HuffPo again; his latest column on China is inane even in and of itself and especially so in the context of his prior, silly predictions regarding that country. Which makes me think that we might want to implement some sort of advanced means of compiling dirt on pundits; Stein has also been thinking of working on some search/notation tools, so perhaps that might be something to think about as well.

On Sat, Jan 16, 2010 at 2:20 PM, Charles Johnson <charles@littlegreenfootballs.com> wrote:

Very interesting stuff.

By the way, I'm going to be making a long drive to Phoenix AZ to visit my mom today, and probably come back either late tomorrow or Monday, so if I'm a bit out of touch for a few days that's why.

CJ

On Jan 16, 2010, at 10:03 AM, Barrett Brown wrote:

Here are some general notes that Stein has composed thus far regarding what could serve as the core software behind our media project. He's going to provide some additional commentary on this a bit later today, I believe. We may start collaborating via Google Wave just for kicks. Incidentally, even if we're unable to get funding, this is something that Stein could do on the side fairly easily. Let me know if you have any thoughts thus far. I've cc'd Stein on this in case you have any questions for him in the meantime.

---------- Forwarded message ----------
From: Andrew Stein <steinlink@gmail.com>
Date: Sat, Jan 16, 2010 at 12:57 PM
Subject: Re: info fgt
To: Barrett Brown <barriticus@gmail.com>

So, these are teh notes I took fo rthe design document - you can skip the arch section, the abstract needs alot more detail in some areas and the unimportant stuff cut out. At the end is the 'workload' estimation.

Diagrams wont make much sense unless you copy this into a monospaced font ....

                                Ego

         Distributed Social Networking for John Q. Developer

ABSTRACT
______________________________________________________________________

Ego is a Content Management System built on an XMPP based adhoc social
network. The product itself consists of an XMPP server written in
Clojure, and a selection of HTML/JQuery widgets that implement simple,
embeddable, stylable functionality without templating - contact lists,
messaging, profile browsing, etc.

Adhoc

Ego is a social network in the truest sense - there is no hub, no root
server and no central authority. Like email (and XMPP, on which the
federation protocol is implemented), users on the Ego network are
identified by a username and domain, not simply username - Andrew is
now Andrew@ego.fm.   By abstracting identity to the domain level,
identity itself becomes domain independent, and common facebook style
social networking functionality becomes "adhoc" - Andrew@ego.fm and
Josh@analgoatsex.com can freely communicate as if they were on the
same site, despite Josh's apparent obsession with deviant sexual
fetishism.

Whitelabel

Ego comes unstyled and without any preconceived notions of the
developer's content schema -   it simply attaches common social
networking functionality onto an existing application or design,
through the adhoc network. Brands, groups and content providers can
build there own   site on the Ego platform   and leverage this
functionality across the adhoc network.

Simple

Ego is designed to be as flexible as possible with regards to design -
the backend is totally configurable, the front end is untemplated, and
core extensions can be directly integrated in Java, Scala, Clojure,
JRuby, Jython or Groovy - plus, there is nothing preventing developers
from simply building around core's functionality in any other language
...

INSTALLATION
______________________________________________________________________

Ego is built on Clojure, JDK 6+ and your choice of Backend (Redis and
Postgresql currently come built in). Step one should be to configure
your application. Ego's configuration is written in pure clojure
(log4j aside ... TODO), and is contained in the config.clj and
sql/{backend-of-choice}.clj files. By default, Ego will try to use a
Postgresql   instance   with   username "postgresql"   and   password
"password", and will serve a self-signed SSL certificate passworded
with "password" - you probably don't want these things configured as
such, unless you are just hacking around.

Once you have everything configured as you want it, install Leiningen
(http://www.github.com/technomancy/leiningen) and get started:

git clone http://www.github.com/texodus/ego
cd ego
lein deps
lein compile
lein-repl

... should bring you up a repl. From there, you'll find the following
commands useful:

;; Start the server
(org.ego/-main)

;; Initialize a blank schema
(org.ego.core.accounts/setup)

;; Create a new user
(org.ego.core.accounts/create-user "username" "password")

Configuration settings can be found in config.clj, and database
settings can be found in sql/{backend type}.clj

ARCHITECTURE
______________________________________________________________________

From a simplistic standpoint, the Ego platform can be viewed as an
XMPP server with a few custom XEPs, and a custom, modular web client
that is aware of these XEPs.   It should thus come as no surprise that
the Ego Network infrastructure is identical to the XMPP federation
protocol architecture (because that's what it is):

User <--JSON--> Node <------XMPP------> Node <--JSON--> User
                       ^                      ^
            |                      |
                    --XMPP--> Node <--XMPP--
                               ^
                               |
                               --JSON--> User

On the User side, the user requests the widget HTML and runs the
embedded JQuery - from there, the widget maintains an HTTP connection
with the server and passes data back and forth in JSON format. The
current implementation is via HTTP long polling - this should probably
be refactored to BOSH, though this will substantially increase the
complexity of the widgets (as   they will have to implement a
substantial subset of the XMPP client protocol, as well as share a
good deal   of message routing logic where   they are currently
independent).

On the server side, each node opens and maintains a (timeout and
max-connections restricted) XMPP channel with each additional node as
it queues messages for those nodes, OR it accepts incoming connections
from other nodes and replies to requests about its's local content and
user "state."

Each node consists of these two transports (JSON and XMPP), a core
module responsible for routing and user "state," and a datastore:

**********************
* Node    --------------> DB
*         |          *
*         v          *
*   ---> Core <---   *
*   |            |   *
*   v            v   *
* JSON        XMPP *
*** ^ ********** ^ ***
      |            |
      -> Internet <-

Core is responsible for maintaining user and (internal) content state.
The JSON and XMPP modules do not make routing decisions on their own,
nor do they request content directly from the datastore - instead,
these actions are encoded as messages to Core, which may then request
data from the datastore, request new client channels from XMPP or
simply reinject the content into the proper channel.

The datastore itself need be as flexible as possible to allow
arbitrary   development    against   potentially   pre-existing   (and
potentially of nontrivial complexity) aplpication stacks. JDBC allows
use to write CRUD functionality using a simply query flatfile.

       *****************
       * Core          *
   -----------   -----------
   |   *     |   |     *   |
JSON *     v   v     * XMPP
   ^   *     Queue     *   ^
   |   *       ^       *   |
   |   *       |       *   |
   |   *       v       *   |
   -------- Routing --------
       *       ^       *
       ********|********
               |
               --> JDBC <----> DB
                    ^
                    |
                    ---- (load-file "some-sql.clj")

The XMPP module actually consists of two seperate stacks: a server
stack and a client stack, both of which end in the Jabber namespace,
where the   actual XMPP processing logic   is implemented.   This
architecture mirrors Netty's Pipeline hierarchy, and is nonblocking.

Each upstream request from either a client or server Channel is pushed
up the Netty Pipeline stack, and results in a Message being queued in
Core for routing. Core then emits Messages to either the client or
Server write function wrappers, which results in the Messages being
pushed downstream the appropriate channel.

    ------------    -> Core <-    ------
    |          |    |        |    |    |
    | *********v****|********|****v*** |
    | * Jabber X    X        X    X * |
    | *********|****^********^****|*** |
    |          |    |        |    |    |
    | *********v****|********|****v*** |     X = Processing Logic
    | * Stanza X    X        X    X * |
    | *********|****^********^****|*** |
    |          |    |        |    |    |
    | *********v****|********|****v*** |
    | * XML    X    X        X    X * |
    | *********|****^********^****|*** |
    |          |    |        |    |    |
**|**********v****|**    **|****v****|*********
* X   Client X    X *    * X    X    X Server *
**^**********|****^**    **^****|****^*********
    |          |    |        |    |    |
    |          ------Internet------    |
    |                                  |
    ------------------ Core ------------

The JSON module is a good deal simpler, as its is based on
pre-existing abstractions provided by the Java platform.   Current
architecture relies on Compojure as a servlet abstraction, with one
servlet dedicated to serving JSON messages for each widget, with
servlet URLs unique to each widget.   Once a JSON message has been
received, it is processed in teh servlet and queued into Core - when
Core later emits a Message for a JSON client, the message is queued in
the servlet until the next client poll.

TODO
______________________________________________________________________

Needs

Contact List widget (HTML/JQuery/Clojure) 8 hours
Should display a list of the user's contacts, and allow the user to
initiate the following actions with each contact:

    * Initiate a conversation (opens in Messaging widget)
    * View a contact's profile

Messaging widget (HTML/JQuery/Clojure) 16 hours
Display an open conversation - no server state, the widget itself
should simply keep a log of received messages (and perhaps have a
function for receiving a log from the server). Each Messaging
widget   should   represent a   conversation   on *one*   channel;
tabbing/windowing should be handled in a messaging container.

Messaging Container widget (HTML/JQuery/Clojure) 8 hours
Holds a number of conversation windows open - no server state.

Login/Status widget (HTML/JQuery/Clojure) 4 hours
Allow the user to login if unauthenticated, or show current status
otherwise.

Profile widget (HTML/JQuery/Clojure) 16 hours?
Not sure how this should be implemented just yet - a simple
implementation would be to overload vCard, but this will not play
nicely with existing XMPP clients like pidgin.   Other options
include Opensocial or some custom protocol or XEP. Regardless, this
widget should allow the user to upload photo(s), set personal
details or whatever - it may be prudent to simply allow the user to
upload an arbitrary static mini-site that serves as a profile. With
this option, developers can choose a templating scheme for their own
profiling engine

Core 'Routing' (Clojure) 16 hours
Routing goes after Jabber in the Netty server pipeline, utilizing
the data abstraction:

     (deftype Message [to header args])

The lifecycle of this layer should look like this:

    1 Queue message
    2 Look up addressee in Core - determine state and locality
    3 If necessary, request a new client connections for the message
      (this will require the message to be queued again for this
      conenction)
    4 Call channelWrite on the appropriate connection.

Client stack (Clojure) 32 hours
Client stack is the socket <-> Message implementation of the XMPP
federation protocol.   This is going to be a pain in the fucking ass
- you've been warned.


Wants

Grizzly refactor (Clojure) 32 hours
Current application uses Netty for XMPP, Jetty for JSON - wasteful.
Compojure has Grizzly bindings, and Grizzly also supports HTTP,
Servlets and Comet as potential transports - but XMPP would have to
be rewritten entirely. This is not an entirely trivial task, the
main issue   being Grizzly's   apparent lack of   midstream tls
negotiation ("starttls") support.   We can require SSL only XMPP
connections to mitigate this, which doesnt seem unreasonable to me -
interoperability with legacy systems being the only real loss here.

Key/Value store backend support (Clojure/Linux) 16 hours
Mainly needs discussion as to cleanest way to support key/value as a
data backend.

SMTP (Clojure/Java) 16 hours
Should be a simple matter of integrating SMTP into the messaging
infrastructure - java has excellent SMTP implementations available.

Sample App (HTML/CSS/JQuery/Clojure) 16 hours
Just a basic, styled social network with some content that can be
used to essentially 'self host' the project. Content thus consists
of wiki, git, buglist, this document, some pictures of kittens
fucking or whatever lagniappe.

OpenID (Clojure/Java) 8 hours
Interop is the name of the game - this one is obvious and dead
simple. Fits snuggly with Ego's concept of adhoc identity

Opensocial (Clojure/Java) 32 hours?
Google has a java opensocial library - this is probably not worth
implementing at start, it is costly and the benefits are meager
given its limited adoption. Plus, we are essentially recreating
much of this functionality in XMPP anyway.

Plugins (Clojure/Java/?) 16 hours
Plugins in clojure should be fairly simple, but we need an
extensible, common protocol for extending core - must include
permissions and namespaces. Might also be nice to have Spring
integration ...

BOSH (JQuery/HTML/XMPP) A Long Fucking Time (tm)
Remove JSON long-polling functionality in clients, replace with BOSH
enabled widgets. There is some possibility of using an existing
JQuery or Javascript BOSH client if the license allows it - needs
research

PubSub (Clojure/XMPP) ALFT (see above, also, tm)
See PubSub XEP - this item is actually fairly necessary for
federated services on the developer side.

"Known Nodes" (Clojure/XMPP) ???
Implement a feature to maintain a list of known Ego Nodes, such that
meta   data about   potential connections   and whatnot   can be
transparently shared. Needs lottsa discussion

Replace XML Layer with StAX? (Clojure/StAX) ???
Requires research/discussion as to what would be an appropriate
(nonblocking!) replacement for the hand rolled stream pull parser

On Sat, Jan 16, 2010 at 7:25 AM, Barrett Brown <barriticus@gmail.com> wrote:
Please to be sending me the pertinent info, CEO.

--
Regards,

Andrew Stein
steinlink@gmail.com
(512) 796-4375 (cell)