[uf-discuss] microformat proposal: dependancy graphs (for software)

Wed Jan 31 00:42:17 PST 2007

Hey Derrick,

I think you are on the right track with regard to process here.  I
especially liked the in-depth treatment of the problem statement, with
specific examples that came in addition to  (instead of soley) your
own frustrations.  It also seems like you've looked over how to gather
examples, so I'd like to encourage you to continue with this effort.

There are a few things I'm a little concerned about.  For example, who
would consume this microformat?  Many of the existing microformats
have centered around end-users as potential consumers.  However, I'm
not sure the "average" person has an interest in consuming software
information.  I'm not sure what impact this might have on the would be
creation and subsequent adoption of your proposal.  Documenting and
codifying existing publishing practices is a worthwhile effort, so it
may not matter.

You seemed to hint on how this technique might be adapted to other
areas in some of our hallway conversations.  Are there other practical
problems that might be solved on a lower level, or would this
technique map directly into other problem domains?

I also liked how you mentioned how your proposed microformat would
work in conjunction with other microformats to compose a larger
system.  I'd like to hear more about this vision because it sounds
interesting, and may inform developments in other microformats, or
inspire would-be extension authors.

Would you mind starting a wiki page, as described in the process?
<http://microformats.org/wiki/process>.  What do you think the name
should be?  Something like versioning-examples?
softwaredependency-examples? version-resolution-examples?  What would
the most apropriate name be?

http://kernel.org/
* uses <link> to provide RSS feed
*   <div id="versions"> uses word "versions"
* uses links to deliverable with version string as anchor text
* uses a kind of "product/software id" (my made up term) <table
class="kver"> to identify the thing being described
* includes a description of the software
* includes the date published
* uses keywords as anchor text to perform operations or view
additional features.  for example, V for view diff, changelog to see
the changelog, etc...

http://libpng.org/pub/png/libpng.html
* <LINK REV="made" HREF="http://pobox.com/~newt/greg_contact.html">
Interesting.  "made by "his guy" :-).  hcard would seem to be a
perfect fit here.
* <B>libpng</B>  name of software in an element
* <A HREF=
"http://libpng.sourceforge.net/">http://libpng.sourceforge.net/</A>
link to homepage
* requires <B>zlib 1.0.4</B>
or later (<B>1.2.3</B> or <B>1.1.4</B> lossy markup of requirements
* The current public release, <B>libpng 1.2.15</B> again, lossy markup
* includes a description of the software
* <B>libpng 1.2.12</B> another mention of the software in it's own element
* the site contains links to test suites, documentation, and download links
* also includes a description of how to verify the contents
* lots of content about the software, but very little semantic markup.
 good example: easy to see how some semantic techniques would help.
would marking this up using hatom help at all?

http://freshmeat.net/projects/libvc/
* uses <link> to an rss feed
* links to other project areas... issue tracking, forums etc...
* branch info published
* date added, created, modified all published
* description
* author
* "trove" categories
* might list dependencies, but there are none for this particular example
* stats listed: vitality, popularity, downloads, graphs...
* hits, subscribers
* other projects depending on this one are listed
* license published
* download links provided
* no semantic markup present.

http://raa.ruby-lang.org/project/fcgi/
* <link> and <meta> used to convey authorship, "made", author, and
some other attributes: search, index, home, glossary
* interesting, there is some semantic html...
<p class="caption">fcgi / 0.8.7</p>
<table class="entry">
* key value pairs (in a table) for: short description, category,
status, created, last update, owner, homepage, download, source
vieweing, license, dependency, versions as link text to the
deliverable with date published outside the link
* uses address to list contact for the document, and other uses of
semantic html such as class names "footer", "header", and "caption".
Shows a receptivity to semantic techniques as well as confirming the
list of properties published by software vendors.

http://www.gentoo-portage.com/dev-lang/erlang/Dep#ptabs
* <link> to rss feed
* <body id="gentoo-portage"> intended consumer published in markup
* links to other project areas
* <h2 id="packageid">dev-lang/erlang </h2> product name
* <h5> used for description
* <div id="website_list"><ul>...</ul></div> used to list project websites
* <div id="ebuild_list"><ul>... used for consumer-specific parameters
* "view" and "download" links.
* <h3>Runtime Dependencies</h3> dependencies published using <div
class="depbox with links.
* >=<a href="/dev-lang/perl">dev-lang/perl</a>-5.6.1
* the previous behaviour is used for each version of the published software
Lots of semantic html.  Could be clues for possibles property names.

There are a couple of things that jump out at me.  There are varying
amounts of information published.  The most common is the name of the
software, the current version, the link to download, and some other
attributes.  It would be a good idea to capture all of this on the
examples page.  I would encourage others to also take a look at the
source code, note what's being published along with any semantic
techniques present in order to ensure that I haven't missed or
misrepresented anything.

It would also be beneficial for anyone interested in this activity to
continue finding more examples, and applying the same kind of analysis
to them.

It's interesting how some of the other microformats might be applied
to this problem.  I noticed a lot of <link>'s to RSS feeds.  If they
are publishing this as RSS, would publishing it with hatom make sense?
 What would be missing?

I'm vaguely aware of a format called DOAP, which freshmeat apparently
uses.  The python cheeseshop also uses it, and I wonder if sourceforge
or tigris uses it.  DOAP lives at <http://usefulinc.com/doap/>.  I
wonder if there are lessons that can be learned from the DOAP effort,
and if any of that work can be re-used.  If there are conflicts in the
plurality of publishing behaviours and the DOAP model, I suspect the
publishing behaviour should take precedence, but that's my opinion.

Thanks,
Ben West

On 1/27/07, Derrick Lyndon Pallas <derrick at pallas.us> wrote:
> I'm interested in feedback on the following idea. (Right now, I'm in the
> process of developing a corpus of live examples but that's not what I'm
> asking for.)
>
> Essentially: software is everywhere. Because it is complex, it is
> modularized. This has the effect that not every developer will be on the
> same page. Changes can happen in a kernel that bubble up through the
> standard library into a maze of support libraries. Hopefully, separation
> of concerns and good interface design have dampened negative effects but
> there can be subtle, negative consequences.
>
> For example, a change in the standard library can break a bad practice
> in libfoo (ignoring some return value) which causes an array bounds
> problem in libbar. Since your application uses libbar and there is no
> direct connect to libfoo, how do you know that you need to upgrade to a
> newer version of libfoo? This is especially a problem if libbar doesn't
> need to be recompiled (it retains binary compatibility with libfoo) or
> if libfoo or libbar are optional.
>
> The same problem from the other direction is directed, acyclic
> dependency graphs. If I am building a new application from scratch, how
> do I know what libraries are possible? I could go to the web page for
> ImageMagick only to learn that I need a new version of libpng. On the
> same token, libpng needs an updated version of libz. And libz doesn't
> like my old crufty compiler. Add to that the complexity that I really
> just want to install RMagick (a Ruby interface to *Magick) which can use
> either ImageMagick or GraphicsMagick (though the interface changes in
> subtle ways depending on which library you choose) and these choices
> interact with my version of Ruby, all of the modules I use in Ruby, any
> code they link in, and the compilers for that code, ad infinitum.
> Suddenly I've got 40 browser tabs open and still no graphics library.
>
> The preceding paragraph is a (sad but) true story. Part of the problem
> had to do with the fact that I didn't own the system and the system had
> "stable" versions of packages on it, as defined by the Fedora Core team.
> Right now, there are ways to do these builds; whether you're using
> binaries (apt-get, yum, rpm) or source (emerge, srpm, *-src), you have
> to go through a clearing house. Someone took the time to compile
> binaries or repackage source trees and write down what needed what.
>
> But the fact is all of this information is already on the homepage for
> most software. Current aggregators rely on the author(s) submitting this
> information manually. Furthermore, commercial packages don't normally
> submit product information to sites like FreshMeat, SourceForge, or any
> language-specific repository (CPAN, PEAR).
>
> Because the information (version, dependency, package URLs, bug alerts)
> is already there (see below) it should be fairly straight-forward to
> figure out what people already do and "semantic it up."
>
> Take a look at:
>  * http://kernel.org/
>  * http://libpng.org/pub/png/libpng.html
>  * http://freshmeat.net/projects/libvc/
>  * http://raa.ruby-lang.org/project/fcgi/
>  * http://www.gentoo-portage.com/dev-lang/erlang/Dep#ptabs
>
> Mix all of this with hCard, for authors; hReview, to help you decide
> between optionals and to detect bit-rot; and rel-license, for software
> rights issues. Suddenly we have a very powerful, automatic, user-driven
> system for keep software up to date.
>
> ~D
>
> _______________________________________________
> microformats-discuss mailing list
> microformats-discuss at microformats.org
> http://microformats.org/mailman/listinfo/microformats-discuss
>