On 1/8/2013 9:24 AM, Johnny Billquist wrote:
Data mining is difficult, since there are different systems, with different possibilities
of extracting it, and in different formats.
Sampsa's goal here is mapping HECnet. My goal is to write a data mining service that
just happens to provide the data that a mapper would use.
To that end, I'll be stashing all this data in a database of some sort (details
unknown as yet, see below)
A centralized repository of data is nice in many ways, but it is a headache to manage.
Absolutely it is, but if people are already putting INFO.TXT files out there they are
doing 99% of the work already, we just need to get the data in a single place.
That said, I could be convinced of setting something semi-automatic up. A reasonable way
would be for people to give me machines to poll, and then I'd setup an automated
process to poll those machines for files in a specific format. I can then create a
database out of that, and make it available through the web, as well as over DECnet, and
also as a summarized file. Anything would be pretty easy if we just have the data
collected.
I think this would be fantastic.
I already have something of a start for this in the form of my database of nodes in
HECnet. I'd need to extend it with more fields, but that would be pretty easy.
It's all in Datatrieve today, and that should be accessible over DECnet right now
(even though I seem to remember that VMS hosts had some problems with that).
I'll have to learn how to access that db.
I'm already extracting information from that database for the hecnet web-page on MIM
(accessible as Madame).
So, if we can just decide on what we want, and how to make the information available,
I'll sit down and write the code to fix it.
Do you want to also store my data or should I do that myself? I might do it myself at
least for now until I know what exactly I need/want to save.
-brian