Portal Home > Knowledgebase > Articles Database > "my own AMAZON S3" - or: "lets draw a lowcost ditrubuted database concept"


"my own AMAZON S3" - or: "lets draw a lowcost ditrubuted database concept"




Posted by joonas, 08-27-2010, 08:53 PM
hey! i need you guys ideas to do a first draw for the ultimate distributed Database concept (software) which was theoretically possible today. what i want to do is nothing speciall, just suggest which software will be a good choice to make parts of the dream work for example "open solaris because ZFS......." "apache hive because all the big websites use it...." SCENARIO ONE: imagin you rent 10 identically dedicated machines each ten harddrives each 2TB at one cheap hoster, which probably stand near to each other but are connected with 100Mbit or 1GBIT only. Now you want to run a website with a database which is 75TB in size. QUESTION ONE: Which exsiting software do you choose to make those machines work together and make the 75TB database works stable and secure in case of any diskfailures? SCENARIO TWO imagine you rent a 200 machines each 10 harddrives each 2TB at a several differnt hosters all around the world. Now you want to use those machines to power a website with a database 500TB in size QUESTION TWO: Which exsiting software do you choose to make the database workstable and secure in case of any diskfailures as well as various ping and connection speeds between differnt server locations?

Posted by joonas, 08-27-2010, 11:04 PM
anyone? i know its a hard question, but to the same degree its very interessting. i noway expect to receive a proper soltuion, just some ideas... thanks...

Posted by plumsauce, 08-27-2010, 11:18 PM
distributed databases aren't really new one of the johnny come lately's to the show is hadoop, maybe you could start there.

Posted by lockbull, 08-28-2010, 01:00 AM
Hadoop isn't a distributed database, it's a distributed file system. Though you can run something like HBase or Hypertable--both of which are "inspired" to say the least by Google's BigTable--on top of Hadoop. Anyways, you're putting the cart way before the horse here talking about servers, disks, networking, etc. "Distributed database" can mean a lot of things, but for starters, there are clear dividing lines between the approaches for SQL-based relational systems and so-called "NoSQL" systems, which use non-relational data stores and eschew fixed schemas. Each has their advantages and disadvantages, and which approach you use is going to heavily affect your application design and architecture. Honestly, this probably isn't the best place to discuss a topic such as this. I'd highly recommend checking out the High Scalability website for some great, real world use cases of highly scalable architectures as well as in depth discussions of various scalability topics such as distributed databases.



Was this answer helpful?

Add to Favourites Add to Favourites    Print this Article Print this Article

Also Read
squid memory (Views: 710)
OpenVZ 32bit centos? (Views: 717)


Language: