Wed May 15 11:53:09 UTC 2013: Hello, we have a routing issue between the frontend and the backend database of Indefero. The database is safe, nothing to worry, this is just that the frontend cannot connect to the database server like if suddenly a firewall was cutting the connection. We are investigating.
Wed May 15 13:54:26 CEST 2013: Our provider is having an issue, we keep an eye on it.
Wed May 15 13:55:39 CEST 2013: Ping between the machines went from 5 ms to back under 1 ms, it looks like our provider is doing something.
Wed May 15 14:02:45 CEST 2013: Our provider changed some configuration of their routers and they are now observing the situation, Indefero is back online.
Reminder: The Indefero hosting will stop June 30 this year, yes, 2013. You were informed about a year ago, so I hope you had time to migrate.
If you are a Subversion user, Assembla can import your dump of your repositories. The daily Subversion dumps are available in your account area. They also offer Git hosting. If you are using Git only, you have 1000's of offers.
Of course you can install your own version of Indefero and go ahead with a self hosted solution as Indefero is a free software.
Thu Apr 11 07:27:29 UTC 2013: There is a pretty serious update of PostgreSQL to be done for security reasons. Indefero will be done for a couple of minutes the time to update the server. I am sorry this is an unplanned update.
Thu Apr 11 07:30:54 UTC 2013: The slave has been updated and is correctly picking up the updates from master, now going to update the master. From the slave update sequence you can expect about 2 minutes downtime of Indefero.
Thu Apr 11 07:33:36 UTC 2013: The master has been updated, the effective downtime was less than 20 seconds.
Wed Mar 6 16:21:45 UTC 2013, we have an electrical issue, for the moment Indefero is down, it will come up as soon as the electrical issue is resolved. More details will soon be available from our provider.
Mon Jan 21 14:22:30 UTC 2013 A large array of servers are down in our datacenter. Saddly, the main database server of Indefero is down too. The backup server is running ok and is up-to-date. We are expecting feedback from OVH to assess if we need to switch over the slave or if we can expect the main server to come back fast enough.
Mon Jan 21 14:24:25 UTC 2013 It was fast, power supply issue for RBX4. Still waiting for more information.
Mon Jan 21 14:35:20 UTC 2013 Everything is up again, sorry for the inconvenience.
Our provider has some issues. Not so fun, the servers are down and I hope they will come up again as soon as possible. Basically, we have a split on the internal network, some servers can talk to each other, some cannot. Very annoying because it makes split brain situation. All the data are safe, but we have to wait until this is solved (and they are doing it manually).
Update: Everything up again, down-time of approximatively 30 minutes, I hope it was at the time of your coffee break.
Very good news, after a lot of discussion (because I want to be sure of the quality of the offer) the Indefero hosting offer will continue as it will be taken over by a small companies already used to work with hosting data and user information under strict security rules. The transition will be totally transparent for the current customers and users. I will keep you informed.
Front matters: This is the email I sent to all the current users of the hosting offer.
Dear Indefero Users and Customers,
today is not an easy email I am sending you, this email is to announce you the wind down of the Indefero hosting platform. The Indefero hosting will be stopped at the following dates:
To help you in the changes, the period of transition is as long as possible. Now that you know the meat of the subject, let me provide you with the whys, how and details. But first, stopping a service is not an easy task, it is especially hard because you, as customers, trusted me to provide long term high quality service and by stopping the service I am breaking this trust. For me, it is also hard because it means that I failed to correctly predict the future.
Thanks a lot for the trust you had in using Indefero and please accept my sincere apologizes for not providing you continued services for another couple of years.
If you paid a renewal or a new forge in the past 45 days, I can issue you a full refund. In this case I will ask you to migrate out before the end of the year. 45 days is the limit of the banking system.
Simple, login here:
and download the backups (down the page). You get everything related to your forge and the data are compatible with Indefero, that is, you can install Indefero on your server and import the data.
Login in your account:
Click on the "configure your forge" link:
and update the personal domain to use a domain you fully control. If you are working for the foobar.com company, put something like code.foobar.com and get a CNAME record in your DNS pointing code.foobar.com to yourforge.indefero.net. Then, start asking people to use code.foobar.com to access your forge. After a while, nobody will use the indefero.net address and you will have full control over your forge.
The next step is to setup your own Indefero instance and import the data from the hosted forge, then switch the DNS to point to your own Indefero instance.
The end result is a migration without downtime and without disturbing your end users.
In October, you will get ready to use Amazon EC2 images which will allow you to do nearly "One Click" migration of your data from the hosted platform to your own Indefero instance. With an EC2 micro instance, it will cost you about $15 per month to run your own Indefero instance. I will also work with the current providers of Debian packages to be sure you can easily setup and import your Indefero forge on a fresh Debian system.
Because of focus, when I started Céondo Ltd, I had not really a clear picture of where I wanted to go and how, now, I know and the key is "Science", that is, I will fully focus on scientific software and consulting. In the last months, I was able secure 2 to 3 years of consulting pipeline in science, this is a clear indication that this is the way to go, a specialization in an extremely technical area where the barrier to entry is very high.
Surprisingly, I expect it to be positively affected. The last year I have been slowing down my involvement in the software because changing the software would also mean for me, applying the changes to 3000+ forges on a system not designed in the first place to accomodate so many forges. I was afraid of the consequence of a bad upgrade at such scale.
The time used to manage the hosting will in part be redirected to improve the software and the migration tools will also be used in parallel to allow us to perform automated testing of Indefero. We will be able to start an EC2 micro instance, test and stop.
The Indefero community is very active with an increasing number of users and packages for nearly all the current Linux distributions, the current goal is to have the Indefero packages distributed officially by all the main Linux distributions to ensure long term support. You will soon get the distribution specific packaging scripts be part of the source code of Indefero. This is critical for the long term support of Indefero and will help testing on a larger scale.
The code hosting space is crowded, so crowded that it is hard to recommend someone. First, the real question is:
how critical is your code? Can you accept to have it hosted by a third party? Of course you currently have it hosted by a third party right now, but it may be a good time to rethink this question. I think it is critical enough to have full control over it, this is why I gave full control with CNAME, backup and OSS dump compatibility to you when providing Indefero hosting. I could not provide something I could not use personally.
then of course, you need to define what you want in terms of functionalities, version management software (Git, Subversion, Mercurial, etc.) and the contractual constraints (hosting location within/outside the US, price, owned by a big/small company independent/partially owned by venture capitalists). This is not simple, I have seen an increase of the number of forge creations since GitHub took venture capital money on board. So, it looks like some of you do not like to be dependent on venture capitalist controlled companies. You have the time to think about it.
If I had only one service to recommend, I would recommend Pikacode under the lead of Benjamin Jordan. They have been hosting code repositories for a long time and have been active contributors to Indefero. I trust them and they are real system administrators, used to managed some of the biggest website in France, saturating Gbps of bandwith during big events. They know their job very well.
Here is the "inline" advertizing from Benjamin for you:
Pikacode.com offers Git and Mercurial repositories hosting. Formerly knows as Intuxication, thousands of repositories have been created by our users since 2008. Pikacode goals are simple : easy, sleek and fast code hosting. We offer you 90 days of free trial for unlimited private repositories and collaborators with the following voucher : HELLO-PIKA.
If some of you can recommend code hosting companies, just let me know. Note that Pikacode also offer free public repositories.
I will setup in September/October a migration website:
this will be your portal to have everything you need to get a successful migration without disturbing your users and losing your data. It will be updated with the latest information, tutorials, possible alternative offers — basically, everything to help you.
At the end, I can only thank you for your trust and the bit of travel around the Sun we did together. I am proud of what was achieved with Indefero and I am honored you trusted me, I am also sure you will find a good way forward.
Best regards, loïc
The [Indefero|http://indefero.net] SSL certificate expires today, it will be renewed during the afternoon, normally the update is performed without downtime. If downtime, it should a couple of seconds the time to restart the web server.
Update 2012.07.30 11:30 UTC: The renewal procedure is on the way, it should be finished in a couple of hours to have the real update at the server level done just afterwards.
Update 2012.07.31 07:40 UTC: The certificates are now updated and valid for another year.
Please accept our sincere apologizes but the creation of new Indefero forges will be suspended this week while the website will be updated. The website will be updated by the end of the week (30th of June 2012) but the exact dates when the website will be updated is not yet known. If you want to create a forge, please try doing it by Wednesday the 27th as the update will most likely take place Thursday/Friday.
I am sorry for the unscheduled downtime this morning, Friday 25th of May around 8 am UTC. A scheduled kernel upgrade of the server went not as expected. The kernel upgrade did went correctly on the slave, including reboot and resync, but the master failed to come up again. For data safety reasons, we performed a backup of the slave before promoting it as master and switching the application to use the new master. This backup is what took a bit more time than expected and resulted in the large downtime.
What is next?
I will keep you informed. Sorry for the annoyance, sometimes issues happen and this one took me by surprise.
Update: Reviewing the logs, the combination of a VM + Hardware node restart, including KVM upgrade is most likely the culprit.
Elveos.org is a crowdfunding website for open source software. You are a free software developer? Elveos gives you a way to get paid for your work. You are a free software user? Elveos let you fund the features you need.
It reminds me KickStarter but more open. It is very nice to see offers targeting the OSS community.
If you noticed a slow down in the past minutes, one of the routers of our provider had some issues. This slowed down the services for a short period of time. As you can see on the following graph, suddenly our GET requests to monitor the response time of the services went bong. 20 second response time, this is the equivalent of dead...
For your information, we are in the process of migrating Indefero's main backup server on our new infrastructure. The new infrastructure has been running for a while and we are satisfied with the stability.
We are going to do at the same time a server upgrade, moving away from Ubuntu and back to the roots, that is Debian. Once the backup server will be up and running smoothly, the main server will follow.
Update: Got a bit of instability at the same time... upgrading here and there an old server is difficult. Time to get the migration to a better system completed!
Few days ago Cheméo's laboratories went life. The labs are running software experiments in the field of chemical and physical properties. They are kind of sandboxes where ideas can be tried without disturbing the main Cheméo website.
The labs are running on top of Céondo's private Platform as a Service (PaaS). This platform will soon host all the services we deliver, from our products Cheméo and Indefero to simpler websites like ceondo.com. In case of, a status website will be kept independently using another technology with a different provider. I will soon write a bit more about this private PaaS.
These are exciting times, the best to close 2011 and start 2012.
Improving the speed of Indefero is challenging as it requires managing a lot of moving parts, from the git/subversion backends to the database. This week, I have been working on setting up Graphite for the infrastructure. This is working pretty well and provides graph like the following one.
This graph is extracted from a special Nginx log format which includes the time needed for Nginx to send the response back to the client. The only thing missing is that when I see a spike, I need a way to directly access the corresponding logs to figure out why. At the moment, there are no integrations between these metrics and the logs.
To improve a system, one needs to know the current state. Graphite is a bit hard to setup, but afterwards, it is really easy to push data in. A really nice tool.
To run a service like Indefero, you need to log a long list of metrics to follow the load on the system, find the bottlenecks and predict the future needed capacity. To do that, a very powerful system is Graphite, the only issue is that it is only storing and graphing numerical values. Of course, you cannot do different, but the problem is: correlation.
Basically: Once I see that every now and then component is not performing well, how can I drill down in my data to find the reason?
Graphite tells you: this day from 14:05 to 14:07, the rendering of a git tree view was slow. Good to know, the following question is of course: why? If you store more metrics, you can maybe find that I/O was slow on the server X, you can graph together many metrics and visually correlate them. But then, why was I/O slow?
At this point, you need to go one level deeper and take a look at the logs coming from server X from 14:05 to 14:07. This can bring you up to the application level where you figure out that a client repeatedly accessed a page which triggered a
git command with a large output, thus loading the server. But to do that you need to access the logs too.
So, Graphite is wonderful, but what I need is that after identifying the subsystem and time range where we have an issue, being able to simply scan through all the corresponding logs in the time range. This would be a kind of integration between Graphite and Graylog2.
My problem now is that Graylog2 is overkill. That is, it tries to provide full text search on the logs, the result is that it requires a very big machinery where I just need aggregation of the logs and the equivalent of a time base search range with a filtering by component, for example
This annoys me, I do not want to build a system by myself.
Around 11:14 UTC today one switch of the private network went down and required a reboot. The problem has been solved but this resulted in a downtime of about 8 minutes.
Note that this is the issue with the new database server, if the link between the application server and the database server goes down, then the service is down. I will contact the support staff of OVH as in my understanding, they had a kind of redundant system to not rely on a single router.
Hello, just to let you know that today November 11, the database migration is starting. You can check this blog post for updates. Here are the steps I will be performing:
This is a bit of cascading but it will always keep several version of the database running and it will always be possible to revert to the original DB server in case of problem.
Last update: The system is now insanely more responsive, pleasure to use is back! If you notice anything unusual, please let me know as soon as possible.
The Decane is a simple molecule but also the name of our new database server. It is a 24GB RAM/240GB SSD server with a lot of power to provide blazing fast data processing. In the next few days it will go through the standard Ganeti setup and the Indefero PostgreSQL database will be migrated over. Depending of the performance, we may migrate more database VMs on it.
This is the first time I am putting a server with SSD drives in production. I have been an heavy user of SSD drives for my desktop/laptop systems in the past two years and I must say, I will never go back to traditional drives, but of course, the amount of data stored is not the same for desktop and for a server.
So, yes, performance increase of Indefero is on the way!
Update: Decane just joined the Ganeti cluster:
# gnt-node list Node DTotal DFree MTotal MNode MFree Pinst Sinst node1.ceondo.net 2.7T 2.2T 11.8G 4.0G 8.2G 4 0 node2.ceondo.net 2.7T 2.4T 11.8G 2.9G 9.2G 5 0 node3.ceondo.net 194.3G 194.3G 23.6G 147M 23.4G 0 0
Update 2011.11.09: Base backup of the postgresql database is on the way, this is a huge rsync job and this is of course slowing down the system. Please be patient... thank you.
Update 2011.11.09 20:00 UTC: Ok, the new server is now acting as a warm standby for the database, this will allow fast "failover" to the new database server after the testing period.
Update 2011.11.10 10:28 UTC: The main application server will be unavailable for a short amount of time the time to connect it to its virtual LAN to communicate directly with the warm standby over a private network. Done.
Update 2011.11.10 11:46 UTC: Now that the connection at the switch level is supporting the VLAN, it needs to be configured at the host and vm level. This will again trigger short downtime here and there.
Update 2011.11.10 12:50 UTC: Ok, now setting up a second warm standby which will take over the current one on the new SSD powered server once it will start to act as master. Done.
Update 2011.11.10 13:34 UTC: Ok, things are running as expected, around 21:00 UTC today, Indefero will be stopped for about 15 minutes, this will ensure that we have the warm standby with the latest version. The warm standby will be brought online as master and then the web app will connect to the new master on the new server. Immediately, I will start to populate the new warm standby. Basically a bit of cascading.
Update 2011.11.10 19:20 UTC: Too tired to do the cascading, it is never good to do so when not really fresh. I will perform it tomorrow, ok it will be during the day, but it will be only about 15 minutes of downtime. So expect a downtime of maximum 30 minutes between 09:00 and 12:00 UTC on November 11.