This week, the main work will be a sprint to get some functionalities out for Cheméo. A sprint week is a very focused week on a given project. I found this approach to be the best to get my projects ahead.
The focus will be on:
- better indexing of the data. Some of the queries are running within 10 ms and some others 250 ms. I need to get that to below 50 ms for all the queries;
- personal interface to manipulate your list of components and run some analysis on their properties.
The challenge with Cheméo is that it ingests a large quantity of data which needs to be curated, deduplicated and merged. I am really happy to see the work of IUPAC with the InChI taking more and more weight, because the current approach of relying most of the time on the CAS number is a nightmare as you cannot derive the CAS from the molecular structure. Basically, you are always a "typo away" from a bad identifier for your components and you need to deal with all the copyright non-sense from CAS (if you are living in the US). In Europe, you do not have copyrights on the data in the databases and the copyright on numbers that CAS tries to claim has never been tested in court.
Anyway, we are manipulating too many molecules, too fast in too different fields, the CAS registry cannot scale by manually giving each structure an arbitrary number. This is good, this means that the CAS monopoly will fail by itself.
Oh, a very nice 3D molecular viewer: CanvasMol.