Category Archives: multivariate analysis

Different kind of map. Map of road deaths

The UN has launched a ‘decade for action’ to tackle road traffic accidents, which kill more people around the world than malaria, and are the leading cause of death for young people – especially in developing countries.

Visualizing the most recent data on traffic deaths and injuries, from the 2009 Global status report on road safety by PCA was my interest. I’ve used a subset of countries where all of the data were available and make the “statistical map” less cluttered by small countries.

Map show countries (green squares) and statistics (red diamonds). The closer countries to each other on the map, the more similar they are in whatever parameters describing them. In this case those are # of deaths, % of each type of death, GNI, etc. The closer those parameters to group of countries the more significant they are (larger values) for that group. For example, Russia, Iran,  Chile and South Africa have largest # of death per capita and % of pedestrians killed (two red diamonds that are closest to this group of countries).

The resulting map (biplot of Principal components) speaks for itself. Majority of road death are pedestrians, with cyclist and bicyclists following behind in poor countries. Developed countries have more of vehicles and larger % of death in car accidents. Japan having largest fleet has small number of death in cars, quite interestingly.  Netherlands, not surprisingly  having so  much bicyclists  stands away from the rest of Europe and other developed countries with having larger % of death of bicyclists.


Optimization of ink composition based on a non-platinum cathode for single membrane electrode assembly proton exchange membrane fuel cells

The paper on research we did long time ago is out.

XPS structural information is correlated with electrochemical performance in fuel cell and stability by Principal Component Analysis.

Non-Pt based oxygen reduction catalyst fuel cell performance is reported for various electrode compositions. Ink formulations for pyrolyzed Co porphyrin based cathode electrocatalysts were evaluated in a membrane electrode assembly (MEA) configuration and X-ray photoelectron spectroscopy was performed on the MEA catalyst layers. The effect of cooling time trajectories of the catalysts after pyrolysis as well as Nafion content in the ink formulation were studied. By building statistical structure-to-property relationships between XPS and MEA performance using multivariate analysis we have determined that the higher stability of fast-cooled containing inks is mainly associated with better preserved graphic carbon from the carbon black and C–F moieties of the Nafion, while better MEA performance is a result of the presence of these moieties as well as pyridinic nitrogen and nitrogen associated with metal in the pyropolymer. Optimal Nafion content is determined at 1:1 catalyst:Nafion weight ratio, while higher Nafion concentrations causes oxidation of the Nafion backbone itself as well as leaching of the CoxOy particles from the catalyst and formation of oxidized species of Co, O, C and F.

What data behind “change in trust in science” really show

Post by Razib Khan made me wanna look at the data behind the questionable change in trust in science from 1998 to 2008 a bit more.

The dilemma whether trust in science vs. religion was impacted by the “broadsides against religion” was approached by asking responders whether they agree with this statement: “We trust too much in science and not enough in religious faith.”  The responses were:

– Strongly agree
– Agree
– Not agree or disagree
– Disagree
– Strongly disagree

The data are right here:

Looking at these data, Razib made very reasonable conclusion “don’t see much difference”.

I could not pass an opportunity to apply principal component analysis to this table above.

The biplot below shows both responses and demographic categories.

Demographic responses in 1998 are shown in green and those in 2008 are in blue. Arrows  are connecting the same category of demographic between two different years. It is clearly that there is no change in total and majority of individual categories.

However, there are 3 peculiarities that caught my attention. There are three red arrows on the plot showing quite significant change. That’s why I love PCA – easy way to visualize data with multiple variables, and the data are still there for us to explore (some think that PCA is a black magic that eats all the data away) !

So back to original data now. Changes in those with “none” religious preferences and “liberal” political views are quite similar (such overlap between these groups is not suprising), in which big part of people who were uncertain (‘neither‘) have transitioned into a group of “disagree“. For “independent” class,  responses in all categories changed except those in “agree” group. (Interesting observation by itself, that indicates the fluid unpredictible character of independent voters?)   Big part of “strongly agreeing”  and “neither” is lost (from 14% to 5% and from 34% to 28% respectively) while  “disagreeing” % grew from 22% to 36%.

To sum up, careful analysis of data shows that in all three categories of responders with largest changes from 1998 to 2008,  the group supporting science grew (“disagree and “strongly disagree” categories of responses) . The major source of this growth seems to be from the pool of those with neutral opinion (‘neither“) except independent for which large % of those that “strongly agree” also switched to “disagree“.

So, I am confused.. People in conservative and religious groups that would be affected the way Robert Wright hypotheses did not show changes in the way they view trust in science vs religion. At the same time, more people  from liberal groups disagree with the statement indicating that they trust science more than before. How exactly this is a sign of weaking trust in science?

By the way, I find the statement  to be  pretty confusing way to ask such a straightforward question…

Is fast, label-free detection of viruses, toxins or even DNA fragments possible in nanochannels?

Fluids confined in nanometer-sized structures exhibit physical behaviors not observed in larger structures, such as those of micrometer dimensions and above, because the characteristic physical scaling lengths of the fluid very closely coincide with the dimensions of the nanostructure itself. For example, confinement of molecular transport in fluidic channels with transport-limiting pore sizes of nanoscopic dimensions gives rise to unique molecular separation capabilities. Such nanofluidic structures are used widely for separating fluids with disparate characteristics. Development of bio-nanofluidic technology for chip-based analysis systems gives possibility of investigating DNA behavior at the single-molecule level.

Various molecular separation techniques such as nanochannel electrophoresis, microchannel capillary electrophoresis and gel electrophoresis rely on difference between velocities of movement of molecules due to various sizes, charges or combination of those. After sufficient time has passed, clearly visible bands of separated molecules are observed using various possible detection schemes.

Think about two fishes moving down the stream. If they look identical and weigh the same, how do we know if any of them have eaten another small fish for dinner? And that is very important question, believe me! The only way to answer it is to take your stopwatch and wait for both fishes to swim down the stream far enough so that the difference in velocities becomes apparent depending on time sensitivity of your stopwatch.

In this analogy fish is antibody, and the dinner is antigen in biochemical world. If there is no antigen present, single band of antibody will be moving down the nanofluidic channel with the velocity v1. When antigen is present, however, part of the antibody will form antibody-antigen complex and will be moving with  a slower velocity v2, while the rest of antibody will be left unbound and will be travelling with the same velocity v1. As time passes, this will result in two clear bands separated along the distance of the separation platform.










If two of the molecules separated are fluorescently labeled with different dyes they can be images by fluorescent microscopy. This is shown in the example below where model receptor/toxin system has been separated by Capillary Electrophoresis.

Green-labeled GM1  is forming a complex with red-labeled CTB and moves slower than excess of unbound GM1. Clear green band of GM1 is followed by orange band of the complexed receptor/toxin mixture confirming the presence of toxin in the system.

This is a main principle of using separation assays for detection purposes of various viruses, toxins, etc. The problem with all these detection systems that they often  must involve labeling of analytes and binding agents with dyes, and sometimes it may take long time to see clearly separated bands to be certain that the analyte is present. Two different flow velocities as shown in example with fishes are either obvious from visual analysis of images (observing clear separate bands) or can be determined by manual calculations from images as a function of time of separation, which is tedious, time-consuming and quite subjective process dependent on the analyst doing the calculations.

This is where patent “Method for multivariate analysis of confocal temporal image sequences for velocity estimation” comes handy. It allows identifying whether there are two flow velocities present in images acquired as a function of time of separation from the way intensities of images themselves change with time.

  • First very important benefit of this methodology that this can be done at the very beginning of experiment when no clearly visible separation is present with as few as first four acquired images.
  • And the second benefit is that no labeling of molecular species is necessary as presence of two flow velocities can be determined from gray scale intensity of image at either Green or Red or overall RGB image converted to grayscale.

We have shown this in “Detecting molecular separation in nano-fluidic channels through velocity analysis of temporal image sequences by multivariate curve resolution” published in MICROFLUIDICS AND NANOFLUIDICS journal. 

Visualizing Life Satisfaction data by Multivariate Analysis

This week OECD relaunched their Better Life Index this week and provided data behind it.

I’ve applied multivariate statistical data analysis methods to average value and you can see results below. Quite interesting groups of countries had emerged.

X-axis separates countries by those with high Life satisfaction index vs those with low. Y axis separates countries by job availability.

  • The most satisfied group of countries is within top right quadrant, having all highly developed countries.
  • The least satisfied group of countries is in the bottom left quadrant of the plot with unemployment being the major factor contributing into their unsatisfaction. This is highest for Eastern European block which experienced economical difficulties in recent years.
  • Countries at top left quadrant are less happy  than  those in the top right quadrant, but not by much. The major factors are high level of crime and long hard working hours. The least satisfied in this group is Turkey (farthest on the plot from Life Satisfaction Index).  Interestingly, Israel has one of the highest wealth and health indicators, lowest crime but at the same time long working hours and worse housing conditions.
  • Countries in the center of the plot is where all indicators balance out. Level of life satisfaction for this group is in between the worse groups. It balanced out by not very high “positive” indicators such as wealth and health and not very high “negative” indicators such as unemployment and crime.
  • Education does not seem to affect life satisfaction as much as other parameters. It is lowest for the group of countries in the left top quadrant and highest for the group in the right top quadrant but both these groups are quite satisfied with life.
Analysis of women and men values separately will be done soon as well.
PLSDA, PLS_Toolbox 6. in Matlab was used with autoscaling options for processing
Original data used for analysis:
Tagged , ,