lipyd – A Python module for

lipidomics LC MS/MS data analysis

1: Chemical calculator

In [157]:

The mass module knows exact masses of isotopes, isotopic abundances, weights, etc. The MassBase class is able to process chemical formula, calculate masses and do arithmetics.

In [3]:
Out[3]:
22.989769282
In [6]:
Out[6]:
46.04186481376

Expression evaluation with mass.expr:

In [8]:
Out[8]:
46.04186481376
In [9]:
Out[9]:
162.0528234236

Make a deuterium:

In [11]:
Out[11]:
2.0164899481400003

Hydrogensulphate ion:

In [12]:
Out[12]:
96.96010326786924

Additional attributes can be provided in keyword arguments to carry metadata.

In [40]:
Out[40]:
'lactic acid'

A galactose:

In [15]:
Out[15]:
'C12H22O11'

2: Calculations with adducts

In [4]:

This is an oleic acid. What is the mass of the [M-H]- adduct?

In [17]:
Out[17]:
281.24860387047

We have seen a mass and wondering what is the exact mass if it is an [M+NH4]+ adduct:

In [22]:
Out[22]:
836.5426570444603

Calculate the [M+Li]+ adduct for the same molecule:

In [24]:
Out[24]:
843.558111907551

3: Metabolite model

Metabolites consist of a core and optionally substituents. Substituents might be formulas or moieties with aliphatic chains.

In [158]:

Make all combinations of halogenated methanes:

In [6]:

Check the first 3:

In [7]:
Out[7]:
[('C1H4', 16.031300129039998),
 ('C1F1H3', 34.02187826038),
 ('C1Cl1H3', 49.99232782678)]

Do the same with all alcohols up to 1-8 carbon count with 0-2 unsaturated bonds:

In [34]:
Out[34]:
[('C1H4O1', 32.02621474924),
 ('C2H6O1', 46.04186481376),
 ('C3H8O1', 60.057514878279996)]

Make some ceramides:

In [9]:
Out[9]:
[('Cer(18:1/16:0)', 'C34H67N1O3', 537.51209502622, 555.5459205795507),
 ('Cer(18:1/16:1)', 'C34H65N1O3', 535.4964449617, 553.5302705150308),
 ('Cer(18:1/18:0)', 'C36H71N1O3', 565.54339515526, 583.5772207085907),
 ('Cer(18:1/18:1)', 'C36H69N1O3', 563.52774509074, 581.5615706440708),
 ('Cer(18:1/20:0)', 'C38H75N1O3', 593.5746952843, 611.6085208376307),
 ('Cer(18:1/20:1)', 'C38H73N1O3', 591.55904521978, 609.5928707731108)]

4: Lipid definitions

In the lipyd.lipid module more than 150 lipid varieties are predefined.

In [11]:
In [16]:
In [30]:
Out[30]:
[('Cer(DH18:0/14:0)', 511.49644496170004),
 ('Cer(DH18:0/14:1)', 509.48079489718003),
 ('Cer(DH18:0/14:2)', 507.46514483266003)]

5: External databases

The lipyd.moldb module provides access to SwissLipis and LipidMaps. Automatically downloads and processes the databases which you can search by various names and identifiers, you can also access structures as OpenBabel objects, InChI and SMILE strings.

In [39]:

5.1: SwissLipids

In [40]:
        Indexing SwissLipids -- finished: 100%|██████████| 623M/623M [00:13<00:00, 47.4Mit/s]

Get phosphatidylethanolamines as OpenBabel objects:

In [155]:
In [49]:
Out[49]:
['Phosphatidylethanolamine (28:1)',
 'Phosphatidylethanolamine (39:3)',
 'Phosphatidylethanolamine (O-29:1)']

5.2: LipidMaps

In [52]:
	:: Indexed 42556 records from `cache/LMSDFDownload12Dec17FinalAll.sdf`.
In [53]:
Out[53]:
{'COMMON_NAME': 'gibberellin A17',
 'SYNONYMS': 'GA17; gibberellin 17',
 'PUBCHEM_CID': '5460657',
 'CHEBI_ID': '24236',
 'INCHI': 'InChI=1S/C20H26O7/c1-10-8-18-9-19(10,27)7-4-11(18)20(16(25)26)6-3-5-17(2,15(23)24)13(20)12(18)14(21)22/h11-13,27H,1,3-9H2,2H3,(H,21,22)(H,23,24)(H,25,26)/t11-,12-,13-,17-,18+,19+,20-/m1/s1'}

LipidMaps too is able to yield OpenBabel objects:

In [156]:

6: Lipid name parser

In order to make the databases computationally useful and use them as a combined database, we need to process their nomenclature. The lipyd.name module is able to recognize dozens of lipid names used in SwissLipids and LipidMaps.

In [67]:
In [68]:
In [70]:
In [71]:
(
  Headgroup(main='PE', sub=()),
  ChainSummary(
    c = 36,
    u = 4,
    typ = ('FA', 'FA'),
    attr = (
        ChainAttr(sph='', ether=False, oh=()),
     ChainAttr(sph='', ether=False, oh=())
        ),
    iso = None
  ),
  (
    Chain(
      c = 16,
      u = 0,
      typ = 'FA',
      attr = ChainAttr(sph='', ether=False, oh=()),
      iso = ()
    ),
   Chain(
      c = 20,
      u = 4,
      typ = 'FA',
      attr = ChainAttr(sph='', ether=False, oh=()),
      iso = ('5Z', '8Z', '11Z', '14Z')
    )
    )
  )

It understands even greek names:

In [76]:
Out[76]:
(Headgroup(main='FA', sub=()),
 ChainSummary(c=20, u=5, typ=('FA',), attr=(ChainAttr(sph='', ether=False, oh=()),), iso=None),
 [Chain(c=20, u=5, typ='FA', attr=ChainAttr(sph='', ether=False, oh=()), iso=())])

7: Combined molecule database

In [56]:
        Indexing SwissLipids -- finished: 100%|██████████| 623M/623M [00:13<00:00, 16.0Mit/s]
	:: Indexed 42556 records from `cache/LMSDFDownload12Dec17FinalAll.sdf`.
        Generating metabolites -- finished: 100%|██████████| 44.0/44.0 [00:09<00:00, 4.50it/s]
        Generating metabolites -- finished: 100%|██████████| 18.0/18.0 [00:05<00:00, 4.10it/s]
        Generating metabolites -- finished: 100%|██████████| 106/106 [00:09<00:00, 11.4it/s] 
        Generating metabolites -- finished: 100%|██████████| 1.00/1.00 [00:00<00:00, 11.9it/s]
        Generating metabolites -- finished: 100%|██████████| 1.00/1.00 [00:00<00:00, 48.9it/s]

Either exact masses or adducts can be searched in the database by lookup and adduct_lookup methods, respectively.

In [60]:

Take a closer look at one of the resulted records:

In [62]:
LipidRecord(
  lab = LipidLabel(db_id=None, db='lipyd.lipid', names=('PC(33:4)',)),
  hg = Headgroup(main='PC', sub=()),
  chainsum = ChainSummary(
      c = 33,
      u = 4,
      typ = ('FA', 'FA'),
      attr = (
          ChainAttr(sph='', ether=False, oh=()),
      ChainAttr(sph='', ether=False, oh=())
          ),
      iso = None
    ),
  chains = ()
)

The exact masses and errors for all hits are also provided. Errors in ppm:

In [65]:
Out[65]:
array([-1.69750819e-02, -1.69750819e-02, -2.69161082e-02, -2.69161082e-02,
       -2.69161082e-02, -2.69161082e-02, -3.04974547e-02, -3.04974547e-02,
       -3.04974547e-02, -3.04974547e-02, -3.04974547e-02, -3.04974547e-02,
       -3.04974547e-02, -3.04974547e-02, -3.04974547e-02, -3.04974547e-02,
       -3.04974547e-02, -3.04974547e-02, -3.04974547e-02, -3.04974547e-02,
       -3.04974547e-02, -3.04974547e-02, -3.04974547e-02, -3.04974547e-02,
       -3.04974547e-02, -3.04974547e-02, -3.04974547e-02, -3.04974547e-02,
       -3.04974547e-02, -3.04974547e-02, -3.04974547e-02, -3.23033863e+00,
       -3.23033863e+00, -3.23033863e+00, -1.11723392e+01,  1.73995818e+01,
        1.73995818e+01,  1.73995818e+01])

Repeat the lookup with lower tolerance, and the items with high ppm disappear:

In [66]:
Out[66]:
array([-0.01697508, -0.01697508, -0.02691611, -0.02691611, -0.02691611,
       -0.02691611, -0.03049745, -0.03049745, -0.03049745, -0.03049745,
       -0.03049745, -0.03049745, -0.03049745, -0.03049745, -0.03049745,
       -0.03049745, -0.03049745, -0.03049745, -0.03049745, -0.03049745,
       -0.03049745, -0.03049745, -0.03049745, -0.03049745, -0.03049745,
       -0.03049745, -0.03049745, -0.03049745, -0.03049745, -0.03049745,
       -0.03049745, -3.23033863, -3.23033863, -3.23033863])

8: MS2 fragment definitions

The fragment database provided by lipyd.fragment and lipyd.fragdb modules works similar way as lipyd.lipid and lipyd.moldb. lipyd.fragment contains near 100 predefined aliphatic chain derived fragments. In addition 140 headgroup derived fragments are included like for example 184 for choline.

In [79]:

As an example take a look at a [Sph-NH2-OH]- fragment:

In [85]:

At this fragment type the constraints tell us which lipid varieties this fragment can be observed. In this case dCer and DHCer.

In [86]:
Out[86]:
(FragConstraint(hg='Cer', family=None, sub=None, sph='d', oh=0, chaintype='Sph'),
 FragConstraint(hg='Cer', family=None, sub=None, sph='DH', oh=0, chaintype='Sph'))
In [87]:
Out[87]:
(-1, 263.2380392001693)

9: MS2 fragment database

Look up a negative mode fragment m/z in the database. It results an array with mass, fragment name, fragment type, aliphatic chain type, carbon count, unsaturation and charge in each row. At neutral losses the charge is 0.

In [89]:
Out[89]:
array([[283.2642539494093, '[FA(18:0)-H]-', 'FA-H', 'FA', 18, 0, -1],
       [283.2642539494093, '[Sph(20:0)-C2H4-NH2-2H]-', 'Sph-C2H4-NH2-2H',
        'Sph', 20, 0, -1]], dtype=object)

Now let's annotate an MS2 scan with possible fragment identifications. To do this we open an example MGF file included in the module. The lipyd.mgf module serves MS2 scans from MGF files on demand. Btw the lipyd.settings module gives easy access for and control over near 100 customizable parameters.

In [104]:

We found the following scans for precursor 590.455:

In [105]:
Out[105]:
array([1941, 1929,  427,  423,  589,  645,  642,  308,  481,  478,  586,
        368,  696,  535,  532,  755,  752,  700,  721])

Select a scan from the ones above and annotate its fragments:

In [106]:

One example of the annotations. This fragment ranks 25 by intensity.

In [112]:
(
  FragmentAnnotation(
    mz = 228.23219101229077,
    name = '[Sph(14:0)-H2O+H]+',
    fragtype = 'Sph-H2O+H',
    chaintype = 'Sph',
    c = 14,
    u = 0,
    charge = 1
  ),
  FragmentAnnotation(
    mz = 228.23219101229077,
    name = '[FA(14:0)+NH2-O]+',
    fragtype = 'FA+NH2-O',
    chaintype = 'FA',
    c = 14,
    u = 0,
    charge = 1
  ),
  FragmentAnnotation(
    mz = 228.23219101229077,
    name = '[Sph(14:1)-O+H]+',
    fragtype = 'Sph-O+H',
    chaintype = 'Sph',
    c = 14,
    u = 1,
    charge = 1
  )
  )

10: MS2 spectrum analysis

The lipyd.ms2.Scan class is able to perform the entire identification workflow. By an alternative constructor method it can be initialized by providing and MGF file and scan ID.

In [114]:
In [138]:

If not provided the Scan instance performs the database lookup of the precursor ion. Here are the results:

In [143]:
LipidRecord(
  lab = LipidLabel(db_id=None, db='lipyd.lipid', names=('Hex2-Cer(t42:2)',)),
  hg = Headgroup(main='Cer', sub=('Hex2',)),
  chainsum = ChainSummary(
      c = 42,
      u = 2,
      typ = ('Sph', 'FA'),
      attr = (
          ChainAttr(sph='t', ether=False, oh=()),
      ChainAttr(sph='', ether=False, oh=())
          ),
      iso = None
    ),
  chains = ()
)

The identify method attempts to confirm each of the records by analysing the MS2 spectrum.

In [145]:

The results are grouped by lipid species and come with a score. Hex2-Cer(t42:2) got a score of 45, which is the highest at this scan:

In [148]:
MS2Identity(
  score = 45,
  hg = Headgroup(main='Cer', sub=('Hex2',)),
  chainsum = ChainSummary(
      c = 42,
      u = 2,
      typ = ('Sph', 'FA'),
      attr = (
          ChainAttr(sph='t', ether=False, oh=()),
      ChainAttr(sph='', ether=False, oh=())
          ),
      iso = None
    ),
  chains = (
      Chain(
        c = 18,
        u = 1,
        typ = 'Sph',
        attr = ChainAttr(sph='t', ether=False, oh=()),
        iso = ()
      ),
    Chain(
        c = 24,
        u = 1,
        typ = 'FA',
        attr = ChainAttr(sph='', ether=False, oh=()),
        iso = ()
      )
      ),
  details = ChainIdentificationDetails(rank = (0, None), i = (1.0, None), fragtype = ('Sph-2xH2O-H', None))
)

At the same time there were attempts to confirm for example Hex-Cer(d53:9-2OH) but it resulted a score of 0.

In [151]:
MS2Identity(
  score = 0,
  hg = Headgroup(main='Cer', sub=('Hex',)),
  chainsum = ChainSummary(
      c = 53,
      u = 9,
      typ = ('Sph', 'FAOH'),
      attr = (
          ChainAttr(sph='d', ether=False, oh=()),
      ChainAttr(sph='', ether=False, oh=('2OH',))
          ),
      iso = None
    ),
  chains = (
      Chain(
        c = 17,
        u = 1,
        typ = 'Sph',
        attr = ChainAttr(sph='d', ether=False, oh=()),
        iso = ()
      ),
    Chain(
        c = 36,
        u = 8,
        typ = 'FAOH',
        attr = ChainAttr(sph='', ether=False, oh=('2OH',)),
        iso = ()
      )
      ),
  details = ChainIdentificationDetails(
      rank = (1, None),
      i = (0.18172325900094663, None),
      fragtype = ('Sph-2xH2O+H', None)
    )
)

Let's see one more example.

In [152]:

We see that this is a PI(34:1) with score 11 and both fatty acyl fragments are confirmed by [FA-H]- ions (see the ChainIdentificationDetails object). These fragments are the 1st and 2nd most abundant with relative intensities of 100% and 99%.

In [154]:
MS2Identity(
  score = 11.0,
  hg = Headgroup(main='PI', sub=()),
  chainsum = ChainSummary(
      c = 34,
      u = 1,
      typ = ('FA', 'FA'),
      attr = (
          ChainAttr(sph='', ether=False, oh=()),
      ChainAttr(sph='', ether=False, oh=())
          ),
      iso = None
    ),
  chains = (
      Chain(
        c = 18,
        u = 1,
        typ = 'FA',
        attr = ChainAttr(sph='', ether=False, oh=()),
        iso = ()
      ),
    Chain(
        c = 16,
        u = 0,
        typ = 'FA',
        attr = ChainAttr(sph='', ether=False, oh=()),
        iso = ()
      )
      ),
  details = ChainIdentificationDetails(rank = (0, 1), i = (1.0, 0.9897746748655405), fragtype = ('FA-H', 'FA-H'))
)
In [ ]: