Data Explorer

Data Explorer¶

Table of Contents¶

Introduction
Setup
Part I. Event detection
Time series analyses:

Electric Organ Discharge (EOD) Physiology¶

As you explore your data, process and analyze it, think about some of the following questions:

Why were there two sets of differential electrodes (what if there had only been one)?
Why do events in the signal look different from each other (when do they look similar)?
How does the sampling rate of the Analog-to-Digital conversion effect the “raw” signal and my ability to observe EODs?
How are EOD events distributed through time?

First, go through this analysis using your group’s 50kHz recording.
Then, use other recordings as needed to address Prompts in your Responses notebook.

Setup¶

Import and define functions

#@title {display-mode: "form" }

#@markdown Run this code cell to import packages and define functions 
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy import ndimage
from scipy.signal import hilbert,medfilt,resample, find_peaks, unit_impulse
import seaborn as sns
from datetime import datetime,timezone,timedelta
pal = sns.color_palette(n_colors=15)
pal = pal.as_hex()
import matplotlib.pyplot as plt
import random

from pathlib import Path

from ipywidgets import widgets, interact, interactive
%config InlineBackend.figure_format = 'retina'
plt.style.use("https://raw.githubusercontent.com/NeuromatchAcademy/course-content/master/nma.mplstyle")

print('Task completed at ' + str(datetime.now(timezone(-timedelta(hours=5)))))

Mount Google Drive

#@title {display-mode: "form" }

#@markdown Run this cell to mount your Google Drive.

from google.colab import drive
drive.mount('/content/drive')

print('Task completed at ' + str(datetime.now(timezone(-timedelta(hours=5)))))

Import data digitized with Nidaq USB6211 and recorded using Bonsai-rx as a .bin file

If you would like sample this Data Explorer, but do not have data, you can download an example from here and then upload the file to Google Colab (or access the file through Drive after uploading it to your Drive). If you are using this example file, the samplerate was 50000 on two channels (each channel was a set of bipolar electrodes perpendicular to each other with a fisn in the middle).

#@title {display-mode: "form" }

#@markdown Specify the file path 
#@markdown to your recorded data on Drive (find the filepath in the colab file manager:

filepath = "full filepath goes here"  #@param 
# filepath = '/Users/kperks/Downloads/bribrach_samplerate100kHz.bin'

#@markdown Specify the sampling rate and number of channels recorded.

sampling_rate = 100000 #@param
number_channels = 2 #@param

# downsample = False #@param
# newfs = 10000 #@param

#@markdown After you have filled out all form fields, 
#@markdown run this code cell to import the data. 

filepath = Path(filepath)

# No need to edit below this line
#################################
data = np.fromfile(Path(filepath), dtype = np.float64)
data = data.reshape(-1,number_channels)
data_dur = np.shape(data)[0]/sampling_rate
print('duration of recording was %0.2f seconds' %data_dur)

fs = sampling_rate
# if downsample:
#     # newfs = 10000 #downsample emg data
#     chunksize = int(sampling_rate/newfs)
#     data = data[0::chunksize,:]
#     fs = int(np.shape(data)[0]/data_dur)

time = np.linspace(0,data_dur,np.shape(data)[0])

print('Data upload completed at ' + str(datetime.now(timezone(-timedelta(hours=5)))))

#@title {display-mode: "form"}

#@markdown Run this code cell to plot your imported data. <br> 
#@markdown Use the range slider to scroll through the data in time.
#@markdown Be patient with the range refresh... the more data you are plotting the slower it will be. 

slider = widgets.FloatRangeSlider(
    min=0,
    max=data_dur,
    value=(0,1),
    step= 0.05,
    readout=True,
    continuous_update=False,
    description='Data Time Range (s)',
    style = {'description_width': '200px'})
slider.layout.width = '600px'

# a function that will modify the xaxis range
def update_plot(x):
    fig, ax = plt.subplots(figsize=(10,5),num=1); #specify figure number so that it does not keep creating new ones
    starti = int(x[0]*fs)
    stopi = int(x[1]*fs)
    ax.plot(time[starti:stopi], data[starti:stopi,:])

w = interact(update_plot, x=slider);

For a more extensive RAW Data Explorer than the one provided in the above figure, use the DataExplorer.py application found in the howto section of the course website.

Part I. Event Detection¶

To analyse the distribution of EOD events acrosst time, we need to figure out the times of each EOD. We will do this by detecting ‘peaks’ in the signal. Instead of detecting peaks on just one channel, we will use a combination of signals from both channels (do you remember why we would want to do this?). First, the signal on each channel is “rectified” (absolute value) and then the signal is summed across all channels (if you recorded more than one). Using this processed signal, we will detect peaks.

Python has built-in algorithms for detecting “peaks” in a signal. However, it will detect all peaks. Therefore, we need to specify a threshold for peak detection (the minimum height that can count as a peak).

#@title {display-mode: "form"}

#@markdown Run this cell to create an interactive plot with a slider to scroll 
#@markdown through the processed EOD signal from all measurement channels 
#@markdown and select an appropraite EOD event detection threshold

y = data - np.median(data)
y = np.sum(np.abs(y),1)

slider_xrange = widgets.FloatRangeSlider(
    min=0,
    max=data_dur,
    value=(0,1),
    step= 0.1,
    readout=True,
    continuous_update=False,
    description='Time Range (s)',
    style = {'description_width': '200px'})
slider_xrange.layout.width = '600px'

slider_yrange = widgets.FloatRangeSlider(
    min=np.min(y)-0.5,
    max=np.max(y)+0.5,
    value=[np.min(y),np.max(y)],
    step=0.05,
    continuous_update=False,
    readout=True,
    description='yrange')
slider_yrange.layout.width = '600px'

slider_threshold = widgets.FloatSlider(
    min=np.min(y)-0.1,
    max=np.max(y)+0.1,
    value=np.mean(y)+np.std(y)*5,
    step=0.01,
    continuous_update=False,
    readout=True,
    description='event threshold',
    style = {'description_width': '200px'})
slider_threshold.layout.width = '600px'


# a function that will modify the xaxis range
def update_plot(thresh_,xrange): #,yrange):
    if np.diff(xrange)>0:
        fig, ax = plt.subplots(figsize=(10,5),num=1); #specify figure number so that it does not keep creating new ones

        d = 0.0003*fs #minimum time allowed between distinct events
        r = find_peaks(y,height=thresh_,distance=d)

        eod_times = r[0]/fs

        starti = int(xrange[0]*fs)+1
        stopi = int(xrange[1]*fs)-1
    
        ax.plot(time[starti:stopi], y[starti:stopi], color='black')

        # ax.plot(tmp,color='black')
        ax.hlines(thresh_, time[starti],time[stopi],linestyle='--',color='green')
        ax.scatter(eod_times,[thresh_]*len(eod_times),marker='^',s=50,color='purple',zorder=3)
        # ax.set_ylim(yrange[0],yrange[1])
        ax.set_xlim(xrange[0],xrange[1])

        return eod_times

w_events_ = interactive(update_plot, thresh_=slider_threshold, xrange=slider_xrange)#, yrange=slider_yrange);
display(w_events_)

#@title {display-mode: "form"}

#@markdown After you are satisfied with your threshold setting, 
#@markdown run this cell to store the list of EOD times in a variable (an array) called "eod_times".
eod_times = w_events_.result

Once you know the times of each peak (each event), we can look at the waveforms of those events. To do this, we plot the signal at the event time and some duration before and after that event time.

#@title {display-mode: "form"}

#@markdown Now, run this code cell to create an interactive plot to examine individual EOD events across channels.
#@markdown You can select one channel at a time or hold 'command' or 'control' key on your keyboard while you mouse click to select multiple channels


slider_xrange = widgets.FloatSlider(
    min=0.05,
    max=2,
    value=0.6,
    step=0.05,
    continuous_update=False,
    readout=True,
    description='xrange (ms)'
)
slider_xrange.layout.width = '600px'

slider_yrange = widgets.FloatRangeSlider(
    min=np.min(np.min(data)),
    max=np.max(np.max(data)),
    value=[np.min(np.min(data)),np.max(np.max(data))],
    step=0.01,
    continuous_update=False,
    readout=False,
    description='yrange'
)
slider_yrange.layout.width = '600px'

slider_eod = widgets.IntSlider(
    min=0,
    max=len(eod_times),
    value=0,
    step= 1,
    continuous_update=False,
    description='EOD number')
slider_eod.layout.width = '600px'

select_chan = widgets.SelectMultiple(
    options=np.arange(np.shape(data)[1]), # start with a single trial on a single bout... it will update when runs ,
    value=[0],
    #rows=10,
    description='Channels',
    disabled=False
)


# a function that will modify the xaxis range
def update_plot(eodi,chan_list,xrange,yrange):
    fig, ax = plt.subplots(figsize=(10,5),num=1); #specify figure number so that it does not keep creating new ones
    
    eod_range = xrange/2
    win_ = int(eod_range/1000*fs)
    
    colors = ['green','purple']
    for chan in chan_list:
        events = np.asarray([data[(int(fs*t)-win_):(int(fs*t)+win_),chan] for t in eod_times 
              if (((int(fs*t)-win_)>0) & ((int(fs*t)+win_)<np.shape(data)[0]))]).T
        etime = np.linspace(-eod_range,eod_range,win_*2)

        ax.plot(etime,events[:,eodi],color=colors[chan],linewidth=3,label='channel ' + str(chan))
    ax.set_ylim(yrange[0],yrange[1]);
    ax.legend()

w = interact(update_plot, eodi=slider_eod, chan_list=select_chan, xrange=slider_xrange, yrange=slider_yrange);

Take some time to explore and observe the variation in EOD waveforms. You can use plots you create with this interactive tool in your Responses as needed.

Importantly, now that we have EOD times, we can analyze how EODs are distributed across time. There are a fundamental set of quantitative techniques commonly use to describe event time series (such as spikes from a neuron or EODs from a fish). This notebook introduces you to three of them:

rate
isi
convolution

In Part II - Part IV, you will work with each of these analyses.

Part II. Rate¶

Average rate¶

First, think about how you would calculate the average EOD rate (given your detected EOD times and knowledge of the duration of your recording). Now, you can implement that calculation with computer code (in this case python). To do so, you will need the following tips…

Tips:

len(variable) : len() is a function used to get the number of elements in an array called *variable*

+ - * / are the symbols for addition, subtraction, multiplication, and division

eod_times is a variable that contains the list of EOD times

In the code cell below, write code that would calculate the average EOD rate in your recording. Store the result as a variable called average_rate by replacing ... with your equation.

average_rate = ...

#@title {display-mode:"form"}

#@markdown Run this code cell to print the average rate calculated using your equation. 
print(f'The average EOD rate is {average_rate}')

Subsampling (bootstrapping) the average rate¶

In order to do statistics and compare EOD rates between different conditions or groups (for example, among different species of fish), we need more than just one estimate of the EOD rate.

Note that this week you will not explicitly do this statistical comparison. You will just apply this analysis to your own group’s fish.

Bootstrapping is an analytic technique used to subsample your data.

In this case, we will calculate a set of EOD rate averages by subsampling the data. Then, we can get the mean and standard deviation of the average EOD rate across those samples/estimates. A subsample of the data is a smaller continuous section of the data.

The script (hidden) in the code cell below randomly selects chunks of time (duration) from throughout the total recording. It repeats this random selection process N times (to provide N subsamples of the data).

Running the code cell below generates an interactive plot in which you can control the number of subsamples taken from the data and the duration of each subsample. You will see a plot of the average rate of each subsample (each black point in the scatterplot), and the distribution (quantiles) of the set of subsamples in boxplot format.

#@title {display-mode: "form"}

#@markdown Run this code cell to enable the interactive analysis.
#@markdown Explore what changing these parameters does to your result.

slider_duration = widgets.FloatSlider(
    min=0,
    max=np.min([15,data_dur/2]),
    value=1,
    step= 0.001,
    readout=True,
    continuous_update=False,
    description='sample duration (seconds)',
    style = {'description_width': '200px'})
slider_duration.layout.width = '600px'

slider_N = widgets.IntSlider(
    min=0,
    max=100,
    value=10,
    step= 1,
    readout=True,
    continuous_update=False,
    description='number of subsamples (N)',
    style = {'description_width': '200px'})
slider_N.layout.width = '600px'

# a function that will modify the xaxis range
def update_plot(duration,N):
    
    rate_ = []
    for i in range(N):
        t = random.uniform(np.min(eod_times)+duration,np.max(eod_times)-duration)
        rate_.append(sum((eod_times>t) & (eod_times<t+duration))/duration)

    fig,ax = plt.subplots(figsize=(3,5),num=1);
    sns.boxplot(y=rate_, color = 'grey',ax = ax);
    sns.stripplot(y = rate_, color = 'black',size=10,ax = ax);
    ax.set_ylabel('rate (eod/sec)')#,fontsize=14);

w = interact(update_plot, duration=slider_duration, N=slider_N);

As you explore the plot, think about how duration and N effect the results of your analysis:

If you increase/decrease the duration of the subsampled data, does the variance of the estimated rate increase/decrease? Is there an ‘asymptote’ to the change in variance? How does that asymptote influence your choice of the duration parameter in analyzing your data?
If you decrease/increase N, does the distribution of the estimate change? How?

Part III. ISI¶

The time between events is called the inter-event interval. Since events in neurons are called spikes the metric is called an inter-SPIKE interval (ISI). In electric fish, the metric is also called the IPI (inter-pulse interval), which refers to each EOD event as a pulse.

How would you calculate the ISI from your recording?
Again, first think about it in terms of the equation you would use. Then, use the following tips to implement that equation.

Tips:

np.diff(variable) : diff() is a numpy module that calculates the numerical difference between each element in a list named variable and returns the result as a list

eod_times is a variable that contains the list of EOD times

In the code cell below, write a script that performs this calculation. Store the result as a variable called isi

isi = ...

#@title {display-mode: "form"}

#@markdown Running this code cell uses your equation to make a scatter plot of ISI at each EOD time.
#@markdown The average isi across the recording will also be printed. Use the time range slider to look at more/less of your data at once. 

print(f'Average isi is {np.mean(isi):0.2f}.')

# plot the isi at each EOD time.


e_time = eod_times[1:]

slider_xrange = widgets.FloatRangeSlider(
    min=0,
    max=data_dur,
    value=(0,1),
    step= 0.1,
    readout=True,
    continuous_update=False,
    description='Time Range (s)',
    style = {'description_width': '200px'})
slider_xrange.layout.width = '600px'

# a function that will modify the xaxis range
def update_plot(xrange): #,yrange):
    if np.diff(xrange)>0:
        fig, ax = plt.subplots(figsize=(10,5),num=1); #specify figure number so that it does not keep creating new ones

        starti = int(xrange[0]*fs)+1
        stopi = int(xrange[1]*fs)-1
        
        mask = ((e_time>xrange[0])&(e_time<xrange[1]))
    
        ax.scatter(e_time[mask], isi[mask], color='black')
        ax.set_ylabel('ISI (seconds)')
        ax.set_xlabel('Time of each EOD (seconds)')

w_ = interact(update_plot, xrange=slider_xrange)#, yrange=slider_yrange);

How does the isi relate to the rate? For example, think about how you could calculate instantaneous rate (as opposed to the average rate across some window of time)?

Would you consider the isi relatively variable or constant? Why?

We can quantify how variable/constant something is by looking at the distribution of its values.

#@title {display-mode:"form"}

#@markdown Run this code cell to plot the isi distribution for your recording (in boxplot format) 

plt.figure(figsize=(3,5));
sns.boxplot(y=isi, color = 'grey');
sns.stripplot(y = isi, color = 'black',size=10);
plt.ylabel('isi',fontsize=14);
plt.yticks(fontsize=14);

Part IV. Convolution¶

Sometimes we want a “smooth” estimate of the eod rate/isi over time.

By convolving a waveform with a time series, each event is transformed into a waveform. When all of these event waveforms are added together, you get a continuous signal instead of a discrete time series. This transformation is sometimes called “smoothing” and is required before some calculations can be made (such as correlation analysis).

In this instance, we are using a Gaussian filter. Therefore, the smoothness of the signal is controlled by a parameter called sigma that you can interactively control with a slider after running the code cell below.

Details of the function used to accomplish this processing step can be found at scipy.ndimage.gaussian_filter. For more general information on what a Gaussian is, you can consult the wikipedia page on the Gaussian distribution. In this case, the sigma parameter of the Gaussian filter is equivalent to the standard deviation of a Gaussian distribution.

#@title {display-mode: "form"}

#@markdown Run this code cell to plot the smoothed signal from discrete EOD times.  
#@markdown Change the value of ```sigma``` (the standard deviation of the *gaussian kernel* in seconds) to see how that effects the smoothed signal. <br>

slider_sigma = widgets.FloatSlider(
    min=0,
    max=1,
    value=(0.1),
    step= 0.01,
    readout=True,
    continuous_update=False,
    description='sigma',
    style = {'description_width': '200px'})
slider_sigma.layout.width = '600px'

slider = widgets.FloatRangeSlider(
    min=0,
    max=data_dur,
    value=(0,1),
    step= 1,
    readout=True,
    continuous_update=False,
    description='Data Time Range (s)',
    style = {'description_width': '200px'})
slider.layout.width = '600px'

# a function that will modify the xaxis range
def update_plot(x,sigma):
    
    filtered_fs = 1000
    sigma = sigma*filtered_fs

    eod_samps = [int(t*filtered_fs) for t in eod_times]

    filtered_time = np.linspace(0,data_dur,int(data_dur*filtered_fs))
    filtered_y = unit_impulse(len(filtered_time),eod_samps)
    filtered_y = ndimage.gaussian_filter1d(filtered_y,sigma=sigma)*filtered_fs

    fig, ax = plt.subplots(figsize=(10,5),num=1); #specify figure number so that it does not keep creating new ones
    starti = int(x[0]*filtered_fs)
    stopi = int(x[1]*filtered_fs)
    ax.plot(filtered_time[starti:stopi], filtered_y[starti:stopi])
    ax.set_xlabel('msec')
    ax.set_ylabel('a.u.')

w = interact(update_plot, x=slider, sigma=slider_sigma);

Great. That is it for the data processing and analysis tools for this week. You will be implementing these throughout the semester so you will get more practice.

Written by Dr. Krista Perks for courses taught at Wesleyan University.

Optional Extensions¶

When you calculated the EOD rate, you could have done something even more fancy than using the known length of your data recording.

This is a good time to introduce "indexing" lists in python, because eod_times is a list (with the earliest eod time in the first position of the list and the latest eod time in the last position of the list).

The following table shows, by example, how you would get the value at each position in a list (L) by indexing the list. In this example, the values in the list are t0, t1, t2.

Consider the list where L=[t0, t1, t2]

If you type:	Then you will get:	Because...
L[2]	t2	you are asking for an offset of 2 positions (start at zero)
L[-2]	t1	negative offsets count from the right
L[1:]	[t1, t2]	a colon "slices" a list, which means it returns a section of the list (in this case, from position 1 until the end)

Finally, if you want to print the entire contents of a list to the output of a code cell, use the command print(list) where “list” can be the name of any list (in this case eod_times)

Lab Manual

Responses

Neurophysiology Lab