ligo-ex ligo-ds
  Richardson Lab Data Science Log  Not logged in ELOG logo
Message ID: 10     Entry time: Mon May 27 13:24:03 2024
Author: Rutuja Gurav 
Type: Infrastructure 
Category: Scripts/Programs 
Subject: Large data collection scripts just die without error 
I can't quite figure out why my standard data collection scripts just die without any error logged in my console dump. I just get the progress bar printed by NDS and it just stops in the middle of data download. The ETA is also unexpectedly high! I was able to download these very channels for O3b just fine but O3a download is being difficult. I could download the ISI-GND_STS BLRMS channels and the PEM WIND channels for both O3a and O3b just fine in reasonable time (<10 hrs).

Example data collection run console dump for ACC channels -

PROJECT_DIR: /home/rutuja/ligo_seismic_state_characterization
Config file loaded
Run config:
{'channels_list_path': 'data/channels_lists/L1/pem_acc_channels.txt',
'data_agg': 'rms',
'data_download_dir': 'data/download/L1',
'data_trend': 's-trend',
'end_time': '2019-08-01T00:00:00',
'gaps_pad': 'nan',
'ifo': 'L1',
'start_time': '2019-07-01T00:00:00'}
Period: 2019-07-01T00:00:00 to 2019-08-01T00:00:00
pem_acc_channels channels list loaded
61 channels to be used
Attempting to access data from frames...
unknown datafind configuration, cannot discover data
Failed to access data from frames, trying NDS...
Opening new connection to nds.ligo.caltech.edu... connected
[nds.ligo.caltech.edu] set ALLOW_DATA_ON_TAPE='True'
Checking channels list against NDS2 database... done
Querying for data availability... done
Found 1 viable segments of data with 100.00% coverage

Downloading data: | | 0/2678400.0 ( 0%) ETA ?
Downloading data: | | 0/2678400.0 ( 0%) ETA ?
Downloading data: |▌ | 137519.0/2678400.0 ( 5%) ETA 77:29:58
Downloading data: |█ | 275038.0/2678400.0 ( 10%) ETA 49:49:41
Downloading data: |█ | 275038.0/2678400.0 ( 10%) ETA 49:49:41

It just hangs here and the script is just killed, I think. I've been restarting this particular run for ages.

On other occasions, the script actually does end gracefully but that's because of a peculiar error and only partial data is downloaded!
PROJECT_DIR: /home/rutuja/ligo_seismic_state_characterization
Config file loaded
Run config:
{'channels_list_path': 'data/channels_lists/L1/pem_acc_channels.txt',
'data_agg': 'rms',
'data_download_dir': 'data/download/L1',
'data_trend': 's-trend',
'end_time': '2019-10-01T00:00:00',
'gaps_pad': 'nan',
'ifo': 'L1',
'start_time': '2019-09-01T00:00:00'}
Period: 2019-09-01T00:00:00 to 2019-10-01T00:00:00
pem_acc_channels channels list loaded
61 channels to be used
Opening new connection to nds.ligo.caltech.edu... connected
[nds.ligo.caltech.edu] set ALLOW_DATA_ON_TAPE='True'
Checking channels list against NDS2 database... done
Querying for data availability... done
Found 1 viable segments of data with 100.00% coverage
Downloading data: | | 0/2592000.0 ( 0%) ETA ?
Downloading data: | | 0/2592000.0 ( 0%) ETA ? read_server_response: Wrong length read (0)
Downloading data: | | 0/2592000.0 ( 0%) ETA ?
Elapsed time: 5:15:47.576076
Data spans 2019-09-01 00:00:00 to 2019-10-01 00:00:00
Saving data to: /home/rutuja/ligo_seismic_state_characterization/data/download/L1/pem_acc_channels/agg_rms--trend_s-trend--gaps_nan/2019-09-01T00:00:00--2019-10-01T00:00:00.hdf5


On chat.ligo.org, someone said there was nothing I could do about the read_server_response: Wrong length read (0) error...
ELOG V3.1.3-7933898